ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at and - Contact ARM for binary driver support!
ninolein has quit [Ping timeout: 264 seconds]
ninolein has joined #lima
_whitelogger has joined #lima
dddddd has quit [Remote host closed the connection]
Barada has joined #lima
guillaume_g has joined #lima
yuq825 has joined #lima
guillaume_g has quit [Quit: Konversation terminated!]
guillaume_g has joined #lima
cwabbott has joined #lima
ninolein_ has joined #lima
ninolein has quit [Ping timeout: 264 seconds]
<rellla> cwabbott: is there any where an example in the nir_lower/opt_* code which inserts a mov op that depends on the parent instruction?
guillaume_g has left #lima ["Konversation terminated!"]
<rellla> (('fddx', ('fabs', a)), ('fddx', ('fmov', ('fabs', a))))
<rellla> in pseudocode :)
<rellla> *opt_algebraic pseudocode
<cwabbott> rellla: first off, opt_algebraic doesn't even work after modifiers are introduced (it isn't meant to be run that late)
<cwabbott> you'll need to do it yourself, or do it in ppir
<rellla> cwabbott: i'll think i'll do it in ppir, because it's lima specific anyway...
<enunes> rellla: don't we always lower abs to a mov already, you need another one?
<rellla> enunes: yes, ppir_op_abs gets ppir_op_mov, but it seems, that we need an extra mov if fddx has fabs as parent instruction. if suspect, it's only necessary if fabs is combined with a negation, but the blob seems to do the following all the time:
<rellla> fddy(fabs) -> fddy(fmov(fabs))
<rellla> thats probably, why glsl-derivs-abs-sign fails
<enunes> so that results in fddy(mov(mov)) ?
<rellla> enunes: this is what the blob produces for glsl-derivs-abs-sign: (from cwabbott)
<cwabbott> enunes: no, it's just that the blob doesn't support abs/neg modifiers on the ddx/ddy instructions, so we have to undo the nir modifier pass
<cwabbott> apparently you can get it do neg and abs by being a little clever, but both together doesn't work
<cwabbott> it's a little tricky since the HW op actually has two sources, presumably it works like an add where one of the sources is swapped with another pixel in the quad
<enunes> I see, so this is for when the nir src already comes with .negate set, not for when we have a separate fneg/fabs node
yuq825 has quit [Remote host closed the connection]
ente has joined #lima
dddddd has joined #lima
<rellla> hm, what happens, if both, negate and absolute modifiers, are set?
Unit193 has quit [Read error: Connection reset by peer]
Unit193 has joined #lima
Barada has quit [Quit: Barada]
jrmuizel has joined #lima
afaerber has quit [Quit: Leaving]
<rellla> enunes: regarding the force zswrite option, see;
<rellla> i don't really understand the cases, where it needs to be enabled automatically... i just followed anarsoul's suggestion
<anarsoul> rellla: it has to be enabled for all cases but for scanout
<rellla> anarsoul, so it does what we want already?!
<anarsoul> I think so
<anarsoul> I doubt any other app tries to read out depth/stencil buffer via glReadPixels()
<enunes> is it possible to enable it automatically when we have glReadPixels?
<enunes> or glReadPixels reading those buffers
<enunes> yeah probably not if that needs to go on the frame reg
<enunes> I guess I would vote to remove the optimization if the buffer was allocated with depth/stencil, otherwise we rely on a debug option to do things that should be valid on opengl es
<enunes> or have the debug option to do the optimization
afaerber has joined #lima
yuq825 has joined #lima
<anarsoul> enunes: I believe glReadPixels does it *after* drawing is done
<anarsoul> enunes: and removing this optimization isn't a good idea since it'll hurt performance badly
<anarsoul> basically you need 2x of memory bandwidth if you always write out depth/stencil buffer
<anarsoul> hi yuq825
<anarsoul> enunes: IIRC I tried it and it was 25-30% FPS drop in glmark
<cwabbott> anarsoul: you can't use that optimization unless the app tells you to via glInvalidateFramebuffer(), the driver has no idea if the app will use the depth buffer in the future
<cwabbott> if the app doesn't perform as well, well then that's the app's fault, and the blob won't do any better
<anarsoul> cwabbott: blob does this optimization
<cwabbott> anarsoul: for glmark2, how does the blob know that the depth buffer is invalidated?
<anarsoul> cwabbott: I believe it just never allocates depth buffer for scanout
<cwabbott> ok, so then it sounds like you're missing some optimization
<cwabbott> it's better to be correct by default first, and then later add per-app workarounds or optimizations, then adding a flag for piglit to enable the correct thing
<rellla> personally i think we don't need this to be merged, it only showed me, that the gl_FragCoord implementation was right...
yuq825 has quit [Remote host closed the connection]
yuq825 has joined #lima
<anarsoul> rellla: well, we still need it to confirm that gl_FragCoord isn't broken :)
<rellla> anarsoul: well, that the confirmation: :)
<rellla> if i remove the debug option, the latter 2 fail.
<anarsoul> rellla: well, it's not one time thing
<anarsoul> what if it breaks in future?
<bshah> hm, depth/stencil buffer topic in messages here triggered by curiosity... so remember I talked about simple Qt/QML app not rendering? one of the bit I can see is qt complains about missing depth/stencil buffer, however AFAIU, it is implemented in mesa already.. : (19.1.0 release is what I am using)
buzzmarshall has joined #lima
<bshah> am I missing something basic here? or?
yuq825 has quit [Read error: Connection reset by peer]
yuq825 has joined #lima
<enunes> anarsoul: do you mean that the blob doesnt allocate the depth buffer in glmark2, or the application? if the application doesnt allocate it, then it makes sense to not do the writeback right? this is what we already have?
<enunes> other than that I agree with cwabbott , what we can also possibly do is have the env var to enable the optimization and maybe spit out a ppir_debug suggesting its use to improve performance in some applications, if it makes that much big difference
<enunes> bshah: yes it should already be supported, the discussion is about an optimization that breaks glReadPixels from the depth buffer, not sure if Qt does that
<enunes> bshah: 19.1 misses many features, I would really recommend to do your testing with master
<anarsoul> enunes: it doesn't allocate depth buffer for scanout since it makes no sense if application doesn't read it back
<anarsoul> enunes: mali4x0 uses tile buffer for depth
<bshah> enunes: hm, now I wonder why qt think it is not supported.. hmm
<bshah> I'll read qt code I guess :)
<enunes> well it can't know if the application will want to read it or not
yuq825 has quit [Remote host closed the connection]
<bshah> question, if I want to grab apitrace for debugging application with lima, should I use any special args?
<bshah> or simply apitrace trace --api egl would do?
guillaume_g has joined #lima
drod has joined #lima
deesix has quit [Ping timeout: 246 seconds]
jkucia has joined #lima
deesix has joined #lima
guillaume_g has quit [Quit: Konversation terminated!]
deesix has quit [Ping timeout: 244 seconds]
deesix has joined #lima
<anarsoul> enunes: what would you read it for? :)
buzzmarshall has quit [Remote host closed the connection]
drod has quit [Quit: Ухожу я от вас (xchat 2.4.5 или старше)]
afaerber has quit [Quit: Leaving]
afaerber has joined #lima
drod has joined #lima
drod has quit [Remote host closed the connection]
adjtm has joined #lima
jrmuizel has quit [Remote host closed the connection]