ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
gcl has joined #lima
<enunes> rellla anarsoul well this is what I have for now, it fixes the same set of tests as the previous branch but looks less weird https://gitlab.freedesktop.org/enunes/mesa/commits/ppir-liveness-fixes-2
<enunes> could still check the last patch for a missing dep in ppir scheduler, but too late for today
<anarsoul> yeah, last patch looks fishy
<anarsoul> could you submit an MR with first 3 patches?
<anarsoul> feel free to add my r-b to 1-3
<anarsoul> (you also may need to fix deqp-lima-fails.txt)
<enunes> I suppose deqp-lima-fails.txt is done per-patch? or are people just adding a patch on top of the series with overall fixes?
<anarsoul> per-patch is cleaner
<anarsoul> but probably more time consuming
<enunes> yeah in that case I guess tomorrow I'll do that
<enunes> I guess I'll submit anyway to see if it finds the same thing as my run
<anarsoul> you can just push 3 branches
<anarsoul> gather unexpected passes
<anarsoul> and then finish it tomorrow :)
<anarsoul> if you push a branch prefixed with ci- you can start jobs you need manually
<anarsoul> see "pipelines" in your repo
<anarsoul> enunes: btw I'm not sure how we're going to add depth/stencil writes to ppir
<enunes> I never looked into that either
<anarsoul> so currently we're using $0 for gl_FragColor
<anarsoul> technically it can be any other register, number of reg is configured in rsw
<anarsoul> (I wrote about it few months ago - accidentally discovered while implementing dithering)
<enunes> yeah the way we use $0 is pretty weird, we just leave it completely out of regalloc which leaves it at 0
<anarsoul> yeah
<anarsoul> so I assume for depth/stencil it's similar
<enunes> I considered adding other register classes to output registers and ensure they are what we want
<anarsoul> we configure which reg to use in rsw
<anarsoul> and it's written as depth/stencil from this register when shader is terminated
<anarsoul> but
<anarsoul> ppir assumes that st_col is last op
<anarsoul> and it won't be true anymore with st_zs
<enunes> I investigated whether we could get rid of st_col completely and just mark registers as output registers and use that to end the program
<anarsoul> yeah, that's what we need
<anarsoul> or something similar
<enunes> but there are all sorts of corner cases with discard or jumping to that st_col, so it needs to exist
<enunes> I guess having register classes for output and labeling the registers properly to make regalloc assign the registers we want to them doesnt require a major redesign
<anarsoul> I think we still have to get rid of requirement it being the last node
<anarsoul> enunes: whichever register works.
<anarsoul> it doesn't have to be reg 0 for color
<anarsoul> I already tried that by changing it to 1
<anarsoul> I guess we should just adjust its liveness accordingly (from instr to the end), let regalloc do its job and then provide the number to upper level
<enunes> that should work too
<anarsoul> there's not a lot of unused bits in rsw
<anarsoul> reg number is 4 bits
<anarsoul> :)
<enunes> sounds good, do you see this as more priority than ppir general improvements like scheduler?
<anarsoul> so if you change "render->multi_sample = 0x0000F807" to 0x1111F807, gl_FragColor will be taken from $1
<anarsoul> enunes: well, I think that it may have influence on scheduler redesign
<enunes> must look nice taking color from $1 with the current implementation
<anarsoul> :D
<anarsoul> try it
<anarsoul> it's like white noise projected on something
<anarsoul> enunes: looks like your change isn't based on master
<anarsoul> rebase? :)
<anarsoul> (it won't run lima jobs in CI)
<enunes> yeah that kinda works, some programs just show as red and others have some random noise
<enunes> Fix a few exposed regressions with the new ppir liveness analysis patch.
<enunes> oops, mispaste
<enunes> well, gonna continue tomorrow, bye for now
<anarsoul> good night
buzzmarshall has quit [Quit: Leaving]
chewitt has quit [Quit: Zzz..]
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
ente has quit [Read error: Connection reset by peer]
jernej has joined #lima
ente has joined #lima
<anarsoul> darn, chroot doesn't work into 32-bit userspace for some reason :\
<anarsoul> it chroots and immediately exits
megi has quit [Ping timeout: 268 seconds]
chewitt has joined #lima
<anarsoul> missing syscall
<anarsoul> now it's working
chewitt has quit [Read error: Connection reset by peer]
chewitt has joined #lima
Barada has joined #lima
Barada has quit [Ping timeout: 258 seconds]
Barada has joined #lima
dddddd has joined #lima
yuq825 has joined #lima
Barada has quit [Ping timeout: 268 seconds]
Barada has joined #lima
<anarsoul> yuq825: hi, do you know if drm_sched exports gpu usage statistics somewhere?
<anarsoul> mostly interested in idle and busy time
<yuq825> it does not do this, need driver do
<anarsoul> I see
<anarsoul> I was kind of hoping for something like gputop :)
<yuq825> tools like gpuvis uses trace point for statistics
deesix has quit [Ping timeout: 258 seconds]
dddddd has quit [Ping timeout: 268 seconds]
deesix has joined #lima
<yuq825> and we does not use any GPU performance counters yet
<rellla> yuq825: i tested heap kernel and mesa patches and didn't notice sth negative. should i see sth positive? :)
<rellla> @all: i still working on depth/stencil/clear/scissor things. dEQP-GLES2.functional.depth_stencil_clear.depth_* fail and i wonder why https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/lima/lima_draw.c#L878 is always 0?
<yuq825> thanks, just no change is OK, unless there is a test case which fail due to need > 1MB heap memory before
<anarsoul> rellla: i.e. ctx->rasterizer->base.scissor is 0?
<rellla> one moment
<anarsoul> because rellla I guess I know what's the issue :)
<anarsoul> oops
<anarsoul> s/because//
<anarsoul> :)
<anarsoul> we don't reload depth/stencil
<anarsoul> so depth_scissored, stencil_scissored, etc won't work
dddddd has joined #lima
<anarsoul> rellla: mali400 can write depth and stencil in fragment shader
<anarsoul> since RSW controls which register to use for gl_FragColor it likely controls which register to use for depth and which for stencil
<anarsoul> also you likely need to set some bit to enable write of depth/stencil from fragment shader
<rellla> hm, let me think about it
<rellla> sorry, was another test, but i think they are related
<rellla> it is dEQP-GLES2.functional.color_clear.long_masked_rgb
<anarsoul> oh, OK
<anarsoul> color clears should work fine
<anarsoul> "0x0C [3] depth test" has 12 bits that are not described
<anarsoul> I bet depth register number and bits to enable depth/stencil write are somewhere here
<rellla> anyway, http://imkreisrum.de/deqp/tmp/ is what lima does (including the mesa api log) and http://imkreisrum.de/deqp/tmp2/ is what mali does
<rellla> mali sets PLBU CMD SCISSORS according to glScissor(), but lima/mesa always sets it to fb dimensions
<anarsoul> that's how clear works in gallium
<anarsoul> iirc it draws a quad if it needs scissored clear
<anarsoul> so scissor set to full fb is expected
<rellla> ok then.
<anarsoul> I'd say something's up with blending
<anarsoul> anyway, it's pretty late here
<anarsoul> good night :)
<anarsoul> rellla: depth/stencil clear issues are unlikely to be related to this particular test
<rellla> blending is also what i guess. for now, i think mali does some magic within ALPHA_BLEND...
yuq825 has quit [Remote host closed the connection]
yuq825 has joined #lima
Barada has quit [Quit: Barada]
<anarsoul|c> Yeah
cwabbott has joined #lima
megi has joined #lima
Xalius has joined #lima
Barada has joined #lima
buzzmarshall has joined #lima
<MoeIcenowy> anarsoul: for the flickering texture issue
<MoeIcenowy> I think I may triggered it on KWinn
<MoeIcenowy> which I can capture apitrace
<MoeIcenowy> s/KWinn/KWin/
monstr has joined #lima
<MoeIcenowy> okay my apitrace fails for KWin
Barada has quit [Quit: Barada]
<MoeIcenowy> apitrace cannot work for GLES KWin
yuq825 has quit [Quit: Leaving.]
deesix has quit [Ping timeout: 268 seconds]
dddddd has quit [Ping timeout: 268 seconds]
deesix has joined #lima
dddddd has joined #lima
<anarsoul> MoeIcenowy: it can
<anarsoul> just use --api=egl
monstr has quit [Remote host closed the connection]
chewitt has quit [Ping timeout: 240 seconds]
deesix has quit [Ping timeout: 265 seconds]
dddddd has quit [Ping timeout: 265 seconds]
deesix has joined #lima
chewitt has joined #lima
dddddd has joined #lima
<UnivrslSuprBox> Just built master merged with !3502. The color issue has not been fixed.
<anarsoul> UnivrslSuprBox: likely requires last commit (missing dependency)
<UnivrslSuprBox> Missing something?
<anarsoul> it's not been finalized yet
<anarsoul> well, wait for enunes to reply :)
<enunes> UnivrslSuprBox: so your issue was likely the patch which is still not in -2. please wait some more
<UnivrslSuprBox> No problem
<enunes> UnivrslSuprBox: cherry-picking https://gitlab.freedesktop.org/enunes/mesa/commit/d01ffcd944bf87b77acd3150a48b1a7dd664ca3a on top of that branch should fix it
<enunes> but that one still needs more investigation before it goes to a MR
<UnivrslSuprBox> Alright. Building now.
<UnivrslSuprBox> Yes, d01ffcd fixed the issue all the way.
<anarsoul> UnivrslSuprBox: but it's likely workaround, not a fix
buzzmarshall has quit [Remote host closed the connection]
<anarsoul> enunes: it looks like vertes shader actually differs *a lot* in https://gitlab.freedesktop.org/lima/mesa/issues/127 and mesa master
<anarsoul> *vertex
<enunes> anarsoul: great, more gpir fun then?
<anarsoul> maybe
<enunes> did you generate yours in that chroot?
<anarsoul> I asked Adam to share LIMA_DEBUG=dump,gp for the same mesa that I have (at 3abfde13be198449230e48c5f277e0b62a0e96c4)
<anarsoul> I generated mine in archlinux chroot
<anarsoul> shader binary for in my case is ~30% larger
<anarsoul> let's see what we get from nir :)
<anarsoul> fragment shader is the same
<anarsoul> (but it's expected since it's trivial)
<enunes> so is there a plausible explanation for why it would be different in your 32-bit chroot and in theirs?
<anarsoul> not yet
<anarsoul> waiting for them to send the logs with exactly the same mesa as I'm using
yann has quit [Ping timeout: 265 seconds]
<enunes> anarsoul: yeah I'm not seeing any easy way around that problem, nir dce doesn't detect it as dead code while still on ssa form because of vec4 instruction and after it converts vec4 to reg movs, nir dce doesn't work on non-ssa regs
<enunes> it doesn't really matter where ppir scheduler place it
<enunes> az and aw are not detected as dead code
<anarsoul> can you show nir represenation?
<anarsoul> enunes: yet I don't understand why it's an issue for ppir
<anarsoul> let me grab some coffee
<enunes> instructions 3 and 4 on that paste are dead, ideally nir should eliminate them but unfortunately it currently doesn't
<enunes> the actual issue is in a bit more complex shader, https://paste.centos.org/view/raw/65e99c2d
<enunes> instruction 6 is a dead load just like those ones, since nobody conflicts with it, regalloc assigns any register to it, which may be, say $0, which might contain some actual value
<anarsoul> but it's a bug in regalloc then
<anarsoul> so let's look at nir
<enunes> well yeah the algorithm for liveness analysis assumes that dead code like that doesn't exist, which is pretty much always true
<enunes> or we can detect these corner cases and put them in a live set, so regalloc won't assign a live register for it
<enunes> which is what my patch does
<anarsoul> so
<anarsoul> line (10) is dead code, right?
<enunes> yes
<anarsoul> and it's live range is 10-10
<anarsoul> so it's not zero
<enunes> there are no live ranges, only live sets
<anarsoul> OK, live set :)
<anarsoul> yet r0 is vec4
<anarsoul> so I don't really understand where the conflict is coming from
<anarsoul> are you calculating per-component live sets for registers?
<anarsoul> even if you do
<anarsoul> r0.y conflicts with r0.x
<anarsoul> since r0.x is live 9-13
<enunes> registers are carried on by masks
<enunes> and analysis is backwards
<enunes> so something at the next instruction needs to read it so it's part of that instruction live_in
<anarsoul> but you also need to update live_out if something in current instruction writes to the reg
<anarsoul> well
<enunes> no because live_in from the next instruction is propagated as live_out to the previous instruction
<enunes> so it comes live, and dies there when it is written
<anarsoul> let me guess, our granularity doesn't allow us to specify this live set? :)
<enunes> it's just the life cycle starts when it is read, and we propagate it up until it is written
<anarsoul> what if it's written several times?
<anarsoul> enunes: hold on
<anarsoul> what happens if we read and write in the same instruction?
<enunes> the read will prevail and it will stay in the live set
<enunes> until the "first" write
<anarsoul> but write is in this instr
<enunes> but it was never read, since it runs backwards it was never part of any set
<anarsoul> I guess write should also start life cycle but at the same time end it
<enunes> it's what my patch does pretty much
<enunes> but I can't put it into live_in otherwise it will be propagated up looking for another write
<anarsoul> but what do you do if we have multiple writes?
<anarsoul> and single read
<enunes> I think then ppir pretty much all broken, not sure it ever happens in nir
<enunes> I think nir will just coalesce them and put a phi node
<anarsoul> that's for ssa
<anarsoul> but phi is then converted to a reg
<anarsoul> enunes: we definitely can have multiple writes to a reg from different blocks
<champagneg> hi, i'm still using arm's libmali, but I was wondering if it's normal/expected to see up to 3 "PLBU needs more heap memory" interrupt per frame? If it is not, is this something lima would fix?
<anarsoul> otherwise it would be ssa, not nir reg
<anarsoul> champagneg: we don't provide support for the blob
<anarsoul> ask ARM
<enunes> I tried to force that but nir always resolves it into a single write, and we can also have cases with multiple writes in case of loops, but liveness is designed to handle that
<enunes> in that case it's not ssa
<champagneg> anarsoul. yeah, I get that. I was wondering if anyone with more architecture knowledge would know if lima currently does this back and forth for allocate more mem for the PLBU.
<anarsoul> something like: uniform int a; main() { float f; if (a == 1) { f = 1.0 } else { f = 2.0}; gl_FragColor = vec4(f, 0, 0, 1.0); } should do multiple writes
<anarsoul> enunes: ^^
<anarsoul> champagneg: currently we allocate 1mb for tile heap, there're kernel and mesa patches to support growing heap (up to 16mb)
<enunes> anarsoul: that is far too simple it resolves to a fcsel after the if
<anarsoul> darn
<anarsoul> :)
<anarsoul> disable control flow flattening?
<enunes> even this resolves to a single write https://paste.centos.org/view/raw/660a972a
<anarsoul> I see
<enunes> but yeah, if that happens, it's bad
<enunes> I suppose we'd pick it up as a regression though when I was writing that
<anarsoul> you should terminate live set with write only if block doesn't have predecessors that write to this reg
<anarsoul> btw that's how loops are supposed to be handled
<enunes> it's a different issue though, I am looking for a solution to fix that regression so we have something usable for 20.0
<anarsoul> OK
<anarsoul> then push your current fix
<enunes> I think it's not feasible to improve nir dce or implement full ppir dce without refactors
<anarsoul> fair enough
<anarsoul> let's fix it for 20.0 then
<enunes> I could implement some early ppir dce that just replaces the dead writes with writes to ^discard
<enunes> removing dead ops is not trivial without rescheduling
<champagneg> anarsoul: Ok, thank you. The heap size is decided by the userspace driver, but the kernel driver allocates the memory, right?
<anarsoul> champagneg: I have no idea how it's done in blob
<champagneg> anarsoul: Fair enough, I dont expect anyone but ARM/xlnx to know that ;). I am asking how it's done in mesa/lima
<anarsoul> champagneg: mesa calls lima_bo_alloc() which results in ioctl to kernel driver
<champagneg> cool, thanks.
<anarsoul> enunes: why it's not trivial without rescheduling?
<enunes> because registers are allocated per instruction, and if there are other ops in the instruction I want to remove, I can't remove it
<enunes> I considered doing it per-op instead, but I don't remember what was the blocker I encountered...
<anarsoul> remove ops, not instruction
<anarsoul> enunes: btw, why not to do it before scheduling?
<anarsoul> oh, won't work, you need liveness info
<enunes> yeah I considered that, if I could run liveness analysis per-op, I could do it at any time
<enunes> I think it's that our op dependency graph is not complete currently, it doesn't go across blocks
<anarsoul> probably worth doing it and then translating liveness to instruction level
<enunes> so the only time I have something sensible to iterate across the entire program is after we have scheduled instructions
<enunes> but technically we could have sets of live registers without instructions
<anarsoul> enunes: it shouldn't matter whether you iterate over instructions
<anarsoul> or over ops
<anarsoul> actually it totally makes sense to do liveness analysis before scheduling
<anarsoul> and do scheduling decisions based on liveness info
<enunes> yeah but I found it's hard to follow the order of ops backwards currently
<anarsoul> enunes: IIRC we print it during debug
<enunes> there is no pred/succ across blocks
<anarsoul> there is pred/succ for blocks, that's all you need
<enunes> I think there is only succ
<anarsoul> you calculate liveness in one block, then propagate it
<anarsoul> yeah, you're right
<anarsoul> we can walk it once to collect predecessors :)
<enunes> yeah it's totally possible
<enunes> I guess it's part of "things that could be much better in ppir" :)
<enunes> not sure if fixes many applications though
Xalius has quit [Quit: Leaving]
Xalius has joined #lima
warpme_ has quit [Quit: Connection closed for inactivity]