#lima on 2020-01-22 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:07 gcl has joined #lima

00:20 <enunes> rellla anarsoul well this is what I have for now, it fixes the same set of tests as the previous branch but looks less weird https://gitlab.freedesktop.org/enunes/mesa/commits/ppir-liveness-fixes-2

00:21 <enunes> could still check the last patch for a missing dep in ppir scheduler, but too late for today

00:22 <anarsoul> yeah, last patch looks fishy

00:23 <anarsoul> could you submit an MR with first 3 patches?

00:23 <anarsoul> feel free to add my r-b to 1-3

00:23 <anarsoul> (you also may need to fix deqp-lima-fails.txt)

00:27 <enunes> I suppose deqp-lima-fails.txt is done per-patch? or are people just adding a patch on top of the series with overall fixes?

00:29 <anarsoul> per-patch is cleaner

00:29 <anarsoul> but probably more time consuming

00:29 <enunes> yeah in that case I guess tomorrow I'll do that

00:29 <enunes> I guess I'll submit anyway to see if it finds the same thing as my run

00:29 <anarsoul> you can just push 3 branches

00:30 <anarsoul> gather unexpected passes

00:30 <anarsoul> and then finish it tomorrow :)

00:30 <anarsoul> if you push a branch prefixed with ci- you can start jobs you need manually

00:30 <anarsoul> see "pipelines" in your repo

00:37 <anarsoul> enunes: btw I'm not sure how we're going to add depth/stencil writes to ppir

00:38 <enunes> I never looked into that either

00:38 <anarsoul> so currently we're using $0 for gl_FragColor

00:38 <anarsoul> technically it can be any other register, number of reg is configured in rsw

00:39 <anarsoul> (I wrote about it few months ago - accidentally discovered while implementing dithering)

00:39 <enunes> yeah the way we use $0 is pretty weird, we just leave it completely out of regalloc which leaves it at 0

00:39 <anarsoul> yeah

00:39 <anarsoul> so I assume for depth/stencil it's similar

00:39 <enunes> I considered adding other register classes to output registers and ensure they are what we want

00:39 <anarsoul> we configure which reg to use in rsw

00:40 <anarsoul> and it's written as depth/stencil from this register when shader is terminated

00:40 <anarsoul> but

00:40 <anarsoul> ppir assumes that st_col is last op

00:40 <anarsoul> and it won't be true anymore with st_zs

00:41 <enunes> I investigated whether we could get rid of st_col completely and just mark registers as output registers and use that to end the program

00:41 <anarsoul> yeah, that's what we need

00:41 <anarsoul> or something similar

00:41 <enunes> but there are all sorts of corner cases with discard or jumping to that st_col, so it needs to exist

00:42 <enunes> I guess having register classes for output and labeling the registers properly to make regalloc assign the registers we want to them doesnt require a major redesign

00:42 <anarsoul> I think we still have to get rid of requirement it being the last node

00:43 <anarsoul> enunes: whichever register works.

00:43 <anarsoul> it doesn't have to be reg 0 for color

00:43 <anarsoul> I already tried that by changing it to 1

00:44 <anarsoul> I guess we should just adjust its liveness accordingly (from instr to the end), let regalloc do its job and then provide the number to upper level

00:45 <enunes> that should work too

00:46 <anarsoul> there's not a lot of unused bits in rsw

00:46 <anarsoul> reg number is 4 bits

00:46 <anarsoul> :)

00:48 <enunes> sounds good, do you see this as more priority than ppir general improvements like scheduler?

00:50 <anarsoul> so if you change "render->multi_sample = 0x0000F807" to 0x1111F807, gl_FragColor will be taken from $1

00:50 <anarsoul> enunes: well, I think that it may have influence on scheduler redesign

00:50 <enunes> must look nice taking color from $1 with the current implementation

00:51 <anarsoul> :D

00:51 <anarsoul> try it

00:51 <anarsoul> it's like white noise projected on something

00:54 <anarsoul> enunes: looks like your change isn't based on master

00:54 <anarsoul> rebase? :)

00:55 <anarsoul> (it won't run lima jobs in CI)

00:55 <enunes> yeah that kinda works, some programs just show as red and others have some random noise

00:57 <anarsoul> enunes: that should fix it : https://gist.github.com/anarsoul/aec916571d258f3e541229a5f7ee3b9d

00:57 <enunes> Fix a few exposed regressions with the new ppir liveness analysis patch.

00:57 <enunes> oops, mispaste

00:58 <enunes> well, gonna continue tomorrow, bye for now

00:58 <anarsoul> good night

01:39 buzzmarshall has quit [Quit: Leaving]

02:42 chewitt has quit [Quit: Zzz..]

02:52 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

02:55 ente has quit [Read error: Connection reset by peer]

02:55 jernej has joined #lima

03:30 ente has joined #lima

04:04 <anarsoul> darn, chroot doesn't work into 32-bit userspace for some reason :\

04:04 <anarsoul> it chroots and immediately exits

04:20 megi has quit [Ping timeout: 268 seconds]

04:24 chewitt has joined #lima

04:28 <anarsoul> missing syscall

04:28 <anarsoul> now it's working

04:47 chewitt has quit [Read error: Connection reset by peer]

04:47 chewitt has joined #lima

05:21 Barada has joined #lima

05:27 Barada has quit [Ping timeout: 258 seconds]

06:45 Barada has joined #lima

06:51 dddddd has joined #lima

06:58 yuq825 has joined #lima

07:00 Barada has quit [Ping timeout: 268 seconds]

07:05 Barada has joined #lima

07:57 <anarsoul> yuq825: hi, do you know if drm_sched exports gpu usage statistics somewhere?

07:58 <anarsoul> mostly interested in idle and busy time

07:59 <yuq825> it does not do this, need driver do

08:00 <anarsoul> I see

08:01 <anarsoul> I was kind of hoping for something like gputop :)

08:06 <yuq825> tools like gpuvis uses trace point for statistics

08:07 deesix has quit [Ping timeout: 258 seconds]

08:08 dddddd has quit [Ping timeout: 268 seconds]

08:08 deesix has joined #lima

08:10 <yuq825> and we does not use any GPU performance counters yet

08:10 <rellla> yuq825: i tested heap kernel and mesa patches and didn't notice sth negative. should i see sth positive? :)

08:15 <rellla> @all: i still working on depth/stencil/clear/scissor things. dEQP-GLES2.functional.depth_stencil_clear.depth_* fail and i wonder why https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/lima/lima_draw.c#L878 is always 0?

08:16 <yuq825> thanks, just no change is OK, unless there is a test case which fail due to need > 1MB heap memory before

08:17 <anarsoul> rellla: i.e. ctx->rasterizer->base.scissor is 0?

08:18 <rellla> one moment

08:18 <anarsoul> because rellla I guess I know what's the issue :)

08:18 <anarsoul> oops

08:18 <anarsoul> s/because//

08:18 <anarsoul> :)

08:18 <anarsoul> we don't reload depth/stencil

08:20 <anarsoul> so depth_scissored, stencil_scissored, etc won't work

08:21 dddddd has joined #lima

08:21 <anarsoul> rellla: mali400 can write depth and stencil in fragment shader

08:21 <anarsoul> since RSW controls which register to use for gl_FragColor it likely controls which register to use for depth and which for stencil

08:22 <anarsoul> also you likely need to set some bit to enable write of depth/stencil from fragment shader

08:22 <rellla> hm, let me think about it

08:23 <rellla> sorry, was another test, but i think they are related

08:23 <rellla> it is dEQP-GLES2.functional.color_clear.long_masked_rgb

08:23 <anarsoul> oh, OK

08:23 <anarsoul> color clears should work fine

08:24 <anarsoul> see https://web.archive.org/web/20171026123213/http://limadriver.org/Render_State/

08:24 <anarsoul> "0x0C [3] depth test" has 12 bits that are not described

08:25 <anarsoul> I bet depth register number and bits to enable depth/stencil write are somewhere here

08:25 <rellla> anyway, http://imkreisrum.de/deqp/tmp/ is what lima does (including the mesa api log) and http://imkreisrum.de/deqp/tmp2/ is what mali does

08:28 <rellla> mali sets PLBU CMD SCISSORS according to glScissor(), but lima/mesa always sets it to fb dimensions

08:28 <anarsoul> that's how clear works in gallium

08:29 <anarsoul> iirc it draws a quad if it needs scissored clear

08:29 <anarsoul> so scissor set to full fb is expected

08:29 <rellla> ok then.

08:30 <anarsoul> I'd say something's up with blending

08:31 <anarsoul> anyway, it's pretty late here

08:31 <anarsoul> good night :)

08:31 <anarsoul> rellla: depth/stencil clear issues are unlikely to be related to this particular test

08:32 <rellla> blending is also what i guess. for now, i think mali does some magic within ALPHA_BLEND...

08:35 yuq825 has quit [Remote host closed the connection]

08:35 yuq825 has joined #lima

08:37 Barada has quit [Quit: Barada]

08:38 <anarsoul|c> Yeah

09:31 cwabbott has joined #lima

10:36 megi has joined #lima

11:41 Xalius has joined #lima

11:43 Barada has joined #lima

12:32 buzzmarshall has joined #lima

13:05 <MoeIcenowy> anarsoul: for the flickering texture issue

13:05 <MoeIcenowy> I think I may triggered it on KWinn

13:05 <MoeIcenowy> which I can capture apitrace

13:05 <MoeIcenowy> s/KWinn/KWin/

13:24 monstr has joined #lima

13:38 <MoeIcenowy> okay my apitrace fails for KWin

13:48 Barada has quit [Quit: Barada]

14:17 <MoeIcenowy> apitrace cannot work for GLES KWin

14:42 yuq825 has quit [Quit: Leaving.]

14:59 deesix has quit [Ping timeout: 268 seconds]

15:00 dddddd has quit [Ping timeout: 268 seconds]

15:06 deesix has joined #lima

15:12 dddddd has joined #lima

16:15 <anarsoul> MoeIcenowy: it can

16:16 <anarsoul> just use --api=egl

17:12 monstr has quit [Remote host closed the connection]

17:32 chewitt has quit [Ping timeout: 240 seconds]

17:36 deesix has quit [Ping timeout: 265 seconds]

17:36 dddddd has quit [Ping timeout: 265 seconds]

17:37 deesix has joined #lima

17:45 chewitt has joined #lima

17:49 dddddd has joined #lima

17:50 <UnivrslSuprBox> Just built master merged with !3502. The color issue has not been fixed.

17:57 <anarsoul> UnivrslSuprBox: likely requires last commit (missing dependency)

17:59 <UnivrslSuprBox> https://github.com/ubports/mesa/commits/xenial_-_edge_-_mesa_-_ppir-liveness-fixes

17:59 <UnivrslSuprBox> Missing something?

18:00 <anarsoul> it's not been finalized yet

18:00 <anarsoul> well, wait for enunes to reply :)

18:01 <enunes> UnivrslSuprBox: so your issue was likely the patch which is still not in -2. please wait some more

18:01 <UnivrslSuprBox> No problem

18:03 <enunes> UnivrslSuprBox: cherry-picking https://gitlab.freedesktop.org/enunes/mesa/commit/d01ffcd944bf87b77acd3150a48b1a7dd664ca3a on top of that branch should fix it

18:03 <enunes> but that one still needs more investigation before it goes to a MR

18:05 <UnivrslSuprBox> Alright. Building now.

19:19 <UnivrslSuprBox> Yes, d01ffcd fixed the issue all the way.

19:37 <anarsoul> UnivrslSuprBox: but it's likely workaround, not a fix

19:50 buzzmarshall has quit [Remote host closed the connection]

19:50 <anarsoul> enunes: it looks like vertes shader actually differs *a lot* in https://gitlab.freedesktop.org/lima/mesa/issues/127 and mesa master

19:50 <anarsoul> *vertex

19:52 <enunes> anarsoul: great, more gpir fun then?

19:53 <anarsoul> maybe

19:53 <enunes> did you generate yours in that chroot?

19:53 <anarsoul> I asked Adam to share LIMA_DEBUG=dump,gp for the same mesa that I have (at 3abfde13be198449230e48c5f277e0b62a0e96c4)

19:53 <anarsoul> I generated mine in archlinux chroot

19:55 <anarsoul> shader binary for in my case is ~30% larger

19:55 <anarsoul> let's see what we get from nir :)

19:56 <anarsoul> fragment shader is the same

19:56 <anarsoul> (but it's expected since it's trivial)

20:02 <enunes> so is there a plausible explanation for why it would be different in your 32-bit chroot and in theirs?

20:13 <anarsoul> not yet

20:14 <anarsoul> waiting for them to send the logs with exactly the same mesa as I'm using

20:24 yann has quit [Ping timeout: 265 seconds]

21:21 <enunes> anarsoul: yeah I'm not seeing any easy way around that problem, nir dce doesn't detect it as dead code while still on ssa form because of vec4 instruction and after it converts vec4 to reg movs, nir dce doesn't work on non-ssa regs

21:21 <enunes> it doesn't really matter where ppir scheduler place it

21:21 <enunes> the issue is this https://paste.centos.org/view/raw/593d3900

21:21 <enunes> az and aw are not detected as dead code

21:21 <anarsoul> can you show nir represenation?

21:22 <anarsoul> enunes: yet I don't understand why it's an issue for ppir

21:22 <enunes> https://paste.centos.org/view/raw/6a4f4569

21:23 <anarsoul> let me grab some coffee

21:24 <enunes> instructions 3 and 4 on that paste are dead, ideally nir should eliminate them but unfortunately it currently doesn't

21:25 <enunes> the actual issue is in a bit more complex shader, https://paste.centos.org/view/raw/65e99c2d

21:26 <enunes> instruction 6 is a dead load just like those ones, since nobody conflicts with it, regalloc assigns any register to it, which may be, say $0, which might contain some actual value

21:27 <anarsoul> but it's a bug in regalloc then

21:28 <anarsoul> so let's look at nir

21:29 <enunes> well yeah the algorithm for liveness analysis assumes that dead code like that doesn't exist, which is pretty much always true

21:29 <enunes> or we can detect these corner cases and put them in a live set, so regalloc won't assign a live register for it

21:29 <enunes> which is what my patch does

21:30 <anarsoul> so

21:30 <anarsoul> https://gist.github.com/anarsoul/99a531bce0ef954045af8174acaa493b

21:30 <anarsoul> line (10) is dead code, right?

21:30 <enunes> yes

21:30 <anarsoul> and it's live range is 10-10

21:30 <anarsoul> so it's not zero

21:30 <enunes> there are no live ranges, only live sets

21:30 <anarsoul> OK, live set :)

21:31 <anarsoul> yet r0 is vec4

21:31 <anarsoul> so I don't really understand where the conflict is coming from

21:32 <anarsoul> are you calculating per-component live sets for registers?

21:32 <anarsoul> even if you do

21:32 <anarsoul> r0.y conflicts with r0.x

21:32 <anarsoul> since r0.x is live 9-13

21:34 <enunes> registers are carried on by masks

21:34 <enunes> and analysis is backwards

21:34 <enunes> so something at the next instruction needs to read it so it's part of that instruction live_in

21:38 <anarsoul> but you also need to update live_out if something in current instruction writes to the reg

21:38 <anarsoul> well

21:38 <enunes> no because live_in from the next instruction is propagated as live_out to the previous instruction

21:39 <enunes> so it comes live, and dies there when it is written

21:40 <anarsoul> let me guess, our granularity doesn't allow us to specify this live set? :)

21:41 <enunes> it's just the life cycle starts when it is read, and we propagate it up until it is written

21:41 <anarsoul> what if it's written several times?

21:41 <anarsoul> enunes: hold on

21:41 <anarsoul> what happens if we read and write in the same instruction?

21:42 <enunes> the read will prevail and it will stay in the live set

21:42 <enunes> until the "first" write

21:42 <anarsoul> but write is in this instr

21:43 <enunes> but it was never read, since it runs backwards it was never part of any set

21:43 <anarsoul> I guess write should also start life cycle but at the same time end it

21:43 <enunes> it's what my patch does pretty much

21:43 <enunes> but I can't put it into live_in otherwise it will be propagated up looking for another write

21:44 <anarsoul> but what do you do if we have multiple writes?

21:44 <anarsoul> and single read

21:45 <enunes> I think then ppir pretty much all broken, not sure it ever happens in nir

21:45 <enunes> I think nir will just coalesce them and put a phi node

21:45 <anarsoul> that's for ssa

21:45 <anarsoul> but phi is then converted to a reg

21:46 <anarsoul> enunes: we definitely can have multiple writes to a reg from different blocks

21:47 <champagneg> hi, i'm still using arm's libmali, but I was wondering if it's normal/expected to see up to 3 "PLBU needs more heap memory" interrupt per frame? If it is not, is this something lima would fix?

21:47 <anarsoul> otherwise it would be ssa, not nir reg

21:47 <anarsoul> champagneg: we don't provide support for the blob

21:47 <anarsoul> ask ARM

21:48 <enunes> I tried to force that but nir always resolves it into a single write, and we can also have cases with multiple writes in case of loops, but liveness is designed to handle that

21:48 <enunes> in that case it's not ssa

21:49 <champagneg> anarsoul. yeah, I get that. I was wondering if anyone with more architecture knowledge would know if lima currently does this back and forth for allocate more mem for the PLBU.

21:50 <anarsoul> something like: uniform int a; main() { float f; if (a == 1) { f = 1.0 } else { f = 2.0}; gl_FragColor = vec4(f, 0, 0, 1.0); } should do multiple writes

21:50 <anarsoul> enunes: ^^

21:50 <anarsoul> champagneg: currently we allocate 1mb for tile heap, there're kernel and mesa patches to support growing heap (up to 16mb)

21:52 <enunes> anarsoul: that is far too simple it resolves to a fcsel after the if

21:53 <anarsoul> darn

21:53 <anarsoul> :)

21:53 <anarsoul> disable control flow flattening?

21:54 <enunes> even this resolves to a single write https://paste.centos.org/view/raw/660a972a

21:55 <anarsoul> I see

21:56 <enunes> but yeah, if that happens, it's bad

21:58 <enunes> I suppose we'd pick it up as a regression though when I was writing that

21:58 <anarsoul> you should terminate live set with write only if block doesn't have predecessors that write to this reg

21:59 <anarsoul> btw that's how loops are supposed to be handled

22:00 <enunes> it's a different issue though, I am looking for a solution to fix that regression so we have something usable for 20.0

22:00 <anarsoul> OK

22:01 <anarsoul> then push your current fix

22:01 <enunes> I think it's not feasible to improve nir dce or implement full ppir dce without refactors

22:01 <anarsoul> fair enough

22:01 <anarsoul> let's fix it for 20.0 then

22:02 <enunes> I could implement some early ppir dce that just replaces the dead writes with writes to ^discard

22:02 <enunes> removing dead ops is not trivial without rescheduling

22:03 <champagneg> anarsoul: Ok, thank you. The heap size is decided by the userspace driver, but the kernel driver allocates the memory, right?

22:04 <anarsoul> champagneg: I have no idea how it's done in blob

22:05 <champagneg> anarsoul: Fair enough, I dont expect anyone but ARM/xlnx to know that ;). I am asking how it's done in mesa/lima

22:06 <anarsoul> champagneg: mesa calls lima_bo_alloc() which results in ioctl to kernel driver

22:06 <champagneg> cool, thanks.

22:34 <anarsoul> enunes: why it's not trivial without rescheduling?

22:38 <enunes> because registers are allocated per instruction, and if there are other ops in the instruction I want to remove, I can't remove it

22:38 <enunes> I considered doing it per-op instead, but I don't remember what was the blocker I encountered...

22:38 <anarsoul> remove ops, not instruction

22:39 <anarsoul> enunes: btw, why not to do it before scheduling?

22:40 <anarsoul> oh, won't work, you need liveness info

22:40 <enunes> yeah I considered that, if I could run liveness analysis per-op, I could do it at any time

22:40 <enunes> I think it's that our op dependency graph is not complete currently, it doesn't go across blocks

22:40 <anarsoul> probably worth doing it and then translating liveness to instruction level

22:41 <enunes> so the only time I have something sensible to iterate across the entire program is after we have scheduled instructions

22:41 <enunes> but technically we could have sets of live registers without instructions

22:42 <anarsoul> enunes: it shouldn't matter whether you iterate over instructions

22:42 <anarsoul> or over ops

22:42 <anarsoul> actually it totally makes sense to do liveness analysis before scheduling

22:42 <anarsoul> and do scheduling decisions based on liveness info

22:43 <enunes> yeah but I found it's hard to follow the order of ops backwards currently

22:43 <anarsoul> enunes: IIRC we print it during debug

22:43 <enunes> there is no pred/succ across blocks

22:44 <anarsoul> there is pred/succ for blocks, that's all you need

22:44 <enunes> I think there is only succ

22:44 <anarsoul> you calculate liveness in one block, then propagate it

22:45 <anarsoul> yeah, you're right

22:45 <anarsoul> we can walk it once to collect predecessors :)

22:45 <enunes> yeah it's totally possible

22:48 <enunes> I guess it's part of "things that could be much better in ppir" :)

22:48 <enunes> not sure if fixes many applications though

23:01 Xalius has quit [Quit: Leaving]

23:05 Xalius has joined #lima

23:40 warpme_ has quit [Quit: Connection closed for inactivity]