#lima on 2019-08-26 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:14 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

00:29 jrmuizel has quit [Remote host closed the connection]

00:29 nerdboy has quit [Ping timeout: 245 seconds]

00:29 nerdboy has joined #lima

00:44 jrmuizel has joined #lima

00:47 nerdboy has quit [Excess Flood]

00:47 nerdboy has joined #lima

00:48 jrmuizel has quit [Ping timeout: 245 seconds]

00:49 Da_Coynul has joined #lima

00:50 jrmuizel has joined #lima

00:52 Da_Coynul has quit [Client Quit]

00:53 Da_Coynul has joined #lima

01:10 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

01:16 yuq825 has joined #lima

01:25 jrmuizel has quit [Remote host closed the connection]

01:29 Da_Coynul has joined #lima

01:31 megi has quit [Ping timeout: 246 seconds]

01:34 nerdboy has quit [Changing host]

01:34 nerdboy has joined #lima

01:59 yuq8251 has joined #lima

02:00 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

02:03 yuq825 has quit [Ping timeout: 272 seconds]

02:36 kaspter has joined #lima

02:39 camus has joined #lima

02:41 kaspter has quit [Ping timeout: 244 seconds]

02:41 camus is now known as kaspter

02:57 <anarsoul> enunes: yeah, we need to insert uniforms to every successor

02:57 <anarsoul> I read some guide on ARM site re: Mali400 and it's essentially free to read uniform and/or varying in every instruction

02:58 <anarsoul> so we should do that when possible

02:58 <anarsoul> that should reduce register pressure for sure as well as instructions count

02:59 <anarsoul> basically use the same approach as with consts

02:59 <anarsoul> but it's a bit more tricky

02:59 <anarsoul> we'll have to create movs in scheduler (or rather in node_to_instr)

03:00 <anarsoul> it's possible and I think I know how to do that

03:00 <anarsoul> just need another refactoring to split out code that creates moves into a helper

03:00 <anarsoul> and then use this helper in scheduler

03:09 <anarsoul> see https://developer.arm.com/solutions/graphics/developer-guides/the-utgard-shader-core

03:29 dddddd has quit [Remote host closed the connection]

03:39 <anarsoul> enunes: yuq8251: btw, I'm having an issue described here in sway, weston or X11: https://community.arm.com/developer/tools-software/graphics/f/discussions/9384/incorrect-drawing-image-with-wayland-on-mali400

03:40 <anarsoul> looks like we must load varying and fetch texture in the same instruction

03:40 <anarsoul> otherwise fp16 precision is not enough for accurate sampling

03:40 <anarsoul> I'll look into it

05:23 Barada has joined #lima

07:19 Elpaulo has quit [Ping timeout: 258 seconds]

07:35 Elpaulo has joined #lima

07:39 megi has joined #lima

09:28 <wens> bad memory access in drm_gem_fence_array_add() when using lima # https://pastebin.com/xhxnrt2U

09:29 <wens> this is with sunxi-next ( 2f2e616b03d6 )

09:30 nerdboy has quit [Ping timeout: 248 seconds]

09:30 <wens> and mesa HEAD ( 4379dcc12d3 )

09:31 nerdboy has joined #lima

09:49 gtucker has joined #lima

10:12 yuq8251 has quit [Quit: Leaving.]

10:13 Da_Coynul has joined #lima

10:18 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

10:29 Da_Coynul has joined #lima

10:42 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

11:41 cp has quit [Quit: Disappeared in a puff of smoke]

11:41 nerdboy has quit [Ping timeout: 246 seconds]

11:42 cp has joined #lima

11:56 jrmuizel has joined #lima

11:59 yuq825 has joined #lima

12:02 kaspter has quit [Read error: Connection reset by peer]

12:02 kaspter has joined #lima

12:08 dddddd has joined #lima

12:53 kaspter has quit [Quit: kaspter]

13:04 jrmuizel has quit [Remote host closed the connection]

13:05 jrmuizel has joined #lima

13:06 jrmuizel has quit [Remote host closed the connection]

14:01 Barada has quit [Quit: Barada]

14:03 yuq825 has quit [Ping timeout: 245 seconds]

14:06 jrmuizel has joined #lima

15:59 <rellla> enunes: i can't remember if you did already, but can you guide through how to setup shader-db for lima? i'm a bit lost ...

16:15 <enunes> rellla: there should be no setup needed, you just use the 'run' application from https://gitlab.freedesktop.org/mesa/shader-db , collect stdout and then 'report.py' on the pc to generate a comparison report between two runs

16:16 <enunes> just like the readme on that page

16:16 <rellla> enunes: thanks. now that i read it more times and simply tried it, it works :)

16:19 <enunes> I haven't been using the shaders from shader-db itself as they are rather complicated, so I just captured all piglit shaders using the method from the readme too

16:33 <anarsoul> enunes: it would be a good idea to add glamor shaders

16:33 <anarsoul> :)

16:34 nerdboy has joined #lima

16:36 <enunes> anarsoul: yeah maybe it is even there, I think the main reason those more complicated shaders were unusable yet is that all of them just aborted due to missing control flow, now we can probably start using them

16:36 nerdboy has quit [Changing host]

16:36 nerdboy has joined #lima

16:39 drod has joined #lima

18:15 drod has quit [Ping timeout: 248 seconds]

18:27 drod has joined #lima

18:49 <anarsoul> enunes: btw, can you publish MR with your improved spilling?

18:49 <enunes> anarsoul: I can make it WIP, there is still 1 spot without saving the instruction creation

18:49 <anarsoul> sounds good

18:50 <anarsoul> I'm planning to work on merging load.v and load.u into successors this week

18:50 <anarsoul> and I'd prefer to base it on top of your branch

18:50 <enunes> ok

18:51 <enunes> I'm not sure if it would be trivial to remove the remaining spot without merging the instruction, due to the forced vec4 spilling that we have right now

18:51 <enunes> I'd be ok merging the ones I already have anyway before we have that

18:52 <anarsoul> is it possible to detect it, marking it with 'no_spill' and try another reg?

18:53 <enunes> I guess so, we could also try to only attempt the optimization if it is already vec4

18:53 <anarsoul> it makes no sense to spill if it increases reg pressure anyway

18:54 <enunes> the tricky part of having non-vec4 spills is that if I recall correctly there is no vec3 store, only vec1 vec2 and vec4, so some exception handling needs to happen anyway

18:54 <enunes> but doing something else for work now, I'll just push it so you can also try to use it and work on it later

18:55 <anarsoul> ok

19:15 alyssa has joined #lima

19:15 <alyssa> rellla: Ya rang?

19:17 <alyssa> On Midgard, lowering to #0 is ~free since we get a free inline constant and 0 always inlines

19:17 <alyssa> I guess you don't have that for PP so that changes a bit.

19:18 <alyssa> OTOH, a lot of f(x, 0) can be optimized for all sorts of f, so it might still be a win

19:20 <alyssa> fadd/fsub/fmul/iand/ixor/ior/inot/csel/fdot/fsum.... all go away/simplify with a zero

19:20 <anarsoul> alyssa: using any reg for undef is completely free

19:20 <alyssa> Anyway, I was less interested in the opt as much as eliminating UB in the output, which is not required by the spec but makes debugging easier

19:20 <anarsoul> and no need to take const slot for that

19:20 <alyssa> anarsoul: Run opt_algebraic/const fold after lowering to zero... cheaper than free..

19:21 <alyssa> fsub(fadd(x, ssa_undef), 1) -> fsub(fadd(x, 0), 1) -> fsub(x, 1)

19:21 <alyssa> saved an instruction without using any const slots

19:21 <anarsoul> we had a discussion re: undef with cwabbott a week ago and he convinced me that using any reg is better

19:22 <alyssa> anarsoul: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1767#note_214311 seems to be pretty clearly a win

19:23 <alyssa> (Notice big shaders get crushed into nothing since the flush-to-zero allows further opts to happen that you don't get otherwise)

19:24 <alyssa> Rewriting to zero at codgen time is certainly a bad idea (because of const slots).. rewriting at NIR opt time I think is totally reasonable

19:24 <alyssa> csel(c, b, undef) -> csel(c, b, 0) -> iand(c, b) .... for another example for Midgard

19:24 <anarsoul> fair enough

19:35 niceplace has quit [Ping timeout: 245 seconds]

19:41 niceplace has joined #lima

19:56 <rellla> i will check shader-db with all piglit tests as i think that there are 9 tests that failed due to missing undef handling.

19:57 <rellla> ... tomorrow

19:58 <anarsoul> rellla: sounds good

20:01 <rellla> if we still get no hurts, we should take the nir pass imho

20:01 <alyssa> Would certainly help me :)

20:01 <rellla> the possible follow-up opts should be convincing ;)

20:03 <rellla> though one should not get undef instr with a “good“ shader anyway.

20:08 drod has quit [Read error: Connection reset by peer]

20:08 drod has joined #lima

20:28 <anarsoul> alyssa: nice to see you on #lima channel btw :)

20:39 <alyssa> anarsoul: rellla linked it

20:54 <enunes> anarsoul: pushed https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1772

20:54 <anarsoul> enunes: looking at it :)

20:54 <enunes> no piglit regressions too

20:55 alyssa has left #lima [#lima]

20:59 <enunes> looks like it even fixes a few tests, not sure how

20:59 <enunes> and 3 timeouts which presumably were inifinite loops without the first commit

21:01 <anarsoul> enunes: I think we should improve spilling cost once again

21:01 <enunes> I started working on something on that direction

21:02 <enunes> to increase cost of registers in instructions that have load or store slots filled

21:02 <anarsoul> basically walk through all instructions and mark regs that are used in instructions with busy load uniform and store temporary slots as unspillable

21:02 <anarsoul> yeah, exactly

21:02 <anarsoul> I don't think that it makes sense to spill these anyway

21:03 <anarsoul> or even better:

21:03 <anarsoul> walk through instructions once calculating register pressure at each instruction and assign it to some field in instruction

21:03 <anarsoul> if instruction is at max reg pressure and has uniform and store slots taken, mark its regs as unspillable

21:04 <anarsoul> (that's in 2nd pass)

21:06 <enunes> not sure if mark it spilled as the 'unspillable' information currently keeps across regalloc attempts and that might change

21:06 <enunes> so maybe just add a large cost to them

21:06 <anarsoul> I think it does

21:08 <enunes> I mean, the 'unspillable' flag probably shouldnt be set based on information that might change (max register pressure) as max register pressure may change between regalloc attempts

21:08 <enunes> but we can probably achieve the same result setting a high cost, as the cost varies between runs

21:11 <anarsoul> yeah, maybe

21:14 <anarsoul> anyway, creating new instructions and using a reg for loading or storing temporaries increases reg pressure in already bottlenecked place

21:15 <anarsoul> enunes: next step would be to clone load.v and load.u for each user

21:16 <enunes> I thought we were already doing that from the cf MR?

21:17 <anarsoul> no, we still create a mov

21:17 <anarsoul> but we clone them for each block

21:17 <enunes> that should be fine as the created register is only alive for 2 instructions

21:17 <anarsoul> we need to clone them for each instruction when possible

21:18 <anarsoul> enunes: that's a disaster for texture coords anyway

21:18 <enunes> same for the instruction in the created reg for spilling

21:18 <anarsoul> we've got only 10 bits of precision if it gets stored into a reg

21:18 <enunes> for some reason ideas needs ~20 registers from top to bottom

21:18 <enunes> even across blocks

21:19 <enunes> I saw the blog your pasted, interesting indeed

21:19 <anarsoul> enunes: yeah, look at glsl source and you'll understand why :)

21:19 <anarsoul> enunes: I actually hit this issue with X11 or any wayland compositor :)

21:20 <anarsoul> I thought that it was caused by my recent change that introduced a struct for texture descriptor

21:20 <anarsoul> (and I actually found a bug there - but it's not related)

21:21 <enunes> I think I removed the optimization where we had the special case for load tex coords

21:29 <anarsoul> enunes: btw, offline compiler uses positive indices for temporaries for ideas-lamp-lit.frag in some cases

21:37 <anarsoul> my guess is that it's similar to python, i.e. positive indices address it from beginning, negative - from the end

21:44 <anarsoul> also it uses sum3()

21:44 <anarsoul> and relative indexing of temporaries

21:52 <anarsoul> enunes: why do you create load node in ppir_update_spilled_dest()?

21:57 <enunes> anarsoul: I think it's required for when there is a register and some instructions only update some components of the register, the other components need to be preserved

22:02 <enunes> I suppose at this optimization stage it might be avoidable too if it's ssa and not register

22:05 <anarsoul> oh

22:37 drod has quit [Quit: Ухожу я от вас (xchat 2.4.5 или старше)]

22:38 jrmuizel has quit [Remote host closed the connection]

22:49 Da_Coynul has joined #lima

22:57 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

22:58 Da_Coynul has joined #lima