ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
jrmuizel has quit [Remote host closed the connection]
nerdboy has quit [Ping timeout: 245 seconds]
nerdboy has joined #lima
jrmuizel has joined #lima
nerdboy has quit [Excess Flood]
nerdboy has joined #lima
jrmuizel has quit [Ping timeout: 245 seconds]
Da_Coynul has joined #lima
jrmuizel has joined #lima
Da_Coynul has quit [Client Quit]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
yuq825 has joined #lima
jrmuizel has quit [Remote host closed the connection]
Da_Coynul has joined #lima
megi has quit [Ping timeout: 246 seconds]
nerdboy has quit [Changing host]
nerdboy has joined #lima
yuq8251 has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
yuq825 has quit [Ping timeout: 272 seconds]
kaspter has joined #lima
camus has joined #lima
kaspter has quit [Ping timeout: 244 seconds]
camus is now known as kaspter
<anarsoul> enunes: yeah, we need to insert uniforms to every successor
<anarsoul> I read some guide on ARM site re: Mali400 and it's essentially free to read uniform and/or varying in every instruction
<anarsoul> so we should do that when possible
<anarsoul> that should reduce register pressure for sure as well as instructions count
<anarsoul> basically use the same approach as with consts
<anarsoul> but it's a bit more tricky
<anarsoul> we'll have to create movs in scheduler (or rather in node_to_instr)
<anarsoul> it's possible and I think I know how to do that
<anarsoul> just need another refactoring to split out code that creates moves into a helper
<anarsoul> and then use this helper in scheduler
dddddd has quit [Remote host closed the connection]
<anarsoul> enunes: yuq8251: btw, I'm having an issue described here in sway, weston or X11: https://community.arm.com/developer/tools-software/graphics/f/discussions/9384/incorrect-drawing-image-with-wayland-on-mali400
<anarsoul> looks like we must load varying and fetch texture in the same instruction
<anarsoul> otherwise fp16 precision is not enough for accurate sampling
<anarsoul> I'll look into it
Barada has joined #lima
Elpaulo has quit [Ping timeout: 258 seconds]
Elpaulo has joined #lima
megi has joined #lima
<wens> bad memory access in drm_gem_fence_array_add() when using lima # https://pastebin.com/xhxnrt2U
<wens> this is with sunxi-next ( 2f2e616b03d6 )
nerdboy has quit [Ping timeout: 248 seconds]
<wens> and mesa HEAD ( 4379dcc12d3 )
nerdboy has joined #lima
gtucker has joined #lima
yuq8251 has quit [Quit: Leaving.]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
cp has quit [Quit: Disappeared in a puff of smoke]
nerdboy has quit [Ping timeout: 246 seconds]
cp has joined #lima
jrmuizel has joined #lima
yuq825 has joined #lima
kaspter has quit [Read error: Connection reset by peer]
kaspter has joined #lima
dddddd has joined #lima
kaspter has quit [Quit: kaspter]
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
Barada has quit [Quit: Barada]
yuq825 has quit [Ping timeout: 245 seconds]
jrmuizel has joined #lima
<rellla> enunes: i can't remember if you did already, but can you guide through how to setup shader-db for lima? i'm a bit lost ...
<enunes> rellla: there should be no setup needed, you just use the 'run' application from https://gitlab.freedesktop.org/mesa/shader-db , collect stdout and then 'report.py' on the pc to generate a comparison report between two runs
<enunes> just like the readme on that page
<rellla> enunes: thanks. now that i read it more times and simply tried it, it works :)
<enunes> I haven't been using the shaders from shader-db itself as they are rather complicated, so I just captured all piglit shaders using the method from the readme too
<anarsoul> enunes: it would be a good idea to add glamor shaders
<anarsoul> :)
nerdboy has joined #lima
<enunes> anarsoul: yeah maybe it is even there, I think the main reason those more complicated shaders were unusable yet is that all of them just aborted due to missing control flow, now we can probably start using them
nerdboy has quit [Changing host]
nerdboy has joined #lima
drod has joined #lima
drod has quit [Ping timeout: 248 seconds]
drod has joined #lima
<anarsoul> enunes: btw, can you publish MR with your improved spilling?
<enunes> anarsoul: I can make it WIP, there is still 1 spot without saving the instruction creation
<anarsoul> sounds good
<anarsoul> I'm planning to work on merging load.v and load.u into successors this week
<anarsoul> and I'd prefer to base it on top of your branch
<enunes> ok
<enunes> I'm not sure if it would be trivial to remove the remaining spot without merging the instruction, due to the forced vec4 spilling that we have right now
<enunes> I'd be ok merging the ones I already have anyway before we have that
<anarsoul> is it possible to detect it, marking it with 'no_spill' and try another reg?
<enunes> I guess so, we could also try to only attempt the optimization if it is already vec4
<anarsoul> it makes no sense to spill if it increases reg pressure anyway
<enunes> the tricky part of having non-vec4 spills is that if I recall correctly there is no vec3 store, only vec1 vec2 and vec4, so some exception handling needs to happen anyway
<enunes> but doing something else for work now, I'll just push it so you can also try to use it and work on it later
<anarsoul> ok
alyssa has joined #lima
<alyssa> rellla: Ya rang?
<alyssa> On Midgard, lowering to #0 is ~free since we get a free inline constant and 0 always inlines
<alyssa> I guess you don't have that for PP so that changes a bit.
<alyssa> OTOH, a lot of f(x, 0) can be optimized for all sorts of f, so it might still be a win
<alyssa> fadd/fsub/fmul/iand/ixor/ior/inot/csel/fdot/fsum.... all go away/simplify with a zero
<anarsoul> alyssa: using any reg for undef is completely free
<alyssa> Anyway, I was less interested in the opt as much as eliminating UB in the output, which is not required by the spec but makes debugging easier
<anarsoul> and no need to take const slot for that
<alyssa> anarsoul: Run opt_algebraic/const fold after lowering to zero... cheaper than free..
<alyssa> fsub(fadd(x, ssa_undef), 1) -> fsub(fadd(x, 0), 1) -> fsub(x, 1)
<alyssa> saved an instruction without using any const slots
<anarsoul> we had a discussion re: undef with cwabbott a week ago and he convinced me that using any reg is better
<alyssa> anarsoul: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1767#note_214311 seems to be pretty clearly a win
<alyssa> (Notice big shaders get crushed into nothing since the flush-to-zero allows further opts to happen that you don't get otherwise)
<alyssa> Rewriting to zero at codgen time is certainly a bad idea (because of const slots).. rewriting at NIR opt time I think is totally reasonable
<alyssa> csel(c, b, undef) -> csel(c, b, 0) -> iand(c, b) .... for another example for Midgard
<anarsoul> fair enough
niceplace has quit [Ping timeout: 245 seconds]
niceplace has joined #lima
<rellla> i will check shader-db with all piglit tests as i think that there are 9 tests that failed due to missing undef handling.
<rellla> ... tomorrow
<anarsoul> rellla: sounds good
<rellla> if we still get no hurts, we should take the nir pass imho
<alyssa> Would certainly help me :)
<rellla> the possible follow-up opts should be convincing ;)
<rellla> though one should not get undef instr with a “good“ shader anyway.
drod has quit [Read error: Connection reset by peer]
drod has joined #lima
<anarsoul> alyssa: nice to see you on #lima channel btw :)
<alyssa> anarsoul: rellla linked it
<anarsoul> enunes: looking at it :)
<enunes> no piglit regressions too
alyssa has left #lima [#lima]
<enunes> looks like it even fixes a few tests, not sure how
<enunes> and 3 timeouts which presumably were inifinite loops without the first commit
<anarsoul> enunes: I think we should improve spilling cost once again
<enunes> I started working on something on that direction
<enunes> to increase cost of registers in instructions that have load or store slots filled
<anarsoul> basically walk through all instructions and mark regs that are used in instructions with busy load uniform and store temporary slots as unspillable
<anarsoul> yeah, exactly
<anarsoul> I don't think that it makes sense to spill these anyway
<anarsoul> or even better:
<anarsoul> walk through instructions once calculating register pressure at each instruction and assign it to some field in instruction
<anarsoul> if instruction is at max reg pressure and has uniform and store slots taken, mark its regs as unspillable
<anarsoul> (that's in 2nd pass)
<enunes> not sure if mark it spilled as the 'unspillable' information currently keeps across regalloc attempts and that might change
<enunes> so maybe just add a large cost to them
<anarsoul> I think it does
<enunes> I mean, the 'unspillable' flag probably shouldnt be set based on information that might change (max register pressure) as max register pressure may change between regalloc attempts
<enunes> but we can probably achieve the same result setting a high cost, as the cost varies between runs
<anarsoul> yeah, maybe
<anarsoul> anyway, creating new instructions and using a reg for loading or storing temporaries increases reg pressure in already bottlenecked place
<anarsoul> enunes: next step would be to clone load.v and load.u for each user
<enunes> I thought we were already doing that from the cf MR?
<anarsoul> no, we still create a mov
<anarsoul> but we clone them for each block
<enunes> that should be fine as the created register is only alive for 2 instructions
<anarsoul> we need to clone them for each instruction when possible
<anarsoul> enunes: that's a disaster for texture coords anyway
<enunes> same for the instruction in the created reg for spilling
<anarsoul> we've got only 10 bits of precision if it gets stored into a reg
<enunes> for some reason ideas needs ~20 registers from top to bottom
<enunes> even across blocks
<enunes> I saw the blog your pasted, interesting indeed
<anarsoul> enunes: yeah, look at glsl source and you'll understand why :)
<anarsoul> enunes: I actually hit this issue with X11 or any wayland compositor :)
<anarsoul> I thought that it was caused by my recent change that introduced a struct for texture descriptor
<anarsoul> (and I actually found a bug there - but it's not related)
<enunes> I think I removed the optimization where we had the special case for load tex coords
<anarsoul> enunes: btw, offline compiler uses positive indices for temporaries for ideas-lamp-lit.frag in some cases
<anarsoul> my guess is that it's similar to python, i.e. positive indices address it from beginning, negative - from the end
<anarsoul> also it uses sum3()
<anarsoul> and relative indexing of temporaries
<anarsoul> enunes: why do you create load node in ppir_update_spilled_dest()?
<enunes> anarsoul: I think it's required for when there is a register and some instructions only update some components of the register, the other components need to be preserved
<enunes> I suppose at this optimization stage it might be avoidable too if it's ssa and not register
<anarsoul> oh
drod has quit [Quit: Ухожу я от вас (xchat 2.4.5 или старше)]
jrmuizel has quit [Remote host closed the connection]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima