Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
cp has quit [Quit: Disappeared in a puff of smoke]
nerdboy has quit [Ping timeout: 246 seconds]
cp has joined #lima
jrmuizel has joined #lima
yuq825 has joined #lima
kaspter has quit [Read error: Connection reset by peer]
kaspter has joined #lima
dddddd has joined #lima
kaspter has quit [Quit: kaspter]
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
Barada has quit [Quit: Barada]
yuq825 has quit [Ping timeout: 245 seconds]
jrmuizel has joined #lima
<rellla>
enunes: i can't remember if you did already, but can you guide through how to setup shader-db for lima? i'm a bit lost ...
<enunes>
rellla: there should be no setup needed, you just use the 'run' application from https://gitlab.freedesktop.org/mesa/shader-db , collect stdout and then 'report.py' on the pc to generate a comparison report between two runs
<enunes>
just like the readme on that page
<rellla>
enunes: thanks. now that i read it more times and simply tried it, it works :)
<enunes>
I haven't been using the shaders from shader-db itself as they are rather complicated, so I just captured all piglit shaders using the method from the readme too
<anarsoul>
enunes: it would be a good idea to add glamor shaders
<anarsoul>
:)
nerdboy has joined #lima
<enunes>
anarsoul: yeah maybe it is even there, I think the main reason those more complicated shaders were unusable yet is that all of them just aborted due to missing control flow, now we can probably start using them
nerdboy has quit [Changing host]
nerdboy has joined #lima
drod has joined #lima
drod has quit [Ping timeout: 248 seconds]
drod has joined #lima
<anarsoul>
enunes: btw, can you publish MR with your improved spilling?
<enunes>
anarsoul: I can make it WIP, there is still 1 spot without saving the instruction creation
<anarsoul>
sounds good
<anarsoul>
I'm planning to work on merging load.v and load.u into successors this week
<anarsoul>
and I'd prefer to base it on top of your branch
<enunes>
ok
<enunes>
I'm not sure if it would be trivial to remove the remaining spot without merging the instruction, due to the forced vec4 spilling that we have right now
<enunes>
I'd be ok merging the ones I already have anyway before we have that
<anarsoul>
is it possible to detect it, marking it with 'no_spill' and try another reg?
<enunes>
I guess so, we could also try to only attempt the optimization if it is already vec4
<anarsoul>
it makes no sense to spill if it increases reg pressure anyway
<enunes>
the tricky part of having non-vec4 spills is that if I recall correctly there is no vec3 store, only vec1 vec2 and vec4, so some exception handling needs to happen anyway
<enunes>
but doing something else for work now, I'll just push it so you can also try to use it and work on it later
<anarsoul>
ok
alyssa has joined #lima
<alyssa>
rellla: Ya rang?
<alyssa>
On Midgard, lowering to #0 is ~free since we get a free inline constant and 0 always inlines
<alyssa>
I guess you don't have that for PP so that changes a bit.
<alyssa>
OTOH, a lot of f(x, 0) can be optimized for all sorts of f, so it might still be a win
<alyssa>
fadd/fsub/fmul/iand/ixor/ior/inot/csel/fdot/fsum.... all go away/simplify with a zero
<anarsoul>
alyssa: using any reg for undef is completely free
<alyssa>
Anyway, I was less interested in the opt as much as eliminating UB in the output, which is not required by the spec but makes debugging easier
<anarsoul>
and no need to take const slot for that
<alyssa>
anarsoul: Run opt_algebraic/const fold after lowering to zero... cheaper than free..
<enunes>
looks like it even fixes a few tests, not sure how
<enunes>
and 3 timeouts which presumably were inifinite loops without the first commit
<anarsoul>
enunes: I think we should improve spilling cost once again
<enunes>
I started working on something on that direction
<enunes>
to increase cost of registers in instructions that have load or store slots filled
<anarsoul>
basically walk through all instructions and mark regs that are used in instructions with busy load uniform and store temporary slots as unspillable
<anarsoul>
yeah, exactly
<anarsoul>
I don't think that it makes sense to spill these anyway
<anarsoul>
or even better:
<anarsoul>
walk through instructions once calculating register pressure at each instruction and assign it to some field in instruction
<anarsoul>
if instruction is at max reg pressure and has uniform and store slots taken, mark its regs as unspillable
<anarsoul>
(that's in 2nd pass)
<enunes>
not sure if mark it spilled as the 'unspillable' information currently keeps across regalloc attempts and that might change
<enunes>
so maybe just add a large cost to them
<anarsoul>
I think it does
<enunes>
I mean, the 'unspillable' flag probably shouldnt be set based on information that might change (max register pressure) as max register pressure may change between regalloc attempts
<enunes>
but we can probably achieve the same result setting a high cost, as the cost varies between runs
<anarsoul>
yeah, maybe
<anarsoul>
anyway, creating new instructions and using a reg for loading or storing temporaries increases reg pressure in already bottlenecked place
<anarsoul>
enunes: next step would be to clone load.v and load.u for each user
<enunes>
I thought we were already doing that from the cf MR?
<anarsoul>
no, we still create a mov
<anarsoul>
but we clone them for each block
<enunes>
that should be fine as the created register is only alive for 2 instructions
<anarsoul>
we need to clone them for each instruction when possible
<anarsoul>
enunes: that's a disaster for texture coords anyway
<enunes>
same for the instruction in the created reg for spilling
<anarsoul>
we've got only 10 bits of precision if it gets stored into a reg
<enunes>
for some reason ideas needs ~20 registers from top to bottom
<enunes>
even across blocks
<enunes>
I saw the blog your pasted, interesting indeed
<anarsoul>
enunes: yeah, look at glsl source and you'll understand why :)
<anarsoul>
enunes: I actually hit this issue with X11 or any wayland compositor :)
<anarsoul>
I thought that it was caused by my recent change that introduced a struct for texture descriptor
<anarsoul>
(and I actually found a bug there - but it's not related)
<enunes>
I think I removed the optimization where we had the special case for load tex coords
<anarsoul>
enunes: btw, offline compiler uses positive indices for temporaries for ideas-lamp-lit.frag in some cases
<anarsoul>
my guess is that it's similar to python, i.e. positive indices address it from beginning, negative - from the end
<anarsoul>
also it uses sum3()
<anarsoul>
and relative indexing of temporaries
<anarsoul>
enunes: why do you create load node in ppir_update_spilled_dest()?
<enunes>
anarsoul: I think it's required for when there is a register and some instructions only update some components of the register, the other components need to be preserved
<enunes>
I suppose at this optimization stage it might be avoidable too if it's ssa and not register
<anarsoul>
oh
drod has quit [Quit: Ухожу я от вас (xchat 2.4.5 или старше)]
jrmuizel has quit [Remote host closed the connection]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]