<enunes>
could still check the last patch for a missing dep in ppir scheduler, but too late for today
<anarsoul>
yeah, last patch looks fishy
<anarsoul>
could you submit an MR with first 3 patches?
<anarsoul>
feel free to add my r-b to 1-3
<anarsoul>
(you also may need to fix deqp-lima-fails.txt)
<enunes>
I suppose deqp-lima-fails.txt is done per-patch? or are people just adding a patch on top of the series with overall fixes?
<anarsoul>
per-patch is cleaner
<anarsoul>
but probably more time consuming
<enunes>
yeah in that case I guess tomorrow I'll do that
<enunes>
I guess I'll submit anyway to see if it finds the same thing as my run
<anarsoul>
you can just push 3 branches
<anarsoul>
gather unexpected passes
<anarsoul>
and then finish it tomorrow :)
<anarsoul>
if you push a branch prefixed with ci- you can start jobs you need manually
<anarsoul>
see "pipelines" in your repo
<anarsoul>
enunes: btw I'm not sure how we're going to add depth/stencil writes to ppir
<enunes>
I never looked into that either
<anarsoul>
so currently we're using $0 for gl_FragColor
<anarsoul>
technically it can be any other register, number of reg is configured in rsw
<anarsoul>
(I wrote about it few months ago - accidentally discovered while implementing dithering)
<enunes>
yeah the way we use $0 is pretty weird, we just leave it completely out of regalloc which leaves it at 0
<anarsoul>
yeah
<anarsoul>
so I assume for depth/stencil it's similar
<enunes>
I considered adding other register classes to output registers and ensure they are what we want
<anarsoul>
we configure which reg to use in rsw
<anarsoul>
and it's written as depth/stencil from this register when shader is terminated
<anarsoul>
but
<anarsoul>
ppir assumes that st_col is last op
<anarsoul>
and it won't be true anymore with st_zs
<enunes>
I investigated whether we could get rid of st_col completely and just mark registers as output registers and use that to end the program
<anarsoul>
yeah, that's what we need
<anarsoul>
or something similar
<enunes>
but there are all sorts of corner cases with discard or jumping to that st_col, so it needs to exist
<enunes>
I guess having register classes for output and labeling the registers properly to make regalloc assign the registers we want to them doesnt require a major redesign
<anarsoul>
I think we still have to get rid of requirement it being the last node
<anarsoul>
enunes: whichever register works.
<anarsoul>
it doesn't have to be reg 0 for color
<anarsoul>
I already tried that by changing it to 1
<anarsoul>
I guess we should just adjust its liveness accordingly (from instr to the end), let regalloc do its job and then provide the number to upper level
<enunes>
that should work too
<anarsoul>
there's not a lot of unused bits in rsw
<anarsoul>
reg number is 4 bits
<anarsoul>
:)
<enunes>
sounds good, do you see this as more priority than ppir general improvements like scheduler?
<anarsoul>
so if you change "render->multi_sample = 0x0000F807" to 0x1111F807, gl_FragColor will be taken from $1
<anarsoul>
enunes: well, I think that it may have influence on scheduler redesign
<enunes>
must look nice taking color from $1 with the current implementation
<anarsoul>
:D
<anarsoul>
try it
<anarsoul>
it's like white noise projected on something
<anarsoul>
enunes: looks like your change isn't based on master
<anarsoul>
rebase? :)
<anarsoul>
(it won't run lima jobs in CI)
<enunes>
yeah that kinda works, some programs just show as red and others have some random noise
<anarsoul>
I asked Adam to share LIMA_DEBUG=dump,gp for the same mesa that I have (at 3abfde13be198449230e48c5f277e0b62a0e96c4)
<anarsoul>
I generated mine in archlinux chroot
<anarsoul>
shader binary for in my case is ~30% larger
<anarsoul>
let's see what we get from nir :)
<anarsoul>
fragment shader is the same
<anarsoul>
(but it's expected since it's trivial)
<enunes>
so is there a plausible explanation for why it would be different in your 32-bit chroot and in theirs?
<anarsoul>
not yet
<anarsoul>
waiting for them to send the logs with exactly the same mesa as I'm using
yann has quit [Ping timeout: 265 seconds]
<enunes>
anarsoul: yeah I'm not seeing any easy way around that problem, nir dce doesn't detect it as dead code while still on ssa form because of vec4 instruction and after it converts vec4 to reg movs, nir dce doesn't work on non-ssa regs
<enunes>
it doesn't really matter where ppir scheduler place it
<enunes>
instruction 6 is a dead load just like those ones, since nobody conflicts with it, regalloc assigns any register to it, which may be, say $0, which might contain some actual value
<anarsoul>
but it's a bug in regalloc then
<anarsoul>
so let's look at nir
<enunes>
well yeah the algorithm for liveness analysis assumes that dead code like that doesn't exist, which is pretty much always true
<enunes>
or we can detect these corner cases and put them in a live set, so regalloc won't assign a live register for it
<anarsoul>
so I don't really understand where the conflict is coming from
<anarsoul>
are you calculating per-component live sets for registers?
<anarsoul>
even if you do
<anarsoul>
r0.y conflicts with r0.x
<anarsoul>
since r0.x is live 9-13
<enunes>
registers are carried on by masks
<enunes>
and analysis is backwards
<enunes>
so something at the next instruction needs to read it so it's part of that instruction live_in
<anarsoul>
but you also need to update live_out if something in current instruction writes to the reg
<anarsoul>
well
<enunes>
no because live_in from the next instruction is propagated as live_out to the previous instruction
<enunes>
so it comes live, and dies there when it is written
<anarsoul>
let me guess, our granularity doesn't allow us to specify this live set? :)
<enunes>
it's just the life cycle starts when it is read, and we propagate it up until it is written
<anarsoul>
what if it's written several times?
<anarsoul>
enunes: hold on
<anarsoul>
what happens if we read and write in the same instruction?
<enunes>
the read will prevail and it will stay in the live set
<enunes>
until the "first" write
<anarsoul>
but write is in this instr
<enunes>
but it was never read, since it runs backwards it was never part of any set
<anarsoul>
I guess write should also start life cycle but at the same time end it
<enunes>
it's what my patch does pretty much
<enunes>
but I can't put it into live_in otherwise it will be propagated up looking for another write
<anarsoul>
but what do you do if we have multiple writes?
<anarsoul>
and single read
<enunes>
I think then ppir pretty much all broken, not sure it ever happens in nir
<enunes>
I think nir will just coalesce them and put a phi node
<anarsoul>
that's for ssa
<anarsoul>
but phi is then converted to a reg
<anarsoul>
enunes: we definitely can have multiple writes to a reg from different blocks
<champagneg>
hi, i'm still using arm's libmali, but I was wondering if it's normal/expected to see up to 3 "PLBU needs more heap memory" interrupt per frame? If it is not, is this something lima would fix?
<anarsoul>
otherwise it would be ssa, not nir reg
<anarsoul>
champagneg: we don't provide support for the blob
<anarsoul>
ask ARM
<enunes>
I tried to force that but nir always resolves it into a single write, and we can also have cases with multiple writes in case of loops, but liveness is designed to handle that
<enunes>
in that case it's not ssa
<champagneg>
anarsoul. yeah, I get that. I was wondering if anyone with more architecture knowledge would know if lima currently does this back and forth for allocate more mem for the PLBU.
<anarsoul>
something like: uniform int a; main() { float f; if (a == 1) { f = 1.0 } else { f = 2.0}; gl_FragColor = vec4(f, 0, 0, 1.0); } should do multiple writes
<anarsoul>
enunes: ^^
<anarsoul>
champagneg: currently we allocate 1mb for tile heap, there're kernel and mesa patches to support growing heap (up to 16mb)
<enunes>
anarsoul: that is far too simple it resolves to a fcsel after the if