<rellla>
i think all of the 6 "regressions" accidently passed before and simply expose the bugs now, so i think we have no real regressions.
<enunes>
rellla: do you see new fixes with those patches too?
<enunes>
still, my series still needs to be improved and it's not completely clear why they fix some of the tests, so I still need to work on them before MR
<anarsoul>
rellla: enunes: so I used lima with desktop apps a bit yesterday
<anarsoul>
and I'd say we have *huge* frame latency
<anarsoul>
it's noticable with mpv playing some video
<anarsoul>
mpv shows no drops and video is smooth enough
<anarsoul>
but there's a/v desync
<anarsoul>
i.e. video is a bit late
<anarsoul>
I'd say it's *at least* couple frames late if I can notice that
megi has quit [Ping timeout: 272 seconds]
megi has joined #lima
buzzmarshall has quit [Remote host closed the connection]
<anarsoul>
btw, q3a has quite interesting artefacts with gl2 renderer
<enunes>
anarsoul: hmm it seems we might have to assign registers to undefs, we create undefs when they are read before write, but it may be in a loop so after the first iteration they are not undef anymore
<anarsoul>
oh
<anarsoul>
hold on
<anarsoul>
undef is ssa
<anarsoul>
it can't be assigned in loop
<enunes>
we also create ppir_op_undef in ppir_node_add_src
<enunes>
/* Reg is read before it was written, create a dummy node for it */
<enunes>
this seems to happen in that regression
<anarsoul>
ah
<anarsoul>
then we need to split ppir_op_undef and ppir_op_dummy
<anarsoul>
and use ppir_op_dummy in this case that would get a reg
<anarsoul>
looks like it wasn't a good idea to merge them
<enunes>
can we still treat this case as undef or dummy? it seems to require an actual register like anything else
<enunes>
since the value needs to be saved
<anarsoul>
no, this case is dummy
<anarsoul>
not undef
<enunes>
it's written later on and the next read will be valid
<anarsoul>
dummy needs register to be assigned
<anarsoul>
undef does not
<enunes>
so what is the difference between dummy and "normal"?
<anarsoul>
"dummy" is just a placeholder for ppir_dest since we don't actually have a node for src yet
<anarsoul>
yet we need to allocate a register for it
<enunes>
well it is defined in the beginning of the program as a nir register so I think we have space for it
<anarsoul>
"undef" is also a placeholder for ppir_dest, but we don't need a register for it (any register would do)
<anarsoul>
enunes: that's just implementation details
<anarsoul>
we keep ppir_node pointer in src to the node that does write
<anarsoul>
but for "dummy" we don't have this node yet
* rellla
feels guilty :p
<enunes>
I think that pointer is only valid for ssa
<anarsoul>
is it?
<anarsoul>
enunes: feel free to rewrite it if you have better solution in mind :)
<enunes>
I actually just found, this discussion was good as at least I understand why that part is there
<enunes>
just found that out*
<enunes>
need to check if we actually pre-reserve space for the nir declared registers/ssa
<enunes>
if so (or if we can do that) then we might leave undefs as-is but not use then in this case
<anarsoul>
enunes: oh, we need a node to add dependency if necessary
<anarsoul>
nope, we don't
<anarsoul>
in theory you can change "ppir_node_target_assign" to accept ppir_dest instead
<anarsoul>
and deal without undef for read before write
<anarsoul>
I'm not sure if it'll break anywhere else
<enunes>
I'll proceed looking into it
<anarsoul>
rellla: don't be, bugs happen :)
<anarsoul>
also it was my suggestion to merge them
<anarsoul>
(undef and dummy)
<rellla>
yeah, i wonder who the reviewers have been :)
<anarsoul>
:P
<anarsoul>
you can't foresee everything
<anarsoul>
and tests didn't catch it
<anarsoul>
Connor's branch gpir fixes OOMs in CI :(
* enunes
wants to completely refactor ppir now too
<anarsoul>
:)
<enunes>
I guess we could go with separating dummy and undef for now... and *then* refactor
<enunes>
otherwise we likely miss 20.0
<anarsoul>
yeah
<enunes>
I guess another "quick" solution would be to emit all nodes first with dest only
<enunes>
and then go through all nodes and resolve src
<anarsoul>
that adds extra pass
<anarsoul>
so I guess variant with splitting dummy and undef should perform better
<enunes>
probably not a big issue with ~100 nodes average
<enunes>
each pass would do half the work anyway
<enunes>
I don't know if separating actually will work though, since the two instructions will end up referring to different ppir_reg
<anarsoul>
it's gonna be the same ppir_reg
<anarsoul>
just use dummy for read before write
<anarsoul>
and undef only for undef
<enunes>
so the read before write will read from the reg in the dummy node, but then the actual write before the read will write to the same dummy node?
<enunes>
sorry, write *after* the read
<anarsoul>
enunes: but the point is that reg in dummy node is supposed to be the same reg
<enunes>
yeah if that actually happens, then it will work
dddddd has quit [Ping timeout: 258 seconds]
<anarsoul>
enunes: it should be the case for register
<anarsoul>
i.e. for nir_register
<anarsoul>
since for ssa it's not possible to have read before write
<enunes>
I was just afraid the dummy node would end up creating a separate ppir_reg than the write uses later
<enunes>
but it's not the case, it will work, I'll try that
<anarsoul>
see "nir_register *reg = ns->reg.reg"
buzzmarshall has joined #lima
gcl has quit [Ping timeout: 265 seconds]
anarsoul has quit [Remote host closed the connection]