<cwabbott>
rellla: no, that assumption is correct... it's only talking about within a single instruction, in the case where you have something like r5 = add r5, r6
<cwabbott>
that gets expanded to something like "read r5; read r6; add; write r5"
<cwabbott>
it assumes that the write gets placed before the read, which is the case because we schedule bottom-up
<cwabbott>
anarsoul|c: and yes, you can have two writes to different things that point to the same ALU slot, and in that case the disassembler will print out something like that
gcl has quit [Ping timeout: 260 seconds]
yann has joined #lima
gcl has joined #lima
anarsoul|c has quit [Quit: Connection closed for inactivity]
cwabbott has left #lima [#lima]
anarsoul|c has joined #lima
dddddd has joined #lima
Barada has quit [Quit: Barada]
Barada has joined #lima
Barada has quit [Quit: Barada]
Barada has joined #lima
<rellla>
cwabbott: did you mean anarsoul with your first oing?
<rellla>
*ping?
anarsoul|c has quit [Quit: Connection closed for inactivity]
gcl_ has joined #lima
gcl has quit [Ping timeout: 256 seconds]
jonkerj has quit [Remote host closed the connection]
jonkerj has joined #lima
yuq825 has quit [Remote host closed the connection]
Barada has quit [Quit: Barada]
dddddd has quit [Remote host closed the connection]
gcl_ has quit [Ping timeout: 268 seconds]
<anarsoul|2>
rellla: thanks
cwabbott has joined #lima
yann has quit [Ping timeout: 260 seconds]
<rellla>
anarsoul|2: i had an issue in mali_syscall_tracker regarding gp uniform decoding. i'm doing new dumps now including your gpir patch for better readability ...
<anarsoul|2>
cwabbott: thanks for comment on write/read order. What about missing movs? I don't understand how we're tracking dependencies if we don't have a node for mov
<rellla>
btw, we have sum3 in the dumps...
<anarsoul|2>
rellla: heh. So it's some new op?
<anarsoul|2>
that looks like sum3 but not sum3?
<cwabbott>
rellla: iirc sum3 is used for dot products
<cwabbott>
so if you use dot() it should show up
<cwabbott>
(of vec3's obviously)
<anarsoul|2>
cwabbott: see scrollback, rellla found that blob uses another undecoded op
<cwabbott>
anarsoul|2: not sure exactly what you're asking about
<rellla>
cwabbott: so it's not ($1.x + $1.y + $1.z) ?
<cwabbott>
anarsoul|2: yeah, no idea about that one
<anarsoul|2>
op18.v1
<anarsoul|2>
cwabbott: do you by any chance remember why scheduler doesn't expect movs?
<cwabbott>
rellla: if you have a good guess about what it is, you can try to create a simple shader that exhibits it and see if you can get offline-shader-compiler to emit it
<rellla>
see original shader and disassembled code.
<cwabbott>
rellla: my guess is that it's like sum3 but with different precision
<cwabbott>
it's possible that sum3 is actually higher-precision than just naively doing (x + y) + z
<cwabbott>
since otherwise it's the same
<rellla>
having for (int i = 0; i < 4; i++)
<rellla>
{ res += tmp[i];} uses op18 with an additional add $.w on top. taking just 2 components leads to a simple add
<rellla>
seems i need some test to find that out ...
<rellla>
imho there must be a difference, because we have sum3 and op18 in the blob dumps
<cwabbott>
ok, so my hypothesis is probably right then... when the shader uses dot() they take the liberty to compute the intermediate results in higher precision, but when you manually do it they do exactly what the user says
<cwabbott>
and op18 is just a faster (but still low-precision) way of doing exactly what the user says
<rellla>
ok, so op18 could be a lower precised sum3 then, not a dot(), but an add(1,2,3)
<cwabbott>
yeah
<rellla>
ok, understand
<cwabbott>
anarsoul|2: we don't use mov's because they're never necessary before the scheduler
<cwabbott>
there are some situations where they're always necessary (e.g. load directly into store) but the scheduler has to handle that anyways on-the-fly, since that can be generated when spilling
<cwabbott>
so inserting mov's beforehand doesn't actually save any complexity, and making the scheduler handle it would just add more complexity