jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
dddddd has quit [Remote host closed the connection]
deesix has quit [Ping timeout: 252 seconds]
deesix has joined #lima
jrmuizel has quit [Remote host closed the connection]
Wizzup has quit [Ping timeout: 248 seconds]
Wizzup has joined #lima
Wizzup has quit [Ping timeout: 272 seconds]
Wizzup has joined #lima
Wizzup has quit [Ping timeout: 245 seconds]
Wizzup has joined #lima
BenG83 has quit [Ping timeout: 248 seconds]
Barada has joined #lima
Barada has quit [Read error: Connection reset by peer]
hoijui has joined #lima
Barada has joined #lima
guillaume_g has joined #lima
hoijui has quit [Quit: Leaving]
adjtm has quit [Ping timeout: 245 seconds]
cwabbott has joined #lima
senilius has joined #lima
ggardet has joined #lima
adjtm has joined #lima
guillaume_g has quit [Ping timeout: 244 seconds]
<senilius>
Even though the info maybe looking vague to you, it is the hardware which determines the behavior. Hence inter-CTA last lsu access wins, intra-CTA last lsu access fails from the two in done consequence/back to back.
<senilius>
highly deterministic and not very black magic right, but then fences need not to be used.
<senilius>
since that is deterministic using texture unit samplers and without one can instruct the chip to schedule with using few logic to do cache-distance based selection on a wfid row
<senilius>
i forgat to mention that those two accesses go for the very same location of course
<senilius>
then a FU becomes pinned to queues, and instruction is preloaded to register file to certain regs as indices
ggardet has quit [Quit: Konversation terminated!]
ggardet has joined #lima
ggardet is now known as guillaume_g
<senilius>
even though some earlier chips are missing address calculator stride and base components, they still have index buffer arrays.
rabend has joined #lima
senilius has quit [Read error: Connection reset by peer]
rabend has quit [Ping timeout: 250 seconds]
afaerber has joined #lima
<rellla>
omg, he's back.
Hell__ has joined #lima
Hell__ has left #lima [#lima]
Hellsenberg is now known as hellsenberg
<hellsenberg>
oh dear
libv has quit [Ping timeout: 258 seconds]
libv has joined #lima
jrmuizel has joined #lima
dddddd has joined #lima
jrmuizel has quit [Remote host closed the connection]
adjtm has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
monstr has joined #lima
jbrown has quit [Ping timeout: 258 seconds]
jbrown has joined #lima
Barada has quit [Quit: Barada]
monstr has quit [Remote host closed the connection]
adjtm has joined #lima
sibelius has joined #lima
<sibelius>
this index buffer stuff is more complex than using indirections having looked at the glsl_to_tgsi.cpp, be that abondened or not, just for reference for now, nicely documented.
<sibelius>
it very much looks like hardware potention is intentionally handycapped by sw always, pretty enourmous stunts there to do that.
<sibelius>
where comparison being, running hw on the very first gear or so to speak lobotomized way when kicking human terms in.
adjtm has quit [Quit: Leaving]
<sibelius>
open source movement is doing a bad favor by following commercial stacks and the spec, actually there are bunch of talented or hardworking men around amongst you, probably they allready understand what the hw is actually capable of, it is just making hard for the users especially the communication with propriatery hw vendors
sibelius has quit [Quit: Leaving]
guillaume_g has quit [Quit: Konversation terminated!]
sibelius has joined #lima
<sibelius>
it drops the second access on intra-CTA cause one compute unit or SM equivalent only has single LSU, and the address calculator is done with continous assignment, however two accesses to the same location going to different CUs will apply the second access instead.
<sibelius>
this can be demonstrated with a little verilog program. LSUs are instanced for global memory in CUDA for example, even though there are more texture units then global memory LSUs per SM clusters, texture based access is deprecated.
<anarsoul>
sibelius: get lost
<sibelius>
*than, but anyhow this is the major clue or hw inherent inevitably avoidable trick to real scheduling with some changes.
<sibelius>
*unavoidable
<sibelius>
this year i may do the prototype and guidelines how to lift various drivers to the glory, however i get exhausted too, tired about this all my own, can not really waste so much time anymore.
drod has joined #lima
sibelius has quit [Remote host closed the connection]
rabend has joined #lima
drod has quit [Read error: Connection reset by peer]
drod has joined #lima
rabend has quit [Quit: Leaving]
adjtm has joined #lima
afaerber has quit [Quit: Leaving]
jrmuizel has quit [Remote host closed the connection]
drod has quit [Read error: Connection reset by peer]