<HdkR>
Sadly 72 RK3399 boards is a different type of scaling than what I'm looking for :P
<HdkR>
Hoping for sometihng like Neoverse V1 upgrade to the Honeycomb LX2K board. Planning on a couple Orin Jetsons...
<HdkR>
Or N2 I guess
stikonas_ is now known as stikonas
wwilly_ has joined #panfrost
wwilly has quit [Ping timeout: 265 seconds]
gcl has joined #panfrost
camus has joined #panfrost
kaspter has quit [Ping timeout: 265 seconds]
camus is now known as kaspter
<HdkR>
Dang it, now I'm sleepily looking at how to deploy ctest things across a cluster. Nope. nooooope
<amonakov>
on Bifrost, is a "warp" dispatched simultaneously to "lanes", or is it dispatched to one lane on consecutive cycles (and therefore different lanes handle different warps)?
<amonakov>
we've investigated instruction issue on g72 and g76 a bit, and from what we see, each pipe can accept an instruction from a particular thread each 12'th cycle
<amonakov>
which makes sense on g72 with 4-wide warps (cycles over 3 warps per lane), but doesn't on g76 (8-wide warps); apparently on g76 "each 12'th cycle" becomes "each 16'th cycle" in case there's another warp
<amonakov>
this is a bit of a mindfuck, but I suspect this would not be the first time "Mali" and "crazy" go hand in hand
<cwabbott>
amonakov: all lanes are dispatched at the same time, the "every 12'th cycle" bit probably comes from the pipeline depth
<cwabbott>
from the ISA's perspective, each instruction starts immediately after the previous one finishes
<cwabbott>
specically the write stage of an instruction overlaps with the read stage of the next
<amonakov>
cwabbott: but for simultaneous dispatch to work, the core would need to cycle over 12 warps to cover that, and t registers would need to be queues of about 6 entries or so
<cwabbott>
obviously everything in between doesn't happen in one cycle, so instructions for each thread are only dispatched on every n'th cycle where n depends on the pipeline depth (and apparently is 12 on g72) and you need n quads/waves active at the same time to keep it fed
<cwabbott>
yes, that's exactly right
<amonakov>
how can you tell the dispatch is simultaneous and not [the other approach I described]?
<cwabbott>
that's what ARM's docs say
<cwabbott>
also derivatives probably wouldn't work well that way
<cwabbott>
sorry, that's what the anandtech article says
<cwabbott>
(i don't have access to ARM's docs)
<amonakov>
I am not entirely convinced, since then we'd observe that 12 warps are enough to saturate a g76 core, and yet that's not the case
raster has quit [Quit: Gettin' stinky!]
<amonakov>
but we also found a related difficult-to-explain effect on g76, so who knows... I'll try to run more experiments
<amonakov>
one more reason I am having doubts is that on Nvidia selecting a warp is known to have a cost, so it is surprising that Mali would be engineered to select one of 12 warps on each cycle like it's no big deal
raster has joined #panfrost
<amonakov>
(correction, 12*3 warps since there are 3 sets of lanes per core; on g72 36 4-wide warps do saturate)
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
<amonakov>
(and to clarify one more thing, 12 cycles would be the sum of MUL and ADD pipe latencies, not their individual depths)
raster has quit [Quit: Gettin' stinky!]
<alyssa>
amonakov: it also might be that g76 and g72 are dramatically different uarchs
<alyssa>
(I don't know if this is the case)
<alyssa>
there is no 'promise' of stability between major archs
<amonakov>
anandtech covered g76 in another article, and it mostly enumerated differences without saying there was a major redesign
mixfix41 has quit [Ping timeout: 265 seconds]
mixfix41 has joined #panfrost
<alyssa>
amonakov: It's a different major arch (G71/G72 vs everything else) so who knows
<amonakov>
I am not making assumptions about G76 based on G72 data, or vice versa; we ran the same experiments on both
* alyssa
probably misread your results then
<alyssa>
It has been a very long week and it's only Wednesday 😿
<urjaman>
I had that mood already at the end of monday
<icecream95>
wdym it's only Wednesday?
<macc24>
it is wednesday my dudes
<icecream95>
[looks outside] Nope, definitely a Thursday
<macc24>
*looks at clock* 30 minutes of wednesday left for me
<macc24>
i'd surely have more wednesday if i didn't sleep through it
jernej has quit [Ping timeout: 272 seconds]
warpme_ has quit [Quit: Connection closed for inactivity]
jernej has joined #panfrost
jernej has quit [Client Quit]
jernej has joined #panfrost
jschwart has quit [Read error: Connection reset by peer]