<enunes>
I thought about moving it to the panfrost/shared directory, rename it to panfrost
<enunes>
probably turn it into a mul 1/pi which is easier to implmeent
<enunes>
any thoughts?
<alyssa>
enunes: \o
<alyssa>
I mean, if you don't already have a lima-specific nir_algebraic.py, you should definitely start one; it's great :)
<alyssa>
It looks like you need 1/tau instead of 1/pi?
<enunes>
hmm right, we have 1/2pi...
<alyssa>
enunes: Anyway, it doesn't sound like we need to share code per se
<alyssa>
Just do your own lima_nir_algebraic.py with the pass you need :)
<enunes>
yeah that makes more sense for now
<alyssa>
(There's delightfully little boilerplate needed... just the python which you can cargo cult from us / freedreno, some meson magic, and then you just #include and call it as a NIR opt)
<alyssa>
Let me know if you have any questions! :)
<enunes>
sure, at my first glance it seemed like it was the same
<alyssa>
enunes: BTW, looking forward to meeting y'all at XDC? :)
<enunes>
alyssa: yeah I'll attend this year, looking forward too
<alyssa>
\o/
<alyssa>
Now, back to our usual channel of ALYSSA BANGING HER HEAD AGAINST A WALL (aka opencl):
<alyssa>
I added a pass to string masks thorugh so addresses are only vec2 now
<alyssa>
Unfortunately, it's still totally broken since addresses are 64-bit and need to be 64-bit aligned, which RA doesn't know about yet
<alyssa>
So we'll need to teach RA about alignment for 64-bit.. but first we'll need to string 64-bit support through in the first place, good golly.
<alyssa>
I'm thinking I should rebase and try to cleanup first but.. hmm
<alyssa>
I'm in for a world of pain when I need to support 8/16 bit RA, huh :(
* alyssa
added a hack
<alyssa>
We'll deal with this properly later; I just want to make forward progress rn
<alyssa>
(Not a hack to be pushed to master, ever, just so I can proceed on my local branch)
<alyssa>
DOH~!
<alyssa>
I just realized the trick for load/store
<alyssa>
Those extra bits aren't specifying register *size*, they're specifying a register *shift*
<alyssa>
This is Arm we're talking about, of course they'll add barrel shifters to random places ;D
<alyssa>
HdkR: ^^
* alyssa
verifies
<alyssa>
Hrm
<alyssa>
I see it generating shifts/muls/etc, despite there being no real reason to... so maybe it's not a shift
<alyssa>
Or maybe the compiler just didn't anticipate this pattern
<alyssa>
Thinking the latter tbh
<alyssa>
Regardless that's defn what it is
<alyssa>
3 bits up there
<alyssa>
Hm, case not totally closed..
yann has joined #panfrost
<alyssa>
Because for some reason, the addresses themselves are being shifted too...
<alyssa>
------Oh
<alyssa>
I have to think *even more Arm*
<alyssa>
Barrel shifters only work on the second source.
<alyssa>
Right.
NeuroScr has joined #panfrost
adjtm has joined #panfrost
afaerber has joined #panfrost
<alyssa>
Headmashing succcessful.
<alyssa>
Indirect SSBO accesses work now.
<HdkR>
alyssa: offset scale is very useful :)
<alyssa>
HdkR: You think useful, I think another 150 lines of backend opt pass to make use of it :(
<HdkR>
:)
<alyssa>
HdkR: It occurs to me there's probably someone out there who's doing compute stuff in vertex shaders with SSBO + rasterizer discard
<alyssa>
Tho TBF vertex and compute shaders are almost identical on Midgard so maybe it's reasonable :P
<HdkR>
Could do
<HdkR>
Not sure what you would be testing in the vertex stage that regular culling doesn't solve though
<HdkR>
Also AAA games typically do a compute pass first before passing data over to VSPS these days :P
<HdkR>
But this is just because AMD's geometry pipeline has been relatively low performing
<alyssa>
Blop
<HdkR>
(Navi fixes that problem)
<HdkR>
Supposedly 4x throughput improvement in best case
<alyssa>
Hmmm
<bnieuwenhuizen>
HdkR: woah, source?
stikonas has quit [Remote host closed the connection]
<HdkR>
ah. Some random AMD engineer who probably shouldn't have said that :P
<bnieuwenhuizen>
:P
raster has quit [Remote host closed the connection]