stikonas has quit [Remote host closed the connection]
* alyssa
returns
<alyssa>
0x7930 is clearly another F32_TO_I32 variant
<alyssa>
I'm not sure what the distinction is.. maybe handling of NaN / infinity or something
<HdkR>
Rounding direction isn't too uncommon
<alyssa>
HdkR: Ah, yes, could be
<alyssa>
midgard has variants for rounding mode
<HdkR>
If they moved even closer to NEON then maybe we have 4 rounding modes to choose :P
<alyssa>
same behaviour at +inf
<alyssa>
HdkR: we've had 4 modes since midgard t6xx :D
<HdkR>
Perfect
<alyssa>
and yes, can confirm it's round mode :)
<alyssa>
which means there's way more variants. todo: fix packing code for that :'(
<HdkR>
trunc, -inf, +inf, ties even? :D
<alyssa>
sounds right
<alyssa>
rtz, rte, rtn, rtp
<HdkR>
yep yep
<alyssa>
Anyway, interpolating a little bit, that means this is F32_TO_I32.RTE
<alyssa>
So we have
<alyssa>
2^x = opCD58(f32_to_i32.rte(x * 2^24))
<alyssa>
so I guess opCD58 is a lookup table for exp2... but with 24-bits of precision, I guess?
<alyssa>
Odd that there's no fixup needed. Seems.. hm.
<alyssa>
uh, ok. cd58(..) takes that reduction *and* the original x.
<icecream95>
Does it make more sense for performance counter values to be per frame or per second?
<alyssa>
icecream95: Probably depends what you're measuring? But I don't think mali's counters are reliable to per-frame granularity (though I'd love to be proven wrong)
<alyssa>
what does gator do?
<icecream95>
alyssa: This isn't about how often to read the values, but how to scale them.
<alyssa>
Ah
<alyssa>
Probably per second since fps is variable then?
<alyssa>
er
<alyssa>
but the counts will vary with the fps
<alyssa>
um
<alyssa>
this seems recursive
<alyssa>
okay, the values returned from cd58(..) become quickly wrong..
<alyssa>
I don't see how it would work for x > 15
<alyssa>
Yeah, it only respects 31-bits of its input
<alyssa>
top bit is ignored
<alyssa>
er no
<alyssa>
top nibble. so 28-bits
<alyssa>
so maybe there's more to that mscale
<alyssa>
also, the spec says exp2(x0 has precision (3 + 2 |x|) ULP
<alyssa>
and yet it works
<alyssa>
okay, it genuinely uses both the fixed int version and the float. fun!
<icecream95>
alyssa: I re-uploaded the image. It looks like it broke when I had to enable Javascript to participate in training AI for cyclist detection
<alyssa>
asimilate into the borg.js
<alyssa>
and super neat!!
<Lyude>
alyssa: any packing work I could help out with?
<alyssa>
Lyude: Have anything in mind?
<alyssa>
it's probably todo ;)
<Lyude>
alyssa: well what do you have working so far?
<alyssa>
Packing of some common ops generally to one unit (FMA *or* ADD but not both)
<alyssa>
Clauses with exactly 1 instruction bundle (and possible 1 constant)
<alyssa>
(but not larger shaders)
<alyssa>
s/shaders/clauses/
<alyssa>
fp32 and a bit of fp16, no int yet or fp64
<alyssa>
Oh, and unit testing so you can test new packing code against real hardware without touching anything else in the compiler
<alyssa>
(so you can dig into alt units or bigger clauses without making the scheduler do that yet)
<alyssa>
(Don't take that as pressure, if you want to just hack away I'd be happy to write the tests :) )
<Lyude>
alyssa: i'm fine with writing tests
<alyssa>
Lyude: fair enough, lmk if you have questions :)
<alyssa>
Okay, it looks like I mis interpreted op1e80 and it's in fact a 2-op
<alyssa>
op1e80(-1, x) seems to just be doing x - 1?
<alyssa>
Not sure why they're not just using a regular ADD.f32 then
<alyssa>
at least for -0.75 < x < 1.5, let's see what happens with bigger x
<alyssa>
op1e80(-1, 2.5) = 0.25
<alyssa>
so clearly it's more involved than an add. that'd be... an add and a >>1?
<alyssa>
That is - reduce to [0.75, 1.5) and then subtract 1
<alyssa>
= 2^{-f(x)} x - 1 ... might be clearer
<alyssa>
letting opC..whatever be T(x), that means we have
<alyssa>
log2(x) = (2^{-f(x)} x - 1) T(x) + f(x)
<alyssa>
If $2^{-f(x)} x - 1 = 0$, then we see $x = 2^f$, so we're just taking a log2 of a power of two (which is what f(x) = log2_frexpe(x) does anyway)
<alyssa>
Otherwise we can divide through and see for nonpower of two x..
<alyssa>
Letting $u = 2^{-f(x)} x$ -- that is, the reduced version of $x$, we see that T(x) = log2(u) / (u - 1)
<alyssa>
Why that's easier to calculate in hw I don't yet know but I suspect...
<chewitt>
I got excited reading the backlog .. bits of cube and such!
<chewitt>
now I'm back to feeling stupid again :)
<alyssa>
Ah, yes, okay
<alyssa>
So, T(x) *is* our logarithm approximation, the rest is just setup, right?
<alyssa>
And all this setup is argument reduction, so instead of dealing with (0, inf), we just need hw for (0.75, 1.5)
<alyssa>
So going back to `T(x) = log2(u) / (u - 1)`, we can look at the Taylor expansion of log2
<alyssa>
actually, one more thing to make this super textbook. Let u' = u + 1
<alyssa>
er, no, u' = u - 1, sorry messy handwriting
<alyssa>
So assuming they're using a taylor approximation to compute this (this is pure speculation, this detail is invisible to us AFAIK) - the extra multiplication on the ISA side removes a bunch of multiplications in hw, which is important to keep everything to one cycle
<alyssa>
I realize this is a "just so" explanation, but it's as good as any?
<alyssa>
Oh, d'oh, that's the series for natural log, not log2
<alyssa>
But the idea's probably similar
<alyssa>
Just multiply by 1/log(2) at the end
<alyssa>
("Are you even doing panfrost anymore alyssa?"
<alyssa>
"Calculus exam is approaching from the right...")
<alyssa>
Main hole in my theory is that it takes bunches of terms for that series to converge but hey.
davidlt_ has joined #panfrost
davidlt_ is now known as davidlt
vstehle has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
<tomeu>
alyssa: yep, happy to look at textures and sampling
<tomeu>
great work!
<tomeu>
icecream95: that looks very good!
tomboy65 has quit [Ping timeout: 240 seconds]
tomboy65 has joined #panfrost
<Werner>
Hey there. I tried to get ioquake3 on my H6 based SBC running and got this error. Not sure if related to ioquake or Panfrost: tty]ioquake3.aarch64: ../src/gallium/drivers/panfrost/pan_sfbd.c:77: panfrost_sfbd_format: Assertion `!"Invalid format rendering"' failed.
tomboy65 has quit [Ping timeout: 240 seconds]
swiftgeek is now known as swiftgeek_
swiftgeek_ is now known as swiftgeek__
swiftgeek__ is now known as swiftgeek
<icecream95>
Werner: The OpenGL 2 renderer tries to use a R16G16B16A16_FLOAT framebuffer, which I don't think SFBD GPUs support.
<icecream95>
You can try using the OpenGL 1 renderer for ioquake by appending +set cl_renderer opengl1 to the command line
<Werner>
Will do, thanks.
<Werner>
Okay, starts, but it is unplayable due to artefacts. Well, it was worth the try :)
<icecream95>
Werner: What Mesa version are you using?
<Werner>
2.1 Mesa 20.1.0-deve- (git-7aa6720ba4)
<Werner>
*devel
<Werner>
Running on Linux 5.6.2
davidlt has quit [Ping timeout: 265 seconds]
davidlt has joined #panfrost
stikonas has joined #panfrost
raster has joined #panfrost
icecream95 has quit [Ping timeout: 265 seconds]
yann|work has quit [Ping timeout: 260 seconds]
yann|work has joined #panfrost
ChanServ has quit [shutting down]
cwabbott has quit [Quit: cwabbott]
ChanServ has joined #panfrost
ChanServ has quit [*.net *.split]
stikonas has quit [Ping timeout: 246 seconds]
ChanServ has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Ping timeout: 246 seconds]
stikonas has joined #panfrost
cwabbott has joined #panfrost
mmind00 has quit [Remote host closed the connection]
mmind00 has joined #panfrost
cwabbott has quit [Quit: cwabbott]
buzzmarshall has joined #panfrost
yann|work has quit [Ping timeout: 265 seconds]
yann|work has joined #panfrost
yann|work has quit [Read error: No route to host]
yann|work has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
yann|work has quit [Read error: No route to host]
yann|work has joined #panfrost
raster has joined #panfrost
nerdboy has joined #panfrost
stikonas_ has joined #panfrost
stikonas has quit [Ping timeout: 246 seconds]
raster has quit [Quit: Gettin' stinky!]
davidlt has quit [Ping timeout: 264 seconds]
indy has quit [Ping timeout: 264 seconds]
raster has joined #panfrost
indy has joined #panfrost
yann|work has quit [Ping timeout: 256 seconds]
yann has joined #panfrost
afaerber has quit [Quit: Leaving]
afaerber has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]