alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
<alyssa> not sure about that second R0 argument... the port is being used..
<alyssa> regardless:
<alyssa> 2^x = opcd58(op7930(E_24(x), x))
<alyssa> opcd58 is near the various special TABLE ops so that's good
<alyssa> op7930 is actually near.. type conversions?
<alyssa> looks like a variant of f32 to i32, oddly
<alyssa> Time to break out the assembler, I guess. That or eat dinner.
<alyssa> Also, the blob seems unwilling to fold things into the lowered code. Not sure what's up with that yet.
<icecream95> Performance counters + Gallium HUD: https://gitlab.freedesktop.org/snippets/953
<alyssa> icecream95: niiiice :D
<anarsoul> icecream95: doom?
stikonas has quit [Remote host closed the connection]
* alyssa returns
<alyssa> 0x7930 is clearly another F32_TO_I32 variant
<alyssa> I'm not sure what the distinction is.. maybe handling of NaN / infinity or something
<HdkR> Rounding direction isn't too uncommon
<alyssa> HdkR: Ah, yes, could be
<alyssa> midgard has variants for rounding mode
<HdkR> If they moved even closer to NEON then maybe we have 4 rounding modes to choose :P
<alyssa> same behaviour at +inf
<alyssa> HdkR: we've had 4 modes since midgard t6xx :D
<HdkR> Perfect
<alyssa> and yes, can confirm it's round mode :)
<alyssa> which means there's way more variants. todo: fix packing code for that :'(
<HdkR> trunc, -inf, +inf, ties even? :D
<alyssa> sounds right
<alyssa> rtz, rte, rtn, rtp
<HdkR> yep yep
<alyssa> Anyway, interpolating a little bit, that means this is F32_TO_I32.RTE
<alyssa> So we have
<alyssa> 2^x = opCD58(f32_to_i32.rte(x * 2^24))
<alyssa> so I guess opCD58 is a lookup table for exp2... but with 24-bits of precision, I guess?
<alyssa> Odd that there's no fixup needed. Seems.. hm.
<alyssa> uh, ok. cd58(..) takes that reduction *and* the original x.
<icecream95> Does it make more sense for performance counter values to be per frame or per second?
<alyssa> icecream95: Probably depends what you're measuring? But I don't think mali's counters are reliable to per-frame granularity (though I'd love to be proven wrong)
<alyssa> what does gator do?
<icecream95> alyssa: This isn't about how often to read the values, but how to scale them.
<alyssa> Ah
<alyssa> Probably per second since fps is variable then?
<alyssa> er
<alyssa> but the counts will vary with the fps
<alyssa> um
<alyssa> this seems recursive
<alyssa> okay, the values returned from cd58(..) become quickly wrong..
<alyssa> I don't see how it would work for x > 15
<alyssa> Yeah, it only respects 31-bits of its input
<alyssa> top bit is ignored
<alyssa> er no
<alyssa> top nibble. so 28-bits
<alyssa> so maybe there's more to that mscale
<alyssa> also, the spec says exp2(x0 has precision (3 + 2 |x|) ULP
<alyssa> and yet it works
<alyssa> okay, it genuinely uses both the fixed int version and the float. fun!
<alyssa> mystery closed, I guess.
<alyssa> Next up - log2
adjtm_ has quit [Ping timeout: 240 seconds]
adjtm has joined #panfrost
<alyssa> log2(x) = opCC68(x) * op1E80(-1, x, x) + i2f(log_frexpe(x))
<alyssa> CC68 looks like another table
<alyssa> (but cf60 is already flog2_table..)
<alyssa> At least the + i2f(log_frexpe(x)) part is the same as the g71 formula
<alyssa> op1e80 is fma fwiw
<alyssa> er unit
vstehle has quit [Ping timeout: 260 seconds]
<alyssa> e1e80, I think
<alyssa> to the asm!
<icecream95> alyssa: Per frame values now work: https://gitlab.freedesktop.org/snippets/955
<alyssa> icecream95: neat!
<alyssa> (image doesn't load here..)
<icecream95> alyssa: I re-uploaded the image. It looks like it broke when I had to enable Javascript to participate in training AI for cyclist detection
<alyssa> asimilate into the borg.js
<alyssa> and super neat!!
<Lyude> alyssa: any packing work I could help out with?
<alyssa> Lyude: Have anything in mind?
<alyssa> it's probably todo ;)
<Lyude> alyssa: well what do you have working so far?
<alyssa> Packing of some common ops generally to one unit (FMA *or* ADD but not both)
<alyssa> Clauses with exactly 1 instruction bundle (and possible 1 constant)
<alyssa> (but not larger shaders)
<alyssa> s/shaders/clauses/
<alyssa> fp32 and a bit of fp16, no int yet or fp64
<alyssa> Oh, and unit testing so you can test new packing code against real hardware without touching anything else in the compiler
<alyssa> (so you can dig into alt units or bigger clauses without making the scheduler do that yet)
<alyssa> (Don't take that as pressure, if you want to just hack away I'd be happy to write the tests :) )
<Lyude> alyssa: i'm fine with writing tests
<alyssa> Lyude: fair enough, lmk if you have questions :)
<alyssa> Okay, it looks like I mis interpreted op1e80 and it's in fact a 2-op
<alyssa> op1e80(-1, x) seems to just be doing x - 1?
<alyssa> Not sure why they're not just using a regular ADD.f32 then
<alyssa> at least for -0.75 < x < 1.5, let's see what happens with bigger x
<alyssa> op1e80(-1, 2.5) = 0.25
<alyssa> so clearly it's more involved than an add. that'd be... an add and a >>1?
<alyssa> op1e80(-1, 3.5) = -0.125
<alyssa> op1e80(-1, 4.5) = 0.125
<alyssa> Uhm. Okay.
<alyssa> Ahh
<alyssa> op1e80(-1, x) = (x / 2^{log_frexpe(x)}) - 1
<alyssa> That is - reduce to [0.75, 1.5) and then subtract 1
<alyssa> = 2^{-f(x)} x - 1 ... might be clearer
<alyssa> letting opC..whatever be T(x), that means we have
<alyssa> log2(x) = (2^{-f(x)} x - 1) T(x) + f(x)
<alyssa> If $2^{-f(x)} x - 1 = 0$, then we see $x = 2^f$, so we're just taking a log2 of a power of two (which is what f(x) = log2_frexpe(x) does anyway)
<alyssa> Otherwise we can divide through and see for nonpower of two x..
<alyssa> Letting $u = 2^{-f(x)} x$ -- that is, the reduced version of $x$, we see that T(x) = log2(u) / (u - 1)
<alyssa> Why that's easier to calculate in hw I don't yet know but I suspect...
<chewitt> I got excited reading the backlog .. bits of cube and such!
<chewitt> now I'm back to feeling stupid again :)
<alyssa> Ah, yes, okay
<alyssa> So, T(x) *is* our logarithm approximation, the rest is just setup, right?
<alyssa> And all this setup is argument reduction, so instead of dealing with (0, inf), we just need hw for (0.75, 1.5)
<alyssa> So going back to `T(x) = log2(u) / (u - 1)`, we can look at the Taylor expansion of log2
<alyssa> actually, one more thing to make this super textbook. Let u' = u + 1
<alyssa> er, no, u' = u - 1, sorry messy handwriting
<alyssa> T(x) = log2(1 + u) / u'
<alyssa> But log2(1 + u) has the Taylor series
<alyssa> $\sum_{i = 1}^n (-1)^{n + 1} \frac{(u')^n}{n}$
<alyssa> so log2(1 + u) / u' has the Taylor series
<alyssa> $\sum_{i = 1}^n (-1)^{n + 1} \frac{(u')^n}{n - 1}$
<alyssa> erm
<alyssa> $\sum_{i = 1}^n (-1)^{n + 1} \frac{(u')^{n - 1}}{n}$
<alyssa> So assuming they're using a taylor approximation to compute this (this is pure speculation, this detail is invisible to us AFAIK) - the extra multiplication on the ISA side removes a bunch of multiplications in hw, which is important to keep everything to one cycle
<alyssa> I realize this is a "just so" explanation, but it's as good as any?
<alyssa> Oh, d'oh, that's the series for natural log, not log2
<alyssa> But the idea's probably similar
<alyssa> Just multiply by 1/log(2) at the end
<alyssa> ("Are you even doing panfrost anymore alyssa?"
<alyssa> "Calculus exam is approaching from the right...")
<alyssa> Main hole in my theory is that it takes bunches of terms for that series to converge but hey.
davidlt_ has joined #panfrost
davidlt_ is now known as davidlt
vstehle has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
<tomeu> alyssa: yep, happy to look at textures and sampling
<tomeu> great work!
<tomeu> icecream95: that looks very good!
tomboy65 has quit [Ping timeout: 240 seconds]
tomboy65 has joined #panfrost
<Werner> Hey there. I tried to get ioquake3 on my H6 based SBC running and got this error. Not sure if related to ioquake or Panfrost: tty]ioquake3.aarch64: ../src/gallium/drivers/panfrost/pan_sfbd.c:77: panfrost_sfbd_format: Assertion `!"Invalid format rendering"' failed.
tomboy65 has quit [Ping timeout: 240 seconds]
swiftgeek is now known as swiftgeek_
swiftgeek_ is now known as swiftgeek__
swiftgeek__ is now known as swiftgeek
<icecream95> Werner: The OpenGL 2 renderer tries to use a R16G16B16A16_FLOAT framebuffer, which I don't think SFBD GPUs support.
<icecream95> You can try using the OpenGL 1 renderer for ioquake by appending +set cl_renderer opengl1 to the command line
<Werner> Will do, thanks.
<Werner> Okay, starts, but it is unplayable due to artefacts. Well, it was worth the try :)
<icecream95> Werner: What Mesa version are you using?
<Werner> 2.1 Mesa 20.1.0-deve- (git-7aa6720ba4)
<Werner> *devel
<Werner> Running on Linux 5.6.2
davidlt has quit [Ping timeout: 265 seconds]
davidlt has joined #panfrost
stikonas has joined #panfrost
raster has joined #panfrost
icecream95 has quit [Ping timeout: 265 seconds]
yann|work has quit [Ping timeout: 260 seconds]
yann|work has joined #panfrost
ChanServ has quit [shutting down]
cwabbott has quit [Quit: cwabbott]
ChanServ has joined #panfrost
ChanServ has quit [*.net *.split]
stikonas has quit [Ping timeout: 246 seconds]
ChanServ has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Ping timeout: 246 seconds]
stikonas has joined #panfrost
cwabbott has joined #panfrost
mmind00 has quit [Remote host closed the connection]
mmind00 has joined #panfrost
cwabbott has quit [Quit: cwabbott]
buzzmarshall has joined #panfrost
yann|work has quit [Ping timeout: 265 seconds]
yann|work has joined #panfrost
yann|work has quit [Read error: No route to host]
yann|work has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
yann|work has quit [Read error: No route to host]
yann|work has joined #panfrost
raster has joined #panfrost
nerdboy has joined #panfrost
stikonas_ has joined #panfrost
stikonas has quit [Ping timeout: 246 seconds]
raster has quit [Quit: Gettin' stinky!]
davidlt has quit [Ping timeout: 264 seconds]
indy has quit [Ping timeout: 264 seconds]
raster has joined #panfrost
indy has joined #panfrost
yann|work has quit [Ping timeout: 256 seconds]
yann has joined #panfrost
afaerber has quit [Quit: Leaving]
afaerber has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
unoccupied has joined #panfrost
stikonas_ has quit [Ping timeout: 246 seconds]
stikonas_ has joined #panfrost