<daniels>
the idea is to reuse someone else's profiling UI that already works, so we can only do the gfx-related bits, like GPU perf counters & display-server timeline
<robmur01>
What, no plan to interface panfrost to Gator? :P
* robmur01
runs
<icecream95>
daniels: Some of the counter blocks are swapped on Panfrost compared to Gator, so make sure you're getting sane values for the counters
<daniels>
robmur01: ha!
<daniels>
shadeslayer: ^ might be useful to beware of the swapped counter blocks
<daniels>
icecream95: btw I completely missed the earlyz fix, that's cool :)
Elpaulo has joined #panfrost
<HdkR>
daniels: Ah nice, so this project lets you profile GPU side using perf counters.
<HdkR>
GPU profile tooling is all over the place, would be nice if it was a bit more..sane
<daniels>
yeah, Perfetto is the Android profiling environment, which can do separate/offline capture & analysis
<HdkR>
That's pretty great, I had hooked in to that interface when doing game porting to Android
<daniels>
I've wanted to make life better for ages, but sysprof is too deeply tied to running in literal GNOME, gpuvis is really cool but misses a lot of the analysis stuff from Perfetto and is also another random tooling environment ... so we decided to suck up the Chromium/C++ pain and do a Perfetto plugin, so someone else could worry about how to do good filtering/analysis and make it fast and not ugly :P
<HdkR>
haha
<HdkR>
Are there any good frontends yet? I always just jump in to chrome://tracing
<HdkR>
Linux should have taken the Windows approach and demote long double to double everywhere...
<robmur01>
"for when having 80 bits is more important than subtle intermediate precision artefacts inconsistent with any other platform"
<HdkR>
It's a rough time, at least I've mostly dealt with it. Although I have ignored the two BCD instructions that x87 has :)
<HdkR>
If anyone is using those then they deserve no being ran
<HdkR>
s/no/not
<robmur01>
but what about the ASCII ones? :P
<icecream95>
HdkR: Compiler Explorer uses soft float on ARM by default, try it with -mfpu=neon -mfloat-abi=hard
<HdkR>
robmur01: Did I miss a table of x87 instructions somewhere? :)
<HdkR>
icecream95: Yea, it jumps to VFP/NEON then as expected. So really AArch64 just gets to deal with the performance hit for any app using long double :/
<HdkR>
Maybe someone should go through supertuxkart and stop making it use x87
<robmur01>
oh, sorry, misread - BCD took me back to x86, I didn't realise it had leaked into x87 too
<robmur01>
I have a lovely old book here with a whole chapter entitled "Arithmetic II: Processing ASCII and BCD data" :D
<HdkR>
Yea, x87 BCD lets you load an 18 digit decimal
<HdkR>
hah
<robmur01>
obligatory "at least it's not EBCDIC"
<HdkR>
At least it isn't Java bytecode
<icecream95>
At least it isn't hardware accelerated Java bytecode...
<HdkR>
:)
<robmur01>
I do like that Jazelle is still mandatory in Armv8 AArch32
<icecream95>
shadeslayer: With panfrost the "L2 and MMU" perf counter block is before the (multiple) "Shader Core" blocks. Use DRM_PANFROST_PARAM_SHADER_PRESENT to check the number of shader cores. Counters saturate, so you'll need to disable and re-enable them at least every five seconds.
<HdkR>
Even though Jazelle is mandatory, at least it can be implemented fully as a fault catching interpreter
<HdkR>
Since it's only a hundred ops or something
<HdkR>
Just a bit silly that the mode is still there and available :)
<robmur01>
HdkR: that's what's fun: hardware still must 'implement' it, but is forbidden from doing anything other that raising the "handle this one in software" exception
<HdkR>
yea
<daniels>
wesome
<daniels>
BXJ is definitely the best opcode
<HdkR>
Long live BXJ, forever stuck in compatibility near-death
icecream95 has quit [Quit: leaving]
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
<tomeu>
robmur01: I heard you were going to do the gator integration!
tomboy65 has quit [Quit: Off to see the wizard.]
tomboy64 has joined #panfrost
bnieuwenhuizen has quit [Ping timeout: 256 seconds]
bnieuwenhuizen_ has joined #panfrost
bnieuwenhuizen_ is now known as bnieuwenhuizen
<alyssa>
icecream95: There goes my theory. thanks, though :p
<alyssa>
Maybe when it's right from a varying, that's also cached
<alyssa>
I know utgard had a varying->tex fast path
<alyssa>
but shader #2 in that file invalidate *that*
<alyssa>
clearly it *isn't* a memoryBarrierBuffer() instruction >:
<alyssa>
Oh, solution - both r28/r29 are cached simultaneously
<alyssa>
So you set the flag if both are valid, and clear if at least one is not
<alyssa>
in shader 1, r28/r29 are fully determined/cached at the beginning I guess
<alyssa>
but then never touched so you keep setting
<alyssa>
shader 2, the first is direct, but the second cache got invalidated so you stop setting ..
<alyssa>
Oh, I wonder..
<alyssa>
FWIW I can't reproduce the varying/cache thing with the blob on t860
<alyssa>
I can't even reproduce the regular caching thing now uhm
<alyssa>
Something I've also noticed is the blob compiler refuses to load more than 128-bit at a time
<alyssa>
^ Notice 64-bit load of each given the masks, though internally it's still effectively accessing 256-bit
<alyssa>
Panfrost doesn't bother and I haven't had issues, so i dunno.
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
raster has quit [Client Quit]
raster has joined #panfrost
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
buzzmarshall has joined #panfrost
nerdboy has joined #panfrost
kherbst has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
karolherbst has quit [Ping timeout: 272 seconds]
kherbst has quit [Ping timeout: 240 seconds]
karolherbst has joined #panfrost
karolherbst has quit [Client Quit]
karolherbst has joined #panfrost
raster has joined #panfrost
* alyssa
dusted off her c201 to debug t760 fails
<alyssa>
wee
<alyssa>
aside: realized a chunk of -bterrain slowness is from mipmapping issues
<alyssa>
eliminating some of the superfluous flushing does improve fps
<alyssa>
Not massively but every bit is welcome
davidlt has quit [Ping timeout: 265 seconds]
raster has quit [Quit: Gettin' stinky!]
nerdboy has quit [Ping timeout: 256 seconds]
NeuroScr has joined #panfrost
karolherbst has joined #panfrost
cwabbott_ has joined #panfrost
cwabbott_ is now known as cwabbott
<alyssa>
Oh, I just noticed the OpenCL blob sets the 0x200 flag
<alyssa>
I wonder if it controls numerical handling or something?
<HdkR>
nan versus FTZ?
<alyssa>
perhaps?
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
<alyssa>
mmind00: Case closed.
<alyssa>
Rabbit hole looking at kbase stuff (see above for why but I admit yes I am distracted)
<alyssa>
and realizd kbase has a product ID stringify routine that I somehow missed
<alyssa>
Nothing there we didn't already know except it maps TNAX to "Mali-G57"
<alyssa>
TNAX = Natt
<alyssa>
so your shiny new Rockchip will be G57 :-)
<mmind00>
alyssa: aaaah, now I remember which rabbit hole you're talking about :-D
<mmind00>
but still it was marketing material ... I guess we'll see when the chip finally arrives
<alyssa>
mmind00: Fair. But Natt is definitely G57, we now know :)
<alyssa>
Something else.... there are duplicates of a bunch of IDs
<alyssa>
some are tYYx where YY is the first two letters of codename (i.e. mali g52 <---> dvalin <--. tDVx)
<alyssa>
but in Valhall, we see a couple duplicates with l instead of t, and with a newer product # within the architecture
<alyssa>
tBEx, lBEx, tODx, lODx, tTUx, lTUx
<alyssa>
robmur01: lTUx, the Linux GPU? :D
<alyssa>
Also weird is tE2x. I've never seen numbers in the codenames. Maybe they ran out of Norse names :p
<alyssa>
For architecture #s, we know 6,7 are bifrost
<alyssa>
G77/G57 are 9, which is valhall
<alyssa>
But I think we have a citation that tBOx (8, 2) is also Valhall, so.. I guess they released out-of-order.
<alyssa>
10 contains tODx -- which I have to guess is Odin, no citation but c'mon :) - which is known Valhall cross referencing more things
<alyssa>
#11 just showed up in this year's kbase, with the weird names.. tTUx, lTUx, tE2x... could be even more Valhall, could be something new entirely tbh
robink has joined #panfrost
<alyssa>
Worth noting l/t versions have identical issues/featurs lists.
<alyssa>
So very closely related at any rate.
<alyssa>
Anyway, uh, back to real work.
<alyssa>
but I won't be surprised if there's serious new Mali in the pipes :)
<alyssa>
'course, Midgard was new once :-)
<alyssa>
--Actually, one more thing, E2 does have precedent
<alyssa>
looking at ancient kbase, we have:
<alyssa>
#define GPU_ID_PI_TFRX 0x0880
<alyssa>
#define GPU_ID_PI_TF2X 0x0860
<alyssa>
Anyway, seriously long rabbit hole, serves me right looking at kbase for answesr..
<alyssa>
Trying to figure out what the sign extension bits do when the GPU is accessing with full precision... they should be no-ops
<alyssa>
And for actual arguments I believe they are
<alyssa>
but the sign-ext bit is set for the r24 dummy with imov..
<alyssa>
hrm. Maybe it'll come to me in my sleep ;_0
<alyssa>
(actually, sign-ext is the default, so probably it's ignored by hw and the blob got lazy. but then I have some other bug..)