alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
raster has quit [Quit: Gettin' stinky!]
ndufresne has joined #panfrost
cphealy has joined #panfrost
LinguinePenguiny has quit [Quit: LinguinePenguiny]
Depau has quit [Quit: ZNC 1.7.5 - https://znc.in]
Depau has joined #panfrost
kaspter has quit [Ping timeout: 258 seconds]
kaspter has joined #panfrost
vstehle has quit [Ping timeout: 265 seconds]
kaspter has quit [Ping timeout: 272 seconds]
kaspter has joined #panfrost
davidlt has joined #panfrost
icecream95 has joined #panfrost
vstehle has joined #panfrost
_whitelogger has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
robink has quit [Ping timeout: 260 seconds]
robink has joined #panfrost
<icecream95> alyssa: Here is another shader using barrier_buffer, but here it's the first texture fetch, not the last: https://gitlab.freedesktop.org/snippets/1020/raw
<icecream95> The offline shader compiler uses it for t760 up (including t820)
icecream95 has quit [Ping timeout: 246 seconds]
raster has joined #panfrost
kaspter has quit [Ping timeout: 246 seconds]
kaspter has joined #panfrost
icecream95 has joined #panfrost
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
stikonas has joined #panfrost
Elpaulo has joined #panfrost
Elpaulo has quit [Client Quit]
Elpaulo has joined #panfrost
Elpaulo has quit [Client Quit]
<icecream95> alyssa: If unsetting 0x200 improves performance and doesn't break CI, you might as well stop setting it and wait for the bug reports...
<icecream95> alyssa: An allocator for packing small temporary BOs into larger blocks might help with bo_wait
<daniels> HdkR: https://gitlab.freedesktop.org/Fahien/gfx-pps is something we've been bashing on
<daniels> the idea is to reuse someone else's profiling UI that already works, so we can only do the gfx-related bits, like GPU perf counters & display-server timeline
<robmur01> What, no plan to interface panfrost to Gator? :P
* robmur01 runs
<icecream95> daniels: Some of the counter blocks are swapped on Panfrost compared to Gator, so make sure you're getting sane values for the counters
<daniels> robmur01: ha!
<daniels> shadeslayer: ^ might be useful to beware of the swapped counter blocks
<daniels> icecream95: btw I completely missed the earlyz fix, that's cool :)
Elpaulo has joined #panfrost
<HdkR> daniels: Ah nice, so this project lets you profile GPU side using perf counters.
<HdkR> GPU profile tooling is all over the place, would be nice if it was a bit more..sane
<daniels> yeah, Perfetto is the Android profiling environment, which can do separate/offline capture & analysis
<HdkR> That's pretty great, I had hooked in to that interface when doing game porting to Android
<daniels> I've wanted to make life better for ages, but sysprof is too deeply tied to running in literal GNOME, gpuvis is really cool but misses a lot of the analysis stuff from Perfetto and is also another random tooling environment ... so we decided to suck up the Chromium/C++ pain and do a Perfetto plugin, so someone else could worry about how to do good filtering/analysis and make it fast and not ugly :P
<HdkR> haha
<HdkR> Are there any good frontends yet? I always just jump in to chrome://tracing
<daniels> you can also use https://perfetto.dev
<HdkR> That looks pretty nice. Looks like I'm going to be spending some time annoting FEX with atrace wrappers
<daniels> FEX?
<HdkR> https://github.com/FEX-Emu/FEX My x86-64 userland emulator :)
<daniels> oh yeah, that :)
<daniels> btw, other options are available https://perfetto.dev/#/trace-processor.md
<icecream95> HdkR: But can it run Crysis?
<HdkR> I assume Crysis is a 32bit application, so not yet :P
<HdkR> https://twitter.com/Sonicadvance1/status/1262897549951660033 But we do have real games running. Coming soon to an ARM device near you
<daniels> I'm not cool enough to even know which game that is :(
<HdkR> FTL, fun little indie game
<HdkR> Roguelite, really hard to actually "beat", runs typically take about an hour
<daniels> anyway, awesome to see it moving along! :)
<HdkR> It finally almost provides benefit to end users. It's pretty exciting
<HdkR> Panfrost/Freedreno are important to the success of FEX as well. Otherwise no ARM graphics drivers to run these games :)
<icecream95> HdkR: Are you using x86 panfrost, or native libraries like box86 does?
<HdkR> Currently we are using fully emulated libraries. We plan to thunk libraries in the future though
<HdkR> In...two days I'll be testing out x86-64 Freedreno. Right now I'm just running things through radeon si
NeuroScr has quit [Quit: NeuroScr]
<HdkR> I spent the past week implementing x87 and crying, so a change of pace is a good thing
<urjaman> so, on to implementing something else and crying?
<HdkR> Pretty much :D
<urjaman> :P
<robmur01> isn't x87 totally deprecated in x86_64?
<HdkR> anyone that is using libstdc++ and std::unordered_* is using x87 on this very day
* robmur01 wonder what abomination of code would still be using it today
<HdkR> `long double` goes down the x87 path on x86, and goes down soft float on everything else
<HdkR> libstdc++'s prime bucket size calculation uses long double
<HdkR> supertuxkart uses x87 as well
<icecream95> HdkR: 32bit ARM doesn't have long double - it's 64-bit like a normal double
<urjaman> ^^ thats what i assumed everything non-x86 did
<HdkR> icecream95: ooo, cheating, I like it
<robmur01> bleh, long double is its own abomination
<HdkR> https://godbolt.org/z/qnbzQ7 for x86, aarch64, and aarch32
<HdkR> Linux should have taken the Windows approach and demote long double to double everywhere...
<robmur01> "for when having 80 bits is more important than subtle intermediate precision artefacts inconsistent with any other platform"
<HdkR> It's a rough time, at least I've mostly dealt with it. Although I have ignored the two BCD instructions that x87 has :)
<HdkR> If anyone is using those then they deserve no being ran
<HdkR> s/no/not
<robmur01> but what about the ASCII ones? :P
<icecream95> HdkR: Compiler Explorer uses soft float on ARM by default, try it with -mfpu=neon -mfloat-abi=hard
<HdkR> robmur01: Did I miss a table of x87 instructions somewhere? :)
<HdkR> icecream95: Yea, it jumps to VFP/NEON then as expected. So really AArch64 just gets to deal with the performance hit for any app using long double :/
<HdkR> Maybe someone should go through supertuxkart and stop making it use x87
<robmur01> oh, sorry, misread - BCD took me back to x86, I didn't realise it had leaked into x87 too
<robmur01> I have a lovely old book here with a whole chapter entitled "Arithmetic II: Processing ASCII and BCD data" :D
<HdkR> Yea, x87 BCD lets you load an 18 digit decimal
<HdkR> hah
<robmur01> obligatory "at least it's not EBCDIC"
<HdkR> At least it isn't Java bytecode
<icecream95> At least it isn't hardware accelerated Java bytecode...
<HdkR> :)
<robmur01> I do like that Jazelle is still mandatory in Armv8 AArch32
<icecream95> shadeslayer: With panfrost the "L2 and MMU" perf counter block is before the (multiple) "Shader Core" blocks. Use DRM_PANFROST_PARAM_SHADER_PRESENT to check the number of shader cores. Counters saturate, so you'll need to disable and re-enable them at least every five seconds.
<HdkR> Even though Jazelle is mandatory, at least it can be implemented fully as a fault catching interpreter
<HdkR> Since it's only a hundred ops or something
<HdkR> Just a bit silly that the mode is still there and available :)
<robmur01> HdkR: that's what's fun: hardware still must 'implement' it, but is forbidden from doing anything other that raising the "handle this one in software" exception
<HdkR> yea
<daniels> wesome
<daniels> BXJ is definitely the best opcode
<HdkR> Long live BXJ, forever stuck in compatibility near-death
icecream95 has quit [Quit: leaving]
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
<tomeu> robmur01: I heard you were going to do the gator integration!
tomboy65 has quit [Quit: Off to see the wizard.]
tomboy64 has joined #panfrost
bnieuwenhuizen has quit [Ping timeout: 256 seconds]
bnieuwenhuizen_ has joined #panfrost
bnieuwenhuizen_ is now known as bnieuwenhuizen
<alyssa> icecream95: There goes my theory. thanks, though :p
<alyssa> Maybe when it's right from a varying, that's also cached
<alyssa> I know utgard had a varying->tex fast path
<alyssa> but shader #2 in that file invalidate *that*
<alyssa> clearly it *isn't* a memoryBarrierBuffer() instruction >:
<alyssa> Oh, solution - both r28/r29 are cached simultaneously
<alyssa> So you set the flag if both are valid, and clear if at least one is not
<alyssa> in shader 1, r28/r29 are fully determined/cached at the beginning I guess
<alyssa> but then never touched so you keep setting
<alyssa> shader 2, the first is direct, but the second cache got invalidated so you stop setting ..
<alyssa> Oh, I wonder..
<alyssa> FWIW I can't reproduce the varying/cache thing with the blob on t860
<alyssa> I can't even reproduce the regular caching thing now uhm
<alyssa> Something I've also noticed is the blob compiler refuses to load more than 128-bit at a time
<alyssa> per bundle, I mean
<alyssa> ld_vary_32 r28.zw, 1.xxxy, 0x9E, 0x1E /* A0 */
<alyssa> ld_vary_32 r29.zw, 2.xxxy, 0x9E, 0x1E /* A0 */
<alyssa> ^ Notice 64-bit load of each given the masks, though internally it's still effectively accessing 256-bit
<alyssa> Panfrost doesn't bother and I haven't had issues, so i dunno.
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
raster has quit [Client Quit]
raster has joined #panfrost
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
buzzmarshall has joined #panfrost
nerdboy has joined #panfrost
kherbst has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
karolherbst has quit [Ping timeout: 272 seconds]
kherbst has quit [Ping timeout: 240 seconds]
karolherbst has joined #panfrost
karolherbst has quit [Client Quit]
karolherbst has joined #panfrost
raster has joined #panfrost
* alyssa dusted off her c201 to debug t760 fails
<alyssa> wee
<alyssa> aside: realized a chunk of -bterrain slowness is from mipmapping issues
<alyssa> eliminating some of the superfluous flushing does improve fps
<alyssa> Not massively but every bit is welcome
davidlt has quit [Ping timeout: 265 seconds]
raster has quit [Quit: Gettin' stinky!]
nerdboy has quit [Ping timeout: 256 seconds]
NeuroScr has joined #panfrost
karolherbst has joined #panfrost
cwabbott_ has joined #panfrost
cwabbott_ is now known as cwabbott
<alyssa> Oh, I just noticed the OpenCL blob sets the 0x200 flag
<alyssa> I wonder if it controls numerical handling or something?
<HdkR> nan versus FTZ?
<alyssa> perhaps?
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
<alyssa> mmind00: Case closed.
<alyssa> Rabbit hole looking at kbase stuff (see above for why but I admit yes I am distracted)
<alyssa> and realizd kbase has a product ID stringify routine that I somehow missed
<alyssa> Nothing there we didn't already know except it maps TNAX to "Mali-G57"
<alyssa> TNAX = Natt
<alyssa> so your shiny new Rockchip will be G57 :-)
<mmind00> alyssa: aaaah, now I remember which rabbit hole you're talking about :-D
<mmind00> but still it was marketing material ... I guess we'll see when the chip finally arrives
<alyssa> mmind00: Fair. But Natt is definitely G57, we now know :)
<alyssa> Something else.... there are duplicates of a bunch of IDs
<alyssa> some are tYYx where YY is the first two letters of codename (i.e. mali g52 <---> dvalin <--. tDVx)
<alyssa> but in Valhall, we see a couple duplicates with l instead of t, and with a newer product # within the architecture
<alyssa> tBEx, lBEx, tODx, lODx, tTUx, lTUx
<alyssa> robmur01: lTUx, the Linux GPU? :D
<alyssa> Also weird is tE2x. I've never seen numbers in the codenames. Maybe they ran out of Norse names :p
<alyssa> For architecture #s, we know 6,7 are bifrost
<alyssa> G77/G57 are 9, which is valhall
<alyssa> But I think we have a citation that tBOx (8, 2) is also Valhall, so.. I guess they released out-of-order.
<alyssa> 10 contains tODx -- which I have to guess is Odin, no citation but c'mon :) - which is known Valhall cross referencing more things
<alyssa> #11 just showed up in this year's kbase, with the weird names.. tTUx, lTUx, tE2x... could be even more Valhall, could be something new entirely tbh
robink has joined #panfrost
<alyssa> Worth noting l/t versions have identical issues/featurs lists.
<alyssa> So very closely related at any rate.
<alyssa> Anyway, uh, back to real work.
<alyssa> but I won't be surprised if there's serious new Mali in the pipes :)
<alyssa> 'course, Midgard was new once :-)
<alyssa> --Actually, one more thing, E2 does have precedent
<alyssa> looking at ancient kbase, we have:
<alyssa> #define GPU_ID_PI_TFRX 0x0880
<alyssa> #define GPU_ID_PI_TF2X 0x0860
<alyssa> Anyway, seriously long rabbit hole, serves me right looking at kbase for answesr..
<alyssa> Trying to figure out what the sign extension bits do when the GPU is accessing with full precision... they should be no-ops
<alyssa> And for actual arguments I believe they are
<alyssa> but the sign-ext bit is set for the r24 dummy with imov..
<alyssa> hrm. Maybe it'll come to me in my sleep ;_0
<alyssa> (actually, sign-ext is the default, so probably it's ignored by hw and the blob got lazy. but then I have some other bug..)