<bbrezillon>
robmur01: Hi! I've been chasing an issue we have on s922 (amlogic) (tomeu, narmstrong and/or chewitt probably reported it here a while ago) where the first few jobs we start on a new GL context fail with faults (DATA_RANGE_FAULT, TILE_RANGE_FAULT, ...)
<bbrezillon>
things stabilize after a while, but I found out that disabling the BO cache in panfrost make things worse (it basically faults on every BO we pass, unless it's already been passed to a previous job)
<bbrezillon>
after further investigation it seems to be cause by the shareability attribute when adding pages to the page table
<bbrezillon>
when I force it to non-shareable (instead of inner-shareable), the faults disappear (and that's also what mali_kbase/libmali seem to use), but I'm not sure I understand what happens here
raster has joined #panfrost
davidlt has quit [Ping timeout: 260 seconds]
nlhowell has joined #panfrost
nhp[m] has quit [Quit: killed]
clementp[m] has quit [Quit: killed]
Ke has quit [Quit: killed]
l-as has quit [Quit: killed]
stikonas has joined #panfrost
l-as has joined #panfrost
icecream95 has joined #panfrost
clementp[m] has joined #panfrost
Ke has joined #panfrost
nhp[m] has joined #panfrost
paulk-leonov has quit [Ping timeout: 240 seconds]
paulk-leonov has joined #panfrost
<icecream95>
raster: It was probably 40b99bb79e1 ("panfrost: Revert "Disable frame throttling"") in Mesa that improved things for you
<raster>
icecream95: not a kernel change?
<raster>
this was on my list of annoyances to look into...
<icecream95>
There's still no real scheduling, but at least GPU-heavy applications don't fill the job queue with too many jobs anymore
<icecream95>
alyssa: With AFBC, glmark2-es2 -b texture is probably still slower than before 528e132d4f7
<robmur01>
bbrezillon: that all seems to chime with the working theory of (at least some part of) the cache being f'ed
<robmur01>
shareablility may well affect how things allocate into the caches in the first place
raster has quit [Ping timeout: 246 seconds]
<bbrezillon>
robmur01: I tried invalidating/flushing the MMU and L2 caches agressively, but still had the issue
<bbrezillon>
so I'm wondering what in this inner shareability domain could influence the cache entries
<robmur01>
bbrezillon: my gut feeling is that allocating into L2 is most likely the problem, so invalidating is liable to make it worse ;)
<bbrezillon>
(again, not sure what in the inner domain in that case, and I won't pretend I get all the subtelties of the shareability concept)
<robmur01>
it seems plausible that NS might mean bypassing (shared) L2 and allocating directly into L1 (and possibly subsequent evictions from L1 out to L2 might not be broken)
jernej has joined #panfrost
<bbrezillon>
hm, ok, so you think it's an issue that only impacts amlogic
<robmur01>
yup, I'm fairly convinced it's some issue in their integration
<robmur01>
if I could un-break my Juno I'd flash a G52 onto it ;)
* robmur01
might have to brave the faff of trying to book a trip into the office
<bbrezillon>
robmur01: hm, my bad, mainlines set the outer-shareable attribute, not inner-shareable
<robmur01>
note that there's more awkwardness with shareability - for Midgard "LPAE" the inner domain is essentially "everything in the GPU" while the outer domain is "the rest of the system"
<bbrezillon>
which is called "SHARED_BOTH" in mali_kbase BTW
<bbrezillon>
hm, ok
<robmur01>
however with AArch64 format the meanings are changed to work more like VMSA
<robmur01>
I *think* it becomes more like inner = shader core and outer = whole GPU
<bbrezillon>
how do you think we should fix that?
<bbrezillon>
1. add a IO_PGTABLE_QUIRK_ARM_SH_NS flag
<bbrezillon>
2. use AArch64 tables
<bbrezillon>
actually, I didn't check what's used when running in AArch64 mode
kaspter has quit [Ping timeout: 256 seconds]
kaspter has joined #panfrost
raster has joined #panfrost
<robmur01>
it probably makes sense to hook up AArch64 properly for Bifrost, which might mean some fiddling with io-pgtable attributes to match kbase
<tomeu>
when I tried AArch64, I saw the same issues
<tomeu>
fwiw :)
<bbrezillon>
tomeu: do you have a branch to share?
<robmur01>
BTW, is the GPU revision in S922 r0px or r1px?
<bbrezillon>
robmur01: "GPU identified as 0x2 arch 7.2.1 r0p0 status 0"
<robmur01>
cool, thanks
<robmur01>
(apparently that generation of GPUs played the nasty Cortex-A9 trick of having significant functional differences between revisions)
<icecream95>
robmur01: rvgl is working great for me - the only problem I have is stuttering during shader compilation, but after the first lap everything is smooth
<tomeu>
hmm, wonder how hard it would be to add disck cache support to Panfrost
<robmur01>
icecream95: on RK3399 (using the binary slurped out of the odroid arm64 .deb) occasionally black cones appear on the faces of all the wheels, and the sky on the stunt arena glitches between textured and black
<robmur01>
(also it segfaults a fair bit, and my Xbox 360 controller has an annoying tendency to pull to the left... might be time to dig out the Dreamcast for some 'proper' Re-Volt again :D)
davidlt has joined #panfrost
<icecream95>
robmur01: What Mesa verison are you using? The "black cones" on wheels and the stunt arena sky have been fixed for months...
<robmur01>
hmm, it *should* be git master, but I suppose it's possible that SDL somehow gets around my ld.so.conf trick and gets the distro mesa instead - I'll double-check
raster has quit [Remote host closed the connection]
raster has joined #panfrost
<robmur01>
nope, just built master as of right now, hacked panfrost_model_name() to verify in-game that it's picking up the right mesa, and the glitches are very much still there
<robmur01>
(if it matters, I'm using ES1.1 mode without shaders)
<robmur01>
bbrezillon: actually, there is another possible reason for wonky cache behaviour...
<icecream95>
robmur01: I'm using GL3 mode (Profile=0 in rvgl.ini, and PAN_MESA_DEBUG=gl3 mesa_glthread=true)
<robmur01>
the pgprot_writecombine() mapping means who knows what stale crap is sat (clean) in the CPU cache for snoops to hit
<robmur01>
non-shareable should happen have the side effect of making non-snooping accesses thus having no possibility of inadvertently hitting stale CPU cache lines
<bbrezillon>
robmur01: the GPU reports ACE-Lite support
<bbrezillon>
but mali_base ignores it and set the coherency reg to non-coherent
<bbrezillon>
I'm trying your suggestion
<robmur01>
another trick is to run some kind of memory benchmark/test in the background to thrash the CPU cache and make sure BOs don't get a chance to hang around in there
<robmur01>
if that visibly reduces the appearance of faults it would point strongly to this
<robmur01>
(this is pretty much what I was doing with Juno last year)
icecream95 has quit [Quit: leaving]
<bbrezillon>
robmur01: hm, so you force ARM_LPAE_PTE_SH_IS ?
<robmur01>
Midgard needs OS to emit snoops properly (which is what the patch does), but Bifrost may be different and do so anyway
<robmur01>
(I'm trying to look that up ATM)
<bbrezillon>
nope, it doesn't help
<bbrezillon>
uh, wait
kaspter has quit [Quit: kaspter]
<bbrezillon>
forgot to update the dtb
<bbrezillon>
robmur01: nope, that's even worse, now I have translation faults
<robmur01>
then make sure cfg->coherent_walk gets set
<robmur01>
Maybe I should try to update that branch properly...
raster has quit [Quit: Gettin' stinky!]
<bbrezillon>
robmur01: nope, still failing
<bbrezillon>
with INSTR_INVALID_ENC faults now
raster has joined #panfrost
<bbrezillon>
I guess there's a good reason libmali disables the coherency and set the shareability attrib to NS
<bbrezillon>
well, libmali+kbase
<alyssa>
[/scroll goto -10
<alyssa>
icecream95: reality check, thanks, still need to figure out what to do about that
kaspter has joined #panfrost
<alyssa>
/me thinks her ducks are in a row to refactor formats
<alyssa>
I refactored everything so the { swizzle, format, sRGB, zero } 22-bits are all together in textures/attributes
<alyssa>
so now need to redo the table mapping PIPE to MALI to actually map PIPE to { swizzle, MALI, sRGB, zero }, all packed at compile-time
<alyssa>
then the explicit format swizzle handling goes away on <= v6 (Midgard, G71, G72)
<alyssa>
when we start paying attention to v7 (later Bifrost, without HAS_SWIZZLES quirk), we can do the same trick, just without the full swizzle (only "swap r/b" etc bits)
<alyssa>
So then we'll end up with two lookup tables depending on version, things are a lot cleaner, less runtime work too :)
<alyssa>
table itself can be done compactly with some macros, and also some python slop to ingest our current source code + Gallium format list and help do the generation
<robmur01>
BTW kmscube no longer works on RK3399 at the moment (mesa master, kernel 5.8) - "failed to set mode: Invalid argument" - will that be the AFBC modifier thing?
<robmur01>
(plus a big old pile'o'warnings from mesa about "failed to remap gl<blah>NV")
<alyssa>
robmur01: current master only uses AFBC for internal textures/fbos, anything shared will still be linear (or u-interleaved tiled)
<alyssa>
kmscube wfm
<alyssa>
maybe need to force one of `-D /dev/dri/card{0, 1}`?
<robmur01>
nope, definitely the right device - it goes through all the normal blurb up to "using modifier fff...f" before failing
<alyssa>
:|
<robmur01>
I wonder what else could be different on my system (Arch)... libdrm perhaps?
kaspter has quit [Quit: kaspter]
narmstrong_ has joined #panfrost
nhp_ has joined #panfrost
Ke has quit [*.net *.split]
narmstrong has quit [*.net *.split]
cyrozap has quit [*.net *.split]
nhp has quit [*.net *.split]
narmstrong_ is now known as narmstrong
clementp[m] has quit [Remote host closed the connection]
l-as has quit [Read error: Connection reset by peer]
nhp[m] has quit [Write error: Connection reset by peer]
nhp[m] has joined #panfrost
l-as has joined #panfrost
Ke has joined #panfrost
clementp[m] has joined #panfrost
davidlt has quit [Ping timeout: 240 seconds]
raster has quit [Quit: Gettin' stinky!]
buzzmarshall has joined #panfrost
unoccupied has quit [Quit: WeeChat 2.8]
raster has joined #panfrost
nlhowell has quit [Ping timeout: 265 seconds]
* alyssa
poking at invalidate_resource
<alyssa>
breaks webgl somehow
<alyssa>
invalidate_resource getting called but we're having to wallpaper anyway
<alyssa>
Probably some race between our batch tracking and global gallium, um