<icecream95>
False alarm, having Firefox open has a larger effect on performance than I expected
<bbrezillon>
icecream95: feel free to add your Tested-by
chewitt has quit [Ping timeout: 268 seconds]
yann|work has quit [Ping timeout: 240 seconds]
marex-cloud has joined #panfrost
warpme_ has joined #panfrost
yann|work has joined #panfrost
alpernebbi has joined #panfrost
<tomeu>
robmur01: it was indeed a PERMISSION_FAULT with EXECUTE access
<tomeu>
patch coming
<tomeu>
robmur01: what I don't understand is that on a kevin I hit the mmap codepath, but on a rockpi4 I went straight to the terminal fault
icecream95 has quit [Quit: leaving]
toggleton has joined #panfrost
chewitt has joined #panfrost
Xalius has joined #panfrost
<Xalius>
what patches are currently recommended/needed on top of the 5.5 release?
<Xalius>
I saw some mentions reading the backlog from the last 5 days
raster has joined #panfrost
megi has joined #panfrost
<tomeu>
bbrezillon, robmur01, robher: we seem to have some concurrency problem, I see faults when running deqp-gles3 only when running several instances in parallel
<tomeu>
the faults can be translation faults, but also misc other such as INSTR_OPERAND_FAULT, DATA_INVALID_FAULT, etc
<tomeu>
which suggests BOs being overwritten
<tomeu>
guess it has to be a kernel problem
<bbrezillon>
tomeu: not necessarily
<bbrezillon>
if the BO accessed by the job is not added to the bos array passed at submit time, the kernel doesn't know about it
<tomeu>
yes, but how could processes affect each other without it being kernel's fault?
<bbrezillon>
tomeu: well, if the memory backing the BO is released, it can be allocated to another process
<bbrezillon>
something like a use-after-free, except this time the user is the GPU
<tomeu>
oh, I see
<bbrezillon>
but maybe you're right and it's a kernel problem
<tomeu>
shouldn't that cause a translation fault though?
<tomeu>
as the previous mapping would have been deleted
<bbrezillon>
tomeu: are all the gles3 tests impacted?
<tomeu>
bbrezillon: you mean if faults happen with all gles3 tests?
<bbrezillon>
I'd expect it to happen on gles2 too if that was a kernel bug
<tomeu>
good point, maybe there's some kind of BO that is only used in gles3 and it's not being added to the list sent to the kernel
<bbrezillon>
tomeu: yep, if it only happens with a subset of the gles3 testsuite, maybe we can figure out what those tests have in common
<tomeu>
will give that a look
<bbrezillon>
well, if INSTR_OPERAND_FAULT is what I think (fault caused by a wrong instruction in the midgard shader bytecode), that'd be weird. I'm pretty sure BOs backing shaders' bytecode are added to the list
Xalius has quit [Remote host closed the connection]
<alyssa>
bbrezillon: INSTR_OPERAND_FAULT is caused by accessing the wrong *operand* in the midgard shader, that is, too many registers, too many uniforms, etc vs the counts reported in the shader metadata descriptor
chewitt has quit [Ping timeout: 260 seconds]
NeuroScr has quit [Quit: NeuroScr]
<bbrezillon>
alyssa: ok, so it can also be caused by a cmdstream corruption
<tomeu>
I have seen other faults in shaders, but I'm not sure they still happen
<alyssa>
Yeah, although DATA_INVALID_FAULT would probably be the more likely result of just random cmdstream corruption
<tomeu>
hmm, good point
<bbrezillon>
tomeu: then maybe double-check the transient allocation path
<tomeu>
though I was planning to see if copying to transient memory in one go before submitting would speed things up
<tomeu>
to reduce cache flushes
<tomeu>
so maybe I should do that first
<bbrezillon>
yep, it should help write-combine
<tomeu>
ok, will try that before doing any further debugging
<tomeu>
once I finish all the paperwork...
chewitt has joined #panfrost
chewitt has quit [Ping timeout: 240 seconds]
raster has quit [Quit: Gettin' stinky!]
chewitt has joined #panfrost
vstehle has quit [Ping timeout: 240 seconds]
vstehle has joined #panfrost
raster has joined #panfrost
raster has quit [Ping timeout: 240 seconds]
<alyssa>
Looking at shared memory siizing fields
<alyssa>
I think I have it understood when the size is a power-of-two
<alyssa>
things get a little (lot) more complicated when it isn't, thouh :-(
raster has joined #panfrost
chewitt has quit [Ping timeout: 265 seconds]
chewitt has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
steev has quit []
steev has joined #panfrost
MoeIcenowy has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
<bbrezillon>
tomeu: BTW, you probably want to keep the shadow buf per-batch, and not per BO
<bbrezillon>
like, fill the shadow buf until if gets full or the batch is flushed, and copy to the actual BO at that time
<tomeu>
oh, good idea
<tomeu>
but that shouldn't affect the result, right?
<bbrezillon>
no no
<bbrezillon>
it's unrelated
<tomeu>
afaics, all transient usage is for cmdstream stuff
<bbrezillon>
it is
<bbrezillon>
are you sure bo->cpu is not used directly in some path?
<bbrezillon>
to readback the content of a cmdstream element
<tomeu>
ah, good idea
<tomeu>
though it should use the transfer
<bbrezillon>
or even updated
<tomeu>
I knew it couldn't be that easy :)
<bbrezillon>
maybe add a panfrost_bo_cpu_addr() helper and patch all places where bo->cpu is accessed directly
<tomeu>
yeah, most uses transfer.cpu, but those job.cpu references look suspicious
<tomeu>
but the cmdstream was identically between a passing and failing case, so it would be something else :/
<tomeu>
as dumped, at least
<tomeu>
first_job is also a transfer
* tomeu
keeps going through each straw
<tomeu>
all are transfers :/
chewitt has quit [Ping timeout: 260 seconds]
<tomeu>
ah, seems to work if we don't sync fences for dumping traces
<bbrezillon>
tomeu: duh
<bbrezillon>
it's hiding another bug :P
<tomeu>
but even then, it regresses on 863 gles3 tests out of 8k
<robmur01>
hmm, even more fun - hit the as_count < 0 error when *starting* kmscube, it runs fine, exits fine, but trying to start it again is the point where the deadlock kicked in
<alyssa>
From the behaviour around powers-of-two it's clear at least the upper nibble is logarithmic (base 2) but that doesn't explain the behaviour for non-powers-of-two
<alyssa>
urjaman: ^ more $\LaTeX$ for you :p
<alyssa>
It *appears* the upper nibble is a shift and the bottom nibble is a linear factor
<alyssa>
But that doesn't always work, the fields with 0 in the bottom are the most egregiously wrong here
<alyssa>
Oh, hey...
<alyssa>
Notice in particular the bottom nibble only takes on one of four values in the table:
<alyssa>
0, 2, 4, 6
<alyssa>
in binary:
<alyssa>
000, 010, 100, 110
<alyssa>
So the bottom bit we don't see used here, but bits 1 and 2 form a 2-bit field.
steev has quit [Ping timeout: 248 seconds]
<alyssa>
The question is of course.... for what?
narmstrong has quit [Ping timeout: 245 seconds]
<alyssa>
Does shared memory have to be a POT? If so, then you just need the logarithm field, no 2-bit field
<alyssa>
Maybe it has to be at least cache line aligned? That was see distinct encodings for <64bytes invalidates that.
<alyssa>
(The fact that 12 bytes has an encoding distinct from 16 says... a lot)
<alyssa>
I'm tempted to think there's a way to do *3 multipliers but... why?
<alyssa>
Of course panfrost could just round up sizes to powers of two, minimum 128, since those are understood. But we shouldn't really have to?
<alyssa>
oh! oh! but it's even more wild
<alyssa>
since apparently the type matters. which invlidates some of the earlier data, gah
guillaume_g has quit [Quit: Konversation terminated!]
<alyssa>
in other news, three.js demos mostly work now
<alyssa>
I guess I'll start implementing the compiler side of the shared memory while my brain tries to piece together this~
<alyssa>
Good news is that the ISA side of shared memory is nearly identical to stack accesses, OpenCL globals, and OpenGL SSBOs
jolan has quit [Quit: leaving]
yann|work has joined #panfrost
<alyssa>
Okay, I got the general tructure for shared memory right
<alyssa>
Passing dEQP-GLES31.functional.compute.basic.shared_var_single_group now but I think I need to do some pretty serious refactoring for this to be practical..
buzzmarshall has joined #panfrost
* alyssa
needs a good name for this structure
<alyssa>
Configuration for the stack and shared memory, and the pointers to those buffers..
<alyssa>
I guess I'll just go with mali_stack_shared, though that's kind of a lame name :p
<alyssa>
mali_shared_memory is probably better, stack is shared
icecream95 has joined #panfrost
alpernebbi has quit [Quit: alpernebbi]
jolan has joined #panfrost
<alyssa>
Admittedly with some hacks, but now passing dEQP-GLES31.functional.compute.basic.shared_var_multiple_groups
<alyssa>
(Hack being worstcasing a size, still need to route the size through properly)
<alyssa>
dEQP-GLES31.functional.compute.shared_var.* failing but in part that's because of barriers right now ...
davidlt has quit [Ping timeout: 265 seconds]
NeuroScr has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
steev has joined #panfrost
narmstrong has joined #panfrost
NeuroScr has joined #panfrost
pH5 has quit [Ping timeout: 260 seconds]
warpme_ has quit [Quit: Connection closed for inactivity]