<tomeu>
alyssa: I'm back to looking at why some tests fail when run in different order, and was wondering if you had thoughts on what gets rendered for dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgb in https://people.collabora.com/~tomeu/TestResults-bad.xml
<tomeu>
alyssa: I know you don't trust fault_pointer, but for me it has been always telling where the problem was
<tomeu>
wonder if there could be a dangling reference to the mali_sampler_descriptor
<tomeu>
oh, back to the sampler_count issue :p
<tomeu>
well, guess it's cool that the same difference appears in both t720 and t760 even if different stuff breaks
robmur01_ is now known as robmur01
megi has quit [Ping timeout: 276 seconds]
<tomeu>
alyssa: you may be able to reproduce those problems on t860 if you run these tests in this order: http://paste.debian.net/1120908/
<alyssa>
tomeu: For the dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgb .xml you linked, the first thing I noticed is the issue is entirely along a triangle edge
<alyssa>
tomeu: That pointer to sampler_descriptor would be telling, yes.
<alyssa>
IME fault_pointer goes broadly to the right place, it just doesn't have the precision to make it overly useful.
<alyssa>
tomeu: Running those 6 tests in that order, they all pass here.
<tomeu>
damn
<tomeu>
alyssa: I'm looking at adding a --no-shuffle flag to the runner :/
<alyssa>
I mean
<alyssa>
Shuffle is a good thing if we don't break when we do it ....
<tomeu>
totally
<tomeu>
quite important for product-readiness, IMO
<tomeu>
alyssa: btw, wonder if we shouldn't be printing the whole cmdstream, without hiding stuff
<tomeu>
and then have tools that one can optionally run to hide stuff or detect werird situations, etc
<tomeu>
because I'm always afraid when hunting differences in the cmdstream of some being hidden
<alyssa>
tomeu: I mean ... the goal of the pandecode stuff is that it doesn't hide anything that's not already as expected
<alyssa>
If you want the verbosity, I mean, by all means we can add a verbose mode but :shrug:
<alyssa>
I suspect that has a subtle bug but I don't know what.
<tomeu>
alyssa: well, but our expectations may have bugs
<tomeu>
and then we don't have tools to debug those
<alyssa>
tomeu: If the code doing the expectation has a bug, then yes, that's a problem.
<tomeu>
that's what I'm afraid of right now
<alyssa>
If the expectation is just flat-out wrong, we do have a tool -- trace the blob; there should be no warnings/errors since that's ground-truth for us.
<tomeu>
but what if a difference between the blob and panfrost is hidden because there's more than one possible value that matches the expectations?
<alyssa>
tomeu: That's a bug in pandecode, then - expectations must be unambiguos.
<tomeu>
alyssa: why do you think that that patch has a bug?
<alyssa>
tomeu: It's right near when the flake starts coming, and everything else around it is unambiguously correct; that's the only one that's complex enough to cause issues I think
<alyssa>
But not sure.
<alyssa>
since that's coming up green in CI somehow.
<tomeu>
grr, when I was young, tests either passed or failed
<tomeu>
this flakes thing is pure nonsense
<alyssa>
tomeu: FWIW, even without expectation checking / hiding, pandecode is necessarily lossy.
<alyssa>
There's always the possibility that a struct is larger than we expect and there are fields after it that we just don't notice.
<alyssa>
The only way around that is to do a complete hexdump of the entirety of mapped memory.
<alyssa>
And -- to be clear -- that is *exactly* what we did in the early days of panfrost when we didn't know the sizes of anything.
guillaume_g has quit [Quit: Konversation terminated!]
<alyssa>
Nowadays that is ... not helpful.
robertfoss has quit [Ping timeout: 240 seconds]
robertfoss has joined #panfrost
<alyssa>
Oh come on.
<alyssa>
If I have just the FBD change, green.
<alyssa>
If I have just the stack size change, green.
<alyssa>
If I have both, flake.
<alyssa>
This is disturbing :|
enunes has quit [Read error: Connection reset by peer]
nerdboy has quit [Ping timeout: 268 seconds]
enunes has joined #panfrost
<alyssa>
but now
<alyssa>
uck.
enunes has quit [Read error: Connection reset by peer]
megi has joined #panfrost
enunes has joined #panfrost
<alyssa>
Now the flake seems to have gone away
<alyssa>
tomeu: I think one of your ubsan changes might've did it.
<alyssa>
Or dumb luck.
<alyssa>
....But then I had messed with the skips file so that might be spurious uhm rerunning more CI.
cowsay_ has joined #panfrost
cowsay has quit [Ping timeout: 276 seconds]
chewitt has quit [Read error: Connection reset by peer]
chewitt has joined #panfrost
<alyssa>
Meh.
<alyssa>
Code is landed, so current panfrost master should work with your favourite apps/games/whatever
<alyssa>
glamor, neverball, webgl stuff, etc. seem to work okay now.
<sravn>
alyssa: how far is panfrost from supporting Chromium? Or maybe it is Chromium that does not yet support panfrost?
<alyssa>
sravn: Not sure how much we're missing, iirc it was just buggier than I was comfortable with
<alyssa>
Firefox works wonderfully though!
<anarsoul>
alyssa: great!
<alyssa>
anarsoul: Don't say great yet, I broke CI thanks to a flake :V
<anarsoul>
what's with CI?
<alyssa>
test is failing in master but only sometimes
<anarsoul>
ah, the nastiest kind of failures
<sravn>
alyssa: thanks for the quick Chromium update. We have at work some HTML app thingy that today runs only on Chromium - on top of i.MX with etnaviv.
<sravn>
With all the wonderful work you and others do we have a much wider choice in the future.
<alyssa>
Oh!
<sravn>
That future may also bring another browser :-)
<anarsoul>
oh nooo
raster has quit [Quit: Gettin' stinky!]
<daniels>
alyssa: if you've found the flake, please push a commit to add it to the flake list with my A-b
stikonas has joined #panfrost
nerdboy has joined #panfrost
Lyude has quit [Quit: WeeChat 2.4]
Lyude has joined #panfrost
<alyssa>
daniels: It's.. not one flake..
<alyssa>
daniels: If I ban the flaking test, 1 other test flakes, etc
<alyssa>
if I ban the whole section, one test from the next section flakes
<daniels>
eugh ... state leaks then?
<alyssa>
Probably.
<daniels>
but if you write it off as a flake, it'll still be executed, right? so you eat the flakiness in one spot, and have time to hunt the leak later
<alyssa>
Flake list is stuff that's not executed
<alyssa>
Which is why it just moves the flake elsewhere
<alyssa>
So yeah, a state leak seems likely, but I'm not seeing anything
<alyssa>
Oh, here's a theory ... Maybe a BO is being used unintitalized
<alyssa>
so state is leaked that way
jschwart has joined #panfrost
<daniels>
ahh, I thought that flake still ran but ignored the result :\
<alyssa>
Nope
<alyssa>
Also I tried a whole series of fixes
<alyssa>
Still the one flake.
<alyssa>
And I can confuse this is nondeterministic
<alyssa>
(I have a green CI run)
<alyssa>
(Buried between red runs)
<daniels>
woop.
<alyssa>
:|
* alyssa
doesn't know what to do about the flake
abordado has joined #panfrost
<urjaman>
can you skip the one before the flaking one?
<urjaman>
or does that have no effect?
davidlt has quit [Ping timeout: 240 seconds]
jschwart has quit [Ping timeout: 245 seconds]
jschwart has joined #panfrost
warpme_ has quit [Quit: Connection closed for inactivity]