alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
raster has quit [Quit: Gettin' stinky!]
<icecream95> For my previous question about threading with LTO, I just changed the linker flag returned by meson:
<icecream95> sudo sed -i 's/flto/flto=4/' /path/to/mesonbuild/linkers.py
<icecream95> Let's see how much memory it ends up using...
stikonas has quit [Remote host closed the connection]
<tlwoerner> wow! my first time running glmark2 (-es2) on tinker with panfrost :-D
<tlwoerner> NICE! B)
<tlwoerner> it even did amazingly well with the "terrain" test, which is usually a killer
<tlwoerner> regular glmark2 segfaults?
<icecream95> glmark2 (not -es2) segfaults for me too (at least the last time I tried it...)
megi has quit [Ping timeout: 268 seconds]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
nerdboy has quit [Ping timeout: 240 seconds]
vstehle has quit [Ping timeout: 240 seconds]
mearon has quit [Ping timeout: 245 seconds]
mearon has joined #panfrost
<alyssa> that sounds like a bug :v
* alyssa builds
<alyssa> tlwoerner: :-D
<alyssa> icecream95: ....Huh.
<tlwoerner> also, specifying the --annotate option to glmark2-es2 seems to do wonky things (on the tinker)
<alyssa> tlwoerner: Wonky how..?
<alyssa> Seems fine for me here on kevin
<tlwoerner> my screen goes blank, i just restart the xorg server to recover
<alyssa> *blinks*
<alyssa> That is wonky.
<alyssa> Looking at the segfault now
<alyssa> Looks like we're missing a gallium callback related to FBOs
<tlwoerner> hmmm... and, of course, it's doing something different now. when it happened (twice now, once in fullscreen, once not) the output on the console keeps going as if nothing is amiss, but the screen is blank
<alyssa> Welcome to my life!~
<tlwoerner> lol
<tlwoerner> meh, ignore that one too wonky to bother chasing it down
<alyssa> #0 0x0000000000000000 in ?? ()
<alyssa> #1 0x0000aaaaaab15bc8 in DepthRenderTarget::setup (
<alyssa> Not helpful ... more things happen in between!
<tlwoerner> it's really nice seeing panfrost building "out of the box" and it's really nice fullscreen!
* alyssa has been using panfrost fulltime for two months now
<HdkR> nullptr? Sounds like it is a failure on the game's side
<HdkR> Probably exposed a new extension and the thing is assuming one feature also implies another
<alyssa> HdkR: In glGenFramebuffers() and glGenerateMipmap()
<alyssa> so no
<HdkR> ah, can't tell from that backtrace :P
* alyssa doesn't really understand mesa/st
<alyssa> `glGenerateMipmap` should not be crashing
<alyssa> _mesa_GenerateMipmap isn't being called?
<alyssa> This shouldn't be possible ...
<alyssa> Oh agh
<alyssa> HdkR: If I do MESA_GL_VERSION_OVERRIDE=3.0 it's fine.
<alyssa> 2.1 does the crashy path
<alyssa> Whaaat.
<HdkR> Fun stuff
<alyssa> tlwoerner: As a workaround, set MESA_GL_VERSION_OVERRIDE=3.0 and glmark2 is fine
<tlwoerner> thanks!
<alyssa> (looks like a bug in glmark)
<tlwoerner> in --fullscreen with either glmark2 or glmar2-es2 the "ideas" test is missing its shadows
* alyssa can
<alyssa> 't reproduce that here
<tlwoerner> ok
<tlwoerner> a couple of the mesa_demos programs produce an "illegal instruction" (sigkill), is that panfrost-related?
<tlwoerner> (oops, sigill)
<HdkR> disas $pc,+4?
<HdkR> Could be an assert being hit without a message
<HdkR> Which would be a `brk #1000` on AArch64
<tlwoerner> HdkR: this is rk3288
<tlwoerner> tinkerboard
<icecream95> While we're on the topic of bugs in glmark2, is anyone going to update the Wayland backend to work on sway?
<HdkR> ARMv7 would be bkpt then I think
<icecream95> I think glmark2 uses deprecated stuff in wayland that Sway has already removed
<icecream95> alyssa: I've been using panfrost fulltime for two months and two weeks now :)
<endrift> HdkR: ARMv5-ARMv7 have bkpt yes
<HdkR> endrift: Was a guess because I couldn't remember the exact instruction name :P
<icecream95> That's why you should create an ARM account under a fake company name (I chose G00gLe) so you can download the ARM arch reference manual. :)
<HdkR> You don't need an account anymore
<icecream95> At least for this one you do: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html
<urjaman> heh, i have DDI0406B in my "hard to get pdfs" folder
<urjaman> apparently they've made a new rev since...
vstehle has joined #panfrost
nerdboy has joined #panfrost
guillaume_g has joined #panfrost
davidlt has joined #panfrost
yann has quit [Ping timeout: 240 seconds]
<tomeu> bbrezillon: what do you think of these oopses? https://lava.collabora.co.uk/scheduler/job/2099293
yann has joined #panfrost
icecream95 has quit [Ping timeout: 268 seconds]
davidlt has quit [Ping timeout: 240 seconds]
<tomeu> bbrezillon: that's with 5.5-rc1 plus your patches
<tomeu> the results vary wildly between the machines: t820 just has the 31 expected fails, t720 has an extra 100 failures, t760 hangs as per above and t860 has a few hundreds extra failures
raster has joined #panfrost
<tomeu> one more run of what should have been the same, now with devfreq errors: https://lava.collabora.co.uk/scheduler/job/2099412
<bbrezillon> tomeu: looks like recursive memory corruption (or use-after-free) bugs to me
<tomeu> bbrezillon: do you think kasan would help here?
<bbrezillon> tomeu: it could
<bbrezillon> but reproducing locally would be better
<bbrezillon> debugging those issues through mesa CI sounds like a massive pain to me
<tomeu> yeah, I'm seeing similar issues here
<bbrezillon> tomeu: what's the first Ooops you get?
<tomeu> actually, locally I'm not seeing the oopses in the kernel side, just the devrfeq warnings and the GPU-side faults
megi has joined #panfrost
<tomeu> will look at the devfreq thing first
Elpaulo has joined #panfrost
<tomeu> bbrezillon: the upside is that with the new runner, we'll have a good way of stress testing new kernrels :)
<tomeu> need to remember to update it often though
karolherbst has quit [Ping timeout: 252 seconds]
<tomeu> bbrezillon: first oops with drm-misc/for-linux-next: http://paste.debian.net/1120468/
<tomeu> hrm, looks like dma_fence_set_error is being called with a wrong error
<tomeu> will check what
<tomeu> 0
karolherbst has joined #panfrost
<tomeu> bbrezillon: looks like that warn is harmless
<tomeu> oh, you already sent a patch for that!
Pak0st has joined #panfrost
<bbrezillon> tomeu: yep
<Pak0st> can the apitrace tool be built on the arm device itself? or the tool requires libunwind (that spews some compiler error on arm)
<tomeu> bbrezillon: wonder what kind of race would manifest more often on machines with faster GPUs
<tomeu> bbrezillon: another interesting data point is that most of the time, the fault doesn't cause the test to fail
davidlt has joined #panfrost
<tomeu> also wonder if instead of overwritten BOs what I'm seeing isn't stale data on the GPU side
<tomeu> so unflushed caches or so
<bbrezillon> tomeu: so, you're now testing the close/open gpu context case, right?
<bbrezillon> did you make sure all jobs are done when the context (panfrost_file_priv) is closed?
<tomeu> yes
<tomeu> and no, let me do that now
<tomeu> bbrezillon: did I understand correctly from the other day that I would need to wait for panfrost_mmu.as_count to be zero?
Pak0st has quit [Remote host closed the connection]
Pak0st has joined #panfrost
<bbrezillon> tomeu: yep
<bbrezillon> tomeu: should be 0 after panfrost_job_close()
<bbrezillon> if not, we have a proble
<tomeu> ok, going with:
<tomeu> panfrost_job_close(panfrost_priv);
<tomeu> + WARN_ON(atomic_read(&panfrost_priv->mmu.as_count) > 0);
<tomeu> keep seeing faults, but that WARN isn't triggered
<tomeu> bbrezillon: what do you think of the hypotheses of some caches in the GPU needing to be flushed?
<bbrezillon> tomeu: could be
<bbrezillon> tomeu: but I suspect we would have seen issues before if that was the case
<bbrezillon> tomeu: could also be the TLB that's not flushed when an AS becomes unused because its previous user has closed its FD
<bbrezillon> so maybe something to check in panfrost_mmu.c
Pak0st has quit [Remote host closed the connection]
abordado has joined #panfrost
<tomeu> robher: is it expected that https://gitlab.freedesktop.org/tomeu/linux/commit/d6ffdabdce55fc9b54f8f05a79a2de627eb1044d isn't in drm-misc/for-linux-next ?
<robmur01> tomeu: according to my tree that already landed way back in 5.4-rc6
<robmur01> 4cad2a574d
<tomeu> robmur01: hrm, through some other tree?
<tomeu> that drm-misc branch seems to be based on 5.4-rc4
<robmur01> gitk claims it's in drm-misc/drm-misc-fixes and drm-misc/for-linux-next-fixes
* robmur01 continues to find the drm-misc branch system horribly confusing
* tomeu as well
<tomeu> guess linux-next merges both for-linux-next and for-linux-next-fixes?
Pak0st has joined #panfrost
<tomeu> that seems to be the case
<tomeu> robher: do you have a branch with everything panfrost that is pending?
Pak0st has quit [Remote host closed the connection]
<tomeu> there isn't kasan for arm32 yet, but I'm going to try it on the h64
<tomeu> even if I don't see any kernel oopses there, only GPU faults
<tomeu> kasan isn't showing anything at all
<abordado> Hey, I submitted a Merge Request. Can I add a label to it, or does it have to be someone else?
<tomeu> abordado: I think you should be able to do so in the right side slide-in
<abordado> I don't see an edit button
<abordado> I see it for the merge request, but not on the sidebar
<tomeu> bbrezillon: 2019-12-10T13:32:00 [ 13.807043] panfrost ffa30000.gpu: AS_ACTIVE bit stuck :/
<tomeu> abordado: I see the labels in the right-side sidebar, and the possibility to edit them
<bbrezillon> tomeu: it would be interesting to disable per-FD AS and see if you still have this problem
<tomeu> let's see if I can reproduce it locally
<robmur01> abordado: tomeu: I believe labels might only be editable by those with "developer" status
abordado has quit [Read error: Connection reset by peer]
abordado has joined #panfrost
abordado has quit [Read error: Connection reset by peer]
abordado has joined #panfrost
abordado has quit [Read error: Connection reset by peer]
abordado has joined #panfrost
abordado_ has joined #panfrost
abordado_ has quit [Read error: Connection reset by peer]
megi has quit [Ping timeout: 250 seconds]
abordado has quit [Ping timeout: 265 seconds]
<alyssa> tlwoerner: I suppose that's an assert as HdkR mentioned, yes. I'm working on it :)
<alyssa> one fix just hit master last night, more to come
<robher> tomeu: That's always drm-misc-next plus the fixes branches. The problem is fixes (and Linus' tree) are merged into drm-misc-next somewhat adhoc.
<alyssa> icecream95: Woo!
abordado has joined #panfrost
abordado has quit [Read error: Connection reset by peer]
abordado has joined #panfrost
<alyssa> abordado: Congrats on your first patch!
abordado has quit [Remote host closed the connection]
abordado has joined #panfrost
abordado has quit [Read error: Connection reset by peer]
<tlwoerner> it looks like glmark2 has a memory leak, it didn't survive an over-night test
abordado_ has joined #panfrost
<alyssa> Ruh roh.
<tlwoerner> it got oom'ed at some point
abordado_ has quit [Read error: Connection reset by peer]
abordado has joined #panfrost
<tlwoerner> alyssa: the illegal instruction?
<alyssa> Ye
<tlwoerner> is there anything i can do to help track it down?
<alyssa> I mean, a debug build would give you the assertions "properly"
<alyssa> Otherwise, just lmk which test you're looking at and I'll see what's broken
<alyssa> but if you build mesa from source, that should be debug by default
<tlwoerner> a debug build of mesa i assume?
<alyssa> Mm
<abordado> alyssa: Thanks!
<tlwoerner> yes, i use OpenEmbedded
<alyssa> abordado: Good work, here's to a great v2! Let me know if you have any questions!
<robmur01> out of curiosity, does GL have bitwise booleans like good ol' Visual Basic then? As a C person I see #2025 and I'm all "but... but... but !(1) == !(2)" :)
<alyssa> robmur01: Doesn't matter what GL does, so much as what our IR does
<alyssa> And in our IR, `->invert` just means "do a bitwise complement after"
<alyssa> so not inherently boolean at all
<alyssa> maybe I should reserve that for just booleans but it hasn't been necessary
<alyssa> (Note that Midgard has 0/~0 convention for booleans so that corresponds well)
<robmur01> thanks - as someone who has barely touched any high-level code in years and has essentially no background in graphics or compilers anyway, I'm finding it fascinating to watch
megi has joined #panfrost
<alyssa> Note that, while `->saturate`/etc is a real thing in the hardware that Mali can do, `->invert` is not
<alyssa> I just felt like adding the feature to the IR since it's convenient for a lot of opt passes
<alyssa> Usually can be optimized out, and if not, well we can do an inot instruction (actually, mali doesn't have an inot, but inor with #0 is canonically inot)
abordado has quit [Quit: Leaving]
abordado has joined #panfrost
abordado_ has joined #panfrost
abordado has quit [Ping timeout: 250 seconds]
guillaume_g has quit [Quit: Konversation terminated!]
fysa has joined #panfrost
guillaume_g has joined #panfrost
fysa has quit [Ping timeout: 245 seconds]
<tomeu> bbrezillon: alyssa ok, looks like the issues I was chasing were due to state leaks that manifest with the mesa runner because it shuffles
<tomeu> and we seem to have some problems with jobs that time out
abordado_ has quit [Remote host closed the connection]
<alyssa> tomeu: Alright.
<alyssa> which specific issues are these?
<alyssa> The compiler fixes flake?
yann has quit [Ping timeout: 268 seconds]
karolherbst has quit [Ping timeout: 265 seconds]
nerdboy has quit [Ping timeout: 252 seconds]
raster has quit [Quit: Gettin' stinky!]
karolherbst has joined #panfrost
nerdboy has joined #panfrost
yann has joined #panfrost
indy has quit [Ping timeout: 240 seconds]
indy has joined #panfrost
stikonas has joined #panfrost
jschwart has joined #panfrost
guillaume_g has quit [Quit: Konversation terminated!]
davidlt has quit [Ping timeout: 276 seconds]
robertfoss has quit [Ping timeout: 276 seconds]
robertfoss has joined #panfrost
stikonas_ has joined #panfrost
stikonas has quit [Ping timeout: 246 seconds]
jolan has quit [Quit: leaving]
<steev> swinging back around.... reading back... spilling fixes... fixed enough to launch X with modesetting? or am i misreading
jolan has joined #panfrost
icecream95 has joined #panfrost
icecream95 has quit [Remote host closed the connection]
icecream95 has joined #panfrost
jschwart has quit [Ping timeout: 245 seconds]
raster has joined #panfrost
<tlwoerner> tunnel: ../mesa-19.2.4/src/gallium/drivers/panfrost/pan_context.c:292: translate_tex_wrap: Assertion `!"Invalid wrap"' failed.
<icecream95> Why does everyone else have problems with spilling on X, while I was using X with modesetting a month ago?
<steev> were you using the 5.4 kernel?
<icecream95> The 5.4 kernel didn't exist then, so I must have been using 5.3.
<steev> well that could be why
<icecream95> It looks like you can get a release build with assertions using `--buildtype=release -Db_ndebug=false`
<anarsoul> I doubt that spilling has to do anything with kernel :)
stikonas has joined #panfrost
nerdboy has quit [Ping timeout: 250 seconds]
raster has quit [Quit: Gettin' stinky!]