alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
yann has quit [Ping timeout: 240 seconds]
vstehle has quit [Ping timeout: 256 seconds]
Green has quit [Quit: ...]
davidlt has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
kinkinkijkin has joined #panfrost
Green has joined #panfrost
Green has quit [Quit: Ping timeout (120 seconds)]
Green has joined #panfrost
vstehle has joined #panfrost
rcf has quit [Quit: WeeChat 2.7]
rcf has joined #panfrost
rcf has quit [Client Quit]
rcf has joined #panfrost
Elpaulo has quit [Quit: Elpaulo]
nerdboy has joined #panfrost
icecream95 has quit [Quit: leaving]
kinkinkijkin has quit [Remote host closed the connection]
NeuroScr has quit [Ping timeout: 240 seconds]
NeuroScr has joined #panfrost
yann has joined #panfrost
yann has quit [Ping timeout: 272 seconds]
NeuroScr has quit [Read error: Connection reset by peer]
NeuroScr has joined #panfrost
raster has joined #panfrost
<la-s> alyssa: how would I debug bad GPU performance? I am using sway now, and though it works mostly great (the background has some graphical glitches), the performance is still not great, just as with weston.
<la-s> was thinking of trying to fix it myself
nerdboy has quit [Ping timeout: 264 seconds]
<tomeu> la-s: first step is figuring out if the bottleneck is cpu or gpu
<la-s> good point yeah
<la-s> should figure out how to profile sway
adjtm has joined #panfrost
Green has quit [Ping timeout: 256 seconds]
adjtm_ has quit [Ping timeout: 256 seconds]
Green has joined #panfrost
<tomeu> well, if it's gpu, then you can look at performance counters to figure out why
<tomeu> but if it's cpu, then something like perf top could give an indication quite quickly
<tomeu> alyssa: so the depth fbo is totally black
<tomeu> well, are very different, nto sure they are wrong because the format is different:
<tomeu> +rg32f varying_0.rrrr;
<tomeu> -rgba32f varying_0.rrrr;
<tomeu> actually, if we ignore the 3rd and 4th components, the output matches:
<tomeu> -<0.000000, 0.000000, 0.000000, 0.000000>
<tomeu> -<1.000000, 0.000000, 0.000000, 0.000000>
<tomeu> +<0.000000, 0.000000, 0.000000, 1.000000>
<tomeu> -<0.000000, 1.000000, 0.000000, 0.000000>
<tomeu> -<1.000000, 1.000000, 0.000000, 0.000000>
<tomeu> +<1.000000, 0.000000, 1.000000, 1.000000>
<tomeu> so I guess the suspect is the fragment job in the depth fbo job chain
<tomeu> the pos varyings in the color fbo job chain don't match either, but seems like the drawing is y-flipped?
adjtm_ has joined #panfrost
ente has quit [Read error: Connection reset by peer]
adjtm has quit [Ping timeout: 256 seconds]
Elpaulo has joined #panfrost
Elpaulo has quit [Quit: Elpaulo]
<alyssa> tomeu: mesa and blob are y-flipped from each others perspective, that's normal
<alyssa> so yes, frag job I guess
thecycoone has quit [Ping timeout: 256 seconds]
buzzmarshall has joined #panfrost
mixfix41 has quit [Ping timeout: 260 seconds]
mixfix4one has quit [Ping timeout: 272 seconds]
bnieuwenhuizen has quit [Ping timeout: 260 seconds]
mixfix41 has joined #panfrost
mixfix41 has quit [Ping timeout: 264 seconds]
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
xdarklight has quit [Quit: ZNC - http://znc.in]
xdarklight has joined #panfrost
cwabbott has quit [Ping timeout: 252 seconds]
cwabbott has joined #panfrost
nerdboy has joined #panfrost
<alyssa> robher: I'm seeing some pretty serious regressions in 5.6 (from 5.4)
<alyssa> Easy reproduction: open weston and run glmark2-es2-wayland -bterrain
<alyssa> (Or even -bshadow)
<alyssa> Anything that uses FBOs is hosed.
<alyssa> Even glmark2-es2-drm -bterrain (w/o a display manager) reproduces.
yann has joined #panfrost
<alyssa> I've downgraded to 5.4 in the meantime.
<urjaman> is that the same thing that i have with 5.7 (rc any) or less severe? (it complains a bunch, fails to reset the gpu, and eventually just kinda hangs the process doing the GPU stuff)
<urjaman> and yeah i jumped from 5.4 to 5.7rc so it could be introduced in 5.6 all i know
<alyssa> urjaman: not sure, try the above repro (super obvious with weston)
<urjaman> ... i'ma build glmark2 then ...
<alyssa> urjaman: fair enough :p
<alyssa> it's a fast build, dw
<urjaman> yeah more surprised i havent used it before
<urjaman> i've legit just been super lazy since 5.4 works fine for me :P
<alyssa> relatable
<urjaman> umm i'll update mesa too first
<urjaman> i did a for comparison test of doing the -bterrain on weston and got weston crashing after a few seconds of running that and a "pan_bo.c:176: pan_bucket_index: Assertion 'bucket_index >= MIN_BO_CACHE_BUCKET' failed" in the terminal
<urjaman> (comparison on 5.4 that is...)
<urjaman> i assume that's fixed already but like whoops
<urjaman> good idea to do a control test first :P
<alyssa> uhhhh
<urjaman> we'll see after about some 800 objects by this lap warmer of a C201 :P
<urjaman> yep updated mesa, and this repro runs fine on 5.4, now to reboot into 5.7rcsomething ÖP
<urjaman> *:P
<alyssa> Nyoof
<urjaman> okay interesting ... it flickered white like a handful of times and dmesg shows a bunch of gpu sched timeouts and 2+ faults
<urjaman> actually two faults and one "There were multiple GPU faults - some have not been reported"
<alyssa> urjaman: That sounds about right
<alyssa> I mean wrong but
<urjaman> and now i need to ssh in to restart this thing because i tried to start my Xorg session (just to confirm it still fails the same way-ish i guess... yup it was laggy and then hung a bit after starting firefox, same as before)
<urjaman> ... i suppose that was pointless since the kernel doesnt manage to reboot from a "reboot -f" in this state
<Lyude> alyssa: sounds like it's time for a bisect?
<alyssa> I mean wrong but
<alyssa> uhm
<alyssa> silly arrow keys
<alyssa> Lyude: probably, yeah. though there haven't been many changes, so
<Lyude> might be a change outside of panfrost maybe
davidlt has quit [Ping timeout: 260 seconds]
<alyssa> Perhaps
<robmur01> hmm, -ENOREPRO here: 5.4-rc7 and glmark2-es2-drm runs all the way through just fine
<bbrezillon> robmur01: the problem is on 5.6+
<robmur01> derp, that was supposed to say 5.7-rc4
<robmur01> been playing with Firefox under GDM with 5.6/5.7-rc with no issue either
<alyssa> Hmm
<alyssa> robmur01: is this real hw?
<robmur01> NanoPC-T4 (RK3399)
<alyssa> Alright.
<alyssa> 12 files changed, 260 insertions(+), 365 deletions(-)
<alyssa> I should maybe clean that up. Uhm.
<alyssa> Anyway, I have thousands of conformance fails to fix for fp16 now. tata :p
<urjaman> my kernel building process isnt really set up for bisecting :/
<urjaman> i guess i could set something up, but like that sounds like work
<urjaman> i guess i should check with 5.7-rc4 for completeness too (my last one was rc3)
<urjaman> but right now upgrading the Arch linux on my C201 (since i realized that was over a month old too)
<alyssa> Ahhh working in Weston feels so different after being in GNOME for so long
<urjaman> somehow the situation(TM) feels like time doesnt exist (and isnt really moving) but then suddenly you havent updated your linuces in a month+
<alyssa> urjaman: I had a terrible nightmare a few days ago where there was a worldwide pandemic
<urjaman> alyssa: how do you distinguish that from reality tho
<alyssa> I was asleep.
<urjaman> ah yeah that bit
<alyssa> fails.txt is 1519 lines long, wee. but just fixed a bunch
<alyssa> so down to 1438 :P
<alyssa> er 1133, one thing fixed a bunch
* alyssa under 1000 in her to-triage list, this is going faster than expected :~)
<bbrezillon> alyssa: same as robmur01, works fine here with mesa/master and linux/master (AKA 5.7-rc4)
<alyssa> bbrezillon: Maybe something was fixed between 5.6.1 and master?
<bbrezillon> I can test on 5.6.1
<alyssa> vmlinuz-5.6.0-1-arm64 from deabin
<bbrezillon> ok, so 5.6
<bbrezillon> alyssa: and I did not test things extensively, just ran glmark2 under weston
<alyssa> bbrezillon: glmark2 -bterrain reproduced reliably
<alyssa> (-bbuild etc do not)
<bbrezillon> yep, -bterrain
<alyssa> OK
<bbrezillon> it works fine here
<bbrezillon> but that's a debug build
<bbrezillon> maybe it has an impact
<bbrezillon> (I mean mesa debug build)
<alyssa> Same
<alyssa> here
<bbrezillon> could also be a platform issue
<bbrezillon> I'm testing on a rockpi
<alyssa> Plausibly
<alyssa> kevin here
<bbrezillon> which doesn't have the same OPP
<bbrezillon> IIRC
<bbrezillon> alyssa: same result with 5.6.0
bnieuwenhuizen_ has joined #panfrost
bnieuwenhuizen_ is now known as bnieuwenhuizen
<alyssa> under 900 :)
<urjaman> i'm building a 5.7-rc4
raster has quit [Quit: Gettin' stinky!]
<alyssa> well, I've worked through my fails list
<alyssa> but -bterrain is still a bit broken
<alyssa> time to run through CI from scratch :>
<alyssa> also not sure why I'm not seeing a statistically siginficant fps difference with fp16 on glmark
<alyssa> I guess except for -bterrain, register pressure isn't the bottleneck since they're simple enough
<HdkR> Not bounded by ALU? :)
<alyssa> HdkR: Well, lower pressure ==> more threads in flight
<HdkR> Ah right
<alyssa> But if it's memory bound, well.
<HdkR> Sounds like we just need more SoCs with >100GB/s memory bandwidth
<robmur01> Oh FFS... how do we keep forgetting this? :P
<robmur01> what does -bterrain do? pretty much guarantee running at max OPP
<robmur01> what landed since 5.4? The generic OPP support that broke voltage scaling :(
<robmur01> default GPU voltage on my board seems to be nominally 1.0V, so probably close enough to the to OPP's 1.1V to squeak by
<robmur01> and more than enough for 600MHz and below
<alyssa> robmur01: sorry? :innocent:
<urjaman> oh i thought that was something that only applied to some other board
<urjaman> not to everything
<urjaman> (like yes i had read about it here but...)
<urjaman> (also, how many kernel versions you need to fix setting a voltage...............................)
<robmur01> urjaman: the default voltage (and thus how likely higher OPPs are to go wrong) is somewhat board-dependent
<robmur01> Chromebooks seem to hurt the most since they have a different regulator setup to most reference-design-based boards
<robmur01> as far as I've seen, fixing it has turned out to be really quite fiddly thanks to awkward interaction between the regulator and devfreq APIs, and both devfreq and/or explicit regulators being optional from our PoV
<alyssa> Erg why is this test failing CI but passing local
<robmur01> "Continuous Instability"
<alyssa> >:D
<urjaman> that also applies to my experience with the kernel development process
<alyssa> Oh, joy - the behaviour chnages with gles3 exposed
<alyssa> Okay, I see the problem. But making that test pass still doesn't fix -bterrain
icecream95 has joined #panfrost
<robmur01> does `echo 300000000 | sudo tee /sys/class/devfreq/ff9a0000.gpu/max_freq` fix it?
<icecream95> Speaking of things that got broken in the last few kernel releases, the microphone doesn't work anymore on c201 - it tries recording through the speaker instead
<alyssa> has it worked recently?
<alyssa> it's been broken on kevin since forever..
<icecream95> alyssa: I'm pretty sure it was working on 5.3, or at least 5.1
<alyssa> Neigh
<alyssa> (have you tried various alsa devices btw?)
<alyssa> still a bug but maybe a userspace workaround
<icecream95> I spent a while trying to change stuff in alsamixer, but didn't manage to fix it
<alyssa> meh
<alyssa> (also, same here for kevin but I digress)