alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
stikonas has quit [Remote host closed the connection]
yann has quit [Ping timeout: 246 seconds]
kaspter has joined #panfrost
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
l-as has quit [Ping timeout: 246 seconds]
milkii has quit [Ping timeout: 272 seconds]
clementp[m] has quit [Remote host closed the connection]
nhp[m] has quit [Remote host closed the connection]
Ke has quit [Remote host closed the connection]
milkii has joined #panfrost
marcodiego has quit [Quit: Leaving]
clementp[m] has joined #panfrost
l-as has joined #panfrost
Ke has joined #panfrost
nhp[m] has joined #panfrost
davidlt has joined #panfrost
tomboy65 has quit [Read error: Connection reset by peer]
tomboy65 has joined #panfrost
tomboy65 has quit [Read error: Connection reset by peer]
tomboy65 has joined #panfrost
guillaume_g has joined #panfrost
<tomeu> bbrezillon: indeed ouch :(
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
warpme_ has quit [Quit: Connection closed for inactivity]
raster has joined #panfrost
<tomeu> bbrezillon: alyssa: one more instance: https://gitlab.freedesktop.org/jekstrand/mesa/-/jobs/4117639
<tomeu> alyssa: could it be related to a recent change? I haven't seen those in a long time
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 246 seconds]
camus1 is now known as kaspter
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
unoccupied has joined #panfrost
BenG83 has joined #panfrost
BenG83 has quit [Ping timeout: 246 seconds]
<bbrezillon> tomeu: had a quick look at the T76X HW issues defined in mali_kbase, and nothing related to AS_ACTIVE stuck popped up
<bbrezillon> but I wonder if we shouldn't reset the GPU when that happens
<tomeu> hmm, I thought that precisely happened when resetting the GPU
<tomeu> maybe we should reset harder when that happens
<bbrezillon> there doesn't seem to be a timeout prior to those "AS_ACTIVE bit stuck" messages in https://gitlab.freedesktop.org/bbrezillon/mesa/-/jobs/4125895
<bbrezillon> my guess is that the timeout happens because the MMU is stuck and the job wants to do a flush/invalidate
<bbrezillon> oh, and we also ignore the return of write_cmd(), meaning that the MMU command might be skipped entirely without ever blocking the rest of the submission
yann has joined #panfrost
clementp[m] has quit [Quit: killed]
Ke has quit [Quit: killed]
nhp[m] has quit [Quit: killed]
l-as has quit [Quit: killed]
clementp[m] has joined #panfrost
<tomeu> ah, that's bad in itself
<tomeu> so what we should do: propagate errors and don't try to submit a job if we weren't able to prepare its AS, and reset the whole GPU if the MMU appears stuck?
<bbrezillon> sounds like a sane approach
<tomeu> hmm, or maybe propagate errors so submission fails, and reset the GPU whenever that happens?
l-as has joined #panfrost
Ke has joined #panfrost
nhp[m] has joined #panfrost
icecream95 has joined #panfrost
<icecream95> bbrezillon: tomeu: AS_ACTIVE got stuck three times on rk3288-veyron-jaq-cbg-0 this month but none on -1
<tomeu> hmm
<tomeu> and when did it happen for the first time?
<icecream95> I suspect that -0 needs more voltage when running the GPU at 600MHz than -1
<icecream95> The kernel used for CI still doesn't have dynamic voltage scaling, right?
<tomeu> not yet, indeed
<tomeu> could be that, let me check where the patches are
warpme_ has joined #panfrost
<icecream95> To confirm, try adding 'echo 600000000 >/sys/class/devfreq/*.gpu/min_freq' before dEQP runs and see if that causes -0 to fail
<icecream95> alyssa: I use zram for swap (zstd, currently 3.6G/5G used with 50% compression) and have never had OOM issues like you mention
<icecream95> I think this memory leak I reported 2 months ago is still at large: https://gitlab.freedesktop.org/snippets/1031
stikonas has joined #panfrost
<daniels> icecream95: that's a _really_ good spot, thankyou! I know we've had issues with -0 and not -1 in the past; they should have identical firmware but even that shouldn't matter as the kernel sets up the whole clock tree; I wonder if it's having thermal issues, or if it's simply just a bit older and needs to be put out to pasture
<daniels> robmur01: could you please register an account on https://gitlab.freedesktop.org so I can harass you there? :)
<icecream95> daniels: Because dynamic voltage scaling isn't being used, the GPU is kept at the same low voltage the firmware sets it to. I think -1 can undervolt better, so still works at the low voltage, but -0 needs a higher voltage to be stable
<icecream95> Setting a maximum frequency of 400 MHz until voltage scaling arrives should make it more stable: echo 400000000 >/sys/class/devfreq/*.gpu/max_freq
icecream95 has quit [Ping timeout: 240 seconds]
<robmur01> daniels: you know I'm just the pagetable guy, right? :P
<daniels> think of it as a personal growth plan?
<daniels> (more seriously, does this mean I should be tagging stepri01 for non-MMU things?)
<robmur01> Why yes Office 365, the confirmation email most definitely deserves to be quarantined as a phishing attempt. Sigh...
<daniels> Office365 is generally pretty skeptical of fd.o due to the volume of spam which comes through Mailman
<robmur01> daniels: technically Steve and RobH are more officially involved than I am
<daniels> sure :)
<robmur01> I'm mostly squeezing it under my general "upstream kernel support" remit because it's more fun and interesting than reviewing SMMU patches ;)
<robmur01> anyway, I'm in - usual work username because laziness
davidlt has quit [Ping timeout: 246 seconds]
nlhowell has quit [Ping timeout: 246 seconds]
davidlt has joined #panfrost
<alyssa> tomeu: bbrezillon: I first saw that with the genxml attribute/varying series but I couldn't bisect it since nondeterminism and nothing stood out as wrong so I thought it was a fluke..
<alyssa> "think of it as a personal growth plan?" lol
<alyssa> robmur01: the trick is to just quarantine EVERYTHING, as 2020 has taught us :p
<alyssa> icecream95: hm, interesting. It's certainly a lot better on 4gb than 2gb as mentioned. I also run without swap at all since I'm stubborn, so that isn't helping :)
nlhowell has joined #panfrost
<alyssa> icecream95: Oh, TIL about heaptrack, neat!
<alyssa> seems a lot more pleasant to use for leaks than valgrind :)
<bbrezillon> tomeu: is it caused by my patch?
<bbrezillon> do you have a branch I can look at?
<tomeu> wonder if any of the panfrost patches I backported depend on stuff outside of panfrot
<tomeu> trying now with drm-misc-next
nlhowell has quit [Ping timeout: 246 seconds]
BenG83 has joined #panfrost
raster has quit [Remote host closed the connection]
raster has joined #panfrost
kaspter has quit [Quit: kaspter]
<macc24> does panfrost support S3TC textures?
nlhowell has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
tgall_foo has quit [Quit: Textual IRC Client: www.textualapp.com]
<alyssa> macc24: yes, if your device supports it
<alyssa> rk3399 does, rk3288 does not
<macc24> :(
<macc24> alyssa: can those textures be implemented on rk3288 in software?
<urjaman> i think mesa provides a software implementation ... or was it for some other required format?
<macc24> when i try to run anything that needs s3tc it complains that it is missing
<urjaman> okay was something else then (or related to details about when it is mandatory...)
tgall_foo has joined #panfrost
<macc24> oops
raster has quit [Quit: Gettin' stinky!]
davidlt has quit [Read error: Connection reset by peer]
raster has joined #panfrost
davidlt has joined #panfrost
guillaume_g has quit [Quit: Konversation terminated!]
Elpaulo has quit [Read error: Connection reset by peer]
Elpaulo has joined #panfrost
BenG83 has quit [Ping timeout: 246 seconds]
<HdkR> alyssa: Can confirm, heaptrack is great
<HdkR> Really helped me smash down small allocations
<alyssa> HdkR: :D
<HdkR> TFW hunting a SIGBUS in an application that catches SIGBUS
raster has quit [Remote host closed the connection]
<alyssa> ;-;
<HdkR> What's even more fun is that it seems to be a SIGBUS that my SIGBUS handler just doesn't catch ¯\_(ツ)_/¯
<HdkR> Oh frick frack, I missed a commit, so sigprocmask was killing it :|
<urjaman> catching SIGBUS sounds oddly like you're talking public transit :P
<HdkR> I tried to catch the SIGBUS but it turns out it was SIGILL
gcl_ has joined #panfrost
gcl has quit [Ping timeout: 240 seconds]
gcl_ has quit [Ping timeout: 246 seconds]
<HdkR> Oh, Valhal device is arriving today
<HdkR> Valhall even
gcl has joined #panfrost
<Lyude> HdkR: see you seen in Valhall[a]
<HdkR> :P
davidlt has quit [Ping timeout: 240 seconds]
stikonas has quit [Remote host closed the connection]
davidlt has joined #panfrost
jgmdev has joined #panfrost
jgmdev has quit [Client Quit]
AreaScout_ has quit [Ping timeout: 240 seconds]
ezequielg has quit [Read error: Connection reset by peer]
enunes has quit [Read error: Connection reset by peer]
enunes has joined #panfrost
ezequielg has joined #panfrost
enunes has quit [Ping timeout: 240 seconds]
davidlt has quit [Ping timeout: 256 seconds]
stikonas has joined #panfrost
enunes has joined #panfrost
buzzmarshall has joined #panfrost
<Lyude> tomeu: do you have any idea how the panfrost tests in IGT get built for autotools? I thought this would be more obvious but I don't see anything listed in tests/Makefile.sources
<alyssa> Lyude: *distant voice* they don't
<Lyude> alyssa: figured it might be something like that, I'm just a little surprised because it seems like something we test in CI according to the gitlab pipeline from here: https://patchwork.freedesktop.org/series/74811/
<Lyude> oh wait
<Lyude> duh, it says right there | grep -v vc4\|v4d\|panfrost
* alyssa shrugs
* Lyude has answered her question :), will just make nouveau exempt from that check as well
<HdkR> Valhall is here
<alyssa> HdkR: Woo!
<alyssa> what about godot
<HdkR> pfft
<HdkR> We seem to have a fun person over here
<alyssa> Who, Godot?
enunes has quit [Quit: ZNC - https://znc.in]
<HdkR> :>
<HdkR> Oh wow, it didn't even have the typical setup process on it
<HdkR> Now lets see if I can remember how to build something using the android NDK
<alyssa> :d
<HdkR> There we go, did it
* HdkR generates new es2_info
BenG83 has joined #panfrost
BenG83 has quit [Quit: Leaving]
<HdkR> There we go. G77 es2_info
<HdkR> and of course my desktop would lock up right when I send that
<alyssa> and it's a he! hi! coming down the plains!
<HdkR> lol
<HdkR> TFW a USB hub causes a kernel panic
<alyssa> Android called?
<alyssa> (TFW video conferencing causes a kernel panic, my alptop called)
<HdkR> There's a couple of devices in a chain, hard to know exactly which one caused it
<HdkR> I blame the final hub, it has had issues in the past
* HdkR orders a new less derpy hub
<alyssa> muffin?
<HdkR> zucchini cake
<HdkR> Now where di I put that triangle drawing code
<HdkR> Alright, now where is pantrace
<alyssa> I ate it
<alyssa> ("the muffin?" "no, pantrace.")
<HdkR> Hm, looks like panwrap is looping on its injection points?
<HdkR> Hm, what is /vendor/etc/meow.cfg
<urjaman> lmao
<HdkR> Hmmmm, why is it opening `/proc/getppid()/cmdline` a dozen times and pushing ioctls through that
<HdkR> Things are happening here
austriancoder has quit [Ping timeout: 244 seconds]
lvrp16 has quit [Read error: Connection reset by peer]
austriancoder has joined #panfrost
lvrp16 has joined #panfrost
<HdkR> Yea, definitely missing files being opened
<HdkR> FDs jump from 6 to 12 without catching what is opened inbetween
<HdkR> If this was Linux I could just strace
<alyssa> HdkR: meow.cfg, beautiful
<HdkR> :D
<HdkR> LD_PRELOAD is behaving a bit weirdly as well. constructor is running after a few logs are being pushed through...?
raster has joined #panfrost
macc24_ has joined #panfrost
macc24_ has quit [Client Quit]
yann has quit [Ping timeout: 256 seconds]
yann has joined #panfrost