alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
warpme_ has quit [Quit: Connection closed for inactivity]
raster has quit [Quit: Gettin' stinky!]
archetech has quit [Quit: Leaving]
stikonas has quit [Remote host closed the connection]
vstehle has quit [Ping timeout: 240 seconds]
kaspter has quit [Ping timeout: 265 seconds]
kaspter has joined #panfrost
icecream95 has joined #panfrost
<icecream95> alyssa: Just borrow a Threadripper and bisect the kernel between 5.8 and 5.9 to find the leak
<dstzd> i happen to have a 3900x i could attach over a wireguard tunnel.....
<dstzd> this leak makes me sad
<icecream95> dstzd: Thanks for volunteering to do the bisecting
<dstzd> lol
<dstzd> it was volunteered as a distcc
<dstzd> i don't know the first thing about bisecting 😕
<icecream95> dstzd: `man git-bisect`
icecream95 has quit [Quit: leaving]
davidlt has joined #panfrost
chewitt has quit [Quit: Adios!]
archetech has joined #panfrost
kaspter has quit [Ping timeout: 272 seconds]
kaspter has joined #panfrost
davidlt has quit [Ping timeout: 260 seconds]
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 260 seconds]
camus1 is now known as kaspter
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 240 seconds]
camus1 is now known as kaspter
vstehle has joined #panfrost
felipealmeida has quit [Ping timeout: 246 seconds]
felipealmeida has joined #panfrost
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 260 seconds]
camus1 is now known as kaspter
<archetech> and the odroid crowd goes wild with the 5.10rc1 release
<HdkR> oh?
<HdkR> Something special about that RC?
<archetech> has the pfrost patches in it for the ones who've waited vs patched
<HdkR> ah
nlhowell has quit [Ping timeout: 265 seconds]
icecream95 has joined #panfrost
<endrift> oh right linux 5.9 came out already
Elpaulo has quit [Remote host closed the connection]
Elpaulo has joined #panfrost
nlhowell has joined #panfrost
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
hoddblaegg has joined #panfrost
davidlt has joined #panfrost
raster has joined #panfrost
archetech has quit [Quit: Leaving]
archetech has joined #panfrost
chewitt has joined #panfrost
kaspter has quit [Ping timeout: 260 seconds]
kaspter has joined #panfrost
icecream95 has quit [Ping timeout: 265 seconds]
archetech_n2 has joined #panfrost
stikonas has joined #panfrost
alpernebbi has joined #panfrost
archetech_n2 has quit [Quit: Konversation terminated!]
<dstzd> bisect done.... https://bpa.st/2NSQ i lost confidence that i got it right when i skipped because i wasn't sure because the panic i got was not the same as the leak .... but i definitely narrowed down the panic and possibly the leak too so? idk... maybe the narrowing can at least help someone.
<dstzd> i'm also not sure my reproduction method of furiously scrolling around in evince was a sure enough thing.... but if someone wants to give me more direction i'm open to it.
yann|work has joined #panfrost
yann has quit [Ping timeout: 260 seconds]
warpme_ has joined #panfrost
kaspter has quit [Ping timeout: 260 seconds]
kaspter has joined #panfrost
<warpme_> guys: i'm testing current master with drm_prime EGL_LINUX_DMA_BUF_EXT and on panfrost mesa gives kernel execute from non-executable memory at mediaplayer exit. It is on t820/g31. Mali/vc4 with exact the same sw.stack exits ok - so thing is panfrost related and seen only when EGL_LINUX_DMA_BUF_EXT is used. kernel dmesg says following:
<warpme_> is this known issue?
<HdkR> Mali/vc4?
<warpme_> HdkR: bcm2835 vc4 and mali450mp4
<archetech> those are tot diff chips
<archetech> neither use pnafrost
<archetech> pan
<HdkR> Was weird since t820 and g31 is also a Mali :P
<warpme_> exactly. by this i wan to say that v4l2 m2m decode with rendering by drm prime via EGL_LINUX_DMA_BUF_EXT root cause and exactly the same code patch work ok on vc4+mali450 and have issue on t820/g31 (panfrost)
<warpme_> root cause-> is not riot cause
<archetech> yeah 450 uses lima drv pfrost is in flux
<warpme_> geez... this spellchecker kills me
<HdkR> Weird that the kernel is jumping to non-executable code page. Assumption being broken somewhere
<robmur01> warpme_: which of those SoCs are using the same video codec drivers? Seeing vb2_common_vm_close in the LR as the most recent call (and thus the likely culprit for trashing the stack) makes me wonder if the difference in GPU IP is just coincidental with a difference in video IP...
<robmur01> probably thinks it's calling vb2_vmarea_handler::put, but the VMA's vm_private_data was actually something else entirely
<warpme_> robmur01: well - video IP is the same for affected SoC (in a sense common kernel module using different firmwares) - but i have 3rd SoC which has also the same video IP (s905) and it works Ok - but it uses lima - not panfrost. So 1:1 mapping between failing / working SoC is: panfrost on s912/sm1 (nok) and lima on s905 + vc4 on brcm2835 (ok).
* robmur01 doesn't really know how dma-buf works, but wonders if the buffer somehow gets mmap'ed through both VB2 and DRM such that one manages to trample the other's private data
<warpme_> robmur01: i decided to put issue question here only because the same code works ok on lima/s905 + vc4/brcm2835. By this i'm assuming dmabuf/EGL_LINUX_DMA_BUF_EXT part is ok.....
<robmur01> sure - if the other variables are sufficiently close then the question is what could panfrost possibly do that apparently lima doesn't, to make vb2 code crash?
<robmur01> the symptom looks to be a VMA originally mapped via vb2_*_mmap() (for vb2_common_vm_ops to be in place) but whose private data has somehow been stomped in the meantime
<robmur01> I refuse to believe that a wild write from the GPU itself puts a kernel address pointing to another reasonable-looking kernel address into a random kernel data structure, reproducibly ;)
<robmur01> the only other obvious assignments I can see to vma->vm_private_data are in core DRM and dma-buf code :/
gcl has joined #panfrost
<robmur01> drm_gem_mmap_obj() in particular looks suspect (as most other rewrite the ops at the same time)
<narmstrong> The panfrost iommu setup of this imported buffer could be faulty ?
gcl_ has quit [Ping timeout: 258 seconds]
alpernebbi has quit [Quit: alpernebbi]
hoddblaegg has quit [Quit: Konversation terminated!]
gcl has quit [Ping timeout: 256 seconds]
gcl_ has joined #panfrost
<warpme_> and I remember it was working ok in past.....
<robmur01> narmstrong: I highly doubt it, per 12:33:14
<alyssa> robmur01: fun fact: rust makes it impossible for memory addresses to be corrupted by coprocessors 🙃
<alyssa> s/
<raster> \o
gcl has joined #panfrost
<kinkinkijkin> ah yes, the good ol "all outside changes are corruption" approach
<kinkinkijkin> of course, if you have more than one register on your cpu then multitasking is actually corruption
gcl_ has quit [Ping timeout: 256 seconds]
<robmur01> I'll save my trust for a language whose name isn't literally 'broken trust'
<kinkinkijkin> did you know that using mapped memory is an inherently corruptive task? at least that's what [insert modern compiler here] tells me
<raster> asan?
<narmstrong> warpme_: did you try with older kernels ?
<kinkinkijkin> also filesystems? pff, you mean corruption blocks
<archetech> brust?
<chewitt> kinkinkijkin can you share your xu4 kernel defconfig .. I'm trying to spot what changed between 5.7 and 5.9
<chewitt> panfrost appears to load okay, but I am missing something in/amongst V4L2 stuff I think
<kinkinkijkin> im just using armbian's 5.4-x-odroidxu4 as it comes
<chewitt> 5.4?
<kinkinkijkin> and 5.8 crashes on low load consistently
<kinkinkijkin> which is why i backed out to 5.4
<chewitt> and @Igorpec was using my 5.7 branch for experiments :)
<chewitt> I have to go find their 5.8 .. one step closer to 5.9
<archetech> try 5.10rc1
<kinkinkijkin> if i had a source to a working patched 5.7 i'd switch up immediately, i don't want to test on 5.4 anymore
<chewitt> it was using kbase/blob not panfrost tho
<chewitt> I have drivers and such to go with it
<warpme_> guys: re: kernel execute from non-executable memory on panfrost: I have it at mediaplayer exit. Start and playback is ok. Exit is issue.
<robmur01> I'd be a little worried if you were calling munmap() *during* playback ;)
jernej has quit [Read error: Connection reset by peer]
jernej_ has joined #panfrost
jernej_ is now known as jernej
<robmur01> yeah, AFAICS it looks possible to get into that state via a sequence of drm_gem_mmap_obj->drm_gem_shmem_mmap->dma_buf_mmap->vb2_*_dmabuf_ops_mmap
stikonas has quit [Ping timeout: 272 seconds]
<robmur01> vma->vm_ops = vb2_common_vm_ops but then vma->vm_private_data is stomped with a drm_gem_object
<robmur01> so how does that manage to *not* happen on S905 if everything else is nominally the same? :/
dstzd has quit [Read error: Connection reset by peer]
<robmur01> panfrost, lima and vc4 all use drm_gem_shmem_mmap...
<robmur01> anyway, that's about as far as I can get - time to get back to 'real' work :(
<robmur01> might be worth a mail to dri-devel and linux-media for more knowledgeable folk to dig into
<warpme_> narmstrong: let me rebuild with 5.8.....
stikonas has joined #panfrost
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 240 seconds]
camus1 is now known as kaspter
<robmur01> if 5.8 is OK, 26d3ac3cb04d smells relevant
felipealmeida has quit [Ping timeout: 260 seconds]
felipealmeida has joined #panfrost
<macc24> i run 5.8.1 on my daily driver laptop and it's 100% fine on t760
kaspter has quit [Quit: kaspter]
<warpme_> re: kernel execute from non-executable memory on panfrost: 5.8.15 works ok.... interesting
gcl_ has joined #panfrost
gcl has quit [Ping timeout: 265 seconds]
gcl has joined #panfrost
gcl_ has quit [Ping timeout: 260 seconds]
<chewitt> kmscube is running on t628 at the first attempt
<chewitt> sorry.. second attempt
<chewitt> first attempt locked the OS :)
<rtp> chewitt: last time i tried on my t628 device kmscube was working and it was long time ago. gnome-shell was broken iirc
stikonas_ has joined #panfrost
stikonas has quit [Ping timeout: 240 seconds]
<chewitt> I got the kodi home screen up for ~5 secs, then it locked up :)
<chewitt> this is still progress since I tried things in the summer with 5.7 kernel
<rtp> oh, funny. According to the last commit in the mesa snapshot I have, my last try was nearly one year ago (few days off).
<narmstrong> warpme_: can you try reverting 26d3ac3cb04d on 5.9 ? maybe bisect between 5.8 and 5.9 if it doesn't fix anything...
<warpme_> narmstrong: yes. already building 5.9.1 with reverted 26d3ac3cb04d...
<robmur01> note that the commit log alludes to some cleanup, which I'm guessing 7d2cd72a9aa3 and 526408357318 subsequently implement, so that revert alone might go wrong in a different way
gcl_ has joined #panfrost
gcl has quit [Ping timeout: 240 seconds]
gcl_ has quit [Ping timeout: 260 seconds]
gcl has joined #panfrost
cphealy_ has joined #panfrost
davidlt has quit [Quit: Leaving]
stikonas_ is now known as stikonas
rando25892 has quit [Ping timeout: 265 seconds]
<archetech> see how this does
<archetech> manj-n2 Kernel: 5.10.0-rc1-ARCH aarch64 bits: 64 Console: tty 0 Distro: Manjaro ARM
<archetech> Machine: Type: ARM Device System: Hardkernel ODROID-N2Plus
<alyssa> 🙉
<chewitt> been there, done that :)
<alyssa> wha
<archetech> manjaro dont have this kern yet its custom
davidlt has joined #panfrost
<chewitt> the manjaro dev who does amlogic normally uses my kernel sources, and I didn't push 5.10.y public yet
<archetech> strit?
<archetech> he tried tobuild it gues theres an issue
<narmstrong> nice to see my email patches flow to chewitt's tree to Armbian and Manjaro trees :-)
<chewitt> spikerguy is the one who picks my stuff (which I pick from patchwork)
<archetech> and next is linux from Scratch whic is where this will get copied to yes youfolks have lots of downhill effect
archetech has quit [Quit: Konversation terminated!]
<chewitt> alyssa: any suggestions for things that might make t628 less insta-crashy?
<chewitt> or how to get dumps/traces or other cruft that might useful in figuring out why it's crashy
archetech has joined #panfrost
cphealy_ has quit [Remote host closed the connection]
<warpme_> narmstrong: robmur01: re: kernel execute from non-executable memory on panfrost: reverting 26d3ac3cb04d + 7d2cd72a9aa3 and 526408357318 + manual fix for compile on 5.9.1 solves issue. So it looks like 26d3ac3cb04d is regression on 5.9. panfrost bits: sorry :-)
<warpme_> should i call dri-devel?
<robmur01> yeah, it definitely needs fixing - even if your userspace were doing something completely wacky and unexpected with mmaps, it still shouldn't be able to trip the kernel up that badly
<alyssa> 👻
<warpme_> alyssa: btw: v.nice work with recent bifrost: it is first time when i stopped to see total mess on screen when there is pop-up stacking window at my appliance online updates. nice work!
<alyssa> warpme_: that was bbrezillon :)
<alyssa> (I did help, I guess)
<warpme_> oooops. sorry! bbrezillon: qll work!
<kinkinkijkin> apparently since i got covid i can use a specific social support i have access to for portable computing devices, scanners, and printers, possibly desks
<kinkinkijkin> is there a laptop/chromebook that you guys would like some testing for which would be available in canada?
<kinkinkijkin> i will be getting an arm device anyways, pointing me to a device of interest for here would let me provide an extra level of help
<alyssa> kinkinkijkin: that's a lot to take in
<kinkinkijkin> if you want it to be enough that you can just ignore most of it, i just had cancer go into remission too and got covid within a month of that
<alyssa> eek, sorry to hear that :(
<kinkinkijkin> :p
<kinkinkijkin> sorry, im extremely jaded at this point
<kinkinkijkin> anyways, what's a portable device that has a configuration you'd like tested, and a good amount of power?
<kinkinkijkin> or just give a lit of types of host configs you'd like tested and i'll figure it out
gcl_ has joined #panfrost
<daniels> kinkinkijkin: sorry to hear that - at the moment we're mostly working on the Acer C201 (Midgard) or R13 (Bifrost) Chromebooks
<daniels> at this point, it's pretty much that hardware we don't have is definitely going to be broken, because every generation has non-trivial differences we need to account for
gcl has quit [Ping timeout: 256 seconds]
<daniels> and we also need to get one of every type into CI really, so we don't regress it instantly
<kinkinkijkin> if it's gotta go into ci that doesn't apply to my purchase unfortunately, if it needs user testing it does apply
<daniels> right :)
<daniels> that's fine, we (Collabora) are working on getting stuff into CI, it's just slightly bottlenecked at the moment
<kinkinkijkin> ah
gcl has joined #panfrost
<kinkinkijkin> your first line is unclear; do you mean those are the chromebooks being used by developers mainly, or those are the chromebooks that would help that you can think of?
gcl_ has quit [Ping timeout: 264 seconds]
<macc24> kinkinkijkin: anything rk3288 or rk3399 would probably work
davidlt has quit [Read error: Connection reset by peer]
davidlt has joined #panfrost
* alyssa is down a yak shaving rabbit hole
<kinkinkijkin> okay, wall of text full of nerd stuff written, i should know within an hour whether im getting an acer c101p or a lenovo chromebook duet rn
<alyssa> Does wall of nerd work with the govt here?
<kinkinkijkin> it's not the govt i was talking to
kinkinkijkin has quit [Remote host closed the connection]
kinkinkijkin has joined #panfrost
davidlt has quit [Ping timeout: 240 seconds]
<kinkinkijkin> alright, it seems to be decided that it's gonna be a duet
dstzd has joined #panfrost
<alyssa> 🎼
<macc24> how's g72mp3 performance>
<macc24> ?
<macc24> on mt8183
<macc24> daniels: doesn't R13 have mt8173 with powervr gpu?
<kinkinkijkin> btw, how far is progress for the most-implemented gpus right now?
<kinkinkijkin> how many perceptually-conformant frames per second can the most implemented gpu draw of a spinning cube?
<macc24> yes
<kinkinkijkin> exactly the answer i was looking for
<warpme_> now all plays nicely :-)
<narmstrong> Nice !
<warpme_> yeah - Daniel is a pretty smart guy!
robmur01 has quit [Ping timeout: 264 seconds]
robmur01 has joined #panfrost
<alyssa> RiiR
<alyssa> ;P
tgall_foo has quit [Read error: Connection reset by peer]
tgall_foo has joined #panfrost
<daniels> alyssa: I lold at that
<daniels> macc24: sorry, you're right - I mean the DUet
<alyssa> daniels: it's a data race from improper memory management
* alyssa salivates
<alyssa> shoot running late for class uh
<alyssa> programming is risky like that