alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
NeuroScr has joined #panfrost
vstehle has quit [Ping timeout: 246 seconds]
NeuroScr has quit [Quit: NeuroScr]
marcodiego has quit [Quit: Leaving]
robink has joined #panfrost
rcf has quit [Ping timeout: 258 seconds]
vstehle has joined #panfrost
Elpaulo has quit [Read error: Connection reset by peer]
Elpaulo has joined #panfrost
pH5 has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
gcl_ has quit [Quit: Moving day; offline until I have Internet again.]
yann has quit [Ping timeout: 272 seconds]
yann has joined #panfrost
raster has joined #panfrost
raster has quit [Remote host closed the connection]
raster has joined #panfrost
_whitelogger has joined #panfrost
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
rhyskidd has joined #panfrost
rhyskidd has quit [Quit: rhyskidd]
rhyskidd has joined #panfrost
davidlt has joined #panfrost
rhyskidd has quit [Quit: rhyskidd]
davidlt has quit [Remote host closed the connection]
rhyskidd has joined #panfrost
davidlt has joined #panfrost
rhyskidd has quit [Quit: rhyskidd]
rhyskidd has joined #panfrost
rhyskidd has quit [Remote host closed the connection]
rhyskidd has joined #panfrost
<bbrezillon> robher, tomeu: I'm currently debugging an iommu fault issue, and I think it might come the dmabuf import/export logic
<bbrezillon> sorry, just import, not export
<bbrezillon> for each dmabuf import we create a new GEM object and map the region on the GPU side
rhyskidd has quit [Ping timeout: 258 seconds]
<bbrezillon> now, imagine the userspace app keeps importing the same dmabuf and releasing the gem object that has been created from it
<bbrezillon> we might have a race between the mmu_map() of the newly imported dmabuf and the mmu_unmap() of the destroyed GEM, leading to a situation where the new GEM is actually not mapped GPU-side
<tomeu> looks like a good one
<tomeu> bbrezillon: how do other drivers deal with this?
<tomeu> bbrezillon: btw, I'm implementing caching of BOs, because if a BO is imported twice, when it is closed for the first time a dangling reference will remain
<tomeu> could be related to the symptoms you are seeing
<bbrezillon> tomeu: hm, probably not, I mean, the thing you're describing would lead to a mem leak, not a use after free
<bbrezillon> (well, it's more a use-after-unmap than a use-after-free)
<tomeu> use-after-unmap, yes
<tomeu> which would cause page faults
<bbrezillon> tomeu: isn't caching of BOs (which I'm sure is useful for other reasons) actually hiding the real problem here?
<tomeu> well, I think it's adding reference counting on top of a resource that isn't reference-counted
<bbrezillon> you mean, on the mesa end?
jolan has quit [Quit: leaving]
jolan has joined #panfrost
<tomeu> yep
<robher> I think multiple imports is not something the kernel deals with. Android does this and bringup of every driver requires adding the support in Mesa.
<bbrezillon> robher: well, looks like Xwayland keeps importing dmabuf/freeing resulting objs at a high rate, and since the cleanup is happening asynchronously in the DRM driver, I fear we fall in the case I described above
afaerber has quit [Quit: Leaving]
<bbrezillon> robher: I'm probably wrong, if things happen concurrently we'd have 2 different GPU mem regions pointing to the same BO (drm_mm_insert_node_generic() is supposed to get a free region), and what I'm seeing is the same GPU-mem range re-used over and over, and suddenly an iommu fault
<bbrezillon> robher, tomeu: interestingly, if I call panfrost_drm_free_imported_bo() (which should probably be killed BTW) instead of panfrost_drm_free_slab() things are a bit more stable
<tomeu> ah yes, I'm carrying that change here :/
<tomeu> I think I would killall of pan_drm.c if I had some spare time
<bbrezillon> why?
<tomeu> because there's a lot of stuff that should be in pan_resource.c (or a new pan_bo.c), and the rest isn't that interesting
<tomeu> guess having pan_resource.c and pan_bo.c would make us more similar to the other gallium drivers
<bbrezillon> makes sense
afaerber has joined #panfrost
<tomeu> btw, I'm looking now at why after some time, the mmap in import_bo failes with ENOMEM
<bbrezillon> tomeu: on arm32 or arm64?
<bbrezillon> panfrost_drm_free_imported_bo() seems to leak the CPU mapping
<bbrezillon> but it's never called anyway (unless you have a patch that assign ->imported to true in the import BO path)
<tomeu> yeah
<bbrezillon> keeping the cpu mapping alive might also explain why I don't see the page fault in that case (I see other page faults though)
* bbrezillon goes check what's done in the mmap
<tomeu> robher: is anything blocking growable, btw?
<robher> tomeu: implementing shrinker support so we free it on memory pressure.
<robher> though maybe that can be done later. I guess we're better off growing than allocating it all up front.
<robher> We just delay OOM...
<tomeu> robher: yeah
<tomeu> and userspace can move forward with using the new UABI
<bbrezillon> tomeu, robher: and the fix is a onliner, as usual
<bbrezillon> oneline
<bbrezillon> *oneliner
<tomeu> bbrezillon: cannot wait to see it!
pH5 has quit [Quit: bye]
<bbrezillon> tomeu: I just sent the patch
<bbrezillon> and you're in Cc
<tomeu> oh, cool!
<tomeu> bbrezillon: may be a good idea to add the BO to bo_handles, in panfrost_drm_submit_job
<tomeu> guess that omission could be causing some flip-flops as well
<alyssa> gfx driver dev is hard
<alyssa> why didn't i listen to all those people that tried to warn me
<alyssa> bbrezillon: Are you working on the transient stuff? (I might take another stab if you're not, but if you are that's great too :) )
<alyssa> ...or I could hack on MRT :angel:
<bbrezillon> alyssa: I'm not
<bbrezillon> thougth this was shadeslayer's task
<bbrezillon> tomeu: yes, I wondered why we're not passing them to the driver
<alyssa> bbrezillon: Oh, yeah, maybe you're right
* alyssa can't keep track
* alyssa is already lost in GLES3 lalaland
pH5 has joined #panfrost
<bbrezillon> tomeu: this being said, I'm not sure it would be any safer to pass them to the driver
<bbrezillon> since userspace is still in charge of this BO list creation
jernej has joined #panfrost
<urjaman> alyssa: sometimes, it's better not to listen to those people :P
<alyssa> urjaman: Fair.
herbmillerjr has quit [Quit: Konversation terminated!]
herbmillerjr has joined #panfrost
yann has quit [Ping timeout: 272 seconds]
raster has quit [Read error: Connection reset by peer]
stikonas_ has joined #panfrost
TheKit has quit [Ping timeout: 245 seconds]
<alyssa> Render target format madness
* alyssa shivers
<bbrezillon> tomeu: hm, actually I omitted one important aspect that's taken care of kernel-side => wait on BO fences
<bbrezillon> so yes, I guess we should pass all BOs attached to a job to the SUBMIT ioctl
<alyssa> Oh, lovely, even the MRT hardware doesn't support MRT sanely.
TheKit has joined #panfrost
<alyssa> So, nir_format_convert.h seems to cover a lot of what will be needd
<alyssa> I guess the trick will be to integrate that with the NIR blend shader code
<alyssa> So then we get blend shaders to do the heavy lifting
yann has joined #panfrost
pH5 has quit [Quit: bye]
jcureton has joined #panfrost
<jcureton> hi! i've been following T720 development, and haven't seen any updates for a while here or on the dri-devel list. I'm currently running with the 32-bit job hacks in Mesa to deal with a 64-bit userspace. is the plan still for the kernel driver to unify to 64-bit descriptors even on the smaller GPU designs?
<alyssa> jcureton: That's the plan, yeah!
<alyssa> I think tomeu was working on that..?
<jcureton> thanks alyssa! yeah, i was under the impression tomeu would be the right person to ask
hlmjr has joined #panfrost
herbmilleriw has quit [Ping timeout: 252 seconds]
afaerber has quit [Quit: Leaving]
afaerber has joined #panfrost
<alyssa> Okay, so I guess the trick will be to add new intrinsics so we can do type conversion explicitly.
stikonas_ has quit [Ping timeout: 252 seconds]
herbmilleriw has joined #panfrost
hlmjr has quit [Ping timeout: 268 seconds]
maciejjo has quit [Ping timeout: 244 seconds]
davidlt has quit [Ping timeout: 245 seconds]
stikonas_ has joined #panfrost
NeuroScr has joined #panfrost
stikonas_ has quit [Remote host closed the connection]
stikonas_ has joined #panfrost
stikonas_ has quit [Remote host closed the connection]
NeuroScr has quit [Quit: NeuroScr]