davidlt has quit [Remote host closed the connection]
rhyskidd has joined #panfrost
davidlt has joined #panfrost
rhyskidd has quit [Quit: rhyskidd]
rhyskidd has joined #panfrost
rhyskidd has quit [Remote host closed the connection]
rhyskidd has joined #panfrost
<bbrezillon>
robher, tomeu: I'm currently debugging an iommu fault issue, and I think it might come the dmabuf import/export logic
<bbrezillon>
sorry, just import, not export
<bbrezillon>
for each dmabuf import we create a new GEM object and map the region on the GPU side
rhyskidd has quit [Ping timeout: 258 seconds]
<bbrezillon>
now, imagine the userspace app keeps importing the same dmabuf and releasing the gem object that has been created from it
<bbrezillon>
we might have a race between the mmu_map() of the newly imported dmabuf and the mmu_unmap() of the destroyed GEM, leading to a situation where the new GEM is actually not mapped GPU-side
<tomeu>
looks like a good one
<tomeu>
bbrezillon: how do other drivers deal with this?
<tomeu>
bbrezillon: btw, I'm implementing caching of BOs, because if a BO is imported twice, when it is closed for the first time a dangling reference will remain
<tomeu>
could be related to the symptoms you are seeing
<bbrezillon>
tomeu: hm, probably not, I mean, the thing you're describing would lead to a mem leak, not a use after free
<bbrezillon>
(well, it's more a use-after-unmap than a use-after-free)
<tomeu>
use-after-unmap, yes
<tomeu>
which would cause page faults
<bbrezillon>
tomeu: isn't caching of BOs (which I'm sure is useful for other reasons) actually hiding the real problem here?
<tomeu>
well, I think it's adding reference counting on top of a resource that isn't reference-counted
<bbrezillon>
you mean, on the mesa end?
jolan has quit [Quit: leaving]
jolan has joined #panfrost
<tomeu>
yep
<robher>
I think multiple imports is not something the kernel deals with. Android does this and bringup of every driver requires adding the support in Mesa.
<bbrezillon>
robher: well, looks like Xwayland keeps importing dmabuf/freeing resulting objs at a high rate, and since the cleanup is happening asynchronously in the DRM driver, I fear we fall in the case I described above
afaerber has quit [Quit: Leaving]
<bbrezillon>
robher: I'm probably wrong, if things happen concurrently we'd have 2 different GPU mem regions pointing to the same BO (drm_mm_insert_node_generic() is supposed to get a free region), and what I'm seeing is the same GPU-mem range re-used over and over, and suddenly an iommu fault
<bbrezillon>
robher, tomeu: interestingly, if I call panfrost_drm_free_imported_bo() (which should probably be killed BTW) instead of panfrost_drm_free_slab() things are a bit more stable
<tomeu>
ah yes, I'm carrying that change here :/
<tomeu>
I think I would killall of pan_drm.c if I had some spare time
<bbrezillon>
why?
<tomeu>
because there's a lot of stuff that should be in pan_resource.c (or a new pan_bo.c), and the rest isn't that interesting
<tomeu>
guess having pan_resource.c and pan_bo.c would make us more similar to the other gallium drivers
<bbrezillon>
makes sense
afaerber has joined #panfrost
<tomeu>
btw, I'm looking now at why after some time, the mmap in import_bo failes with ENOMEM
<bbrezillon>
tomeu: on arm32 or arm64?
<bbrezillon>
panfrost_drm_free_imported_bo() seems to leak the CPU mapping
<bbrezillon>
but it's never called anyway (unless you have a patch that assign ->imported to true in the import BO path)
<tomeu>
yeah
<bbrezillon>
keeping the cpu mapping alive might also explain why I don't see the page fault in that case (I see other page faults though)
* bbrezillon
goes check what's done in the mmap
<tomeu>
robher: is anything blocking growable, btw?
<robher>
tomeu: implementing shrinker support so we free it on memory pressure.
<robher>
though maybe that can be done later. I guess we're better off growing than allocating it all up front.
<robher>
We just delay OOM...
<tomeu>
robher: yeah
<tomeu>
and userspace can move forward with using the new UABI
<bbrezillon>
tomeu, robher: and the fix is a onliner, as usual
<bbrezillon>
oneline
<bbrezillon>
*oneliner
<tomeu>
bbrezillon: cannot wait to see it!
pH5 has quit [Quit: bye]
<bbrezillon>
tomeu: I just sent the patch
<bbrezillon>
and you're in Cc
<tomeu>
oh, cool!
<tomeu>
bbrezillon: may be a good idea to add the BO to bo_handles, in panfrost_drm_submit_job
<tomeu>
guess that omission could be causing some flip-flops as well
<alyssa>
gfx driver dev is hard
<alyssa>
why didn't i listen to all those people that tried to warn me
<alyssa>
bbrezillon: Are you working on the transient stuff? (I might take another stab if you're not, but if you are that's great too :) )
<alyssa>
...or I could hack on MRT :angel:
<bbrezillon>
alyssa: I'm not
<bbrezillon>
thougth this was shadeslayer's task
<bbrezillon>
tomeu: yes, I wondered why we're not passing them to the driver
<alyssa>
bbrezillon: Oh, yeah, maybe you're right
* alyssa
can't keep track
* alyssa
is already lost in GLES3 lalaland
pH5 has joined #panfrost
<bbrezillon>
tomeu: this being said, I'm not sure it would be any safer to pass them to the driver
<bbrezillon>
since userspace is still in charge of this BO list creation
jernej has joined #panfrost
<urjaman>
alyssa: sometimes, it's better not to listen to those people :P
<alyssa>
urjaman: Fair.
herbmillerjr has quit [Quit: Konversation terminated!]
herbmillerjr has joined #panfrost
yann has quit [Ping timeout: 272 seconds]
raster has quit [Read error: Connection reset by peer]
stikonas_ has joined #panfrost
TheKit has quit [Ping timeout: 245 seconds]
<alyssa>
Render target format madness
* alyssa
shivers
<bbrezillon>
tomeu: hm, actually I omitted one important aspect that's taken care of kernel-side => wait on BO fences
<bbrezillon>
so yes, I guess we should pass all BOs attached to a job to the SUBMIT ioctl
<alyssa>
Oh, lovely, even the MRT hardware doesn't support MRT sanely.
TheKit has joined #panfrost
<alyssa>
So, nir_format_convert.h seems to cover a lot of what will be needd
<alyssa>
I guess the trick will be to integrate that with the NIR blend shader code
<alyssa>
So then we get blend shaders to do the heavy lifting
yann has joined #panfrost
pH5 has quit [Quit: bye]
jcureton has joined #panfrost
<jcureton>
hi! i've been following T720 development, and haven't seen any updates for a while here or on the dri-devel list. I'm currently running with the 32-bit job hacks in Mesa to deal with a 64-bit userspace. is the plan still for the kernel driver to unify to 64-bit descriptors even on the smaller GPU designs?
<alyssa>
jcureton: That's the plan, yeah!
<alyssa>
I think tomeu was working on that..?
<jcureton>
thanks alyssa! yeah, i was under the impression tomeu would be the right person to ask
hlmjr has joined #panfrost
herbmilleriw has quit [Ping timeout: 252 seconds]
afaerber has quit [Quit: Leaving]
afaerber has joined #panfrost
<alyssa>
Okay, so I guess the trick will be to add new intrinsics so we can do type conversion explicitly.
stikonas_ has quit [Ping timeout: 252 seconds]
herbmilleriw has joined #panfrost
hlmjr has quit [Ping timeout: 268 seconds]
maciejjo has quit [Ping timeout: 244 seconds]
davidlt has quit [Ping timeout: 245 seconds]
stikonas_ has joined #panfrost
NeuroScr has joined #panfrost
stikonas_ has quit [Remote host closed the connection]
stikonas_ has joined #panfrost
stikonas_ has quit [Remote host closed the connection]