* hanetzer
is anticipating kernel 5.2; panfrost.ko should be in that, no?
griffinp has quit [Ping timeout: 248 seconds]
vstehle has joined #panfrost
mani_s_ has quit [Ping timeout: 268 seconds]
<tomeu>
robher: I thought that userspace broke when mmap started failing noisely, so not mmaping would unbreak userspace?
<tomeu>
robher: regarding testing, we have panfrost just now built as a module in next, so we should start running the panfrost igt tests and probably extend them to exercise this code path
<tomeu>
well, if we want for userspace to be able to mmap an imported buffer :)
<bbrezillon>
tomeu, alyssa, robher: just my 2cents regarding the mmap regression => we're talking about a driver that's been merged in 5.2-rc1 (5.2 is not even release yet), and AFAICT, the only user we have right now is the Panfrost driver is mesa which is still under development
<bbrezillon>
not to mention that it's the mesa side of the driver that was not conforming to the "don't mmap imported BO" rule
<bbrezillon>
so maybe we should just fix the bug in mesa and inform everyone that they should update their mesa/panfrost driver if they want it to work with the latest kernel version
<bbrezillon>
I know that the kernel is supposed to provide some guarantees about not changing existing behavior if some userspace lib/app rely on it, even if it's known to be broken
<bbrezillon>
but given how recent the driver and driver user (mesa/panfrost) are, I'd suggest that we just bend the rules
milloni has quit [Remote host closed the connection]
milloni has joined #panfrost
<bbrezillon>
alyssa, tomeu, robher: looks like dropping the mmap() on imported BO does not work, there's that tries to access the memory region
<bbrezillon>
*there's something
<tomeu>
bbrezillon: I'm not sure we'd be bending anything if no kernel has been released with that behavior
<bbrezillon>
yep
<bbrezillon>
also think so
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
yann has joined #panfrost
yann has quit [Ping timeout: 268 seconds]
<bbrezillon>
tomeu, robher: getting rid of the mmap() on imported BO might be harder than we thought
<tomeu>
did you find out what needs it?
<bbrezillon>
I have a crash with xwayland
<bbrezillon>
I'm not sure yet, but it seems like the GEM are exported, the DMABUF fd passed through a wayland message and imported on the other end
<bbrezillon>
(client app I guess)
<bbrezillon>
and this client tries to write to the surface
<tomeu>
hmm
<tomeu>
daniels: wonder if there's any way around it
<daniels>
xwl doesn't use server-allocated buffers
<bbrezillon>
I tried to mmap the dmabuf directly (mmap() on the dmabuf fd), but the FD has RO perm
<daniels>
it allocates its own with gbm and passes them to the host compositor
<daniels>
do you have a backtrace of where the failing map is called from?
<bbrezillon>
yes, but my xwayland bin does not have debug symbols
<bbrezillon>
so it's useless
<daniels>
i wouldn't have thought that xwl would be mmaping client buffers directly; is it not coming through Mesa?
<tomeu>
hmm
<tomeu>
I'm changing things right now so all BOs are allocated by the panfrost driver, and imported into the KMS device for scanout
<tomeu>
but I don't know if we can guarantee that all midgard+bifrost implementations will be coupled with graphics hw that can scanout whatever the GPU driver allocates
<bbrezillon>
daniels: it's xwayland that crashes, so I'd say it's the one importing the BO and trying to write to it
<tomeu>
specifically, if the graphics hw won't require CMA-allocated buffers
<tomeu>
bbrezillon: strace should tell?
<daniels>
bbrezillon: sure, but I meant that the access is presumably Xwayland -> GL -> Mesa -> map dmabuf
<bbrezillon>
on a side note, I looked at other drivers and they do not seem to check whether it's an imported BO or not before mmap()-ing it through the GEM path
<bbrezillon>
daniels: oh, yes, that's probably the case
<tomeu>
bbrezillon: btw, if you are on debian, you may need to only install debug symbols to get a nice trace
<bbrezillon>
tomeu: what's the package name?
ente has joined #panfrost
<bbrezillon>
something -dbg
<bbrezillon>
tomeu: okay, have it now
<bbrezillon>
it's something in _mesa_format_convert()
<tomeu>
bbrezillon: what's the bt?
<tomeu>
wonder if it isn't tiling an uploaded texture
<tomeu>
so it's uploading data to a texture, and for some reason mesa decides it needs to be converted to another format
<tomeu>
so not tiling
<bbrezillon>
hm, not sure
<bbrezillon>
there's definitely a format conversion
<bbrezillon>
but it doesn't explain why dst is an imported BO
<tomeu>
maybe because it's probably going to be scanned out
<tomeu>
so it was allocated on the KMS driver
<tomeu>
this should change with my modifiers branch
<bbrezillon>
(assuming imported BO are not supposed to be written to, which, based on the other drivers I looked at, am not sure is a rule that's enforced by all drivers)
<tomeu>
well, if it's a wayland client's surface then it's susceptible of being scanned out
<tomeu>
so it will currently be allocated in the kms device
<tomeu>
because a surface can end up in its own plane
<bbrezillon>
ok, but is that really an invalid use case?
<bbrezillon>
tomeu: maybe I'm missing something
<tomeu>
yeah, I'm afraid it's valid
<tomeu>
but hopefully we won't hit it :)
<bbrezillon>
hm, not sure I like that :)
<bbrezillon>
tomeu: I can mmap() the dmabuf directly (instead of importing the BO and then mapping the GEM object), but for some reason, the FD I'm being passed is opened RO
<bbrezillon>
and mmap(RW) fails with EPERM
<bbrezillon>
interestingly, if we import the BO (like we previously did), the dmabuf permission are completely ignored, and the resulting GEM obj can be mmap()-ed in RW mode
<daniels>
bbrezillon: the buffer that TexSubImage is trying to upload to - where was it allocated?
<daniels>
oh, I think I can answer this, actually
<daniels>
glamor uses GBM as an actual generic buffer allocator - it'll allocate a single BO for a window's backing storage, then import that as an EGLImage, within the same process
<daniels>
which is a legitimate thing which has to work
<bbrezillon>
tomeu, daniels: hm, do you know why BO are always exported in RO mode (drmPrimeHandleToFD() is not passed the DRM_RDWR flag)
<bbrezillon>
?
<daniels>
bbrezillon: do you get RO unless you explicitly pass RW?
<bbrezillon>
daniels: yes
<bbrezillon>
but maybe that's what we want
<daniels>
well, it depends on the intended usage ... for passing into KMS and to other clients (e.g. Wayland client -> compositor), we surely want the export to be RO
<daniels>
but given that GPUs ignore the permissions and allow writes anyway, and that caching means that we might end up with an import with flags from a previous import, it seems sensible to allow RW regardless
<bbrezillon>
if we change that tree-wide in mesa, I should be able to mmap() the dmabuf directly instead of going through an import (which no longer works after Steven's patch)
<bbrezillon>
daniels: I just tested it, and if I patch the call in renderonly.c plus the one in pan_drm.c it works
<daniels>
(note that direct access to a dmabuf's backing storage through mmap should be bracketed by dmabuf access begin/end calls btw)
<bbrezillon>
daniels: yep
<bbrezillon>
I have a comment in the code
<daniels>
c'est bon
<bbrezillon>
it's a bit complicated to do that right now
<bbrezillon>
since we'd need to sync things before mmap()-ing, but also before submitting a job
<daniels>
how do other drivers do it? import to a handle and then map through the handle, rather than map through the fd? or something else?
<bbrezillon>
and before re-using the BO (after the GPU is done)
<bbrezillon>
daniels: yep
<bbrezillon>
and they also map on demand
<bbrezillon>
which is not the case in panfrost yet
<bbrezillon>
for buffers allocated by the GPU driver or display controller it shouldn't be a problem because they are mapped write-combine
<bbrezillon>
(at least CPU-side, don't remember if things are coherent GPU-side)
<bbrezillon>
daniels: BTW, I'm not sure the FD -> handle conversion provides guarantees on CPU <-> GPU sync, so, in theory, even without the dmabuf syncs, it shouldn't be worse than what we have right now
<bbrezillon>
but it definitely deserves a FIXME
<daniels>
(i wonder if this might be related to rk3288 flip-flops ...)
<bbrezillon>
could be
yann has joined #panfrost
afaerber has quit [Quit: Leaving]
maciejjo has quit [Remote host closed the connection]
afaerber has joined #panfrost
xdarklight has joined #panfrost
maciejjo has joined #panfrost
<tomeu>
yeah
<alyssa>
tomeu: 'I don't know
<alyssa>
' never try to gaurantee anything about midgard/bifrost impl
<alyssa>
tomeu: bbrezillon daniels: Keep in mind that winsys stuff is totally over my head, but one thing I can think of:
<alyssa>
A lot of these apps seem to want to render to RGB10_A2. I explicitly disable that format in pan_screen's format_supported, and then Gallium has them render to RGBA8
<alyssa>
....but maybe it's still pretending to be RGB10_A2, and whenever the app does shm access Gallium is incurring a (possibly very) expensive software colourspace conversion
<bbrezillon>
alyssa: the format conversion seems to be SW-based indeed
<bbrezillon>
but that's not a reason to not support that (even if it should probably be optimized at some point)
<alyssa>
bbrezillon: Do you know how often that code is running? (From a perf standpoint)
<alyssa>
Is that every frame for shm clients?
<bbrezillon>
didn't check
<tomeu>
guess that perf should show that if it has a noticeable impact
<alyssa>
The unfortunate part is that rendering to RGB10_A2 is slooow
<alyssa>
because it's one of the formats that needs a blend shaders
<alyssa>
*shader
<alyssa>
Yes we can do it but we're just pushing slowness around
<alyssa>
(If it's only shm clients, that's a different story, texturing from RGB10_A2 is ok)
jesse74 has joined #panfrost
jesse74 has left #panfrost [#panfrost]
jcureton has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
<bbrezillon>
tomeu, daniels: given that the GPU takes care of cache maintenance at the beginning/end of each job chain (as explained by Steven) and the fact that BOs are mapped WC on the CPU side, I don't think the absence of explicit CPU <-> GPU syncs could be a problem
<bbrezillon>
so, that's probably not the cause of those flip-flop we have on rk3288
Lyude has quit [Read error: Connection reset by peer]
<alyssa>
bbrezillon: From my perspective in userspace, what's the significance of WC mapping in terms of making fast code?
<tomeu>
hmm, wonder if some GPU versions behave differently in that regard
<tomeu>
there's at least some erratas
Lyude has joined #panfrost
<bbrezillon>
alyssa: don't read from the BO :)
<bbrezillon>
WC (write-combine) means there's no caching
<bbrezillon>
the only optimization is on the write path, where the CPU tries to buffer sequential write access to try grouping several small mem accesses into bigger ones