alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
vstehle has quit [Ping timeout: 248 seconds]
_whitelogger has joined #panfrost
* hanetzer is anticipating kernel 5.2; panfrost.ko should be in that, no?
griffinp has quit [Ping timeout: 248 seconds]
vstehle has joined #panfrost
mani_s_ has quit [Ping timeout: 268 seconds]
<tomeu> robher: I thought that userspace broke when mmap started failing noisely, so not mmaping would unbreak userspace?
<tomeu> robher: regarding testing, we have panfrost just now built as a module in next, so we should start running the panfrost igt tests and probably extend them to exercise this code path
<tomeu> well, if we want for userspace to be able to mmap an imported buffer :)
<bbrezillon> tomeu, alyssa, robher: just my 2cents regarding the mmap regression => we're talking about a driver that's been merged in 5.2-rc1 (5.2 is not even release yet), and AFAICT, the only user we have right now is the Panfrost driver is mesa which is still under development
<bbrezillon> not to mention that it's the mesa side of the driver that was not conforming to the "don't mmap imported BO" rule
<bbrezillon> so maybe we should just fix the bug in mesa and inform everyone that they should update their mesa/panfrost driver if they want it to work with the latest kernel version
<bbrezillon> I know that the kernel is supposed to provide some guarantees about not changing existing behavior if some userspace lib/app rely on it, even if it's known to be broken
<bbrezillon> but given how recent the driver and driver user (mesa/panfrost) are, I'd suggest that we just bend the rules
milloni has quit [Remote host closed the connection]
milloni has joined #panfrost
<bbrezillon> alyssa, tomeu, robher: looks like dropping the mmap() on imported BO does not work, there's that tries to access the memory region
<bbrezillon> *there's something
<tomeu> bbrezillon: I'm not sure we'd be bending anything if no kernel has been released with that behavior
<bbrezillon> yep
<bbrezillon> also think so
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
yann has joined #panfrost
yann has quit [Ping timeout: 268 seconds]
<bbrezillon> tomeu, robher: getting rid of the mmap() on imported BO might be harder than we thought
<tomeu> did you find out what needs it?
<bbrezillon> I have a crash with xwayland
<bbrezillon> I'm not sure yet, but it seems like the GEM are exported, the DMABUF fd passed through a wayland message and imported on the other end
<bbrezillon> (client app I guess)
<bbrezillon> and this client tries to write to the surface
<tomeu> hmm
<tomeu> daniels: wonder if there's any way around it
<tomeu> guess that's basically server-allocated shm buffers
<bbrezillon> as I said, I'm not sure yet
<daniels> xwl doesn't use server-allocated buffers
<bbrezillon> I tried to mmap the dmabuf directly (mmap() on the dmabuf fd), but the FD has RO perm
<daniels> it allocates its own with gbm and passes them to the host compositor
<daniels> do you have a backtrace of where the failing map is called from?
<bbrezillon> yes, but my xwayland bin does not have debug symbols
<bbrezillon> so it's useless
<daniels> i wouldn't have thought that xwl would be mmaping client buffers directly; is it not coming through Mesa?
<tomeu> hmm
<tomeu> I'm changing things right now so all BOs are allocated by the panfrost driver, and imported into the KMS device for scanout
<tomeu> but I don't know if we can guarantee that all midgard+bifrost implementations will be coupled with graphics hw that can scanout whatever the GPU driver allocates
<bbrezillon> daniels: it's xwayland that crashes, so I'd say it's the one importing the BO and trying to write to it
<tomeu> specifically, if the graphics hw won't require CMA-allocated buffers
<tomeu> bbrezillon: strace should tell?
<daniels> bbrezillon: sure, but I meant that the access is presumably Xwayland -> GL -> Mesa -> map dmabuf
<bbrezillon> on a side note, I looked at other drivers and they do not seem to check whether it's an imported BO or not before mmap()-ing it through the GEM path
<bbrezillon> daniels: oh, yes, that's probably the case
<tomeu> bbrezillon: btw, if you are on debian, you may need to only install debug symbols to get a nice trace
<bbrezillon> tomeu: what's the package name?
ente has joined #panfrost
<bbrezillon> something -dbg
<bbrezillon> tomeu: okay, have it now
<bbrezillon> it's something in _mesa_format_convert()
<tomeu> bbrezillon: what's the bt?
<tomeu> wonder if it isn't tiling an uploaded texture
<bbrezillon> glTexSubImage2D()?
<tomeu> so it's uploading data to a texture, and for some reason mesa decides it needs to be converted to another format
<tomeu> so not tiling
<bbrezillon> hm, not sure
<bbrezillon> there's definitely a format conversion
<bbrezillon> but it doesn't explain why dst is an imported BO
<tomeu> maybe because it's probably going to be scanned out
<tomeu> so it was allocated on the KMS driver
<tomeu> this should change with my modifiers branch
<bbrezillon> (assuming imported BO are not supposed to be written to, which, based on the other drivers I looked at, am not sure is a rule that's enforced by all drivers)
<tomeu> well, if it's a wayland client's surface then it's susceptible of being scanned out
<tomeu> so it will currently be allocated in the kms device
<tomeu> because a surface can end up in its own plane
<bbrezillon> ok, but is that really an invalid use case?
<bbrezillon> tomeu: maybe I'm missing something
<tomeu> yeah, I'm afraid it's valid
<tomeu> but hopefully we won't hit it :)
<bbrezillon> hm, not sure I like that :)
<bbrezillon> tomeu: I can mmap() the dmabuf directly (instead of importing the BO and then mapping the GEM object), but for some reason, the FD I'm being passed is opened RO
<bbrezillon> and mmap(RW) fails with EPERM
<bbrezillon> interestingly, if we import the BO (like we previously did), the dmabuf permission are completely ignored, and the resulting GEM obj can be mmap()-ed in RW mode
<daniels> bbrezillon: the buffer that TexSubImage is trying to upload to - where was it allocated?
<daniels> oh, I think I can answer this, actually
<daniels> glamor uses GBM as an actual generic buffer allocator - it'll allocate a single BO for a window's backing storage, then import that as an EGLImage, within the same process
<daniels> which is a legitimate thing which has to work
<daniels> e.g. if you changed https://gitlab.freedesktop.org/daniels/kms-quads/ to upload to the FBO with TexImage then you'd have the same problem
mani_s has joined #panfrost
gtucker has joined #panfrost
davidlt has joined #panfrost
mani_s has quit [Quit: ZNC 1.7.2 - https://znc.in]
raster has joined #panfrost
mani_s has joined #panfrost
<bbrezillon> tomeu, daniels: hm, do you know why BO are always exported in RO mode (drmPrimeHandleToFD() is not passed the DRM_RDWR flag)
<bbrezillon> ?
<daniels> bbrezillon: do you get RO unless you explicitly pass RW?
<bbrezillon> daniels: yes
<bbrezillon> but maybe that's what we want
<daniels> well, it depends on the intended usage ... for passing into KMS and to other clients (e.g. Wayland client -> compositor), we surely want the export to be RO
<daniels> but given that GPUs ignore the permissions and allow writes anyway, and that caching means that we might end up with an import with flags from a previous import, it seems sensible to allow RW regardless
<bbrezillon> if we change that tree-wide in mesa, I should be able to mmap() the dmabuf directly instead of going through an import (which no longer works after Steven's patch)
<bbrezillon> daniels: I just tested it, and if I patch the call in renderonly.c plus the one in pan_drm.c it works
<daniels> (note that direct access to a dmabuf's backing storage through mmap should be bracketed by dmabuf access begin/end calls btw)
<bbrezillon> daniels: yep
<bbrezillon> I have a comment in the code
<daniels> c'est bon
<bbrezillon> it's a bit complicated to do that right now
<bbrezillon> since we'd need to sync things before mmap()-ing, but also before submitting a job
<daniels> how do other drivers do it? import to a handle and then map through the handle, rather than map through the fd? or something else?
<bbrezillon> and before re-using the BO (after the GPU is done)
<bbrezillon> daniels: yep
<bbrezillon> and they also map on demand
<bbrezillon> which is not the case in panfrost yet
<bbrezillon> for buffers allocated by the GPU driver or display controller it shouldn't be a problem because they are mapped write-combine
<bbrezillon> (at least CPU-side, don't remember if things are coherent GPU-side)
<bbrezillon> daniels: BTW, I'm not sure the FD -> handle conversion provides guarantees on CPU <-> GPU sync, so, in theory, even without the dmabuf syncs, it shouldn't be worse than what we have right now
<bbrezillon> but it definitely deserves a FIXME
<daniels> (i wonder if this might be related to rk3288 flip-flops ...)
<bbrezillon> could be
yann has joined #panfrost
afaerber has quit [Quit: Leaving]
maciejjo has quit [Remote host closed the connection]
afaerber has joined #panfrost
xdarklight has joined #panfrost
maciejjo has joined #panfrost
<tomeu> yeah
<alyssa> tomeu: 'I don't know
<alyssa> ' never try to gaurantee anything about midgard/bifrost impl
<alyssa> tomeu: bbrezillon daniels: Keep in mind that winsys stuff is totally over my head, but one thing I can think of:
<alyssa> A lot of these apps seem to want to render to RGB10_A2. I explicitly disable that format in pan_screen's format_supported, and then Gallium has them render to RGBA8
<alyssa> ....but maybe it's still pretending to be RGB10_A2, and whenever the app does shm access Gallium is incurring a (possibly very) expensive software colourspace conversion
<bbrezillon> alyssa: the format conversion seems to be SW-based indeed
<bbrezillon> but that's not a reason to not support that (even if it should probably be optimized at some point)
<alyssa> bbrezillon: Do you know how often that code is running? (From a perf standpoint)
<alyssa> Is that every frame for shm clients?
<bbrezillon> didn't check
<tomeu> guess that perf should show that if it has a noticeable impact
<alyssa> The unfortunate part is that rendering to RGB10_A2 is slooow
<alyssa> because it's one of the formats that needs a blend shaders
<alyssa> *shader
<alyssa> Yes we can do it but we're just pushing slowness around
<alyssa> (If it's only shm clients, that's a different story, texturing from RGB10_A2 is ok)
jesse74 has joined #panfrost
jesse74 has left #panfrost [#panfrost]
jcureton has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
<bbrezillon> tomeu, daniels: given that the GPU takes care of cache maintenance at the beginning/end of each job chain (as explained by Steven) and the fact that BOs are mapped WC on the CPU side, I don't think the absence of explicit CPU <-> GPU syncs could be a problem
<bbrezillon> so, that's probably not the cause of those flip-flop we have on rk3288
Lyude has quit [Read error: Connection reset by peer]
<alyssa> bbrezillon: From my perspective in userspace, what's the significance of WC mapping in terms of making fast code?
<tomeu> hmm, wonder if some GPU versions behave differently in that regard
<tomeu> there's at least some erratas
Lyude has joined #panfrost
<bbrezillon> alyssa: don't read from the BO :)
<bbrezillon> WC (write-combine) means there's no caching
<bbrezillon> the only optimization is on the write path, where the CPU tries to buffer sequential write access to try grouping several small mem accesses into bigger ones
<bbrezillon> *to group
<raster> hmmm
<raster> fascinating
<raster> drmModeAtomicCommit() time out
<raster> or more specifically we dont see complete events at all
<raster> UNTIL i try and run another gl client - it fails entirely in eglCreateWindowSurface
<raster> but after this then i get complete events
* raster stares at the kernel with an evil eye
<alyssa> bbrezillon: Blah, ok
<alyssa> bbrezillon: So, is it safe to do random-access writes to a WC buffer?
<alyssa> Also, is there a sane way to test that we're doing this right and not incurring the cache wrath?
<HdkR> perf top :P
<raster> on arm write combining is not defined like x86
<alyssa> raster: *dun dun dunnnn*
<raster> i dont want to update/rebuild my kernel
<raster> why does the mainline dt for rockpro64 have to be so broken?
<raster> oh ... interesting
<raster> glmark in offscreen mode is not happy with panfrost
<alyssa> This is true..
<raster> but on screen it's gotten pretyt good
<raster> oh
<raster> fbos magically have been fixed
<raster> mostly
<raster> it seems
<raster> :)
<raster> good stuff
<alyssa> "Mostly", alas
<raster> kms seems to be unhappy tho
<raster> as i said
<alyssa> raster: See -bterrain for how far that's come
<alyssa> (Or -bdesktop)
<raster> i dont get completion events from swaps
<raster> yeah
<raster> i saw terrain :)
<raster> and the desktopones
<raster> :)
<raster> dang
<alyssa> Only known visual bug in glmark is the shadow -bideas is screwy
<raster> i may have to update my kernel
<raster> hmm
<raster> i got a segv
<raster> oh
<raster> the flip timeouts are because we drop back to sw rendering as eglCreateWindowSurface fails
<raster> until i somehow swizzle kernel state by another client also failing
<raster> oh
<raster> damn
<alyssa> For E?
<raster> its always failing now
<raster> yeah
<raster> well if i run elementary_test and enable gl
<raster> it fails (and segv's)
<raster> but then enlightenment starst without failures
<raster> so something gets swizzled in kernel to make it then work
<raster> i havent figured out what yet
<raster> but i'd need to get up to date with kernel as i'm probably a month or 2 behind
<raster> 5.2.0-rc3-next-20190605
<raster> not going to start that now today :)
<raster> but it does seem something somewhere may be broken with atomic flips of sw mmaped drm buffers
<raster> on this kernel anyway
<alyssa> There's been talk on the list about sw mmaping imported buffers being broken
<alyssa> So maybe that's realted
<alyssa> *related
<raster> it may be tho that'd be client buffers
<raster> but when egl init fails then we go back to sw
<raster> and we're getting issues there that our timeout checks are detecting (and complaining about) with nothing on screen
<raster> but no point doing much until i am up to date
<raster> :)
<raster> but
<raster> much has improved since my last check maybe ~2-4 weeks back
<alyssa> Full-time development does wonders for a driver :0
<raster> ahhahahahahahahahaa
<raster> :)
pH5 has joined #panfrost
<bbrezillon> alyssa: random access write is okay, if you don't care about perfs
<alyssa> So not ok? :p
<bbrezillon> okay as in "will do what you expect"
<bbrezillon> "but slowly"
<bbrezillon> :)
<alyssa> So not ok if I care about perf :(
<bbrezillon> obviously not
<bbrezillon> alyssa: what's the case you have in mind? sw-based linear <-> tiled transformation?
<alyssa> bbrezillon: Not even that, I just sometimes get lazy with what's GPU mapped and what's not and want to make sure I don't do anything silly
<bbrezillon> alyssa: the sw-based colorspace conversion is likely to fall in that "read from WC-mapped BO" case
<bbrezillon> I know that etnativ has an option to map enable caching
<bbrezillon> s/map//
<bbrezillon> but then you have to take care of cache maintainanc
stikonas has joined #panfrost
yann has quit [Ping timeout: 248 seconds]
<alyssa> ^ certainly could be worse
<alyssa> ...Right, I also still have sRGB blend shaders TODO. Oy vey
<alyssa> Blend shaders: the gift that keeps on giving (TM)
raster has quit [Remote host closed the connection]
tgall_foo has quit [Remote host closed the connection]
jcureton has quit [Remote host closed the connection]
tgall_foo has joined #panfrost
jcureton has joined #panfrost
pH5 has quit [Quit: -_-]
* alyssa probably messed up the stride for RGB5_A1 or something
afaerber has quit [Quit: Leaving]
davidlt has quit [Ping timeout: 272 seconds]
afaerber has joined #panfrost
yann has joined #panfrost
<alyssa> So, at least on T860 (probably different for other models, based on notes in some old code I have here afddafdas):
<alyssa> There are *3* types of loads you might encounter in a blend shader
<alyssa> 1) Raw load. Manually unpack the raw pixel with ALU ops.
<alyssa> 2) Raw load and accelerated unpack. Two LD/ST ops back-to-back.
<alyssa> 3) Unpacking load. Single ld/st op.
<alyssa> A given format maps to one of those. #1 is the most general, of course.
<alyssa> Blend shaders are going to make me lose my santy.
<HdkR> Sounds like fun :)
stikonas has quit [Remote host closed the connection]