#panfrost on 2019-07-03 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

01:25 vstehle has quit [Ping timeout: 248 seconds]

03:35 _whitelogger has joined #panfrost

04:44 * hanetzer is anticipating kernel 5.2; panfrost.ko should be in that, no?

04:46 griffinp has quit [Ping timeout: 248 seconds]

05:00 vstehle has joined #panfrost

05:18 mani_s_ has quit [Ping timeout: 268 seconds]

05:25 <tomeu> robher: I thought that userspace broke when mmap started failing noisely, so not mmaping would unbreak userspace?

05:31 <tomeu> robher: regarding testing, we have panfrost just now built as a module in next, so we should start running the panfrost igt tests and probably extend them to exercise this code path

05:32 <tomeu> well, if we want for userspace to be able to mmap an imported buffer :)

06:01 <bbrezillon> tomeu, alyssa, robher: just my 2cents regarding the mmap regression => we're talking about a driver that's been merged in 5.2-rc1 (5.2 is not even release yet), and AFAICT, the only user we have right now is the Panfrost driver is mesa which is still under development

06:02 <bbrezillon> not to mention that it's the mesa side of the driver that was not conforming to the "don't mmap imported BO" rule

06:04 <bbrezillon> so maybe we should just fix the bug in mesa and inform everyone that they should update their mesa/panfrost driver if they want it to work with the latest kernel version

06:05 <bbrezillon> I know that the kernel is supposed to provide some guarantees about not changing existing behavior if some userspace lib/app rely on it, even if it's known to be broken

06:09 <bbrezillon> but given how recent the driver and driver user (mesa/panfrost) are, I'd suggest that we just bend the rules

06:26 milloni has quit [Remote host closed the connection]

06:27 milloni has joined #panfrost

06:34 <bbrezillon> alyssa, tomeu, robher: looks like dropping the mmap() on imported BO does not work, there's that tries to access the memory region

06:34 <bbrezillon> *there's something

06:50 <tomeu> bbrezillon: I'm not sure we'd be bending anything if no kernel has been released with that behavior

06:53 <bbrezillon> yep

06:53 <bbrezillon> also think so

06:56 stikonas has joined #panfrost

07:10 stikonas has quit [Remote host closed the connection]

07:25 yann has joined #panfrost

07:30 yann has quit [Ping timeout: 268 seconds]

07:40 <bbrezillon> tomeu, robher: getting rid of the mmap() on imported BO might be harder than we thought

07:41 <tomeu> did you find out what needs it?

07:41 <bbrezillon> I have a crash with xwayland

07:42 <bbrezillon> I'm not sure yet, but it seems like the GEM are exported, the DMABUF fd passed through a wayland message and imported on the other end

07:42 <bbrezillon> (client app I guess)

07:43 <bbrezillon> and this client tries to write to the surface

07:43 <tomeu> hmm

07:43 <tomeu> daniels: wonder if there's any way around it

07:43 <tomeu> guess that's basically server-allocated shm buffers

07:44 <bbrezillon> as I said, I'm not sure yet

07:44 <daniels> xwl doesn't use server-allocated buffers

07:44 <bbrezillon> I tried to mmap the dmabuf directly (mmap() on the dmabuf fd), but the FD has RO perm

07:44 <daniels> it allocates its own with gbm and passes them to the host compositor

07:45 <daniels> do you have a backtrace of where the failing map is called from?

07:45 <bbrezillon> yes, but my xwayland bin does not have debug symbols

07:45 <bbrezillon> so it's useless

07:50 <daniels> i wouldn't have thought that xwl would be mmaping client buffers directly; is it not coming through Mesa?

07:50 <tomeu> hmm

07:51 <tomeu> I'm changing things right now so all BOs are allocated by the panfrost driver, and imported into the KMS device for scanout

07:51 <tomeu> but I don't know if we can guarantee that all midgard+bifrost implementations will be coupled with graphics hw that can scanout whatever the GPU driver allocates

07:52 <bbrezillon> daniels: it's xwayland that crashes, so I'd say it's the one importing the BO and trying to write to it

07:52 <tomeu> specifically, if the graphics hw won't require CMA-allocated buffers

07:52 <tomeu> bbrezillon: strace should tell?

07:53 <daniels> bbrezillon: sure, but I meant that the access is presumably Xwayland -> GL -> Mesa -> map dmabuf

07:54 <bbrezillon> on a side note, I looked at other drivers and they do not seem to check whether it's an imported BO or not before mmap()-ing it through the GEM path

07:54 <bbrezillon> daniels: oh, yes, that's probably the case

08:00 <tomeu> bbrezillon: btw, if you are on debian, you may need to only install debug symbols to get a nice trace

08:03 <bbrezillon> tomeu: what's the package name?

08:04 ente has joined #panfrost

08:04 <bbrezillon> something -dbg

08:13 <bbrezillon> tomeu: okay, have it now

08:14 <bbrezillon> it's something in _mesa_format_convert()

08:17 <tomeu> bbrezillon: what's the bt?

08:17 <tomeu> wonder if it isn't tiling an uploaded texture

08:20 <bbrezillon> tomeu: http://code.bulix.org/9hwopr-784837

08:20 <bbrezillon> glTexSubImage2D()?

08:22 <tomeu> so it's uploading data to a texture, and for some reason mesa decides it needs to be converted to another format

08:22 <tomeu> so not tiling

08:24 <bbrezillon> hm, not sure

08:27 <bbrezillon> there's definitely a format conversion

08:27 <bbrezillon> but it doesn't explain why dst is an imported BO

08:27 <tomeu> maybe because it's probably going to be scanned out

08:28 <tomeu> so it was allocated on the KMS driver

08:28 <tomeu> this should change with my modifiers branch

08:28 <bbrezillon> (assuming imported BO are not supposed to be written to, which, based on the other drivers I looked at, am not sure is a rule that's enforced by all drivers)

08:29 <tomeu> well, if it's a wayland client's surface then it's susceptible of being scanned out

08:29 <tomeu> so it will currently be allocated in the kms device

08:29 <tomeu> because a surface can end up in its own plane

08:30 <bbrezillon> ok, but is that really an invalid use case?

08:38 <bbrezillon> tomeu: maybe I'm missing something

08:43 <tomeu> yeah, I'm afraid it's valid

08:43 <tomeu> but hopefully we won't hit it :)

08:45 <bbrezillon> hm, not sure I like that :)

08:46 <bbrezillon> tomeu: I can mmap() the dmabuf directly (instead of importing the BO and then mapping the GEM object), but for some reason, the FD I'm being passed is opened RO

08:46 <bbrezillon> and mmap(RW) fails with EPERM

08:49 <bbrezillon> interestingly, if we import the BO (like we previously did), the dmabuf permission are completely ignored, and the resulting GEM obj can be mmap()-ed in RW mode

08:51 <daniels> bbrezillon: the buffer that TexSubImage is trying to upload to - where was it allocated?

08:51 <daniels> oh, I think I can answer this, actually

08:52 <daniels> glamor uses GBM as an actual generic buffer allocator - it'll allocate a single BO for a window's backing storage, then import that as an EGLImage, within the same process

08:52 <daniels> which is a legitimate thing which has to work

08:53 <daniels> e.g. if you changed https://gitlab.freedesktop.org/daniels/kms-quads/ to upload to the FBO with TexImage then you'd have the same problem

09:04 mani_s has joined #panfrost

09:05 gtucker has joined #panfrost

09:09 davidlt has joined #panfrost

09:10 mani_s has quit [Quit: ZNC 1.7.2 - https://znc.in]

09:15 raster has joined #panfrost

09:23 mani_s has joined #panfrost

09:32 <bbrezillon> tomeu, daniels: hm, do you know why BO are always exported in RO mode (drmPrimeHandleToFD() is not passed the DRM_RDWR flag)

09:32 <bbrezillon> ?

09:35 <daniels> bbrezillon: do you get RO unless you explicitly pass RW?

09:36 <bbrezillon> daniels: yes

09:36 <bbrezillon> but maybe that's what we want

09:37 <daniels> well, it depends on the intended usage ... for passing into KMS and to other clients (e.g. Wayland client -> compositor), we surely want the export to be RO

09:38 <daniels> but given that GPUs ignore the permissions and allow writes anyway, and that caching means that we might end up with an import with flags from a previous import, it seems sensible to allow RW regardless

09:39 <bbrezillon> if we change that tree-wide in mesa, I should be able to mmap() the dmabuf directly instead of going through an import (which no longer works after Steven's patch)

09:43 <bbrezillon> daniels: I just tested it, and if I patch the call in renderonly.c plus the one in pan_drm.c it works

09:45 <daniels> (note that direct access to a dmabuf's backing storage through mmap should be bracketed by dmabuf access begin/end calls btw)

09:46 <bbrezillon> daniels: yep

09:46 <bbrezillon> I have a comment in the code

09:46 <daniels> c'est bon

09:46 <bbrezillon> it's a bit complicated to do that right now

09:47 <bbrezillon> since we'd need to sync things before mmap()-ing, but also before submitting a job

09:47 <daniels> how do other drivers do it? import to a handle and then map through the handle, rather than map through the fd? or something else?

09:47 <bbrezillon> and before re-using the BO (after the GPU is done)

09:47 <bbrezillon> daniels: yep

09:48 <bbrezillon> and they also map on demand

09:48 <bbrezillon> which is not the case in panfrost yet

09:49 <bbrezillon> for buffers allocated by the GPU driver or display controller it shouldn't be a problem because they are mapped write-combine

09:50 <bbrezillon> (at least CPU-side, don't remember if things are coherent GPU-side)

09:52 <bbrezillon> daniels: BTW, I'm not sure the FD -> handle conversion provides guarantees on CPU <-> GPU sync, so, in theory, even without the dmabuf syncs, it shouldn't be worse than what we have right now

09:52 <bbrezillon> but it definitely deserves a FIXME

09:55 <daniels> (i wonder if this might be related to rk3288 flip-flops ...)

10:03 <bbrezillon> could be

10:12 yann has joined #panfrost

11:44 afaerber has quit [Quit: Leaving]

11:57 maciejjo has quit [Remote host closed the connection]

11:58 afaerber has joined #panfrost

12:48 xdarklight has joined #panfrost

12:57 maciejjo has joined #panfrost

13:12 <tomeu> yeah

13:36 <alyssa> tomeu: 'I don't know

13:36 <alyssa> ' never try to gaurantee anything about midgard/bifrost impl

13:38 <alyssa> tomeu: bbrezillon daniels: Keep in mind that winsys stuff is totally over my head, but one thing I can think of:

13:39 <alyssa> A lot of these apps seem to want to render to RGB10_A2. I explicitly disable that format in pan_screen's format_supported, and then Gallium has them render to RGBA8

13:40 <alyssa> ....but maybe it's still pretending to be RGB10_A2, and whenever the app does shm access Gallium is incurring a (possibly very) expensive software colourspace conversion

13:40 <bbrezillon> alyssa: the format conversion seems to be SW-based indeed

13:41 <bbrezillon> but that's not a reason to not support that (even if it should probably be optimized at some point)

13:41 <alyssa> bbrezillon: Do you know how often that code is running? (From a perf standpoint)

13:41 <alyssa> Is that every frame for shm clients?

13:42 <bbrezillon> didn't check

13:43 <tomeu> guess that perf should show that if it has a noticeable impact

13:45 <alyssa> The unfortunate part is that rendering to RGB10_A2 is slooow

13:45 <alyssa> because it's one of the formats that needs a blend shaders

13:45 <alyssa> *shader

13:45 <alyssa> Yes we can do it but we're just pushing slowness around

13:46 <alyssa> (If it's only shm clients, that's a different story, texturing from RGB10_A2 is ok)

14:03 jesse74 has joined #panfrost

14:04 jesse74 has left #panfrost [#panfrost]

14:05 jcureton has joined #panfrost

14:20 davidlt has quit [Ping timeout: 272 seconds]

14:56 davidlt has joined #panfrost

15:13 <bbrezillon> tomeu, daniels: given that the GPU takes care of cache maintenance at the beginning/end of each job chain (as explained by Steven) and the fact that BOs are mapped WC on the CPU side, I don't think the absence of explicit CPU <-> GPU syncs could be a problem

15:13 <bbrezillon> so, that's probably not the cause of those flip-flop we have on rk3288

15:18 Lyude has quit [Read error: Connection reset by peer]

15:36 <alyssa> bbrezillon: From my perspective in userspace, what's the significance of WC mapping in terms of making fast code?

15:36 <tomeu> hmm, wonder if some GPU versions behave differently in that regard

15:37 <tomeu> there's at least some erratas

15:38 Lyude has joined #panfrost

15:39 <bbrezillon> alyssa: don't read from the BO :)

15:39 <bbrezillon> WC (write-combine) means there's no caching

15:40 <bbrezillon> the only optimization is on the write path, where the CPU tries to buffer sequential write access to try grouping several small mem accesses into bigger ones

15:41 <bbrezillon> *to group

15:44 <bbrezillon> alyssa: https://fgiesen.wordpress.com/2013/01/29/write-combining-is-not-your-friend/

16:04 <raster> hmmm

16:04 <raster> fascinating

16:04 <raster> drmModeAtomicCommit() time out

16:05 <raster> or more specifically we dont see complete events at all

16:06 <raster> UNTIL i try and run another gl client - it fails entirely in eglCreateWindowSurface

16:06 <raster> but after this then i get complete events

16:07 * raster stares at the kernel with an evil eye

16:10 <alyssa> bbrezillon: Blah, ok

16:11 <alyssa> bbrezillon: So, is it safe to do random-access writes to a WC buffer?

16:12 <alyssa> Also, is there a sane way to test that we're doing this right and not incurring the cache wrath?

16:12 <HdkR> perf top :P

16:13 <raster> on arm write combining is not defined like x86

16:14 <alyssa> raster: *dun dun dunnnn*

16:15 <raster> i dont want to update/rebuild my kernel

16:15 <raster> why does the mainline dt for rockpro64 have to be so broken?

16:16 <raster> oh ... interesting

16:16 <raster> glmark in offscreen mode is not happy with panfrost

16:18 <alyssa> This is true..

16:18 <raster> but on screen it's gotten pretyt good

16:18 <raster> oh

16:18 <raster> fbos magically have been fixed

16:18 <raster> mostly

16:18 <raster> it seems

16:18 <raster> :)

16:18 <raster> good stuff

16:18 <alyssa> "Mostly", alas

16:18 <raster> kms seems to be unhappy tho

16:18 <raster> as i said

16:18 <alyssa> raster: See -bterrain for how far that's come

16:18 <alyssa> (Or -bdesktop)

16:18 <raster> i dont get completion events from swaps

16:19 <raster> yeah

16:19 <raster> i saw terrain :)

16:19 <raster> and the desktopones

16:19 <raster> :)

16:19 <raster> dang

16:19 <alyssa> Only known visual bug in glmark is the shadow -bideas is screwy

16:19 <raster> i may have to update my kernel

16:19 <raster> hmm

16:19 <raster> i got a segv

16:20 <raster> oh

16:20 <raster> the flip timeouts are because we drop back to sw rendering as eglCreateWindowSurface fails

16:20 <raster> until i somehow swizzle kernel state by another client also failing

16:21 <raster> oh

16:21 <raster> damn

16:21 <alyssa> For E?

16:21 <raster> its always failing now

16:21 <raster> yeah

16:21 <raster> well if i run elementary_test and enable gl

16:22 <raster> it fails (and segv's)

16:22 <raster> but then enlightenment starst without failures

16:22 <raster> so something gets swizzled in kernel to make it then work

16:22 <raster> i havent figured out what yet

16:22 <raster> but i'd need to get up to date with kernel as i'm probably a month or 2 behind

16:22 <raster> 5.2.0-rc3-next-20190605

16:23 <raster> not going to start that now today :)

16:26 <raster> but it does seem something somewhere may be broken with atomic flips of sw mmaped drm buffers

16:26 <raster> on this kernel anyway

16:26 <alyssa> There's been talk on the list about sw mmaping imported buffers being broken

16:26 <alyssa> So maybe that's realted

16:27 <alyssa> *related

16:27 <raster> it may be tho that'd be client buffers

16:27 <raster> but when egl init fails then we go back to sw

16:28 <raster> and we're getting issues there that our timeout checks are detecting (and complaining about) with nothing on screen

16:28 <raster> but no point doing much until i am up to date

16:28 <raster> :)

16:28 <raster> but

16:29 <raster> much has improved since my last check maybe ~2-4 weeks back

16:30 <alyssa> Full-time development does wonders for a driver :0

16:30 <raster> ahhahahahahahahahaa

16:30 <raster> :)

16:41 pH5 has joined #panfrost

16:52 <bbrezillon> alyssa: random access write is okay, if you don't care about perfs

16:53 <alyssa> So not ok? :p

16:53 <bbrezillon> okay as in "will do what you expect"

16:53 <bbrezillon> "but slowly"

16:53 <bbrezillon> :)

16:57 <alyssa> So not ok if I care about perf :(

16:57 <bbrezillon> obviously not

16:58 <bbrezillon> alyssa: what's the case you have in mind? sw-based linear <-> tiled transformation?

16:59 <alyssa> bbrezillon: Not even that, I just sometimes get lazy with what's GPU mapped and what's not and want to make sure I don't do anything silly

17:02 <bbrezillon> alyssa: the sw-based colorspace conversion is likely to fall in that "read from WC-mapped BO" case

17:03 <bbrezillon> I know that etnativ has an option to map enable caching

17:03 <bbrezillon> s/map//

17:05 <bbrezillon> but then you have to take care of cache maintainanc

17:13 stikonas has joined #panfrost

17:32 yann has quit [Ping timeout: 248 seconds]

17:47 <alyssa> https://people.collabora.com/~alyssa/formats.txt

17:47 <alyssa> ^ certainly could be worse

17:52 <alyssa> ...Right, I also still have sRGB blend shaders TODO. Oy vey

17:52 <alyssa> Blend shaders: the gift that keeps on giving (TM)

17:56 raster has quit [Remote host closed the connection]

18:41 tgall_foo has quit [Remote host closed the connection]

19:03 jcureton has quit [Remote host closed the connection]

19:37 tgall_foo has joined #panfrost

19:51 jcureton has joined #panfrost

20:11 pH5 has quit [Quit: -_-]

20:45 * alyssa probably messed up the stride for RGB5_A1 or something

21:34 afaerber has quit [Quit: Leaving]

21:45 davidlt has quit [Ping timeout: 272 seconds]

22:09 afaerber has joined #panfrost

22:37 yann has joined #panfrost

22:48 <alyssa> So, at least on T860 (probably different for other models, based on notes in some old code I have here afddafdas):

22:48 <alyssa> There are *3* types of loads you might encounter in a blend shader

22:48 <alyssa> 1) Raw load. Manually unpack the raw pixel with ALU ops.

22:48 <alyssa> 2) Raw load and accelerated unpack. Two LD/ST ops back-to-back.

22:49 <alyssa> 3) Unpacking load. Single ld/st op.

22:49 <alyssa> A given format maps to one of those. #1 is the most general, of course.

23:05 <alyssa> Blend shaders are going to make me lose my santy.

23:14 <HdkR> Sounds like fun :)

23:49 stikonas has quit [Remote host closed the connection]