#panfrost on 2019-07-01 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:57 NeuroScr has joined #panfrost

01:04 vstehle has quit [Ping timeout: 246 seconds]

01:36 NeuroScr has quit [Quit: NeuroScr]

03:14 marcodiego has quit [Quit: Leaving]

03:20 robink has joined #panfrost

04:20 rcf has quit [Ping timeout: 258 seconds]

05:00 vstehle has joined #panfrost

05:31 Elpaulo has quit [Read error: Connection reset by peer]

05:33 Elpaulo has joined #panfrost

06:24 pH5 has joined #panfrost

07:27 stikonas has joined #panfrost

07:41 stikonas has quit [Remote host closed the connection]

07:43 gcl_ has quit [Quit: Moving day; offline until I have Internet again.]

07:48 yann has quit [Ping timeout: 272 seconds]

09:08 yann has joined #panfrost

09:20 raster has joined #panfrost

09:23 raster has quit [Remote host closed the connection]

09:24 raster has joined #panfrost

10:26 _whitelogger has joined #panfrost

11:03 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

11:24 rhyskidd has joined #panfrost

11:44 rhyskidd has quit [Quit: rhyskidd]

11:45 rhyskidd has joined #panfrost

11:53 davidlt has joined #panfrost

12:05 rhyskidd has quit [Quit: rhyskidd]

12:05 davidlt has quit [Remote host closed the connection]

12:06 rhyskidd has joined #panfrost

12:07 davidlt has joined #panfrost

12:26 rhyskidd has quit [Quit: rhyskidd]

12:27 rhyskidd has joined #panfrost

12:27 rhyskidd has quit [Remote host closed the connection]

12:27 rhyskidd has joined #panfrost

12:41 <bbrezillon> robher, tomeu: I'm currently debugging an iommu fault issue, and I think it might come the dmabuf import/export logic

12:41 <bbrezillon> sorry, just import, not export

12:42 <bbrezillon> for each dmabuf import we create a new GEM object and map the region on the GPU side

12:43 rhyskidd has quit [Ping timeout: 258 seconds]

12:43 <bbrezillon> now, imagine the userspace app keeps importing the same dmabuf and releasing the gem object that has been created from it

12:44 <bbrezillon> we might have a race between the mmu_map() of the newly imported dmabuf and the mmu_unmap() of the destroyed GEM, leading to a situation where the new GEM is actually not mapped GPU-side

12:47 <tomeu> looks like a good one

12:47 <tomeu> bbrezillon: how do other drivers deal with this?

12:48 <tomeu> bbrezillon: btw, I'm implementing caching of BOs, because if a BO is imported twice, when it is closed for the first time a dangling reference will remain

12:48 <tomeu> could be related to the symptoms you are seeing

12:51 <bbrezillon> tomeu: hm, probably not, I mean, the thing you're describing would lead to a mem leak, not a use after free

12:51 <bbrezillon> (well, it's more a use-after-unmap than a use-after-free)

12:52 <tomeu> use-after-unmap, yes

12:52 <tomeu> which would cause page faults

12:53 <bbrezillon> tomeu: isn't caching of BOs (which I'm sure is useful for other reasons) actually hiding the real problem here?

12:53 <tomeu> well, I think it's adding reference counting on top of a resource that isn't reference-counted

12:53 <bbrezillon> you mean, on the mesa end?

12:56 jolan has quit [Quit: leaving]

12:57 jolan has joined #panfrost

12:59 <tomeu> yep

13:00 <robher> I think multiple imports is not something the kernel deals with. Android does this and bringup of every driver requires adding the support in Mesa.

13:02 <bbrezillon> robher: well, looks like Xwayland keeps importing dmabuf/freeing resulting objs at a high rate, and since the cleanup is happening asynchronously in the DRM driver, I fear we fall in the case I described above

13:03 afaerber has quit [Quit: Leaving]

13:12 <bbrezillon> robher: I'm probably wrong, if things happen concurrently we'd have 2 different GPU mem regions pointing to the same BO (drm_mm_insert_node_generic() is supposed to get a free region), and what I'm seeing is the same GPU-mem range re-used over and over, and suddenly an iommu fault

13:15 <bbrezillon> robher, tomeu: interestingly, if I call panfrost_drm_free_imported_bo() (which should probably be killed BTW) instead of panfrost_drm_free_slab() things are a bit more stable

13:15 <tomeu> ah yes, I'm carrying that change here :/

13:16 <tomeu> I think I would killall of pan_drm.c if I had some spare time

13:16 <bbrezillon> why?

13:17 <tomeu> because there's a lot of stuff that should be in pan_resource.c (or a new pan_bo.c), and the rest isn't that interesting

13:18 <tomeu> guess having pan_resource.c and pan_bo.c would make us more similar to the other gallium drivers

13:18 <bbrezillon> makes sense

13:20 afaerber has joined #panfrost

13:20 <tomeu> btw, I'm looking now at why after some time, the mmap in import_bo failes with ENOMEM

13:21 <bbrezillon> tomeu: on arm32 or arm64?

13:22 <bbrezillon> panfrost_drm_free_imported_bo() seems to leak the CPU mapping

13:22 <bbrezillon> but it's never called anyway (unless you have a patch that assign ->imported to true in the import BO path)

13:23 <tomeu> yeah

13:23 <bbrezillon> keeping the cpu mapping alive might also explain why I don't see the page fault in that case (I see other page faults though)

13:25 * bbrezillon goes check what's done in the mmap

13:25 <tomeu> robher: is anything blocking growable, btw?

13:26 <robher> tomeu: implementing shrinker support so we free it on memory pressure.

13:27 <robher> though maybe that can be done later. I guess we're better off growing than allocating it all up front.

13:27 <robher> We just delay OOM...

13:29 <tomeu> robher: yeah

13:30 <tomeu> and userspace can move forward with using the new UABI

15:24 <bbrezillon> tomeu, robher: and the fix is a onliner, as usual

15:24 <bbrezillon> oneline

15:24 <bbrezillon> *oneliner

15:24 <tomeu> bbrezillon: cannot wait to see it!

15:26 pH5 has quit [Quit: bye]

15:27 <bbrezillon> tomeu: I just sent the patch

15:27 <bbrezillon> and you're in Cc

15:27 <tomeu> oh, cool!

15:36 <tomeu> bbrezillon: may be a good idea to add the BO to bo_handles, in panfrost_drm_submit_job

15:36 <tomeu> guess that omission could be causing some flip-flops as well

16:02 <alyssa> gfx driver dev is hard

16:02 <alyssa> why didn't i listen to all those people that tried to warn me

16:03 <alyssa> bbrezillon: Are you working on the transient stuff? (I might take another stab if you're not, but if you are that's great too :) )

16:05 <alyssa> ...or I could hack on MRT :angel:

16:17 <bbrezillon> alyssa: I'm not

16:17 <bbrezillon> thougth this was shadeslayer's task

16:18 <bbrezillon> tomeu: yes, I wondered why we're not passing them to the driver

16:18 <alyssa> bbrezillon: Oh, yeah, maybe you're right

16:18 * alyssa can't keep track

16:18 * alyssa is already lost in GLES3 lalaland

16:19 pH5 has joined #panfrost

16:19 <bbrezillon> tomeu: this being said, I'm not sure it would be any safer to pass them to the driver

16:19 <bbrezillon> since userspace is still in charge of this BO list creation

16:23 jernej has joined #panfrost

16:33 <urjaman> alyssa: sometimes, it's better not to listen to those people :P

16:34 <alyssa> urjaman: Fair.

16:46 herbmillerjr has quit [Quit: Konversation terminated!]

16:52 herbmillerjr has joined #panfrost

17:10 yann has quit [Ping timeout: 272 seconds]

17:11 raster has quit [Read error: Connection reset by peer]

17:23 stikonas_ has joined #panfrost

18:34 TheKit has quit [Ping timeout: 245 seconds]

18:47 <alyssa> Render target format madness

18:47 * alyssa shivers

18:59 <bbrezillon> tomeu: hm, actually I omitted one important aspect that's taken care of kernel-side => wait on BO fences

19:00 <bbrezillon> so yes, I guess we should pass all BOs attached to a job to the SUBMIT ioctl

19:08 <alyssa> Oh, lovely, even the MRT hardware doesn't support MRT sanely.

19:16 TheKit has joined #panfrost

19:23 <alyssa> So, nir_format_convert.h seems to cover a lot of what will be needd

19:23 <alyssa> I guess the trick will be to integrate that with the NIR blend shader code

19:23 <alyssa> So then we get blend shaders to do the heavy lifting

19:46 yann has joined #panfrost

20:22 pH5 has quit [Quit: bye]

20:45 jcureton has joined #panfrost

20:50 <jcureton> hi! i've been following T720 development, and haven't seen any updates for a while here or on the dri-devel list. I'm currently running with the 32-bit job hacks in Mesa to deal with a 64-bit userspace. is the plan still for the kernel driver to unify to 64-bit descriptors even on the smaller GPU designs?

20:51 <alyssa> jcureton: That's the plan, yeah!

20:51 <alyssa> I think tomeu was working on that..?

20:51 <jcureton> thanks alyssa! yeah, i was under the impression tomeu would be the right person to ask

21:25 hlmjr has joined #panfrost

21:28 herbmilleriw has quit [Ping timeout: 252 seconds]

21:32 afaerber has quit [Quit: Leaving]

21:46 afaerber has joined #panfrost

21:47 <alyssa> Okay, so I guess the trick will be to add new intrinsics so we can do type conversion explicitly.

21:55 stikonas_ has quit [Ping timeout: 252 seconds]

22:07 herbmilleriw has joined #panfrost

22:10 hlmjr has quit [Ping timeout: 268 seconds]

22:11 maciejjo has quit [Ping timeout: 244 seconds]

22:18 davidlt has quit [Ping timeout: 245 seconds]

22:47 stikonas_ has joined #panfrost

22:47 NeuroScr has joined #panfrost

23:02 stikonas_ has quit [Remote host closed the connection]

23:04 stikonas_ has joined #panfrost

23:04 stikonas_ has quit [Remote host closed the connection]

23:43 NeuroScr has quit [Quit: NeuroScr]