#panfrost on 2021-05-06 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:04 vstehle has quit [Ping timeout: 252 seconds]

00:44 archetech has quit [Quit: Konversation terminated!]

00:56 stikonas has quit [Ping timeout: 240 seconds]

01:00 atler is now known as Guest21362

01:00 atler has joined #panfrost

01:03 Guest21362 has quit [Ping timeout: 246 seconds]

01:18 jernej has quit [Ping timeout: 276 seconds]

01:19 jernej has joined #panfrost

01:52 atler has quit [*.net *.split]

01:52 endrift has quit [*.net *.split]

01:52 bbrezillon has quit [*.net *.split]

01:52 paulk-leonov has quit [*.net *.split]

01:52 tlwoerner has quit [*.net *.split]

01:52 tomeu has quit [*.net *.split]

01:52 Prf_Jakob has quit [*.net *.split]

01:52 remexre has quit [*.net *.split]

01:54 atler has joined #panfrost

01:54 endrift has joined #panfrost

01:54 paulk-leonov has joined #panfrost

01:54 bbrezillon has joined #panfrost

01:54 Prf_Jakob has joined #panfrost

01:54 tlwoerner has joined #panfrost

01:54 remexre has joined #panfrost

01:54 tomeu has joined #panfrost

01:54 paulk-leonov has quit [Max SendQ exceeded]

01:55 paulk-leonov has joined #panfrost

01:58 kaspter has joined #panfrost

02:14 raster has quit [Quit: Gettin' stinky!]

02:46 kaspter has quit [Ping timeout: 240 seconds]

02:46 kaspter has joined #panfrost

03:50 davidlt has joined #panfrost

05:17 vstehle has joined #panfrost

06:02 camus has joined #panfrost

06:02 kaspter has quit [Ping timeout: 268 seconds]

06:03 camus is now known as kaspter

06:23 kaspter has quit [Remote host closed the connection]

06:24 kaspter has joined #panfrost

06:35 guillaume_g has joined #panfrost

06:56 nlhowell has joined #panfrost

07:03 <bbrezillon> wicast: you might want to try https://gitlab.freedesktop.org/bbrezillon/mesa/-/tree/panvk-fixes

07:05 <bbrezillon> I realized DRM_IOCTL_SYNCOBJ_TRANSFER is only supported if the driver has the SYNCOBJ_TIMELINE cap set, which is not yet the case on panfrost

07:08 <bbrezillon> that explains why the app was stuck waiting for a fence that was never signalled (reproduced locally after I've compiled a kernel that doesn't have https://lore.kernel.org/dri-devel/20210311092539.2405596-1-boris.brezillon@collabora.com applied)

08:33 kaspter has quit [Ping timeout: 240 seconds]

08:33 kaspter has joined #panfrost

08:36 camus has joined #panfrost

08:37 kaspter has quit [Ping timeout: 246 seconds]

08:37 camus is now known as kaspter

08:52 stikonas has joined #panfrost

09:00 stikonas has quit [Remote host closed the connection]

09:03 stikonas has joined #panfrost

09:04 stikonas has quit [Remote host closed the connection]

09:06 stikonas has joined #panfrost

09:17 <tomeu> bbrezillon: regarding https://gitlab.freedesktop.org/panfrost/mesa/-/merge_requests/34#note_908341 , I'm waiting for the first half of the batch to complete before submitting the second half: https://gitlab.freedesktop.org/panfrost/mesa/-/merge_requests/34#a9366c3de4e72a81120f5663284043e672c21a80_1221_1221

09:18 nlhowell has quit [Ping timeout: 240 seconds]

09:19 raster has joined #panfrost

09:20 <bbrezillon> ok, so that works for intra cmdbuf signalization, but IIUC, vkEvent is also handling CPU <-> GPU sync, and inter queue/cmdbuf sync

09:21 <tomeu> tbh, I thought this was such a huge hammer that it covers all possible sync needs

09:21 <bbrezillon> I think we need to embed a syncobj in our vkevent, and then transfer the batch out_fence to those syncobjs

09:21 <bbrezillon> tomeu: no, what you do really doest wait for the batch to complete

09:22 <bbrezillon> unless the have the trace or sync debug options

09:22 <bbrezillon> *unless you have

09:22 <tomeu> but, that wait is the same we do when the trace or sync debug options are enabled

09:22 <bbrezillon> ok, maybe I missed something

09:23 <bbrezillon> I'm a bit lost, you wait for events before queueing the batch?

09:24 <tomeu> before queueing the second half of the batch we splitted

09:24 <tomeu> but of course, after submitting the first half

09:29 <bbrezillon> oh, I get it now

09:29 <bbrezillon> still not a huge fan of this wait in the submit path

09:30 <bbrezillon> I feel like the syncobj solution wouldn't be much more complicated

09:30 <bbrezillon> and would avoid this wait

09:36 <tomeu> yeah, tbh, I would prefer to get conformant first

09:36 <tomeu> or at least conformant enough so we can run deqp in CI

09:36 <tomeu> then it will be much easier to improve stuff without regressing

09:37 <tomeu> also, I'm not really sure how much better waiting in the kernel would be if we had a separate submission tread

09:37 <tomeu> but it's something I would very gladly leave for the future :)

09:38 <tomeu> this passes all the dEQP-VK.synchronization.basic.* tests and allows me to move onto the dEQP-VK.api.command_buffers.* ones*

09:46 <bbrezillon> yes, but again, I'm not sure what you're doing here is correct

09:46 <bbrezillon> I still don't see how GPU waits can work

09:47 <tomeu> so, when we insert a wait cmd, any subsequent commands will be submitted after all the previous ones have completed

09:47 <tomeu> inefficient, but seems correct to me?

09:49 <bbrezillon> that's only intra-queue sync

09:49 <bbrezillon> AFAICT

09:51 <tomeu> bbrezillon: vkQueue you mean? or hw queue?

09:52 <bbrezillon> vkQueue

09:53 <tomeu> how does that work, multiple app threads could have each its own vkQueue and call the submit ioctl concurrently?

09:54 <tomeu> ah, looks that that case isn't supported by the spec

09:54 <tomeu> "Events must not be used to insert a dependency between commands submitted to different queues."

09:54 <tomeu> and earlier:

09:54 <tomeu> "Events are a synchronization primitive that can be used to insert a fine-grained dependency between commands submitted to the same queue, or between the host and a queue. "

09:54 <bbrezillon> ok

09:55 <amonakov> (inter-thread synch is done via semaphores)

09:56 <amonakov> erm, inter-queue I mean

09:56 <bbrezillon> that leaves the "GPU waits on CPU signaling an event" case, which AFAICT is not handled

09:56 <bbrezillon> or did I miss it

09:58 <bbrezillon> tomeu: and I'd rather think that through than introducing an implementation that adds waits in the submit path

09:59 <tomeu> yeah, we miss that, v3dv handles those waits in a separate thread

10:02 <bbrezillon> yeah, but if you do that you clearly can't wait in the submit path

10:03 <bbrezillon> you'd need to have a defered queue for batches that have such deps

10:04 <tomeu> yep

10:04 <bbrezillon> which would make the implementation a lot messier than if we were using syncobjs and extending the SUBMIT_IOCTL to allow unsignaling a syncobj (or any other solution we find to handle the reset case)

10:05 <tomeu> hmm, wonder if we need to unsignal a syncobj, if we couldn't just replace an event's syncobj with a new one

10:07 <bbrezillon> you can definitely transfer an unsignaled fence to a syncobj

10:07 <tomeu> ok, that would work

10:08 <tomeu> I'm wondering now if we shouldn't then use write_value jobs for the set and reset commands

10:08 <tomeu> then add a ioctl for the kernel to poll a memory region and signal when it changes values

10:08 <bbrezillon> ok, so a fence for the sync, and a GPU buf for the value?

10:08 <tomeu> yep

10:09 <tomeu> we would need to get the caching right I guess

10:09 <tomeu> as it would be writable by both gpu and cpu

10:09 <bbrezillon> if it's in a separate batch it shouldn't be an issue

10:09 <bbrezillon> (at least not yet :))

10:09 <tomeu> then we would be only splitting batches on waits

10:10 <tomeu> ok, let me think about all this over lunch

10:10 <bbrezillon> GPU buffers are mapped NC on the CPU side, that leaves GPU caches, which are flushed at job boundaries

10:10 <tomeu> guess if we find a relatively simple solution that is optimal and we can be sure it's the way forward, it's better than a more complex one that doesn't require kernel changes

10:11 <bbrezillon> but I wouldn't through away the syncobj-only solution so quickly

10:12 <bbrezillon> I mean, if there's a way to transfer an unsignaled fence to the vkevent syncobj, it should work without requiring kernel changes

10:12 <bbrezillon> just not sure when this transfer should happen

10:13 <bbrezillon> and I feel it would have to be done kernel side

10:13 <tomeu> ah, I thought we needed to do it in the kernel so we don't have to block as we do now

10:13 <bbrezillon> most likely yes

10:14 <tomeu> then maybe the write_value option is better because we avoid splits?

10:14 <bbrezillon> I mean, unsignaling the syncobj should happen when the job is done executing, not earlier

10:14 <tomeu> the mali supports only two queued jobs, so small chains are worst for latency than other gpus

10:14 <bbrezillon> we avoid split at the expense of active waits

10:15 <bbrezillon> and it's still not clear to me how you'd implement GPU waits without splitting the batch

10:15 <tomeu> with write_value and a poll ioctl? I think we don't have any additional active waits that way

10:15 <bbrezillon> the only thing you could avoid is splits on Set/Reset

10:15 <tomeu> for the waits we need to split

10:16 <tomeu> because the HW doesn't support it

10:16 <tomeu> the question is now whether the wait should happen in the kernel or in userspace

10:17 <tomeu> yeah, I think we are in agreement on that

10:17 <bbrezillon> if GPU has to wait, I'd rather do that in kernel space, with an explicit dep (in_sync)

10:17 <tomeu> yeah, I also like that

10:18 <tomeu> in the mem-backed solution, it would be waiting on the syncobj returned by the new poll ioctl

10:18 <bbrezillon> I feel like the "split batches on events" option, with a way to unsignal syncobjs would be the simplest solution, but I guess we can try the write-value approach too

10:19 <tomeu> by unsignaling syncobjs, that would be transferring a new syncobj to the event's?

10:20 <bbrezillon> no, same syncobj, new unsignaled fence attached to it

10:20 <tomeu> ok, so it's all userspace then

10:20 <bbrezillon> not really, because it has to be done when the batch is done

10:21 <bbrezillon> otherwise you screw up the sequencing (or maybe not, I'm not sure)

10:21 <tomeu> but, do we really want to change the kernel for something that we know is not optimal?

10:21 <bbrezillon> is it really suboptimal?

10:21 <tomeu> well, we have the added splits, right?

10:22 <bbrezillon> sure, it adds a GPU -> CPU -> GPU round trid

10:22 <bbrezillon> *trip

10:22 ids1024 has quit [Ping timeout: 245 seconds]

10:23 <tomeu> what you suggest is probably what the ddk does, though

10:23 <bbrezillon> I'll think about it and see if we can do it without kernel changes

10:23 <bbrezillon> unsignaling syncobjs at submit time might work

10:24 <bbrezillon> because we only care about intra-queue synchronisation

10:24 <bbrezillon> and submission is serialized at the queue level

10:25 <tomeu> ah, true

10:26 ids1024 has joined #panfrost

10:27 <bbrezillon> so, for reset, the only thing you'd need to do is transfer an unsignaled fence to the syncobj, and for set, transfer the queue syncobj

10:27 <bbrezillon> after queueing the batch that has events attached to it

11:27 <bbrezillon> tomeu: BTW, for the reset operation, you can use DRM_IOCTL_SYNCOBJ_RESET

11:28 <tomeu> ah, cool, thanks

11:31 <bbrezillon> oh, and I just found out that msm has a flag for this 'reset syncobj at the end of the job' case => MSM_SUBMIT_SYNCOBJ_RESET,

11:32 <bbrezillon> but it seems to apply to in_fences, not out_fences

12:05 newton688 has joined #panfrost

12:09 camus has joined #panfrost

12:11 kaspter has quit [Ping timeout: 246 seconds]

12:11 camus is now known as kaspter

13:15 zkrx has quit [Ping timeout: 252 seconds]

13:21 chewitt has joined #panfrost

13:30 karolherbst has quit [Remote host closed the connection]

13:31 karolherbst has joined #panfrost

13:33 zkrx has joined #panfrost

13:33 catfella has quit [Remote host closed the connection]

13:33 catfella has joined #panfrost

13:37 karolherbst has quit [Quit: duh 🐧]

13:38 karolherbst has joined #panfrost

13:50 kaspter has quit [Ping timeout: 240 seconds]

13:50 kaspter has joined #panfrost

14:04 kaspter has quit [Quit: kaspter]

14:52 newton688 has quit [Quit: Connection closed]

16:01 guillaume_g has quit [Ping timeout: 260 seconds]

19:21 zkrx has quit [Ping timeout: 265 seconds]

19:29 davidlt has quit [Ping timeout: 252 seconds]

19:36 adjtm has joined #panfrost

19:39 adjtm_ has quit [Ping timeout: 260 seconds]

19:40 zkrx has joined #panfrost

20:25 leah has quit [Quit: WeeChat 2.8]

22:31 warpme_ has quit [Quit: Connection closed for inactivity]

23:42 raster has quit [Quit: Gettin' stinky!]

23:45 sigmaris has quit [Ping timeout: 258 seconds]