alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
vstehle has quit [Ping timeout: 252 seconds]
archetech has quit [Quit: Konversation terminated!]
stikonas has quit [Ping timeout: 240 seconds]
atler is now known as Guest21362
atler has joined #panfrost
Guest21362 has quit [Ping timeout: 246 seconds]
jernej has quit [Ping timeout: 276 seconds]
jernej has joined #panfrost
atler has quit [*.net *.split]
endrift has quit [*.net *.split]
bbrezillon has quit [*.net *.split]
paulk-leonov has quit [*.net *.split]
tlwoerner has quit [*.net *.split]
tomeu has quit [*.net *.split]
Prf_Jakob has quit [*.net *.split]
remexre has quit [*.net *.split]
atler has joined #panfrost
endrift has joined #panfrost
paulk-leonov has joined #panfrost
bbrezillon has joined #panfrost
Prf_Jakob has joined #panfrost
tlwoerner has joined #panfrost
remexre has joined #panfrost
tomeu has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
kaspter has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
davidlt has joined #panfrost
vstehle has joined #panfrost
camus has joined #panfrost
kaspter has quit [Ping timeout: 268 seconds]
camus is now known as kaspter
kaspter has quit [Remote host closed the connection]
kaspter has joined #panfrost
guillaume_g has joined #panfrost
nlhowell has joined #panfrost
<bbrezillon> I realized DRM_IOCTL_SYNCOBJ_TRANSFER is only supported if the driver has the SYNCOBJ_TIMELINE cap set, which is not yet the case on panfrost
<bbrezillon> that explains why the app was stuck waiting for a fence that was never signalled (reproduced locally after I've compiled a kernel that doesn't have https://lore.kernel.org/dri-devel/20210311092539.2405596-1-boris.brezillon@collabora.com applied)
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
camus has joined #panfrost
kaspter has quit [Ping timeout: 246 seconds]
camus is now known as kaspter
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
nlhowell has quit [Ping timeout: 240 seconds]
raster has joined #panfrost
<bbrezillon> ok, so that works for intra cmdbuf signalization, but IIUC, vkEvent is also handling CPU <-> GPU sync, and inter queue/cmdbuf sync
<tomeu> tbh, I thought this was such a huge hammer that it covers all possible sync needs
<bbrezillon> I think we need to embed a syncobj in our vkevent, and then transfer the batch out_fence to those syncobjs
<bbrezillon> tomeu: no, what you do really doest wait for the batch to complete
<bbrezillon> unless the have the trace or sync debug options
<bbrezillon> *unless you have
<tomeu> but, that wait is the same we do when the trace or sync debug options are enabled
<bbrezillon> ok, maybe I missed something
<bbrezillon> I'm a bit lost, you wait for events before queueing the batch?
<tomeu> before queueing the second half of the batch we splitted
<tomeu> but of course, after submitting the first half
<bbrezillon> oh, I get it now
<bbrezillon> still not a huge fan of this wait in the submit path
<bbrezillon> I feel like the syncobj solution wouldn't be much more complicated
<bbrezillon> and would avoid this wait
<tomeu> yeah, tbh, I would prefer to get conformant first
<tomeu> or at least conformant enough so we can run deqp in CI
<tomeu> then it will be much easier to improve stuff without regressing
<tomeu> also, I'm not really sure how much better waiting in the kernel would be if we had a separate submission tread
<tomeu> but it's something I would very gladly leave for the future :)
<tomeu> this passes all the dEQP-VK.synchronization.basic.* tests and allows me to move onto the dEQP-VK.api.command_buffers.* ones*
<bbrezillon> yes, but again, I'm not sure what you're doing here is correct
<bbrezillon> I still don't see how GPU waits can work
<tomeu> so, when we insert a wait cmd, any subsequent commands will be submitted after all the previous ones have completed
<tomeu> inefficient, but seems correct to me?
<bbrezillon> that's only intra-queue sync
<bbrezillon> AFAICT
<tomeu> bbrezillon: vkQueue you mean? or hw queue?
<bbrezillon> vkQueue
<tomeu> how does that work, multiple app threads could have each its own vkQueue and call the submit ioctl concurrently?
<tomeu> ah, looks that that case isn't supported by the spec
<tomeu> "Events must not be used to insert a dependency between commands submitted to different queues."
<tomeu> and earlier:
<tomeu> "Events are a synchronization primitive that can be used to insert a fine-grained dependency between commands submitted to the same queue, or between the host and a queue. "
<bbrezillon> ok
<amonakov> (inter-thread synch is done via semaphores)
<amonakov> erm, inter-queue I mean
<bbrezillon> that leaves the "GPU waits on CPU signaling an event" case, which AFAICT is not handled
<bbrezillon> or did I miss it
<bbrezillon> tomeu: and I'd rather think that through than introducing an implementation that adds waits in the submit path
<tomeu> yeah, we miss that, v3dv handles those waits in a separate thread
<bbrezillon> yeah, but if you do that you clearly can't wait in the submit path
<bbrezillon> you'd need to have a defered queue for batches that have such deps
<tomeu> yep
<bbrezillon> which would make the implementation a lot messier than if we were using syncobjs and extending the SUBMIT_IOCTL to allow unsignaling a syncobj (or any other solution we find to handle the reset case)
<tomeu> hmm, wonder if we need to unsignal a syncobj, if we couldn't just replace an event's syncobj with a new one
<bbrezillon> you can definitely transfer an unsignaled fence to a syncobj
<tomeu> ok, that would work
<tomeu> I'm wondering now if we shouldn't then use write_value jobs for the set and reset commands
<tomeu> then add a ioctl for the kernel to poll a memory region and signal when it changes values
<bbrezillon> ok, so a fence for the sync, and a GPU buf for the value?
<tomeu> yep
<tomeu> we would need to get the caching right I guess
<tomeu> as it would be writable by both gpu and cpu
<bbrezillon> if it's in a separate batch it shouldn't be an issue
<bbrezillon> (at least not yet :))
<tomeu> then we would be only splitting batches on waits
<tomeu> ok, let me think about all this over lunch
<bbrezillon> GPU buffers are mapped NC on the CPU side, that leaves GPU caches, which are flushed at job boundaries
<tomeu> guess if we find a relatively simple solution that is optimal and we can be sure it's the way forward, it's better than a more complex one that doesn't require kernel changes
<bbrezillon> but I wouldn't through away the syncobj-only solution so quickly
<bbrezillon> I mean, if there's a way to transfer an unsignaled fence to the vkevent syncobj, it should work without requiring kernel changes
<bbrezillon> just not sure when this transfer should happen
<bbrezillon> and I feel it would have to be done kernel side
<tomeu> ah, I thought we needed to do it in the kernel so we don't have to block as we do now
<bbrezillon> most likely yes
<tomeu> then maybe the write_value option is better because we avoid splits?
<bbrezillon> I mean, unsignaling the syncobj should happen when the job is done executing, not earlier
<tomeu> the mali supports only two queued jobs, so small chains are worst for latency than other gpus
<bbrezillon> we avoid split at the expense of active waits
<bbrezillon> and it's still not clear to me how you'd implement GPU waits without splitting the batch
<tomeu> with write_value and a poll ioctl? I think we don't have any additional active waits that way
<bbrezillon> the only thing you could avoid is splits on Set/Reset
<tomeu> for the waits we need to split
<tomeu> because the HW doesn't support it
<tomeu> the question is now whether the wait should happen in the kernel or in userspace
<tomeu> yeah, I think we are in agreement on that
<bbrezillon> if GPU has to wait, I'd rather do that in kernel space, with an explicit dep (in_sync)
<tomeu> yeah, I also like that
<tomeu> in the mem-backed solution, it would be waiting on the syncobj returned by the new poll ioctl
<bbrezillon> I feel like the "split batches on events" option, with a way to unsignal syncobjs would be the simplest solution, but I guess we can try the write-value approach too
<tomeu> by unsignaling syncobjs, that would be transferring a new syncobj to the event's?
<bbrezillon> no, same syncobj, new unsignaled fence attached to it
<tomeu> ok, so it's all userspace then
<bbrezillon> not really, because it has to be done when the batch is done
<bbrezillon> otherwise you screw up the sequencing (or maybe not, I'm not sure)
<tomeu> but, do we really want to change the kernel for something that we know is not optimal?
<bbrezillon> is it really suboptimal?
<tomeu> well, we have the added splits, right?
<bbrezillon> sure, it adds a GPU -> CPU -> GPU round trid
<bbrezillon> *trip
ids1024 has quit [Ping timeout: 245 seconds]
<tomeu> what you suggest is probably what the ddk does, though
<bbrezillon> I'll think about it and see if we can do it without kernel changes
<bbrezillon> unsignaling syncobjs at submit time might work
<bbrezillon> because we only care about intra-queue synchronisation
<bbrezillon> and submission is serialized at the queue level
<tomeu> ah, true
ids1024 has joined #panfrost
<bbrezillon> so, for reset, the only thing you'd need to do is transfer an unsignaled fence to the syncobj, and for set, transfer the queue syncobj
<bbrezillon> after queueing the batch that has events attached to it
<bbrezillon> tomeu: BTW, for the reset operation, you can use DRM_IOCTL_SYNCOBJ_RESET
<tomeu> ah, cool, thanks
<bbrezillon> oh, and I just found out that msm has a flag for this 'reset syncobj at the end of the job' case => MSM_SUBMIT_SYNCOBJ_RESET,
<bbrezillon> but it seems to apply to in_fences, not out_fences
newton688 has joined #panfrost
camus has joined #panfrost
kaspter has quit [Ping timeout: 246 seconds]
camus is now known as kaspter
zkrx has quit [Ping timeout: 252 seconds]
chewitt has joined #panfrost
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #panfrost
zkrx has joined #panfrost
catfella has quit [Remote host closed the connection]
catfella has joined #panfrost
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
kaspter has quit [Quit: kaspter]
newton688 has quit [Quit: Connection closed]
guillaume_g has quit [Ping timeout: 260 seconds]
zkrx has quit [Ping timeout: 265 seconds]
davidlt has quit [Ping timeout: 252 seconds]
adjtm has joined #panfrost
adjtm_ has quit [Ping timeout: 260 seconds]
zkrx has joined #panfrost
leah has quit [Quit: WeeChat 2.8]
warpme_ has quit [Quit: Connection closed for inactivity]
raster has quit [Quit: Gettin' stinky!]
sigmaris has quit [Ping timeout: 258 seconds]