<alyssa>
italove: looking good on the disasm, I think if you cleanup the series (squashing so everything is bisectable, I mean -- no need to write expansive commit messages or anything) we're pretty close to landable :)
<alyssa>
though now that iris/fd do it shouldn't be so bad
<bbrezillon>
stepri01: may I ask you a few questions about the NO_IMPLICIT/NO_FENCE case?
<stepri01>
bbrezillon: sure
<bbrezillon>
I guess the idea is to limit the number of fences to wait on to a single fence, so the scheduler gets to schedule our job as soon as this fence is signaled
<bbrezillon>
is that correct?
<stepri01>
there's no need to have to limit the number of fences as such. It's more about letting user space control the fences so you don't have unnecessary fences
<stepri01>
Ultimiately user space (usually) has a good idea what the actual dependencies are between jobs, so letting it encode that rather than trying to deduce it from the implicit fences can be beneficial
<stepri01>
e.g. you don't need to have implicit fences on things user space knows are effectively immutable - so we can save time by not processing those fences
<stepri01>
equally there are complex situations such as sub-buffer accesses which user space can optimise by fencing appropriately, whereas the kernel doesn't know how they might or might not conflict
<stepri01>
the blob/kbase mostly use the expicit fencing approach for the above reasons, only using implicit fencing when necessary because it's an imported buffer
<bbrezillon>
ok, the sub-buffer case I had it, but the front-buffer update one you mentioned in your reply I don't see what it is
<stepri01>
so in the normal double (or more) buffering case it makes sense the for display driver to hold a (shared) lock on the buffer that's being scanned out. That allows you to schedule a buffer swap and immediately send the kernel GPU work which would render to what was (and for a while still will be) the front buffer
<stepri01>
The GPU work will block until the display driver releases the lock when flipping to a back buffer, unblocking the GPU and allowing the rendering to happen straight away
<stepri01>
Clearly this falls down if for whatever reason you then want to actually render to the displayed buffer. Either you need to have a way of reconfiguring the display driver not to hold the lock (i.e. fence) or you need to convince the GPU driver to ignore the fence
<stepri01>
Usually you can get away with using a shared access on the GPU (even though you are actually writing), but I seem to remember there are corner cases even with that
<bbrezillon>
ok, but how does NO_IMPLICIT simplify/optimizes this case. I mean, I'd expect it to work similarly with the implicit fences: the GPU job will be blocked until the display controller signals the front-buffer fence
<bbrezillon>
the only different being the number of fences to test
<bbrezillon>
*difference
<stepri01>
I think there's two things. First there is overhead juggling the unnecessary fences in the kernel - whether that's measurable I don't know.
<stepri01>
Secondly you need to be able to use shared fences (a problem with the current Panfrost kernel) and you need to ensure that any other drivers you are working with also support shared fences
davidlt_ is now known as davidlt
<bbrezillon>
stepri01: ok, after looking at the KMS API more closely I get why GPU drivers take a sync_file FD and not a syncobj: that's what atomic plane updates return (passing a syncobj would require importing the sync_file first)
Elpaulo has quit [Quit: Elpaulo]
<bbrezillon>
and the NO_IMPLICIT mechanism makes more sense now. Thanks for the detailed explanation
<raster>
and i had such good uptimne.. :| about 2h! :)
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
<daniels>
stepri01: KMS synchronises against exclusive fences before making the framebuffer current, but that's the last involvement it has; as soon as it's synchronised against all fences placed before the commit was made, it doesn't do anything else related to fencing, including holding a shared reservation
<stepri01>
daniels: To be fair I'm more familiar with how Android (used) to do these things, I'm not so familiar with KMS. There are also cases like a video encoder reading the buffer for use cases like casting to a remote screen.
<daniels>
yep, the video encoder will take a shared fence, but I'd argue that the number of people doing active frontbuffer rendering (X11, XR) and simultaneous streaming from that frontbuffer are ... like none?
<stepri01>
I think it can be done on Android (cast while using the phone as an VR headset), but it was a while ago when I was involved in such discussions. Like you say it's pretty rare
<daniels>
racing the encoder against scanout seems pretty brave, but what do I know :P
<stepri01>
Yeah - I can't remember the details these days. The main thing is that you need to have a design that can at least support both independently. And it's much better if the GPU doesn't need to change too much based on exact use case
<macc24_>
HdkR: how would you test if threaded context actually works?