#videocore on 2021-03-02 — irc logs at freenode.irclog.whitequark.org

2021-02-26 10:57 nroberts changed the topic of #videocore to: Raspberry Pi Mesa drivers discussion - Logs http://freenode.irclog.whitequark.org/videocore

00:02 jcea has quit [Ping timeout: 240 seconds]

02:06 jcea has joined #videocore

02:36 jcea has quit [Ping timeout: 264 seconds]

03:07 gpoo_ has quit [Ping timeout: 245 seconds]

05:36 egilhh_ has quit [Quit: leaving]

06:37 itoral has joined #videocore

07:51 <itoral> apinheiro: I have been thinking about the ldvary optimization and I think I might be able to take it a bit further and make it a bit more aggressive now that I added that patch that also pipelines flat and noperspective varyings, I'll give it a go.

07:54 <itoral> and I also have an idea to potentially improve the ldunifa optimizations slightly that may come in handy if we start nir_opt_load_store_vectorize like Eric suggested

07:54 <itoral> so I'll be working on those two things next

08:46 <abergmeier> So just so I maybe understand a bit more context - does v3dv use gbm or should it use gbm? And why?

09:07 <itoral> not sure what you're asking exactly. Typically, when people refer to gbm, they refer to the scenario in which you don't have a display server, which in Vulkan maps to VK_KHR_display

09:07 <itoral> in any case, for anything that has to do with presentation/display, we rely on the existing Vulkan WSI implementation in Mesa

09:08 <itoral> so v3dv itself is not expected to have to use GBM directly at all in any case

09:14 gpoo_ has joined #videocore

09:56 <apinheiro[m]1> <itoral "so I'll be working on those two "> ok

10:00 <apinheiro[m]1> I will keep figthing with the one-step pipeline cache. As I solve regressiones new appear

10:00 <apinheiro[m]1> btw, now that Im talking about this

10:00 <apinheiro[m]1> with this work, I found that this test:

10:00 <apinheiro[m]1> dEQP-VK.api.object_management.max_concurrent.graphics_pipeline

10:00 <apinheiro[m]1> fails if we disable the cache

10:01 <apinheiro[m]1> that also happens with mesa master (so it is not related with what Im doing)

10:01 <apinheiro[m]1> basically that test creates 16k pipelines. without the cache it raises an oom

10:02 <apinheiro[m]1> but taking into account that having a default cache is the , well, the default

10:02 <apinheiro[m]1> I didn't plan to look too much into it

10:02 <apinheiro[m]1> but you can disagree now ;)

10:07 apinheiro has joined #videocore

10:08 apinheiro has quit [Client Quit]

10:14 <itoral> apinheiro[m]1: if the problem is an OOM, I think it is not too relevant, that is, it is not that the driver is broken, it is just that it requires too much memory

10:15 <itoral> it would be interesting to understand why we run out of memory with 16k pipelines though, and see if we can optimize our setup a little

10:15 <itoral> but I'd not call that priority work

10:16 <apinheiro[m]1> itoral: well, yes, but the test is expected to pass, and digging on old CTS issues, it is expected that all drivers would be able to allocate that numbers of pipelines. About options, the only would be to check the pipeline struct, and check if we can reduce somehow the size

10:16 <apinheiro[m]1> <itoral "it would be interesting to under"> yes, this

10:16 <apinheiro[m]1> but ok, when I finish the current task I will take a look

10:16 <itoral> ok

10:16 <apinheiro[m]1> but will not use too much time if I don't get it working without the default cache

10:17 <itoral> I presume the underlying issue might be less relates with the actual amount of memory used, and maybe more with the number of BOs that we end up allocating for this

10:17 <itoral> since we would be allocating at least 3 BOs per pipeline for vs, vsbin and fs

10:17 <itoral> and I imagine there is a limit to the number of BOs we can allocate

10:18 <itoral> I think I had arrived to that conclusion once some time ago when figuring out an OOM with the pipelinecache IIRC

10:19 <itoral> I vaguely remember that we have a hard limit at 64K BOs, and 16K pipelines would put us at 48K of BOs for shaders alone

10:19 <apinheiro[m]1> itoral: yes. other option I was thinking, is that even if writing a general "resource BO allocator", would be too complex

10:19 <apinheiro[m]1> one option would be to use the same bo for the vs and vs_bin

10:20 <apinheiro[m]1> shader source

10:20 <itoral> yep, that would probably be a good idea

10:20 <apinheiro[m]1> it would not be a total change, but I guess that it would be also straighforward to do (famous last words)

10:21 <itoral> yeah, that was my impression as well

10:21 <itoral> it would trim 33% of the BO requirements though, so it would still be a significant improvement

10:21 <itoral> that not only means more successful BO allocations, it also means more efficient BO memory usage

10:22 <itoral> and less cache lookups too

10:22 <itoral> (probably)

10:22 <itoral> and it would also mean shorter BO lists for the kernel submissions

10:23 <itoral> which is also a good thing

10:23 <apinheiro[m]1> itoral: fwiw, we could be even more agressive

10:24 <apinheiro[m]1> for the pipeline graphics we are already creating the hash using both the vs and fs

10:24 <itoral> oh, are we?

10:24 <itoral> is that a good idea?

10:24 <apinheiro[m]1> it is needed ;)

10:24 <itoral> well, you mean for the pipeline cache, right

10:24 <apinheiro[m]1> becauase the final variant

10:24 <apinheiro[m]1> is a compiled shader

10:25 <apinheiro[m]1> and it depends of the outcome of linking the shaders

10:25 <apinheiro[m]1> I remember a CTS test

10:25 <apinheiro[m]1> that reused the same vs spirv, so the same original nir

10:25 <itoral> but we also cache the individual shaders, no? At least I remember that we did that for the NIR

10:25 <apinheiro[m]1> but the final vs depended on the linking

10:26 <apinheiro[m]1> <itoral "but we also cache the individual"> yes, and this "reuse" thing is the reason we cache them before linking

10:26 <apinheiro[m]1> but my point is that the shader source BO are using nir shaders that are linked together

10:26 <apinheiro[m]1> so I think that in the general case, using the same BO for the vs, vs_bin and fs would be possible

10:28 jcea has joined #videocore

10:28 <itoral> I guess we would only know for sure if that is benefitial once we tried

10:29 <itoral> I think it would be a worthwhile experiment

10:29 <apinheiro[m]1> yes

10:29 <apinheiro[m]1> as mentioned, will experiment a little after finishing the pipeline cache work

13:29 itoral has quit [Quit: Leaving]

14:00 egilhh has joined #videocore

20:51 txenoo has quit [Quit: Leaving]

22:45 gpoo_ has quit [Ping timeout: 245 seconds]

22:46 gpoo_ has joined #videocore

22:55 gpoo_ has quit [Quit: Leaving]