<itoral>
apinheiro: I have been thinking about the ldvary optimization and I think I might be able to take it a bit further and make it a bit more aggressive now that I added that patch that also pipelines flat and noperspective varyings, I'll give it a go.
<itoral>
and I also have an idea to potentially improve the ldunifa optimizations slightly that may come in handy if we start nir_opt_load_store_vectorize like Eric suggested
<itoral>
so I'll be working on those two things next
<abergmeier>
So just so I maybe understand a bit more context - does v3dv use gbm or should it use gbm? And why?
<itoral>
not sure what you're asking exactly. Typically, when people refer to gbm, they refer to the scenario in which you don't have a display server, which in Vulkan maps to VK_KHR_display
<itoral>
in any case, for anything that has to do with presentation/display, we rely on the existing Vulkan WSI implementation in Mesa
<itoral>
so v3dv itself is not expected to have to use GBM directly at all in any case
gpoo_ has joined #videocore
<apinheiro[m]1>
<itoral "so I'll be working on those two "> ok
<apinheiro[m]1>
I will keep figthing with the one-step pipeline cache. As I solve regressiones new appear
<apinheiro[m]1>
btw, now that Im talking about this
<apinheiro[m]1>
with this work, I found that this test:
<apinheiro[m]1>
that also happens with mesa master (so it is not related with what Im doing)
<apinheiro[m]1>
basically that test creates 16k pipelines. without the cache it raises an oom
<apinheiro[m]1>
but taking into account that having a default cache is the , well, the default
<apinheiro[m]1>
I didn't plan to look too much into it
<apinheiro[m]1>
but you can disagree now ;)
apinheiro has joined #videocore
apinheiro has quit [Client Quit]
<itoral>
apinheiro[m]1: if the problem is an OOM, I think it is not too relevant, that is, it is not that the driver is broken, it is just that it requires too much memory
<itoral>
it would be interesting to understand why we run out of memory with 16k pipelines though, and see if we can optimize our setup a little
<itoral>
but I'd not call that priority work
<apinheiro[m]1>
itoral: well, yes, but the test is expected to pass, and digging on old CTS issues, it is expected that all drivers would be able to allocate that numbers of pipelines. About options, the only would be to check the pipeline struct, and check if we can reduce somehow the size
<apinheiro[m]1>
<itoral "it would be interesting to under"> yes, this
<apinheiro[m]1>
but ok, when I finish the current task I will take a look
<itoral>
ok
<apinheiro[m]1>
but will not use too much time if I don't get it working without the default cache
<itoral>
I presume the underlying issue might be less relates with the actual amount of memory used, and maybe more with the number of BOs that we end up allocating for this
<itoral>
since we would be allocating at least 3 BOs per pipeline for vs, vsbin and fs
<itoral>
and I imagine there is a limit to the number of BOs we can allocate
<itoral>
I think I had arrived to that conclusion once some time ago when figuring out an OOM with the pipelinecache IIRC
<itoral>
I vaguely remember that we have a hard limit at 64K BOs, and 16K pipelines would put us at 48K of BOs for shaders alone
<apinheiro[m]1>
itoral: yes. other option I was thinking, is that even if writing a general "resource BO allocator", would be too complex
<apinheiro[m]1>
one option would be to use the same bo for the vs and vs_bin
<apinheiro[m]1>
shader source
<itoral>
yep, that would probably be a good idea
<apinheiro[m]1>
it would not be a total change, but I guess that it would be also straighforward to do (famous last words)
<itoral>
yeah, that was my impression as well
<itoral>
it would trim 33% of the BO requirements though, so it would still be a significant improvement
<itoral>
that not only means more successful BO allocations, it also means more efficient BO memory usage
<itoral>
and less cache lookups too
<itoral>
(probably)
<itoral>
and it would also mean shorter BO lists for the kernel submissions
<itoral>
which is also a good thing
<apinheiro[m]1>
itoral: fwiw, we could be even more agressive
<apinheiro[m]1>
for the pipeline graphics we are already creating the hash using both the vs and fs
<itoral>
oh, are we?
<itoral>
is that a good idea?
<apinheiro[m]1>
it is needed ;)
<itoral>
well, you mean for the pipeline cache, right
<apinheiro[m]1>
becauase the final variant
<apinheiro[m]1>
is a compiled shader
<apinheiro[m]1>
and it depends of the outcome of linking the shaders
<apinheiro[m]1>
I remember a CTS test
<apinheiro[m]1>
that reused the same vs spirv, so the same original nir
<itoral>
but we also cache the individual shaders, no? At least I remember that we did that for the NIR
<apinheiro[m]1>
but the final vs depended on the linking
<apinheiro[m]1>
<itoral "but we also cache the individual"> yes, and this "reuse" thing is the reason we cache them before linking
<apinheiro[m]1>
but my point is that the shader source BO are using nir shaders that are linked together
<apinheiro[m]1>
so I think that in the general case, using the same BO for the vs, vs_bin and fs would be possible
jcea has joined #videocore
<itoral>
I guess we would only know for sure if that is benefitial once we tried
<itoral>
I think it would be a worthwhile experiment
<apinheiro[m]1>
yes
<apinheiro[m]1>
as mentioned, will experiment a little after finishing the pipeline cache work