alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
raster has quit [Quit: Gettin' stinky!]
stikonas has quit [Remote host closed the connection]
kaspter has joined #panfrost
tgall_fo_ is now known as tgall-foo
tgall-foo is now known as tgall_foo
kaspter has quit [Ping timeout: 246 seconds]
kaspter has joined #panfrost
icecream95 has joined #panfrost
davidlt has quit [Ping timeout: 264 seconds]
guillaume_g has joined #panfrost
raster has joined #panfrost
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
karolherbst has quit [Client Quit]
karolherbst has joined #panfrost
Elpaulo has quit [Read error: Connection reset by peer]
Elpaulo has joined #panfrost
kaspter has quit [Ping timeout: 265 seconds]
kaspter has joined #panfrost
andrey-konovalov has joined #panfrost
<warpme_> guys - just quick Q regarding current mesa master on g31: since some time it is non-functional. xorg says: (EE) modeset(0): Failed to initialize glamor at ScreenInit() time. Is this because WiP or rater regression?
<tomeu> warpme_: I think it should work with PAN_MESA_DEBUG=bifrost?
stikonas has joined #panfrost
ezequielg has quit [Ping timeout: 260 seconds]
ezequielg has joined #panfrost
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
davidlt has joined #panfrost
davidlt has quit [Read error: Connection reset by peer]
davidlt has joined #panfrost
<warpme_> tomeu: of course i have in env PAN_MESA_DEBUG=bifrost non-working bifrost is since few weeks. tests I do is simply recompile different versions of mesa sources. tested today with current master....
<tomeu> ah, ok, maybe you could bisect it?
<tomeu> hope we'll have bifrost in CI soon
<warpme_> tomeu: i can play with this but probably in next days as now trying to nail 5.8 kernel regression causing non-booting amlogic sm1 :-(
<macc24> warpme_: does glxgears work on g31?
<warpme_> nope. I can get working Xorg. glamour can't initialise....
<warpme_> nope. I can't get working Xorg. glamour can't initialise....
<macc24> hmm
<macc24> i have no idea
<warpme_> issue started something 3 weeks ago...
<macc24> have you tried with wayland compositor?
<warpme_> no as I don't use wayland (yet)
yann has joined #panfrost
Lyude has quit [Ping timeout: 246 seconds]
ente has quit [Remote host closed the connection]
nlhowell has quit [Ping timeout: 240 seconds]
ente has joined #panfrost
raster has quit [Remote host closed the connection]
icecream95 has quit [Quit: leaving]
<alyssa> So I spent 3am on Saturday thinking about cache maintenance
<alyssa> Maybe my subconscious is telling me something about Panfrost performance...
<alyssa> ("Did you have any eureka moments?" "Uhhhh... it was 3am.")
Lyude has joined #panfrost
robmur01_ is now known as robmur01
<alyssa> Basically trying to figure out how we manage to have incredible CPU overhead from memory management, and still OOM on the regular ;-;
indy has quit [Read error: Connection reset by peer]
<robmur01> we shouldn't be doing much cache maintenance CPU-side, since we remap everything as non-cacheable anyway
<alyssa> robmur01: buffer object cache, I mean
<alyssa> We've evolved some... interesting mechanisms here
<robmur01> well that's just unfairly misleading :P
<alyssa> First we created BOs on demand and free'd, and everything was simple, and super slow since BO creation is so slow.
<alyssa> So we added a BO cache to keep freed objects around in userspace.
<alyssa> Except then that ate exorbitant amounts of memory and led to frequent OOMs.
<alyssa> So we added madvise() to the kernel.
<alyssa> So now instead of a quick OOM freeze, the machine just locks up and gets insanely slow as there's a back-and-forth between userspace repopulating the cache and the kernel freeing it.
<alyssa> Meanwhile this means for each per-frame BO, we have 5 ioctls
<alyssa> wait_bo, madvise, mmap, munmap, madvise
indy has joined #panfrost
<alyssa> So allocating a BO is slow again, which matters when we need to allocate memory every frame for e.g job structures
<alyssa> (Or possibly worse - main memory backing varyings, which is GPU R/W but CPU invisible but still needs us to manage.)
<alyssa> IIRC kbase played some "interesting" games with the cache to optimize that. Minimally we should skip the mmap/munmap there.
<robmur01> is part of the problem that BOs are not necessarily consistent, such that you can have loads sat in the cache, none of which are quite the right size for the thing you need right now?
<warpme_> alyssa: yeah. such eureka moments usually are subconscious thing. I had them many times. usually: i have heavy problem to solve. days/weeks without solution. so I quit for some time. and usually in very unexpected moment I have just flash thought. bingo. it solves! at brain level it well explained: even when i quit thinking on subject, subconscious is not suspended and still "works" on problem.
<warpme_> alyssa: yeah. such eureka moments usually are subconscious thing. I had them many times. usually: i have heavy problem to solve. days/weeks without solution. so I quit for some time. and usually in very unexpected moment I have just flash thought. bingo. it solves! at brain level it is well explained: even when i quit thinking on subject, subconscious is not suspended and still "works" on problem.
<alyssa> robmur01: Possibly? It's just a lot of memory in either case. All the job structures are small but add up quickly with thousands of draws per second.
<alyssa> Varying memory is large since that's proportional to vertex count (*after* instancing).
<alyssa> warpme_: Indeed.
raster has joined #panfrost
<robmur01> I wonder if it's worth being more selective about what gets cached, i.e. favour fixed-size descriptors that will definitely be reused quickly, but maybe don't bother hanging on to flexible-sized things
<alyssa> robmur01: oh and those heinous CoW faults -.-
<alyssa> endrift: "mmapped but not read in from disk" not sure how that fits in a gfx stack, not sure I want to know
<alyssa> robmur01: Perhaps.
<alyssa> following the sysprof further, it's routing to drm_gem_shmem_fault/vm_insert_page
<alyssa> Oh, does that mean it's mmaped() but not actually mapped yet?
<robmur01> probably - that's likely to be "read in from disk" part for things that aren't 'real' files
<robmur01> see MAP_POPULATE (I think)
<alyssa> robmur01: gets back to my point that munmapping/madvising things in cache has higher CPU overhead than we thought
<alyssa> and doesn't actually solve OOMs in practice, only slows them down
<alyssa> better question perhaps is why we're prone to OOMs in the first place, 4GB of memory should be plenty
<robmur01> are we talking kernel task-killing OOMs, or just failure to allocate new buffers?
<robmur01> the latter is more likely CMA exhaustion than 'real' OOM
<alyssa> all of the above
<alyssa> system becoming super slow as madvise reclaims memory and userspace fights for it back, userspace winning out and freezing the system requiring a hard reboot
<alyssa> I guess what I don't understand is where all this memory usage is coming from. Are we leaking terribly?
<alyssa> That's probably the first question.
<warpme_> dears - regarding non-working current mesa master on g31 bisecting:
kaspter has quit [Quit: kaspter]
<alyssa> warpme_: Ah :|
<alyssa> chrisf: ^ saw something like that IIRC
<warpme_> alyssa: :-p
<alyssa> warpme_: hm?
<robmur01> I guess the upshot of 96fa8d70 is that we end up exposing fewer capabilities for Bifrost regardless of the debug flags
<robmur01> so I suppose either the unsupported features were "working" in the sense of not blowing up horribly, or the client doesn't actually use them but still throws a tantrum about them not being present.
kaspter has joined #panfrost
<alyssa> ^^ that
<bbrezillon> alyssa: regarding your OOM issue, did you try disabling the BO cache?
<bbrezillon> just to see if the memory consumption keeps growing without it
<alyssa> bbrezillon: it's not one OOM issue, it's just something that happens a lot while dogfooding panfrost
<alyssa> no easy repro other than "use the machine for 8 hours"
<alyssa> or in the case of the 3gb veyron, "open GNOME, Zoom, Chromium, and Firefox at the same time" :p
<urjaman> 3GB?
<alyssa> i thought so?
<urjaman> there are 2GB and 4GB models
<alyssa> somewhere in bwteen then ;p
<bbrezillon> alyssa: yes, but I'm wondering if disabling the cache wouldn't help reproducing the problem more quickly
<alyssa> urjaman: /proc/meminfo says 2GB
<alyssa> so that explains that issue :p
kaspter has quit [Quit: kaspter]
<urjaman> "MemTotal: 4032536 kB" for the 4GB model (i mean yeah it isnt quite the full 4GB which isnt surprising on a 32bit SoC, needs to have the registers somewhere)
<alyssa> nods
<alyssa> The allocation strategy for varyings is really not clear to me either.
<alyssa> Maybe having a separate invisible pool would do it.
<tomeu> alyssa: I thought that the slow part of BO creation was the mmap
<alyssa> tomeu: it's both mmap and create
<alyssa> bo cache only helps with the latter, and adds overhead in the form of madvise and wait
<tomeu> hrm, what could be slow in create besides the mmap?
<alyssa> er, by mmap I mean CPU-side mmap()
<alyssa> I guess by create I mean GPU-side mmap, but that's not mmap() from userspace perspective
nlhowell has joined #panfrost
<tomeu> ah yeah
<tomeu> wonder why the GPU-side mmap needs to be so slow
<tomeu> robmur01: do you know?
<alyssa> regardless it is, hence the BO cache
<tomeu> well, but what if we could make it fast enough to not need a BO cache? :)
<alyssa> we'd still have old kernels to worry about ;)
<robmur01> hmm, mapping stuff into the GPU pagetables shouldn't be slow, there's very little to it
<alyssa> anyways, if I split to two pools, one CPU-accessible, one GPU-only, that helps a bit
<alyssa> since that saves the mmap/munmap on the GPU-only side.
<alyssa> (which varyings are)
<tomeu> ah, cool
<alyssa> anyways, if the BO cache is disabled, we spend a huge amount of time in panfrost_mmu_map
<alyssa> a big chunk of which is in shmem_getpage_gfp
marcodiego has joined #panfrost
<alyssa> drm_gem_get_pages is just expensive ig
<tomeu> maybe we need a new data structure for the pages
<robmur01> shmem_getpage_gfp()... there's rather a lot of that function. I hope we're not making a separate call for each individual 4KB page :(
<alyssa> Interesting bo_cache_fetch itself has some weirdly high overhead
<alyssa> perf blames a load instruction in the linked list iteration
<alyssa> guess we're thrashing the cache
<alyssa> Might be worth trying out util_sparse_array-free_list instead
<alyssa> (I know I went thru this exercise a few months ago..)
raster has quit [Quit: Gettin' stinky!]
nlhowell has quit [Ping timeout: 264 seconds]
<nhp[m]> So, not a strictly panfrost related question, but is it possible to have mesa compile a shader for you ahead of time for an arbitrary target?
davidlt has quit [Ping timeout: 256 seconds]
raster has joined #panfrost
ente has quit [Remote host closed the connection]
ente has joined #panfrost
<tomeu> nhp[m]: guess you want to use glShaderBinary ?
<tomeu> I expect at least all drivers that support OpenGL 4.1 to also support it
<nhp[m]> issue is that I don't have a PC with the actual hardware, what I'm trying to do is build shaders for an old AMD GPU used in the WIi U
BenG83 has joined #panfrost
guillaume_g has quit [Quit: Konversation terminated!]
<chrisf> nhp[m]: are you running mesa on that device?
<nhp[m]> No, it has its own graphics API and no ability to compile shaders at runtime, so they must be compiled ahead of time
<nhp[m]> there's a very barebones shader compiler for it that uses AMD's shaderanalyzer, but it chokes on anything nontrivial, so I was wondering if it might be possible to have mesa do it
<chrisf> theoretically, sure; but it's going to take some hacking on your part, and you shouldnt assume that their graphics api has made the same decisions as mesa's r600 driver where there's multiple ways to do things
davidlt has joined #panfrost
kherbst has joined #panfrost
<alyssa> nhp[m]: ^^ that
karolherbst has quit [Disconnected by services]
kherbst is now known as karolherbst
<alyssa> there's an offline shader compiler API ("standalone") in mesa, panfrost implements it
<alyssa> so you could implement that for AMD, but a lot of shader compiler decsisions are arbitrary and depend on decisions on the command stream side
<alyssa> given the drivers on the other side are proprietary.... :(
gcl_ has joined #panfrost
<nhp[m]> Yeah...it's probably not worth the effort :(
gcl is now known as Guest66166
gcl_ is now known as gcl
<alyssa> if you could boot linux and use mesa otoh... :~)
<nhp[m]> There's a Linux port for it actually, no GPU driver yet though :(
<alyssa> Now that sounds like a fun project ;)
<alyssa> #dri-devel is that way, enjoy
Guest66166 has quit [Ping timeout: 240 seconds]
<nhp[m]> heh, sorry for being totally offtopic here, just figured someone here might be able to point me in the right direction :)
<alyssa> and we did ;)
<alyssa> c'mon, it'll be fun
<alyssa> :P
<nhp[m]> Heh, even rendering things with GX2 is enough suffering for me already lol
<nhp[m]> But still thanks for the help :)
<alyssa> ;)
tomboy65 has quit [Remote host closed the connection]
remexre has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
tomboy65 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
BenG83 has quit [Ping timeout: 240 seconds]
raster has joined #panfrost
davidlt has quit [Ping timeout: 256 seconds]
nlhowell has joined #panfrost
nlhowell has quit [Ping timeout: 256 seconds]
<endrift> alyssa: point taken
<alyssa> line taken
anarsoul has quit [Read error: Connection reset by peer]
anarsoul has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
megi has joined #panfrost
buzzmarshall has joined #panfrost
raster has joined #panfrost
nlhowell has joined #panfrost
unoccupied has quit [Ping timeout: 256 seconds]
TheMojoMan has joined #panfrost
TheMojoMan has quit [Client Quit]
raster has quit [Quit: Gettin' stinky!]