alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
mearon has quit [Ping timeout: 246 seconds]
embed-3d has quit [Ping timeout: 248 seconds]
embed-3d has joined #panfrost
mearon has joined #panfrost
vstehle has quit [Ping timeout: 268 seconds]
herbmilleriw has quit [Ping timeout: 258 seconds]
herbmilleriw has joined #panfrost
herbmilleriw has quit [Remote host closed the connection]
_whitelogger has joined #panfrost
davidlt has joined #panfrost
vstehle has joined #panfrost
<tomeu> alyssa: fdo just allocates a BO per shader, any reason not to do that, now that we have the BO cache?
chewitt has joined #panfrost
<tomeu> bbrezillon: are you happy now with how the polygon list work is looking?
<bbrezillon> tomeu: haven't finished yet
<bbrezillon> but I'm making progress
<EmilKarlson> bbrezillon: is bootlin aware that linux-5.3 rockchip-drm is perhaps 11x slower compared to 5.2
<EmilKarlson> x11perf -tilerect500
<EmilKarlson> Xorg without glamor
<bbrezillon> EmilKarlson: I don't know (I no longer work for bootlin :))
<EmilKarlson> ah ok, thanks
<bbrezillon> EmilKarlson: what's slower?
<EmilKarlson> I checked the commit log, it was very bootliny
<EmilKarlson> mostly virtual desktop change on xmonad
<EmilKarlson> x11perf -tilerect500 gave me the numbers
<EmilKarlson> window redraw in general
<EmilKarlson> rxvt-unicode scrolling or whatever seems to do full window updates
<bbrezillon> you mean panfrost is slower, right?
<EmilKarlson> no, this is without anyone making any requests to the gpu afaik
<EmilKarlson> Xorg without glamor, as mentioned
<bbrezillon> ok
<bbrezillon> v5.2 vs v5.3-rc1 ?
<EmilKarlson> yes, latest comparison with 5.2.5 and 5.3-rc2
<EmilKarlson> I don't strictly have numbers for -rc1 and v5.2, but subjectively measured slowdown was discussed on #linux-rockhip
<bbrezillon> you're using the emulated fbdev or the KMS interface?
<EmilKarlson> I believe kms, I can check later, whatever Xorg selects by default on debian buster
<bbrezillon> just had a quick look at the commit log
<bbrezillon> and the only commit that could potentially be harmful is 6c83ca795f2c ("drm/rockchip: Use dirtyfb helper")
<bbrezillon> you can try reverting that one
<EmilKarlson> thanks, will do, though have to work for a few hours now
<EmilKarlson> obviously regressions are not stricly limited to inside rockchip-drm
<EmilKarlson> for rt2x00 I actually reverted the whole driver to fix regression there
<EmilKarlson> not sure, if that would work for rockchip-drm
<bbrezillon> EmilKarlson: well, if there's a perf regression, we want to know where it comes from
<bbrezillon> reverting the driver to its v5.2 state doesn't help
chewitt has quit [Quit: Adios!]
<EmilKarlson> you tried already
<EmilKarlson> or I mean testing revert will help exclude other causes
<bbrezillon> no, I mean it doesn't help us figuring out which commit is causing that
<bbrezillon> and no, I haven't tried
<tomeu> EmilKarlson: what about bisecting?
<EmilKarlson> well it's about the same thing
<EmilKarlson> but perhaps at some point
<EmilKarlson> I mean, whatever helps exclude causes
<EmilKarlson> if hypothesis is that "only commit that could potentially be harmful is 6c83ca795f2c" in rockchip-drm it means either reverting that commit helps, the regression is outside the driver or reverting driver helps, unless there is compatibility issue
<EmilKarlson> s/or/and/
<EmilKarlson> and does not help, whatever
_whitelogger has joined #panfrost
<tomeu> well, if using git-bisect, you would be bisecting the whole kernel
<tomeu> guess it could be a change in the clock configuration, DDR, devfreq, etc
<EmilKarlson> true
<EmilKarlson> but that's a lot of work on the kernel that has more than one regression per system I tested
pH5 has joined #panfrost
<EmilKarlson> and also git bisect accepts paths
yann has quit [Ping timeout: 246 seconds]
<tomeu> bbrezillon: awesome, that's going to help a lot :)
<tomeu> if we only had that for 19.2, I would be already happy :)
<tomeu> but let's see if I manage to put NOEXEC and HEAP support in it as well
<tomeu> bbrezillon: any ideas on what means to hit this warning? https://elixir.bootlin.com/linux/latest/source/drivers/iommu/io-pgtable-arm.c#L325
<bbrezillon> tomeu: nope
<tomeu> hmm, it's XWayland trying to import a buffer
<tomeu> now I got it as well when chromeium creates a buffer
yann has joined #panfrost
megi has joined #panfrost
jcureton has quit [Remote host closed the connection]
raster has joined #panfrost
davidlt has quit [Ping timeout: 258 seconds]
davidlt has joined #panfrost
adjtm has quit [Ping timeout: 245 seconds]
jcureton has joined #panfrost
adjtm has joined #panfrost
<tomeu> robher: your heap+noexec branch looks good in my testing here
<robher> tomeu: great! I should get the next version sent out today. Also, I have madvise patches about ready.
<tomeu> awesome, we have gotten memory usage really low
<tomeu> bbrezillon: I think panfrost should work much better now on your 1GB board
<tomeu> well, once everything lands :)
<bbrezillon> tomeu: did you have a look at the armhf/rk3288 test results?
<bbrezillon> tomeu: does any of what we've done help with the flip/flop issues we had?
<tomeu> bbrezillon: in which branch?
megi has quit [Ping timeout: 245 seconds]
<tomeu> my branch has a weird crash when the EGL context is destroyed
<tomeu> cannot reproduce here though
hlmjr has joined #panfrost
herbmillerjr has quit [Ping timeout: 248 seconds]
JaceAlvejetti has joined #panfrost
<bbrezillon> tomeu: I didn't have any particular branch in mind
<tomeu> ah, I see
<bbrezillon> was just wondering if the work that's been pushed during the last 4 weeks had helped getting some of those problems fixed
<tomeu> not long ago I checked and the flip-flops were still there
<tomeu> and I can see that the perennial unmasked flip-flops are still there
<bbrezillon> :-(
<tomeu> I think there's some difference in the cmdstream that needs to be addressed
<tomeu> but in my local testing, rk3288 works quite fine here
<tomeu> (I debug most of the time on a veyron)
<tomeu> alyssa, bbrezillon: was quite happy with this branch regarding reduced memory usage: https://gitlab.freedesktop.org/tomeu/mesa/commits/panfrost-ci-noexec
<tomeu> but I get a crash just after the last test and I cannot reproduce locally
<tomeu> trying now with a debug build, so I get a better backtrace
<alyssa> tomeu: The problem with a BO-per-shader is twofold
<alyssa> One is that allocating BOs are expensive and the BO cache can't save the upfront cost, lots of overhead
<alyssa> Two is that executable memory, IIRC, has some funky alignment reqs in the kernel so you'd be wasting memory and/or fragmenting stuff? But maybe that's not too terrible in practgice
<alyssa> tomeu: Memory usage reduction is from HEAP, yeah?
JaceAlvejetti has quit [Remote host closed the connection]
JaceAlvejetti has joined #panfrost
davidlt has quit [Ping timeout: 246 seconds]
megi has joined #panfrost
<bbrezillon> alyssa: regarding the ctx->job field, do you think avoiding the job lookup in the hash table makes a huge difference?
<alyssa> bbrezillon: Huge? No. But it does get called very frequently and hash lookups aren't free. I've seen it show up as taking some nontrivial time in sysprof but certainly not the bottleneck.
<alyssa> Not going to make or break anything, but might as well get it right
wens has quit [Ping timeout: 268 seconds]
pH5 has quit [Quit: bye]
<alyssa> So, working on better uniform allocation
<alyssa> If I just cap it at 8 registers, it's actually quite a win
<alyssa> The issue is that now we, well, only have 8 registers available -> register spilling
<alyssa> Uniform spilling is cheaper than register spilling, so le'ts handle that.
yann has quit [Ping timeout: 244 seconds]
<alyssa> Uniform spilling implemented. I'm quite happy with the results + no loss in performance.
<alyssa> Er, no losses in shader-db I mean
<alyssa> ---Uh oh regression city
belgin has joined #panfrost
<alyssa> Huh. Actually, spilling might be a win here.
<bbrezillon> alyssa: added some assert()s to make sure ctx->job is consistent, and it's not
<bbrezillon> (even after I added the ctx->job = job; at the end of the get_fbo_job() func)
<alyssa> *blink*
<bbrezillon> looks like panfrost_get_job_for_fbo() gets called when ctx->pipe_framebuffer is still zeroed
<bbrezillon> which fills the ctx->job entry with a dummy FBO job
<alyssa> Grah
<bbrezillon> and when the function is called again, this time with a valid pipe_framebuffer state, the implementation returns the dummy job
<alyssa> bbrezillon: You know I rather detest Gallium/OpenGL, right?
<alyssa> bbrezillon: Anyways, why not add "ctx->job = NULL" at the beginning of set_framebuffer_state?
herbmillerjr has joined #panfrost
<bbrezillon> I did that too
<bbrezillon> not at the beginning though
hlmjr has quit [Ping timeout: 245 seconds]
<bbrezillon> alyssa: that's it, was done too late in the set_framebuffer_state() func
<bbrezillon> thx
<EmilKarlson> bbrezillon: initially seems your hypothesis of 6c83ca795f2c causing the performance regression seems correct
<alyssa> bbrezillon: +1
herbmilleriw has joined #panfrost
herbmilleriw has quit [Quit: Konversation terminated!]
herbmilleriw has joined #panfrost
<bbrezillon> alyssa: anything you want me to address in patch 6 and 8?
* alyssa eyes
<alyssa> bbrezillon: I guess it's fine... not excited about More things to fix for pipelining but..
<alyssa> 8 is R-b, just a question
herbmilleriw has quit [Client Quit]
yann has joined #panfrost
<alyssa> My RA bug sense is a tingling
herbmilleriw has joined #panfrost
<alyssa> I'm giving it the right itnerference graph sooooo
<alyssa> Ohhh
<alyssa> This is.... delicate...
<alyssa> (the analysis was right in the RA, but pipeline register creation made otheer stuff simple, hence cascade effect)
pH5 has joined #panfrost
belgin has quit [Quit: Leaving]
herbmilleriw has quit [Quit: Konversation terminated!]
JaceAlvejetti has quit [Remote host closed the connection]
adjtm has quit [Quit: Leaving]
raster has quit [Remote host closed the connection]
<alyssa> .....Texture ops can have ALU outmods on them in Midgard
<alyssa> I give up.
<alyssa> This arch is too weird.
<anarsoul> hehe
<alyssa> anarsoul: Mali-PP too?
<anarsoul> what is outmod?
<alyssa> anarsoul: fsat/etc
<anarsoul> well, you can pass texture fetch result into alu and have modifiers there
<alyssa> anarsoul: Yeah, that makes sense.
<anarsoul> but it's separate instruction
<alyssa> We can have the modifier on the texture op itself.
<alyssa> Somehow.
<anarsoul> alyssa: probably they wanted to make it as flexible as possible
<alyssa> Why? It's more gates..
<anarsoul> no idea
<anarsoul> I still wonder why Utgard GP was designed that way
<anarsoul> i.e. with pipeline internals exposed
<anarsoul> alyssa: I'd bet on "no one asked software guys"
<alyssa> Hardware guys are the ones who woild complain
<anarsoul> well, you never know
<anarsoul> I've seen pretty weird hw designs
herbmilleriw has joined #panfrost
TheKit has quit [Ping timeout: 244 seconds]
TheKit has joined #panfrost
<alyssa> I feel like I'm playing whack-am-ole
<alyssa> whack-a-mole
* alyssa tries to fudge control glow graphs
davidlt has joined #panfrost
stikonas has joined #panfrost
herbmillerjr has quit [Ping timeout: 248 seconds]
raster has joined #panfrost
davidlt has quit [Ping timeout: 268 seconds]
ente has quit [Ping timeout: 272 seconds]
raster has quit [Remote host closed the connection]
ente has joined #panfrost
pH5 has quit [Quit: -_-]
<alyssa> Woo, direct SSBO writes work.
<alyssa> (Caveat: security issues still so not for prod, broken with helper invocations for now)
<alyssa> Next up is testing direct SSBO reads so I can see if I'm losing my mind
<alyssa> Direct SSBO reads also work (similar caveats)
<alyssa> So next step will be indirect SSBO reads/writes
<alyssa> Which should be easy enough to add
<alyssa> To be implemented.. right after the break :p
<alyssa> `aaand we're back!
<alyssa> Uh oh, RA is on the fritz
<alyssa> Unfortunately a fix may be a little complex since we only have 2 ld/st regs
<alyssa> but.. 3 sources
<alyssa> It's not a *huge* obstacle since one source is scalar and another is 64-bit only
<alyssa> But... it does mean we need to handle ld/st reg subdivision now
* alyssa just landed a bunch of stuff
<alyssa> Anyway, so to special reg subdivison