#panfrost on 2019-08-02 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

01:08 mearon has quit [Ping timeout: 246 seconds]

01:08 embed-3d has quit [Ping timeout: 248 seconds]

01:08 embed-3d has joined #panfrost

01:09 mearon has joined #panfrost

01:27 vstehle has quit [Ping timeout: 268 seconds]

01:36 herbmilleriw has quit [Ping timeout: 258 seconds]

01:41 herbmilleriw has joined #panfrost

02:06 herbmilleriw has quit [Remote host closed the connection]

03:14 _whitelogger has joined #panfrost

04:06 davidlt has joined #panfrost

05:00 vstehle has joined #panfrost

05:26 <tomeu> alyssa: fdo just allocates a BO per shader, any reason not to do that, now that we have the BO cache?

05:29 chewitt has joined #panfrost

05:33 <tomeu> bbrezillon: are you happy now with how the polygon list work is looking?

06:18 <bbrezillon> tomeu: haven't finished yet

06:18 <bbrezillon> but I'm making progress

06:36 <EmilKarlson> bbrezillon: is bootlin aware that linux-5.3 rockchip-drm is perhaps 11x slower compared to 5.2

06:37 <EmilKarlson> x11perf -tilerect500

06:38 <EmilKarlson> Xorg without glamor

06:52 <bbrezillon> EmilKarlson: I don't know (I no longer work for bootlin :))

06:53 <EmilKarlson> ah ok, thanks

06:53 <bbrezillon> EmilKarlson: what's slower?

06:53 <EmilKarlson> I checked the commit log, it was very bootliny

06:53 <EmilKarlson> mostly virtual desktop change on xmonad

06:54 <EmilKarlson> x11perf -tilerect500 gave me the numbers

06:54 <EmilKarlson> window redraw in general

06:54 <EmilKarlson> rxvt-unicode scrolling or whatever seems to do full window updates

06:56 <bbrezillon> you mean panfrost is slower, right?

06:57 <EmilKarlson> no, this is without anyone making any requests to the gpu afaik

06:58 <EmilKarlson> Xorg without glamor, as mentioned

06:58 <bbrezillon> ok

06:59 <bbrezillon> v5.2 vs v5.3-rc1 ?

07:00 <EmilKarlson> yes, latest comparison with 5.2.5 and 5.3-rc2

07:03 <EmilKarlson> I don't strictly have numbers for -rc1 and v5.2, but subjectively measured slowdown was discussed on #linux-rockhip

07:04 <bbrezillon> you're using the emulated fbdev or the KMS interface?

07:04 <EmilKarlson> I believe kms, I can check later, whatever Xorg selects by default on debian buster

07:05 <bbrezillon> just had a quick look at the commit log

07:06 <bbrezillon> and the only commit that could potentially be harmful is 6c83ca795f2c ("drm/rockchip: Use dirtyfb helper")

07:06 <bbrezillon> you can try reverting that one

07:07 <EmilKarlson> thanks, will do, though have to work for a few hours now

07:07 <EmilKarlson> obviously regressions are not stricly limited to inside rockchip-drm

07:08 <EmilKarlson> for rt2x00 I actually reverted the whole driver to fix regression there

07:08 <EmilKarlson> not sure, if that would work for rockchip-drm

07:17 <bbrezillon> EmilKarlson: well, if there's a perf regression, we want to know where it comes from

07:18 <bbrezillon> reverting the driver to its v5.2 state doesn't help

07:25 chewitt has quit [Quit: Adios!]

07:33 <EmilKarlson> you tried already

07:33 <EmilKarlson> or I mean testing revert will help exclude other causes

07:34 <bbrezillon> no, I mean it doesn't help us figuring out which commit is causing that

07:34 <bbrezillon> and no, I haven't tried

07:38 <tomeu> EmilKarlson: what about bisecting?

07:42 <EmilKarlson> well it's about the same thing

07:42 <EmilKarlson> but perhaps at some point

07:49 <EmilKarlson> I mean, whatever helps exclude causes

07:51 <EmilKarlson> if hypothesis is that "only commit that could potentially be harmful is 6c83ca795f2c" in rockchip-drm it means either reverting that commit helps, the regression is outside the driver or reverting driver helps, unless there is compatibility issue

07:51 <EmilKarlson> s/or/and/

07:52 <EmilKarlson> and does not help, whatever

07:59 _whitelogger has joined #panfrost

08:00 <tomeu> well, if using git-bisect, you would be bisecting the whole kernel

08:00 <tomeu> guess it could be a change in the clock configuration, DDR, devfreq, etc

08:00 <EmilKarlson> true

08:01 <EmilKarlson> but that's a lot of work on the kernel that has more than one regression per system I tested

08:09 pH5 has joined #panfrost

08:23 <EmilKarlson> and also git bisect accepts paths

08:30 yann has quit [Ping timeout: 246 seconds]

09:23 <bbrezillon> tomeu: it's ready => https://gitlab.freedesktop.org/tomeu/mesa/pipelines/53032

09:29 <tomeu> bbrezillon: awesome, that's going to help a lot :)

09:29 <tomeu> if we only had that for 19.2, I would be already happy :)

09:29 <tomeu> but let's see if I manage to put NOEXEC and HEAP support in it as well

09:31 <tomeu> bbrezillon: any ideas on what means to hit this warning? https://elixir.bootlin.com/linux/latest/source/drivers/iommu/io-pgtable-arm.c#L325

09:32 <bbrezillon> tomeu: nope

09:34 <tomeu> hmm, it's XWayland trying to import a buffer

09:37 <tomeu> now I got it as well when chromeium creates a buffer

10:05 yann has joined #panfrost

10:29 megi has joined #panfrost

10:33 jcureton has quit [Remote host closed the connection]

10:58 raster has joined #panfrost

12:35 davidlt has quit [Ping timeout: 258 seconds]

12:36 davidlt has joined #panfrost

12:37 adjtm has quit [Ping timeout: 245 seconds]

13:28 jcureton has joined #panfrost

13:35 adjtm has joined #panfrost

13:37 <tomeu> robher: your heap+noexec branch looks good in my testing here

13:39 <robher> tomeu: great! I should get the next version sent out today. Also, I have madvise patches about ready.

13:41 <tomeu> awesome, we have gotten memory usage really low

13:42 <tomeu> bbrezillon: I think panfrost should work much better now on your 1GB board

13:42 <tomeu> well, once everything lands :)

13:47 <bbrezillon> tomeu: did you have a look at the armhf/rk3288 test results?

13:47 <bbrezillon> tomeu: does any of what we've done help with the flip/flop issues we had?

13:48 <tomeu> bbrezillon: in which branch?

13:51 megi has quit [Ping timeout: 245 seconds]

13:56 <tomeu> my branch has a weird crash when the EGL context is destroyed

13:56 <tomeu> https://gitlab.freedesktop.org/tomeu/mesa/-/jobs/467519

13:56 <tomeu> cannot reproduce here though

14:11 hlmjr has joined #panfrost

14:13 herbmillerjr has quit [Ping timeout: 248 seconds]

14:15 JaceAlvejetti has joined #panfrost

14:27 <bbrezillon> tomeu: I didn't have any particular branch in mind

14:27 <tomeu> ah, I see

14:27 <bbrezillon> was just wondering if the work that's been pushed during the last 4 weeks had helped getting some of those problems fixed

14:27 <tomeu> not long ago I checked and the flip-flops were still there

14:27 <tomeu> and I can see that the perennial unmasked flip-flops are still there

14:28 <bbrezillon> :-(

14:28 <tomeu> I think there's some difference in the cmdstream that needs to be addressed

14:28 <tomeu> but in my local testing, rk3288 works quite fine here

14:28 <tomeu> (I debug most of the time on a veyron)

14:48 <tomeu> alyssa, bbrezillon: was quite happy with this branch regarding reduced memory usage: https://gitlab.freedesktop.org/tomeu/mesa/commits/panfrost-ci-noexec

14:48 <tomeu> but I get a crash just after the last test and I cannot reproduce locally

14:50 <tomeu> trying now with a debug build, so I get a better backtrace

14:53 <alyssa> tomeu: The problem with a BO-per-shader is twofold

14:53 <alyssa> One is that allocating BOs are expensive and the BO cache can't save the upfront cost, lots of overhead

14:54 <alyssa> Two is that executable memory, IIRC, has some funky alignment reqs in the kernel so you'd be wasting memory and/or fragmenting stuff? But maybe that's not too terrible in practgice

14:56 <alyssa> tomeu: Memory usage reduction is from HEAP, yeah?

15:03 JaceAlvejetti has quit [Remote host closed the connection]

15:03 JaceAlvejetti has joined #panfrost

15:06 davidlt has quit [Ping timeout: 246 seconds]

15:21 megi has joined #panfrost

15:36 <bbrezillon> alyssa: regarding the ctx->job field, do you think avoiding the job lookup in the hash table makes a huge difference?

15:36 <alyssa> bbrezillon: Huge? No. But it does get called very frequently and hash lookups aren't free. I've seen it show up as taking some nontrivial time in sysprof but certainly not the bottleneck.

15:37 <alyssa> Not going to make or break anything, but might as well get it right

15:41 wens has quit [Ping timeout: 268 seconds]

15:45 pH5 has quit [Quit: bye]

15:50 <alyssa> So, working on better uniform allocation

15:51 <alyssa> If I just cap it at 8 registers, it's actually quite a win

15:51 <alyssa> https://people.collabora.com/~alyssa/big-win

15:51 <alyssa> The issue is that now we, well, only have 8 registers available -> register spilling

15:51 <alyssa> Uniform spilling is cheaper than register spilling, so le'ts handle that.

16:11 yann has quit [Ping timeout: 244 seconds]

16:18 <alyssa> Uniform spilling implemented. I'm quite happy with the results + no loss in performance.

16:18 <alyssa> Er, no losses in shader-db I mean

16:18 <alyssa> ---Uh oh regression city

16:23 belgin has joined #panfrost

16:24 <alyssa> Huh. Actually, spilling might be a win here.

16:24 <bbrezillon> alyssa: added some assert()s to make sure ctx->job is consistent, and it's not

16:25 <bbrezillon> (even after I added the ctx->job = job; at the end of the get_fbo_job() func)

16:25 <alyssa> *blink*

16:25 <bbrezillon> looks like panfrost_get_job_for_fbo() gets called when ctx->pipe_framebuffer is still zeroed

16:26 <bbrezillon> which fills the ctx->job entry with a dummy FBO job

16:27 <alyssa> Grah

16:27 <bbrezillon> and when the function is called again, this time with a valid pipe_framebuffer state, the implementation returns the dummy job

16:27 <alyssa> bbrezillon: You know I rather detest Gallium/OpenGL, right?

16:28 <alyssa> bbrezillon: Anyways, why not add "ctx->job = NULL" at the beginning of set_framebuffer_state?

16:28 herbmillerjr has joined #panfrost

16:28 <bbrezillon> I did that too

16:29 <bbrezillon> not at the beginning though

16:29 hlmjr has quit [Ping timeout: 245 seconds]

16:32 <bbrezillon> alyssa: that's it, was done too late in the set_framebuffer_state() func

16:32 <bbrezillon> thx

16:33 <EmilKarlson> bbrezillon: initially seems your hypothesis of 6c83ca795f2c causing the performance regression seems correct

16:33 <alyssa> bbrezillon: +1

16:39 herbmilleriw has joined #panfrost

16:49 herbmilleriw has quit [Quit: Konversation terminated!]

16:53 herbmilleriw has joined #panfrost

16:53 <bbrezillon> alyssa: anything you want me to address in patch 6 and 8?

16:55 * alyssa eyes

16:56 <alyssa> bbrezillon: I guess it's fine... not excited about More things to fix for pipelining but..

16:56 <alyssa> 8 is R-b, just a question

16:57 herbmilleriw has quit [Client Quit]

17:09 yann has joined #panfrost

17:17 <alyssa> My RA bug sense is a tingling

17:24 herbmilleriw has joined #panfrost

17:28 <alyssa> I'm giving it the right itnerference graph sooooo

17:32 <alyssa> Ohhh

17:32 <alyssa> This is.... delicate...

17:34 <alyssa> (the analysis was right in the RA, but pipeline register creation made otheer stuff simple, hence cascade effect)

17:43 pH5 has joined #panfrost

17:45 belgin has quit [Quit: Leaving]

17:55 herbmilleriw has quit [Quit: Konversation terminated!]

18:06 JaceAlvejetti has quit [Remote host closed the connection]

18:15 adjtm has quit [Quit: Leaving]

18:20 raster has quit [Remote host closed the connection]

18:38 <alyssa> .....Texture ops can have ALU outmods on them in Midgard

18:38 <alyssa> I give up.

18:38 <alyssa> This arch is too weird.

18:39 <anarsoul> hehe

18:39 <alyssa> anarsoul: Mali-PP too?

18:40 <anarsoul> what is outmod?

18:40 <alyssa> anarsoul: fsat/etc

18:41 <anarsoul> well, you can pass texture fetch result into alu and have modifiers there

18:41 <alyssa> anarsoul: Yeah, that makes sense.

18:41 <anarsoul> but it's separate instruction

18:41 <alyssa> We can have the modifier on the texture op itself.

18:41 <alyssa> Somehow.

18:42 <anarsoul> alyssa: probably they wanted to make it as flexible as possible

18:42 <alyssa> Why? It's more gates..

18:42 <anarsoul> no idea

18:43 <anarsoul> I still wonder why Utgard GP was designed that way

18:43 <anarsoul> i.e. with pipeline internals exposed

18:44 <anarsoul> alyssa: I'd bet on "no one asked software guys"

18:44 <alyssa> Hardware guys are the ones who woild complain

18:45 <anarsoul> well, you never know

18:46 <anarsoul> I've seen pretty weird hw designs

18:54 herbmilleriw has joined #panfrost

18:55 TheKit has quit [Ping timeout: 244 seconds]

18:56 TheKit has joined #panfrost

19:22 <alyssa> I feel like I'm playing whack-am-ole

19:22 <alyssa> whack-a-mole

20:07 * alyssa tries to fudge control glow graphs

20:34 davidlt has joined #panfrost

20:43 stikonas has joined #panfrost

21:39 herbmillerjr has quit [Ping timeout: 248 seconds]

21:58 raster has joined #panfrost

22:26 davidlt has quit [Ping timeout: 268 seconds]

22:33 ente has quit [Ping timeout: 272 seconds]

22:33 raster has quit [Remote host closed the connection]

22:37 ente has joined #panfrost

22:41 pH5 has quit [Quit: -_-]

23:07 <alyssa> Woo, direct SSBO writes work.

23:07 <alyssa> (Caveat: security issues still so not for prod, broken with helper invocations for now)

23:07 <alyssa> Next up is testing direct SSBO reads so I can see if I'm losing my mind

23:09 <alyssa> Direct SSBO reads also work (similar caveats)

23:09 <alyssa> So next step will be indirect SSBO reads/writes

23:09 <alyssa> Which should be easy enough to add

23:11 <alyssa> To be implemented.. right after the break :p

23:30 <alyssa> `aaand we're back!

23:37 <alyssa> Uh oh, RA is on the fritz

23:42 <alyssa> Unfortunately a fix may be a little complex since we only have 2 ld/st regs

23:42 <alyssa> but.. 3 sources

23:42 <alyssa> It's not a *huge* obstacle since one source is scalar and another is 64-bit only

23:43 <alyssa> But... it does mean we need to handle ld/st reg subdivision now

23:56 * alyssa just landed a bunch of stuff

23:57 <alyssa> Anyway, so to special reg subdivison