#panfrost on 2020-02-16 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:02 raster has joined #panfrost

01:04 stikonas has quit [Remote host closed the connection]

01:10 leper` has joined #panfrost

01:24 raster has quit [Quit: Gettin' stinky!]

02:03 vstehle has quit [Ping timeout: 240 seconds]

02:25 rcf has quit [Quit: WeeChat 2.5]

02:28 rcf has joined #panfrost

03:44 icecream95 has joined #panfrost

03:57 megi has quit [Ping timeout: 240 seconds]

04:37 anarsoul|c has quit [Quit: Connection closed for inactivity]

05:15 icecrea105 has joined #panfrost

05:16 icecream95 has quit [Ping timeout: 268 seconds]

05:20 icecrea105 is now known as icecream95

05:58 davidlt has joined #panfrost

06:00 vstehle has joined #panfrost

06:09 buzzmarshall has quit [Remote host closed the connection]

06:26 anarsoul|c has joined #panfrost

08:40 davidlt has quit [Ping timeout: 268 seconds]

08:49 MoeIcenowy has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]

08:49 MoeIcenowy has joined #panfrost

09:05 pH5 has joined #panfrost

09:07 JaceAlvejetti has quit [Ping timeout: 248 seconds]

09:09 JaceAlvejetti has joined #panfrost

09:53 megi has joined #panfrost

09:56 Space_Man has joined #panfrost

10:42 daniels has quit [Ping timeout: 240 seconds]

10:43 daniels has joined #panfrost

10:50 raster has joined #panfrost

11:10 stikonas has joined #panfrost

11:33 icecream95 has quit [Ping timeout: 268 seconds]

12:34 adjtm_ has joined #panfrost

12:37 adjtm has quit [Ping timeout: 265 seconds]

13:58 megi has quit [Ping timeout: 240 seconds]

14:29 <alyssa> Meanwhile, sysprof has regressed :|

14:33 * alyssa valgrinds

14:34 <alyssa> sysprof bug looks like.

14:37 <alyssa> Looks to be the same as https://gitlab.gnome.org/GNOME/sysprof/issues/23

14:38 <alyssa> Guess I'll really have to learn perf, eh?

14:56 raster has quit [Quit: Gettin' stinky!]

15:02 pak0st has joined #panfrost

15:44 buzzmarshall has joined #panfrost

16:23 megi has joined #panfrost

16:27 davidlt has joined #panfrost

16:29 karolherbst has joined #panfrost

16:52 pak0st has quit [Remote host closed the connection]

18:15 * alyssa tries to get proper proper mainline

18:16 <alyssa> So far most things seem to work.... except for panfrost :p

18:43 <alyssa> (Fixed)

18:43 * alyssa discovers perf :)

18:53 <anarsoul|c> alyssa: check out flamegraph in addition to perf

19:01 <alyssa> anarsoul|c: alright

19:02 * alyssa revisits scoreboarding now that the compute stuff is mildly sane

19:02 <alyssa> Doing it right is... obnoxiously difficult. But we'll get there.

19:16 pH5 has quit [Read error: Connection reset by peer]

19:20 pH5 has joined #panfrost

19:31 * alyssa has an algorithm sketched out, but u_blitter makes this needlessly complicated.

19:43 nerdboy has joined #panfrost

20:01 <alyssa> 7 files changed, 94 insertions(+), 513 deletions(-)

20:01 <alyssa> oh boy~

20:01 <alyssa> Not done but I digress

20:27 <alyssa> Making forward progress... I think..

20:32 <alyssa> Okay, stuff that doesn't wallpaper works (again)

20:33 yann has joined #panfrost

20:38 yann has quit [Ping timeout: 260 seconds]

20:49 <alyssa> MR opened. I'm quite pleased with how this turned out!

20:49 <alyssa> 9 files changed, 118 insertions(+), 527 deletions(-)

20:49 <alyssa> Lots of cleanup *and* a perf improvement? Count me in ;)

21:18 * alyssa tries ot learn `perf annotate`

21:18 davidlt_ has joined #panfrost

21:20 davidlt has quit [Ping timeout: 265 seconds]

21:26 <alyssa> Evidently bitfields are slow.

21:26 davidlt_ has quit [Ping timeout: 265 seconds]

21:31 <urjaman> I'm not exactly surprised

21:31 <alyssa> Actually that wasn't the issue, it was another reading-from-GPU-memory issue. How many of theose will we keep catching, idk :c

21:33 <alyssa> -----Actually, it *is* still the issue. Because bitfields necessarily imply reading from memory.

21:33 <HdkR> x86 cheats with bitfields since it has some instructions for helping accessing them directly from memory

21:33 <HdkR> ARM has to do a loadstore shuffle

21:36 <alyssa> HdkR: The issue here isn't the shuffling around per se, it's that reading from GPU mapped memory is stupidly expensive

21:37 <HdkR> ah, uncached then?

21:37 <alyssa> Yeah, for now at least

21:37 <alyssa> When you manually inspect code like `foo.bar = 5;` it's like, cool, that's just a write, no reads here

21:37 <alyssa> but if you look at the assembly level, that has to load the uncached memory to do the dance... and that becomes slow.

21:37 <HdkR> yea, uncached ends up being pretty bad

21:38 <HdkR> usually ends up being worth keeping it cached then doing a dcache flush at the end of whatever you need to do

21:45 <alyssa> oh hey it's register spilling

21:45 <alyssa> ins't that cute

21:45 * alyssa shivers

21:45 <alyssa> (I must say - as far as learning perf goes, this has been extremely educative. Way more productive with this than I've ever been with any other profiler ever and this is day #1. <3)

22:03 <alyssa> On min/max index computation... I see 99% of the time spent loading indices, which absolutely supports the theory that this is a caching issue (so the proposed Gallium-based fix ought to work well)

22:04 <alyssa> Next to the memory access, the actual min'ing and max'ing is effectively free.

22:05 <alyssa> Same thing with the heavy access_tiled_image_generic usage in stk

22:07 <HdkR> vector min/max ends up being three cycles per op, compared to uncached memory accesses it is nothing :P

22:07 <alyssa> true!

22:07 <anarsoul> alyssa: btw do you see heavy access_tiled_image_generic usage in weston?

22:08 <anarsoul> I'm seeing it with lima for some reason :(

22:08 <alyssa> anarsoul: not sure, I can look in a bit

22:08 <alyssa> currently in gnome

22:08 <anarsoul> I assume it'd be the same

22:10 <HdkR> It's a sad day that we don't get gather loads on ARM until SVE

22:10 <alyssa> (Note: this is again the same bottleneck we see for WebGL on firefox. Unfortunately I don't think there's much to be done there.)

22:14 TheKit has quit [Read error: Connection reset by peer]

22:16 <alyssa> anarsoul: In weston, I'm seeing the top function be panfrost_store_tiled_image_yes.

22:17 <alyssa> It's just a lot of memory access anyway.

22:17 <anarsoul> alyssa: it's not the case with gnome-shell?

22:17 <alyssa> Apparently not? Dunno

22:18 <anarsoul> I see

22:19 <alyssa> anarsoul: At any rate I suspect something like https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3818 would help

22:19 <anarsoul> I see

22:20 <anarsoul> thanks for the pointer

22:20 <alyssa> If you do an impl for lima, please do check what the win is. But it might be pretty decent for glamor at least :)

22:23 <anarsoul> do you have any specific benchmark in mind?

22:24 <alyssa> anarsoul: the MR linked, and the issue linked from that, talk about ShmPutImage in x11perf, which seems a decent proxy for glamor perf

22:32 <alyssa> tomeu: dj,hgskkgyrs ack, I didn't realize you were *already* working on this in a branch, aaa I didn't mean to duplicate effort >..<

22:33 <alyssa> Not time wasted - I did need to learn perf - but still feel bad :|

22:35 <alyssa> Actually, it looks like most of it is complementary (so conflicts will be "fun" but not strictly duplicated work)

22:53 <daniels> anarsoul: that's weird, we don't ourselves do any readbacks unless you're taking screenshots, we don't use FBOs, and we only do software uploads when software clients give us changed buffers

22:53 <daniels> so it shouldn't be spending a ton of time doing that

22:57 <anarsoul> daniels: according to perf it's coming from gl-renderer, which (indirectly) calls _mesa_TexSubImage2D

23:00 <daniels> anarsoul: right, we do that to upload client content which has been given to us as a SHM buffer

23:00 <daniels> we only do it clipped to the changed region(s), but that means the TexSubImage2D path might not be tile-aligned

23:01 <daniels> alyssa changed Panfrost so that it would only do a partial fallback for unaligned regions (i.e. use the generic unaligned access routine for the sub-tile regions, use the fast routine for the others), rather than doing all the accesses using the generic helper if the region was unaligned

23:03 <daniels> hmm yeah, 2091d311c9d0 applied that fix to the shared code, so it should've helped Lima as well

23:10 NeuroScr has joined #panfrost

23:45 pH5 has quit [Quit: -_-]