#panfrost on 2020-03-04 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

01:35 stikonas has quit [Remote host closed the connection]

01:37 tgall_foo has quit [Read error: Connection reset by peer]

01:53 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

01:53 jernej has joined #panfrost

02:15 <alyssa> anarsoul: If you can do so cleanly, go ahead!

02:15 <alyssa> (Cleanly = without regressing anything in terms of performance/functionality)

02:16 <alyssa> Also, I was later informed that mesa/st has tis own index cache that doesn't seem to be used much, so it might be easier to try to promote that to core Gallium and/or have a CAP to make it always be used

02:17 <alyssa> Er, not even mesa/st, the mesa GL itself

02:17 <alyssa> src/mesa/vbo/vbo_minmax_index.c:

02:27 <anarsoul> interesting

02:37 vstehle has quit [Ping timeout: 256 seconds]

03:21 <anarsoul> alyssa: hm, I don't see much difference with minmax cache :(

03:33 icecream95 has joined #panfrost

03:37 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

03:39 mearon has quit [Ping timeout: 265 seconds]

03:40 <icecream95> alyssa: I remember seeing vbo_get_minmax_index in a profile for something (STK?) so it is already being used for some things.

03:41 mearon has joined #panfrost

03:42 <icecream95> The function has a duplicate of u_vbuf_get_minmax_index_mapped, but without my optimisation for vectorisation

03:43 jernej has joined #panfrost

03:57 buzzmarshall has quit [Remote host closed the connection]

04:03 tgall_foo has joined #panfrost

04:11 QwertyChouskie has joined #panfrost

04:29 QwertyChouskie has quit [Remote host closed the connection]

04:29 QwertyChouskie has joined #panfrost

04:47 icecream95 has quit [Ping timeout: 255 seconds]

04:54 icecream95 has joined #panfrost

05:14 mixfix41 has joined #panfrost

05:55 QwertyChouskie has quit [Ping timeout: 255 seconds]

06:00 vstehle has joined #panfrost

06:18 <icecream95> It looks like WebGL on Firefox 75 is going to have decent performance, at least on Wayland. :)

06:24 mixfix41 has quit [Remote host closed the connection]

06:38 _whitelogger has joined #panfrost

06:40 <icecream95> tomeu: https://mastransky.wordpress.com/2020/03/03/webgl-and-fgx-acceleration-on-wayland/

06:42 <tomeu> nice!

07:15 MastaG has quit [Quit: The Lounge - https://thelounge.chat]

07:19 MastaG has joined #panfrost

07:41 pH5 has quit [Quit: bye]

08:26 yann has quit [Ping timeout: 258 seconds]

08:49 pH5 has joined #panfrost

09:06 mias has joined #panfrost

09:06 mias has quit [Quit: Konversation terminated!]

09:09 stikonas has joined #panfrost

09:09 icecream95 has quit [Ping timeout: 258 seconds]

09:12 mias has joined #panfrost

09:17 MastaG has quit [Quit: The Lounge - https://thelounge.chat]

09:22 yann has joined #panfrost

09:22 stikonas has quit [Remote host closed the connection]

09:47 MastaG has joined #panfrost

11:00 rak-zero has joined #panfrost

11:03 nerdboy has quit [Ping timeout: 255 seconds]

11:37 MastaG has quit [Quit: The Lounge - https://thelounge.chat]

11:41 MastaG has joined #panfrost

12:34 gcl_ has joined #panfrost

12:38 gcl has quit [Ping timeout: 258 seconds]

12:39 apol has joined #panfrost

12:41 <apol> I'm adding the use of eglSetDamageRegionKHR in kwin and I see this weird effect when I add it https://youtu.be/6v7hbMJYO58

12:41 <apol> if I remove the call or pass 0 n_rects, I get perfectly fine behaviour

12:42 <apol> any ideas?

12:42 <apol> (sorry if this is offtopic here, I'm really at a loss right now and I can only test on panfrost)

12:55 warpme_ has quit [Read error: Connection reset by peer]

12:55 warpme_ has joined #panfrost

13:05 <tomeu> bbrezillon may know

13:14 <bbrezillon> apol: are you taking buffer age into account?

13:16 <apol> bbrezillon: yes

13:16 <apol> let me show you the patch

13:17 <apol> bbrezillon: although if I force the the damage to be { 0, 0, 1920, 1080 } I get the same issue

13:18 <apol> https://phabricator.kde.org/D27788

13:39 <bbrezillon> apol: did you check the damage rect coordinates origin (the spec says "Coordinates are specified relative to the lower left corner")

13:40 <bbrezillon> well, if you set damage rect to cover the whole surface, then you have to redraw everything

13:41 <apol> bbrezillon: I added this change since I couldn't figure out what was going wrong and I get similar output as weston https://invent.kde.org/snippets/748

13:41 <apol> bbrezillon: well we are rendering the same AFAIU, like I said, if I don't call eglSetDamageRegionKHR it all works okay

13:43 <bbrezillon> apol: which is expected since we reload the entire FB into the tile buffer in that case

13:44 <bbrezillon> I guess you only redraw a sub-region if eglSetDamageRegion() is set

13:45 <bbrezillon> if the trace you've added to mesa and some extra traces in kwin, you can make sure the damage rects match

13:46 <bbrezillon> s/if/with/

13:58 <apol> let me look at what we are rendering, thanks

13:58 * apol is feeling n00b

14:23 <alyssa> bifrost branching is complicated~

14:25 <alyssa> Anything to save a bit, I guess.

14:25 <cwabbott> ^ the mantra of the entire bifrost ISA

14:31 <alyssa> cwabbott: :3

14:31 <alyssa> I thought it was "don't do whatever *that* was again *vigorous waving in midgard's general direction*"

14:32 <cwabbott> there are so many clever hacks to squeeze out those bits, though

14:33 <cwabbott> I vaguely remember figuring out the branch condition stuff... the biggest PITA was that the blob compiler sucked ass and didn't use the floating-point comparisons nearly as much as it could've

14:33 <alyssa> Wee.

14:33 <alyssa> Well, I guess that Bifrost motto (the second one) is my philosophy for IR design here

14:34 <alyssa> "You know all the zillions of rewrites you did for midgard? yeah, don't do that again."

14:36 <alyssa> See: register allocation, scheduling, control flow, !32-bit..

14:38 <alyssa> Meanwhile, the blob here is literally emitting FCMP.D3D.OGT.f32 and then BRANCH.EQ.i32.Z

14:39 <alyssa> (which I guess is what you meant)

14:51 <apol> bbrezillon: so the damage traces match, I see on the logs what I expect to see. It's actually these artifacts I start seeing outside the damaged region that I don't understand

14:56 <bbrezillon> apol: does the damage extent (mesa trace you've added) cover the region showing artifacts?

14:57 <apol> bbrezillon: I don't think so, do you know if there's a good tool to debug this?

14:58 <bbrezillon> not that I know

14:59 <bbrezillon> apol: what GPU are you testing on BTW?

14:59 <apol> that's a pinebook pro

14:59 <apol> OpenGL renderer string: Mali T860 (Panfrost)

15:00 <bbrezillon> should work just fine

15:04 <bbrezillon> alyssa: should I move all panfrost_emit_ functions to a new file (pan_cmdstream.{c,h} ?) or should they stay where they are (pan_context.c, pan_varyings.c, pan_compute.c, pan_attributes.c)?

15:07 <bbrezillon> apol: if the artifacts appear outside the damage extent, maybe it's a bug in the damage region tracking logic (you have to merge damage regions of the N last frames, N being the buffer age)

15:14 <bbrezillon> apol: I'd say that you should pass 'region' to regionToRects(), not 'output.damageHistory.constFirst()'

15:14 <apol> bbrezillon: I've already tried this

15:15 <apol> there's something else wrong

15:15 <apol> but I'm sadly not very familiar with our codebase, which doesn't help

15:17 yann has quit [Ping timeout: 260 seconds]

16:41 <apol> bbrezillon: yep, I was creating the rects wrong :'(

16:57 <apol> bbrezillon: it's working great now, thanks for the patience :)

17:00 apol has quit [Remote host closed the connection]

17:04 gcl_ has quit [Ping timeout: 240 seconds]

17:06 gcl has joined #panfrost

17:43 nerdboy has joined #panfrost

17:58 pH5 has quit [Quit: bye]

18:04 <alyssa> bbrezillon: Up to you, whatever you think is easier / less churn :)

18:33 pH5 has joined #panfrost

18:48 rak-zero has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

18:51 stikonas has joined #panfrost

18:59 gcl has quit [Ping timeout: 260 seconds]

19:01 gcl has joined #panfrost

19:10 gcl has quit [Ping timeout: 256 seconds]

19:12 gcl has joined #panfrost

19:21 adjtm_ has joined #panfrost

19:24 adjtm has quit [Ping timeout: 256 seconds]

19:26 NeuroScr has quit [Ping timeout: 256 seconds]

19:37 mixfix41 has joined #panfrost

19:50 _whitelogger has joined #panfrost

20:04 anarsoul|c has quit [Ping timeout: 256 seconds]

20:04 anarsoul|c has joined #panfrost

20:04 _whitelogger has quit [Ping timeout: 256 seconds]

20:07 _whitelogger has joined #panfrost

20:11 nerdboy has quit [Ping timeout: 256 seconds]

20:13 gcl_ has joined #panfrost

20:16 mixfix41 has quit [Remote host closed the connection]

20:16 gcl has quit [Ping timeout: 258 seconds]

20:40 mixfix41 has joined #panfrost

20:40 gcl_ has quit [Ping timeout: 256 seconds]

20:40 gcl has joined #panfrost

20:45 buzzmarshall has joined #panfrost

21:10 mias has quit [Ping timeout: 256 seconds]

21:23 TheKit has joined #panfrost

22:10 <anarsoul> alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4051

22:40 karolherbst has quit [Quit: duh 🐧]

22:50 karolherbst has joined #panfrost

22:51 pH5 has quit [Ping timeout: 256 seconds]

22:53 QwertyChouskie has joined #panfrost

23:03 megi has quit [Ping timeout: 260 seconds]

23:09 robmur01_ has joined #panfrost

23:10 robmur01 has quit [Read error: Connection reset by peer]

23:26 megi has joined #panfrost

23:35 QwertyChouskie has quit [Ping timeout: 260 seconds]

23:35 megi has quit [Ping timeout: 268 seconds]

23:37 megi has joined #panfrost

23:40 <alyssa> anarsoul: fwiw, the win of that series was specifically for gles3 supertuxkart

23:40 <alyssa> gles2 supertuxkart is bottlenecked on *user* indices

23:41 <alyssa> which can also be optimized a bit but not via caching, the patch that helped there I dropped since it was kinda sketchy

23:41 <anarsoul> oh

23:41 <anarsoul> I wonder if disabling user indices would help

23:43 <alyssa> anarsoul: I don't believe that's an option iirc

23:44 <alyssa> anarsoul: For gl2.1 supertuxkart anyway, you might try something like https://people.collabora.com/~alyssa/0001-panfrost-Add-minmax-upload-path-for-user-buffers.patch

23:45 <alyssa> patch was dropped since 1) it doesn't hadle other sizes but you could fix that and 2) I never got around to doing good benchmarking and from the asm I'm not convinced it's actually a win

23:45 <alyssa> Theoretically it should definitely be a win since it amortizes the cost of reading user indices so it'll save [size of index buffer] in read traffic

23:46 <alyssa> But in practice I don't know if `for (i < len) out[i] = in[i]` is as efficient as memcpy ... obviously a compiler on -O3 should be able to vectorize it and get close but memcpy I think can have some hand asm tricks on some platforms, and I don't know if arm32/64 do that

23:46 <alyssa> If it's a bbottleneck for you, though, worth a look

23:49 <HdkR> ARM64 has some major memcpy optimizations in the libraries

23:49 <anarsoul> iirc supertuxkart spends quite some time in min/max calculation

23:49 <anarsoul> but not q3a

23:50 <alyssa> anarsoul: yeah, but it's not clear if the bottleneck is actually the min/max ALU side or just the I/O of reading all that data

23:50 <alyssa> HdkR: significantly better than an unrolled NEON vectorized load/store then?

23:53 <HdkR> https://github.com/lattera/glibc/blob/master/sysdeps/aarch64/memcpy.S It uses 128bit paired loadstores which saturates the memory

23:53 <HdkR> Unrolled loops as well

23:54 <HdkR> NEON loadstores doesn't help significantly here since there isn't a consumer ARM CPU that has a 512bit loadstore pipeline

23:55 <anarsoul> alyssa: I think it's already in cache after util_upload_index_buffer() so it shouldn't be I/O