#panfrost on 2019-08-06 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:23 _whitelogger has joined #panfrost

00:40 <alyssa> HdkR: lol

00:43 * alyssa returns

00:43 <alyssa> https://gitlab.freedesktop.org/tomeu/mesa/-/jobs/473652

00:43 <alyssa> Oh, a thousand regressions. Wee.

01:04 vstehle has quit [Ping timeout: 248 seconds]

01:10 embed-3d has quit [Ping timeout: 258 seconds]

01:13 <alyssa> Yes!

01:13 * alyssa just did her first compute-like thing

01:14 <alyssa> A shader reading from an SSBO in a loop, doing a transform, and writing back

01:14 <alyssa> So demonstrating indirect reads/writes from an SSBO (and not kerplutzing with CF)

01:22 <alyssa> Next up will be decrappening this series which could... take time...

01:23 <alyssa> Also, implementing some of the compute builtins so I can start running deqp tests

01:33 <alyssa> So, IDs:

01:33 <alyssa> get_global_id(0) =

01:33 <alyssa> ld_global_id r0.x, 0.xxxx, 0x18, 0x0

01:33 <alyssa> + r23.z

01:33 <alyssa> I'm trying to figure out what r23.z means here, it's zero here

01:34 <alyssa> get_global_id(1) is the same, but 0.yyyy

01:34 <alyssa> So `ld_global_id` is to load the whole vector at once, convenient

01:36 <alyssa> Arguably weirder is local_id which is as

01:36 <alyssa> ld_global_id r0.x, 0.xxxx, 0x10, 0x0

01:36 <alyssa> No addition needed.

01:36 <alyssa> I'll have to read the OpenCL spec to grok the different tho

01:37 <alyssa> get_group_id(0) is

01:37 <alyssa> ld_global_id r0.x, 0.xxxx, 0x14, 0x0

01:37 <alyssa> and then plus r23.z again

01:37 <alyssa> Again, that uniform is *zero* so what am I missing

01:38 <HdkR> Are you only kicking off one work group?

01:39 <alyssa> HdkR: (2, 2, 1) it looks like

01:39 <alyssa> Maybe I'll have to up that :p

01:39 <alyssa> get_num_groups is just a uniform

01:39 <alyssa> get_local_size too

01:39 <alyssa> aaaand this will resume tomorrow

01:40 <alyssa> Later y'all

02:37 hlmjr has quit [Quit: Konversation terminated!]

02:45 herbmillerjr has joined #panfrost

03:21 <anarsoul> alyssa: did you have to do any magic to get perf working on arm64?

03:39 NeuroScr has quit [Quit: NeuroScr]

03:40 NeuroScr has joined #panfrost

04:29 herbmilleriw has quit [Remote host closed the connection]

04:30 herbmilleriw has joined #panfrost

04:31 NeuroScr has quit [Quit: NeuroScr]

05:00 vstehle has joined #panfrost

05:05 herbmilleriw has quit [Remote host closed the connection]

05:05 herbmilleriw has joined #panfrost

05:21 <tomeu> anarsoul: for me it just worked

05:21 <tomeu> (on debian with 5.1+)

05:22 <tomeu> I just used the perf in debian, probably perf_4.19

05:25 <anarsoul> tomeu: doesn't work for me with perf 5.2.1 from archlinux nor self-compiled perf-5.2.5

05:25 <anarsoul> perf top is just silent

05:26 <tomeu> hmm, that has never happened to me

05:28 <tomeu> anarsoul: can only think of some kconfig, what about CONFIG_PERF_EVENTS=y ?

05:29 <anarsoul> it's set

05:51 yann has quit [Ping timeout: 258 seconds]

06:30 davidlt has joined #panfrost

06:42 NeuroScr has joined #panfrost

06:42 davidlt has quit [Ping timeout: 245 seconds]

07:14 yann has joined #panfrost

07:44 megi has quit [Ping timeout: 246 seconds]

08:20 megi has joined #panfrost

08:30 adjtm has quit [Ping timeout: 272 seconds]

08:57 davidlt has joined #panfrost

09:09 adjtm has joined #panfrost

09:16 davidlt has quit [Quit: Leaving]

10:13 <shadeslayer> alyssa: awesome, shall I send it to the ML ?

10:19 pH5 has joined #panfrost

10:32 _whitelogger has joined #panfrost

10:56 chewitt has quit [Quit: Adios!]

11:57 afaerber has quit [Quit: Leaving]

12:08 yann has quit [Ping timeout: 272 seconds]

12:50 buzzmarshall has joined #panfrost

13:56 <alyssa> anarsoul: I couldn't get it; I just used sysprof (which turned out to be awesome)

13:56 <alyssa> shadeslayer: Do you have results from the CI test group I listed?

13:56 <alyssa> s/CI/CTS/

14:03 <anarsoul> alyssa: I got it working, but I had to disable PMU node in dts

14:04 <anarsoul> looks like PMU is broken in Allwinner A64

14:07 <TheCycoTWO> alyssa: what was the result of your gnome bottleneck profiling?

14:10 herbmilleriw has quit [Quit: Konversation terminated!]

14:10 <alyssa> TheCycoTWO: G-S JavaScript

14:10 <alyssa> Rewriting GNOME in a language that was designed in more than 10 days is outside of my internship scope ;)

14:11 <TheCycoTWO> ug

14:11 <alyssa> Yeah...

14:11 <alyssa> TheCycoTWO: It's possible the GPU is slowing down and sysprof isn't showing that but I.. somehow doubt that terribly...

14:12 <alyssa> In the benchmarks with the largest disparity with the blob, we're a bit above 1/2 the blob's fps

14:12 <alyssa> Which is annoying, but... on a desktop compositor, that means GPU isn't going to be the bottleneck, esp. when CPU usage is through the roof

14:15 <tomeu> wonder if the lack of parallel GPU jobs couldn't be impacting the user experience

14:15 <alyssa> tomeu: Pipelining you mean?

14:15 herbmilleriw has joined #panfrost

14:16 <tomeu> guess so, depending on what one means by that :)

14:16 <alyssa> tomeu: Two:

14:16 <tomeu> basically, we only send to the GPU one job at a time

14:16 <alyssa> 1) Running two FBOs concurrently if they don't depend on each other

14:17 <alyssa> 2) Running two frames of the same FBO concurrently, doing the VS shader of frame N+1 while running the FS of frame N

14:19 <tomeu> well, I think the GPU can schedule jobs

14:20 <tomeu> so the kernel can send to the HW more jobs that can be run at a given time

14:20 <tomeu> so we could save a bit of time there as well

14:20 <alyssa> tomeu: Job (JOB) or batch (panfrost_job)?

14:20 * alyssa really needs to do that s/panfrost_job/panfrost_batch/g already

14:23 <tomeu> alyssa: batches

14:24 <alyssa> tomeu: Ah. I don't believe the GPU can schedule batches.

14:24 <alyssa> In kbase, you could submit multiple batches from userspace at oncew ith a dep graph

14:24 <alyssa> But afaik, that was scheduled in kernel-space

14:25 <alyssa> The GPU scheduler (based on scoreboarding) is to order jobs within a batch

14:25 <daniels> kernel-space scheduling at least lets userspace get on with doing CPU-side prep work for more GPU jobs

14:27 <tomeu> kernel-space scheduling should be fine I think, unless we are flushing too often within mesa

14:27 <tomeu> but I thought kbase was able to schedule a second batch

14:27 <alyssa> tomeu: TBH eliminating needless wallpapers is probably an easier first task

14:27 <tomeu> well, or job chain

14:28 <tomeu> alyssa: do we need to increase performance now though? looks quite usable on my veyron

14:28 <alyssa> tomeu: Always room to make something faster ;P

14:28 <tomeu> sure, just a matter of what to work on next

14:30 <EmilKarlson> chromium^^

14:32 <tomeu> hehe

14:40 <alyssa> tomeu: https://lists.freedesktop.org/archives/mesa-dev/2019-August/221837.html

14:42 <tomeu> cool

14:42 <alyssa> Although actually -bterrain perf is only bumped 1fps when wallpapering is totally disabled

14:42 <alyssa> So maybe :/

14:43 <alyssa> -bdestkop is the big win from disabling wallpaper

14:44 <alyssa> (destkop jumps like 20fps)

14:45 <tomeu> alyssa: I think what we need to do in the kernel is to write the next job to JS_COMMAND_NEXT as soon as the hw has scheduled the execution of the previous batch

14:46 <tomeu> not sure how to model that with drm-sched though

14:53 rhyskidd has quit [Ping timeout: 258 seconds]

14:56 somy has quit [Ping timeout: 250 seconds]

14:58 rhyskidd has joined #panfrost

15:05 rhyskidd has quit [Ping timeout: 244 seconds]

15:07 rhyskidd has joined #panfrost

16:08 pH5 has quit [Quit: bye]

16:35 Elpaulo has quit [Quit: Elpaulo]

17:45 Elpaulo has joined #panfrost

18:47 buzzmarshall has quit [Remote host closed the connection]

18:47 adjtm has quit [Ping timeout: 268 seconds]

19:03 stikonas has joined #panfrost

19:04 raster has quit [Remote host closed the connection]

19:28 <alyssa> You know further thought let's just press ahead on the CL branch, this is fine

19:31 <HdkR> It's OpenCL via Clover I presume?

19:31 <alyssa> HdkR: eventually

19:31 <alyssa> Just doing GLES3.1 compute shaders for PoC right now

19:31 <HdkR> Sounds like a good first step :)

19:31 <alyssa> :)

19:32 <HdkR> OpenCL 2.0 requires SVM which means you need to capture GPU faults and recover from them

19:32 <alyssa> Bwa

19:33 <HdkR> Unless you feel like mapping in the full process in to GPU space I guess?

19:33 <alyssa> Oh, hey, milestone!

19:34 <alyssa> tomeu: I just submitted the first successful COMPUTE job (that did something, ie with an SSBO)

19:38 <Lyude> HdkR: hmm time :)

19:38 <HdkR> hmm

19:40 <Lyude> (heterogeneous memory management, not the onomatopoeia)

19:45 <HdkR> Oh right

20:24 <shadeslayer> alyssa: https://paste.ubuntu.com/p/rDSnfB2Kmz/

20:25 <shadeslayer> alyssa: do all of them need to pass?

20:26 <alyssa> shadeslayer: You can skip the builtin_variable.* section

20:26 <alyssa> But the dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.base_vertex.* section needs to all pass, yes.

20:27 <alyssa> It looks like your patch is half there, just needs to handle negative biases correctly.

20:27 <shadeslayer> hm, I'll look into that image comparison fail then

20:27 <shadeslayer> negative bias q.q

20:27 <alyssa> Yeah, it's where you're so biased that you start being impartial again

20:27 <alyssa> :p

20:27 <shadeslayer> sounds lovely

20:28 <shadeslayer> alyssa: Thanks, I'll debug what's going on :)

21:14 <shadeslayer> alyssa: well, that was simple http://paste.ubuntu.com/p/9SmW8vzBNH/

21:23 <alyssa> shadeslayer: I'm not sure whether this necessarily makes sense if we think about what's really going on..

21:23 <alyssa> But if that fixes those dEQP tests, hm

21:23 <alyssa> shadeslayer: Could you try doing:

21:24 <alyssa> info->count + MAX2(info->index_bias, 00)

21:24 <alyssa> s/00/0

21:24 <alyssa> (So clamping positive -- negative biases would just hit zero)

21:29 <shadeslayer> alyssa: is there a way to tell deqp not to run the builtin_variable test with a regex?

21:29 <shadeslayer> or is the regex only for whitelisting and not blacklisting

21:29 <alyssa> Not sure..

21:30 * shadeslayer looks at the documentation

21:30 <shadeslayer> apparently there's a --exclude=

21:32 <shadeslayer> ah nope, that's for something called a executor

21:34 <alyssa> shadeslayer: android/cts/master/gles31-master.txt

21:34 <alyssa> has a list of tests

21:34 <alyssa> So you can run the ones you want

21:34 <shadeslayer> yeah but then I'll have to hand pick everything except the builtin

21:35 <shadeslayer> I was looking for a more lazy way out

21:39 <shadeslayer> alyssa: https://paste.ubuntu.com/p/Yw5X9xQ6st/

21:40 <shadeslayer> this is with clamping

21:40 <alyssa> shadeslayer: Hmm, so maybe abs is the way to go?

21:41 <shadeslayer> let me test it ;)

21:41 <shadeslayer> alyssa: indeed https://paste.ubuntu.com/p/WNWmDkTkPh/

21:42 <alyssa> shadeslayer: Can't argue with 100% :)

21:42 <alyssa> Thank you for the investigation! :)

21:44 <shadeslayer> alyssa: I've uploaded it here https://gitlab.freedesktop.org/shadeslayer/mesa/tree/index_bias

21:44 <shadeslayer> alyssa: thanks for the help and patience! :)

21:44 * alyssa will get to it in a bit, swimming in compute stuff

21:45 <shadeslayer> sure thing, I'm gonna go look at https://people.collabora.com/~shadeslayer/DEQP/foo.xml

21:45 <shadeslayer> not sure what's going on there because afaict the image on the screen looked right

22:04 <alyssa> Oh, cute.

22:05 <alyssa> Midgard has native support for indexing (gl_WorkGroup/gl_Invo)ID in a ld/st op

22:05 <alyssa> 0x90 = gl_LocalInvoationID.x

22:05 <alyssa> 0x91 = gl_LocalInvoationID.y

22:05 <alyssa> 0x94 = gl_WorkGroupID.x

22:05 <alyssa> 0x94 = gl_WorkGroupID.y

22:05 <alyssa> etc

22:06 <alyssa> So we see pretty conclusively the middle 2-bits are a tpye selector

22:06 <alyssa> If both zero, it's a ld/st reg

22:06 <alyssa> If 0b10, it's a compute param as above

22:06 <alyssa> 0b01 allows indexing into Thread Local Storage iirc

22:11 raster has joined #panfrost

22:14 <HdkR> alyssa: Is TLS compute only or fragment supporting as well? :)

22:14 <alyssa> HdkR: TLS is universal since it implemented spilling

22:17 <HdkR> Nice

22:21 adjtm has joined #panfrost

22:34 <alyssa> HdkR: compute shaders work from freedreno ./test-compute

22:34 <alyssa> er

22:34 <alyssa> ./test-compiler

22:34 <alyssa> But not from deqp

22:34 <alyssa> Clearly deqp is broken :^)

22:44 stikonas has quit [Remote host closed the connection]

22:46 raster has quit [Remote host closed the connection]

22:48 <HdkR> clearly

22:58 <alyssa> I tried the shader out of dEQP and it's fine, except for binding= being different..

22:59 <alyssa> But if binding is just different, I should be able to flip that in the compiler and have the bugs cancel

23:00 <alyssa> (And using the shader outside of freedreno identical bindings but flipped in the backed, is fine)

23:09 <alyssa> This doesn't make sense...

23:19 <alyssa> shadeslayer: Oh, one more thing, have you run your branch through general CI?

23:21 * alyssa just triggered a run