00:23
_whitelogger has joined #panfrost
00:43
<
alyssa >
Oh, a thousand regressions. Wee.
01:04
vstehle has quit [Ping timeout: 248 seconds]
01:10
embed-3d has quit [Ping timeout: 258 seconds]
01:13
* alyssa
just did her first compute-like thing
01:14
<
alyssa >
A shader reading from an SSBO in a loop, doing a transform, and writing back
01:14
<
alyssa >
So demonstrating indirect reads/writes from an SSBO (and not kerplutzing with CF)
01:22
<
alyssa >
Next up will be decrappening this series which could... take time...
01:23
<
alyssa >
Also, implementing some of the compute builtins so I can start running deqp tests
01:33
<
alyssa >
get_global_id(0) =
01:33
<
alyssa >
ld_global_id r0.x, 0.xxxx, 0x18, 0x0
01:33
<
alyssa >
I'm trying to figure out what r23.z means here, it's zero here
01:34
<
alyssa >
get_global_id(1) is the same, but 0.yyyy
01:34
<
alyssa >
So `ld_global_id` is to load the whole vector at once, convenient
01:36
<
alyssa >
Arguably weirder is local_id which is as
01:36
<
alyssa >
ld_global_id r0.x, 0.xxxx, 0x10, 0x0
01:36
<
alyssa >
No addition needed.
01:36
<
alyssa >
I'll have to read the OpenCL spec to grok the different tho
01:37
<
alyssa >
get_group_id(0) is
01:37
<
alyssa >
ld_global_id r0.x, 0.xxxx, 0x14, 0x0
01:37
<
alyssa >
and then plus r23.z again
01:37
<
alyssa >
Again, that uniform is
*zero* so what am I missing
01:38
<
HdkR >
Are you only kicking off one work group?
01:39
<
alyssa >
HdkR: (2, 2, 1) it looks like
01:39
<
alyssa >
Maybe I'll have to up that :p
01:39
<
alyssa >
get_num_groups is just a uniform
01:39
<
alyssa >
get_local_size too
01:39
<
alyssa >
aaaand this will resume tomorrow
01:40
<
alyssa >
Later y'all
02:37
hlmjr has quit [Quit: Konversation terminated!]
02:45
herbmillerjr has joined #panfrost
03:21
<
anarsoul >
alyssa: did you have to do any magic to get perf working on arm64?
03:39
NeuroScr has quit [Quit: NeuroScr]
03:40
NeuroScr has joined #panfrost
04:29
herbmilleriw has quit [Remote host closed the connection]
04:30
herbmilleriw has joined #panfrost
04:31
NeuroScr has quit [Quit: NeuroScr]
05:00
vstehle has joined #panfrost
05:05
herbmilleriw has quit [Remote host closed the connection]
05:05
herbmilleriw has joined #panfrost
05:21
<
tomeu >
anarsoul: for me it just worked
05:21
<
tomeu >
(on debian with 5.1+)
05:22
<
tomeu >
I just used the perf in debian, probably perf_4.19
05:25
<
anarsoul >
tomeu: doesn't work for me with perf 5.2.1 from archlinux nor self-compiled perf-5.2.5
05:25
<
anarsoul >
perf top is just silent
05:26
<
tomeu >
hmm, that has never happened to me
05:28
<
tomeu >
anarsoul: can only think of some kconfig, what about CONFIG_PERF_EVENTS=y ?
05:29
<
anarsoul >
it's set
05:51
yann has quit [Ping timeout: 258 seconds]
06:30
davidlt has joined #panfrost
06:42
NeuroScr has joined #panfrost
06:42
davidlt has quit [Ping timeout: 245 seconds]
07:14
yann has joined #panfrost
07:44
megi has quit [Ping timeout: 246 seconds]
08:20
megi has joined #panfrost
08:30
adjtm has quit [Ping timeout: 272 seconds]
08:57
davidlt has joined #panfrost
09:09
adjtm has joined #panfrost
09:16
davidlt has quit [Quit: Leaving]
10:13
<
shadeslayer >
alyssa: awesome, shall I send it to the ML ?
10:19
pH5 has joined #panfrost
10:32
_whitelogger has joined #panfrost
10:56
chewitt has quit [Quit: Adios!]
11:57
afaerber has quit [Quit: Leaving]
12:08
yann has quit [Ping timeout: 272 seconds]
12:50
buzzmarshall has joined #panfrost
13:56
<
alyssa >
anarsoul: I couldn't get it; I just used sysprof (which turned out to be awesome)
13:56
<
alyssa >
shadeslayer: Do you have results from the CI test group I listed?
14:03
<
anarsoul >
alyssa: I got it working, but I had to disable PMU node in dts
14:04
<
anarsoul >
looks like PMU is broken in Allwinner A64
14:07
<
TheCycoTWO >
alyssa: what was the result of your gnome bottleneck profiling?
14:10
herbmilleriw has quit [Quit: Konversation terminated!]
14:10
<
alyssa >
TheCycoTWO: G-S JavaScript
14:10
<
alyssa >
Rewriting GNOME in a language that was designed in more than 10 days is outside of my internship scope ;)
14:11
<
alyssa >
TheCycoTWO: It's possible the GPU is slowing down and sysprof isn't showing that but I.. somehow doubt that terribly...
14:12
<
alyssa >
In the benchmarks with the largest disparity with the blob, we're a bit above 1/2 the blob's fps
14:12
<
alyssa >
Which is annoying, but... on a desktop compositor, that means GPU isn't going to be the bottleneck, esp. when CPU usage is through the roof
14:15
<
tomeu >
wonder if the lack of parallel GPU jobs couldn't be impacting the user experience
14:15
<
alyssa >
tomeu: Pipelining you mean?
14:15
herbmilleriw has joined #panfrost
14:16
<
tomeu >
guess so, depending on what one means by that :)
14:16
<
alyssa >
tomeu: Two:
14:16
<
tomeu >
basically, we only send to the GPU one job at a time
14:16
<
alyssa >
1) Running two FBOs concurrently if they don't depend on each other
14:17
<
alyssa >
2) Running two frames of the same FBO concurrently, doing the VS shader of frame N+1 while running the FS of frame N
14:19
<
tomeu >
well, I think the GPU can schedule jobs
14:20
<
tomeu >
so the kernel can send to the HW more jobs that can be run at a given time
14:20
<
tomeu >
so we could save a bit of time there as well
14:20
<
alyssa >
tomeu: Job (JOB) or batch (panfrost_job)?
14:20
* alyssa
really needs to do that s/panfrost_job/panfrost_batch/g already
14:23
<
tomeu >
alyssa: batches
14:24
<
alyssa >
tomeu: Ah. I don't believe the GPU can schedule batches.
14:24
<
alyssa >
In kbase, you could submit multiple batches from userspace at oncew ith a dep graph
14:24
<
alyssa >
But afaik, that was scheduled in kernel-space
14:25
<
alyssa >
The GPU scheduler (based on scoreboarding) is to order jobs within a batch
14:25
<
daniels >
kernel-space scheduling at least lets userspace get on with doing CPU-side prep work for more GPU jobs
14:27
<
tomeu >
kernel-space scheduling should be fine I think, unless we are flushing too often within mesa
14:27
<
tomeu >
but I thought kbase was able to schedule a second batch
14:27
<
alyssa >
tomeu: TBH eliminating needless wallpapers is probably an easier first task
14:27
<
tomeu >
well, or job chain
14:28
<
tomeu >
alyssa: do we need to increase performance now though? looks quite usable on my veyron
14:28
<
alyssa >
tomeu: Always room to make something faster ;P
14:28
<
tomeu >
sure, just a matter of what to work on next
14:30
<
EmilKarlson >
chromium^^
14:42
<
alyssa >
Although actually -bterrain perf is only bumped 1fps when wallpapering is totally disabled
14:42
<
alyssa >
So maybe :/
14:43
<
alyssa >
-bdestkop is the big win from disabling wallpaper
14:44
<
alyssa >
(destkop jumps like 20fps)
14:45
<
tomeu >
alyssa: I think what we need to do in the kernel is to write the next job to JS_COMMAND_NEXT as soon as the hw has scheduled the execution of the previous batch
14:46
<
tomeu >
not sure how to model that with drm-sched though
14:53
rhyskidd has quit [Ping timeout: 258 seconds]
14:56
somy has quit [Ping timeout: 250 seconds]
14:58
rhyskidd has joined #panfrost
15:05
rhyskidd has quit [Ping timeout: 244 seconds]
15:07
rhyskidd has joined #panfrost
16:08
pH5 has quit [Quit: bye]
16:35
Elpaulo has quit [Quit: Elpaulo]
17:45
Elpaulo has joined #panfrost
18:47
buzzmarshall has quit [Remote host closed the connection]
18:47
adjtm has quit [Ping timeout: 268 seconds]
19:03
stikonas has joined #panfrost
19:04
raster has quit [Remote host closed the connection]
19:28
<
alyssa >
You know further thought let's just press ahead on the CL branch, this is fine
19:31
<
HdkR >
It's OpenCL via Clover I presume?
19:31
<
alyssa >
HdkR: eventually
19:31
<
alyssa >
Just doing GLES3.1 compute shaders for PoC right now
19:31
<
HdkR >
Sounds like a good first step :)
19:32
<
HdkR >
OpenCL 2.0 requires SVM which means you need to capture GPU faults and recover from them
19:33
<
HdkR >
Unless you feel like mapping in the full process in to GPU space I guess?
19:33
<
alyssa >
Oh, hey, milestone!
19:34
<
alyssa >
tomeu: I just submitted the first successful COMPUTE job (that did something, ie with an SSBO)
19:38
<
Lyude >
HdkR: hmm time :)
19:40
<
Lyude >
(heterogeneous memory management, not the onomatopoeia)
20:25
<
shadeslayer >
alyssa: do all of them need to pass?
20:26
<
alyssa >
shadeslayer: You can skip the builtin_variable.* section
20:26
<
alyssa >
But the dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.base_vertex.* section needs to all pass, yes.
20:27
<
alyssa >
It looks like your patch is half there, just needs to handle negative biases correctly.
20:27
<
shadeslayer >
hm, I'll look into that image comparison fail then
20:27
<
shadeslayer >
negative bias q.q
20:27
<
alyssa >
Yeah, it's where you're so biased that you start being impartial again
20:27
<
shadeslayer >
sounds lovely
20:28
<
shadeslayer >
alyssa: Thanks, I'll debug what's going on :)
21:23
<
alyssa >
shadeslayer: I'm not sure whether this necessarily makes sense if we think about what's really going on..
21:23
<
alyssa >
But if that fixes those dEQP tests, hm
21:23
<
alyssa >
shadeslayer: Could you try doing:
21:24
<
alyssa >
info->count + MAX2(info->index_bias, 00)
21:24
<
alyssa >
(So clamping positive -- negative biases would just hit zero)
21:29
<
shadeslayer >
alyssa: is there a way to tell deqp not to run the builtin_variable test with a regex?
21:29
<
shadeslayer >
or is the regex only for whitelisting and not blacklisting
21:29
<
alyssa >
Not sure..
21:30
* shadeslayer
looks at the documentation
21:30
<
shadeslayer >
apparently there's a --exclude=
21:32
<
shadeslayer >
ah nope, that's for something called a executor
21:34
<
alyssa >
shadeslayer: android/cts/master/gles31-master.txt
21:34
<
alyssa >
has a list of tests
21:34
<
alyssa >
So you can run the ones you want
21:34
<
shadeslayer >
yeah but then I'll have to hand pick everything except the builtin
21:35
<
shadeslayer >
I was looking for a more lazy way out
21:40
<
shadeslayer >
this is with clamping
21:40
<
alyssa >
shadeslayer: Hmm, so maybe abs is the way to go?
21:41
<
shadeslayer >
let me test it ;)
21:42
<
alyssa >
shadeslayer: Can't argue with 100% :)
21:42
<
alyssa >
Thank you for the investigation! :)
21:44
<
shadeslayer >
alyssa: thanks for the help and patience! :)
21:44
* alyssa
will get to it in a bit, swimming in compute stuff
21:45
<
shadeslayer >
not sure what's going on there because afaict the image on the screen looked right
22:05
<
alyssa >
Midgard has native support for indexing (gl_WorkGroup/gl_Invo)ID in a ld/st op
22:05
<
alyssa >
0x90 = gl_LocalInvoationID.x
22:05
<
alyssa >
0x91 = gl_LocalInvoationID.y
22:05
<
alyssa >
0x94 = gl_WorkGroupID.x
22:05
<
alyssa >
0x94 = gl_WorkGroupID.y
22:06
<
alyssa >
So we see pretty conclusively the middle 2-bits are a tpye selector
22:06
<
alyssa >
If both zero, it's a ld/st reg
22:06
<
alyssa >
If 0b10, it's a compute param as above
22:06
<
alyssa >
0b01 allows indexing into Thread Local Storage iirc
22:11
raster has joined #panfrost
22:14
<
HdkR >
alyssa: Is TLS compute only or fragment supporting as well? :)
22:14
<
alyssa >
HdkR: TLS is universal since it implemented spilling
22:21
adjtm has joined #panfrost
22:34
<
alyssa >
HdkR: compute shaders work from freedreno ./test-compute
22:34
<
alyssa >
./test-compiler
22:34
<
alyssa >
But not from deqp
22:34
<
alyssa >
Clearly deqp is broken :^)
22:44
stikonas has quit [Remote host closed the connection]
22:46
raster has quit [Remote host closed the connection]
22:58
<
alyssa >
I tried the shader out of dEQP and it's fine, except for binding= being different..
22:59
<
alyssa >
But if binding is just different, I should be able to flip that in the compiler and have the bugs cancel
23:00
<
alyssa >
(And using the shader outside of freedreno identical bindings but flipped in the backed, is fine)
23:09
<
alyssa >
This doesn't make sense...
23:19
<
alyssa >
shadeslayer: Oh, one more thing, have you run your branch through general CI?
23:21
* alyssa
just triggered a run