alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
_whitelogger has joined #panfrost
<alyssa> HdkR: lol
* alyssa returns
<alyssa> Oh, a thousand regressions. Wee.
vstehle has quit [Ping timeout: 248 seconds]
embed-3d has quit [Ping timeout: 258 seconds]
<alyssa> Yes!
* alyssa just did her first compute-like thing
<alyssa> A shader reading from an SSBO in a loop, doing a transform, and writing back
<alyssa> So demonstrating indirect reads/writes from an SSBO (and not kerplutzing with CF)
<alyssa> Next up will be decrappening this series which could... take time...
<alyssa> Also, implementing some of the compute builtins so I can start running deqp tests
<alyssa> So, IDs:
<alyssa> get_global_id(0) =
<alyssa> ld_global_id r0.x, 0.xxxx, 0x18, 0x0
<alyssa> + r23.z
<alyssa> I'm trying to figure out what r23.z means here, it's zero here
<alyssa> get_global_id(1) is the same, but 0.yyyy
<alyssa> So `ld_global_id` is to load the whole vector at once, convenient
<alyssa> Arguably weirder is local_id which is as
<alyssa> ld_global_id r0.x, 0.xxxx, 0x10, 0x0
<alyssa> No addition needed.
<alyssa> I'll have to read the OpenCL spec to grok the different tho
<alyssa> get_group_id(0) is
<alyssa> ld_global_id r0.x, 0.xxxx, 0x14, 0x0
<alyssa> and then plus r23.z again
<alyssa> Again, that uniform is *zero* so what am I missing
<HdkR> Are you only kicking off one work group?
<alyssa> HdkR: (2, 2, 1) it looks like
<alyssa> Maybe I'll have to up that :p
<alyssa> get_num_groups is just a uniform
<alyssa> get_local_size too
<alyssa> aaaand this will resume tomorrow
<alyssa> Later y'all
hlmjr has quit [Quit: Konversation terminated!]
herbmillerjr has joined #panfrost
<anarsoul> alyssa: did you have to do any magic to get perf working on arm64?
NeuroScr has quit [Quit: NeuroScr]
NeuroScr has joined #panfrost
herbmilleriw has quit [Remote host closed the connection]
herbmilleriw has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
vstehle has joined #panfrost
herbmilleriw has quit [Remote host closed the connection]
herbmilleriw has joined #panfrost
<tomeu> anarsoul: for me it just worked
<tomeu> (on debian with 5.1+)
<tomeu> I just used the perf in debian, probably perf_4.19
<anarsoul> tomeu: doesn't work for me with perf 5.2.1 from archlinux nor self-compiled perf-5.2.5
<anarsoul> perf top is just silent
<tomeu> hmm, that has never happened to me
<tomeu> anarsoul: can only think of some kconfig, what about CONFIG_PERF_EVENTS=y ?
<anarsoul> it's set
yann has quit [Ping timeout: 258 seconds]
davidlt has joined #panfrost
NeuroScr has joined #panfrost
davidlt has quit [Ping timeout: 245 seconds]
yann has joined #panfrost
megi has quit [Ping timeout: 246 seconds]
megi has joined #panfrost
adjtm has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
adjtm has joined #panfrost
davidlt has quit [Quit: Leaving]
<shadeslayer> alyssa: awesome, shall I send it to the ML ?
pH5 has joined #panfrost
_whitelogger has joined #panfrost
chewitt has quit [Quit: Adios!]
afaerber has quit [Quit: Leaving]
yann has quit [Ping timeout: 272 seconds]
buzzmarshall has joined #panfrost
<alyssa> anarsoul: I couldn't get it; I just used sysprof (which turned out to be awesome)
<alyssa> shadeslayer: Do you have results from the CI test group I listed?
<alyssa> s/CI/CTS/
<anarsoul> alyssa: I got it working, but I had to disable PMU node in dts
<anarsoul> looks like PMU is broken in Allwinner A64
<TheCycoTWO> alyssa: what was the result of your gnome bottleneck profiling?
herbmilleriw has quit [Quit: Konversation terminated!]
<alyssa> TheCycoTWO: G-S JavaScript
<alyssa> Rewriting GNOME in a language that was designed in more than 10 days is outside of my internship scope ;)
<TheCycoTWO> ug
<alyssa> Yeah...
<alyssa> TheCycoTWO: It's possible the GPU is slowing down and sysprof isn't showing that but I.. somehow doubt that terribly...
<alyssa> In the benchmarks with the largest disparity with the blob, we're a bit above 1/2 the blob's fps
<alyssa> Which is annoying, but... on a desktop compositor, that means GPU isn't going to be the bottleneck, esp. when CPU usage is through the roof
<tomeu> wonder if the lack of parallel GPU jobs couldn't be impacting the user experience
<alyssa> tomeu: Pipelining you mean?
herbmilleriw has joined #panfrost
<tomeu> guess so, depending on what one means by that :)
<alyssa> tomeu: Two:
<tomeu> basically, we only send to the GPU one job at a time
<alyssa> 1) Running two FBOs concurrently if they don't depend on each other
<alyssa> 2) Running two frames of the same FBO concurrently, doing the VS shader of frame N+1 while running the FS of frame N
<tomeu> well, I think the GPU can schedule jobs
<tomeu> so the kernel can send to the HW more jobs that can be run at a given time
<tomeu> so we could save a bit of time there as well
<alyssa> tomeu: Job (JOB) or batch (panfrost_job)?
* alyssa really needs to do that s/panfrost_job/panfrost_batch/g already
<tomeu> alyssa: batches
<alyssa> tomeu: Ah. I don't believe the GPU can schedule batches.
<alyssa> In kbase, you could submit multiple batches from userspace at oncew ith a dep graph
<alyssa> But afaik, that was scheduled in kernel-space
<alyssa> The GPU scheduler (based on scoreboarding) is to order jobs within a batch
<daniels> kernel-space scheduling at least lets userspace get on with doing CPU-side prep work for more GPU jobs
<tomeu> kernel-space scheduling should be fine I think, unless we are flushing too often within mesa
<tomeu> but I thought kbase was able to schedule a second batch
<alyssa> tomeu: TBH eliminating needless wallpapers is probably an easier first task
<tomeu> well, or job chain
<tomeu> alyssa: do we need to increase performance now though? looks quite usable on my veyron
<alyssa> tomeu: Always room to make something faster ;P
<tomeu> sure, just a matter of what to work on next
<EmilKarlson> chromium^^
<tomeu> hehe
<tomeu> cool
<alyssa> Although actually -bterrain perf is only bumped 1fps when wallpapering is totally disabled
<alyssa> So maybe :/
<alyssa> -bdestkop is the big win from disabling wallpaper
<alyssa> (destkop jumps like 20fps)
<tomeu> alyssa: I think what we need to do in the kernel is to write the next job to JS_COMMAND_NEXT as soon as the hw has scheduled the execution of the previous batch
<tomeu> not sure how to model that with drm-sched though
rhyskidd has quit [Ping timeout: 258 seconds]
somy has quit [Ping timeout: 250 seconds]
rhyskidd has joined #panfrost
rhyskidd has quit [Ping timeout: 244 seconds]
rhyskidd has joined #panfrost
pH5 has quit [Quit: bye]
Elpaulo has quit [Quit: Elpaulo]
Elpaulo has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
adjtm has quit [Ping timeout: 268 seconds]
stikonas has joined #panfrost
raster has quit [Remote host closed the connection]
<alyssa> You know further thought let's just press ahead on the CL branch, this is fine
<HdkR> It's OpenCL via Clover I presume?
<alyssa> HdkR: eventually
<alyssa> Just doing GLES3.1 compute shaders for PoC right now
<HdkR> Sounds like a good first step :)
<alyssa> :)
<HdkR> OpenCL 2.0 requires SVM which means you need to capture GPU faults and recover from them
<alyssa> Bwa
<HdkR> Unless you feel like mapping in the full process in to GPU space I guess?
<alyssa> Oh, hey, milestone!
<alyssa> tomeu: I just submitted the first successful COMPUTE job (that did something, ie with an SSBO)
<Lyude> HdkR: hmm time :)
<HdkR> hmm
<Lyude> (heterogeneous memory management, not the onomatopoeia)
<HdkR> Oh right
<shadeslayer> alyssa: do all of them need to pass?
<alyssa> shadeslayer: You can skip the builtin_variable.* section
<alyssa> But the dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.base_vertex.* section needs to all pass, yes.
<alyssa> It looks like your patch is half there, just needs to handle negative biases correctly.
<shadeslayer> hm, I'll look into that image comparison fail then
<shadeslayer> negative bias q.q
<alyssa> Yeah, it's where you're so biased that you start being impartial again
<alyssa> :p
<shadeslayer> sounds lovely
<shadeslayer> alyssa: Thanks, I'll debug what's going on :)
<shadeslayer> alyssa: well, that was simple http://paste.ubuntu.com/p/9SmW8vzBNH/
<alyssa> shadeslayer: I'm not sure whether this necessarily makes sense if we think about what's really going on..
<alyssa> But if that fixes those dEQP tests, hm
<alyssa> shadeslayer: Could you try doing:
<alyssa> info->count + MAX2(info->index_bias, 00)
<alyssa> s/00/0
<alyssa> (So clamping positive -- negative biases would just hit zero)
<shadeslayer> alyssa: is there a way to tell deqp not to run the builtin_variable test with a regex?
<shadeslayer> or is the regex only for whitelisting and not blacklisting
<alyssa> Not sure..
* shadeslayer looks at the documentation
<shadeslayer> apparently there's a --exclude=
<shadeslayer> ah nope, that's for something called a executor
<alyssa> shadeslayer: android/cts/master/gles31-master.txt
<alyssa> has a list of tests
<alyssa> So you can run the ones you want
<shadeslayer> yeah but then I'll have to hand pick everything except the builtin
<shadeslayer> I was looking for a more lazy way out
<shadeslayer> this is with clamping
<alyssa> shadeslayer: Hmm, so maybe abs is the way to go?
<shadeslayer> let me test it ;)
<shadeslayer> alyssa: indeed https://paste.ubuntu.com/p/WNWmDkTkPh/
<alyssa> shadeslayer: Can't argue with 100% :)
<alyssa> Thank you for the investigation! :)
<shadeslayer> alyssa: I've uploaded it here https://gitlab.freedesktop.org/shadeslayer/mesa/tree/index_bias
<shadeslayer> alyssa: thanks for the help and patience! :)
* alyssa will get to it in a bit, swimming in compute stuff
<shadeslayer> sure thing, I'm gonna go look at https://people.collabora.com/~shadeslayer/DEQP/foo.xml
<shadeslayer> not sure what's going on there because afaict the image on the screen looked right
<alyssa> Oh, cute.
<alyssa> Midgard has native support for indexing (gl_WorkGroup/gl_Invo)ID in a ld/st op
<alyssa> 0x90 = gl_LocalInvoationID.x
<alyssa> 0x91 = gl_LocalInvoationID.y
<alyssa> 0x94 = gl_WorkGroupID.x
<alyssa> 0x94 = gl_WorkGroupID.y
<alyssa> etc
<alyssa> So we see pretty conclusively the middle 2-bits are a tpye selector
<alyssa> If both zero, it's a ld/st reg
<alyssa> If 0b10, it's a compute param as above
<alyssa> 0b01 allows indexing into Thread Local Storage iirc
raster has joined #panfrost
<HdkR> alyssa: Is TLS compute only or fragment supporting as well? :)
<alyssa> HdkR: TLS is universal since it implemented spilling
<HdkR> Nice
adjtm has joined #panfrost
<alyssa> HdkR: compute shaders work from freedreno ./test-compute
<alyssa> er
<alyssa> ./test-compiler
<alyssa> But not from deqp
<alyssa> Clearly deqp is broken :^)
stikonas has quit [Remote host closed the connection]
raster has quit [Remote host closed the connection]
<HdkR> clearly
<alyssa> I tried the shader out of dEQP and it's fine, except for binding= being different..
<alyssa> But if binding is just different, I should be able to flip that in the compiler and have the bugs cancel
<alyssa> (And using the shader outside of freedreno identical bindings but flipped in the backed, is fine)
<alyssa> This doesn't make sense...
<alyssa> shadeslayer: Oh, one more thing, have you run your branch through general CI?
* alyssa just triggered a run