alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
stikonas has quit [Remote host closed the connection]
tgall_foo has quit [Read error: Connection reset by peer]
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
jernej has joined #panfrost
<alyssa> anarsoul: If you can do so cleanly, go ahead!
<alyssa> (Cleanly = without regressing anything in terms of performance/functionality)
<alyssa> Also, I was later informed that mesa/st has tis own index cache that doesn't seem to be used much, so it might be easier to try to promote that to core Gallium and/or have a CAP to make it always be used
<alyssa> Er, not even mesa/st, the mesa GL itself
<alyssa> src/mesa/vbo/vbo_minmax_index.c:
<anarsoul> interesting
vstehle has quit [Ping timeout: 256 seconds]
<anarsoul> alyssa: hm, I don't see much difference with minmax cache :(
icecream95 has joined #panfrost
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
mearon has quit [Ping timeout: 265 seconds]
<icecream95> alyssa: I remember seeing vbo_get_minmax_index in a profile for something (STK?) so it is already being used for some things.
mearon has joined #panfrost
<icecream95> The function has a duplicate of u_vbuf_get_minmax_index_mapped, but without my optimisation for vectorisation
jernej has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
tgall_foo has joined #panfrost
QwertyChouskie has joined #panfrost
QwertyChouskie has quit [Remote host closed the connection]
QwertyChouskie has joined #panfrost
icecream95 has quit [Ping timeout: 255 seconds]
icecream95 has joined #panfrost
mixfix41 has joined #panfrost
QwertyChouskie has quit [Ping timeout: 255 seconds]
vstehle has joined #panfrost
<icecream95> It looks like WebGL on Firefox 75 is going to have decent performance, at least on Wayland. :)
mixfix41 has quit [Remote host closed the connection]
_whitelogger has joined #panfrost
<tomeu> nice!
MastaG has quit [Quit: The Lounge - https://thelounge.chat]
MastaG has joined #panfrost
pH5 has quit [Quit: bye]
yann has quit [Ping timeout: 258 seconds]
pH5 has joined #panfrost
mias has joined #panfrost
mias has quit [Quit: Konversation terminated!]
stikonas has joined #panfrost
icecream95 has quit [Ping timeout: 258 seconds]
mias has joined #panfrost
MastaG has quit [Quit: The Lounge - https://thelounge.chat]
yann has joined #panfrost
stikonas has quit [Remote host closed the connection]
MastaG has joined #panfrost
rak-zero has joined #panfrost
nerdboy has quit [Ping timeout: 255 seconds]
MastaG has quit [Quit: The Lounge - https://thelounge.chat]
MastaG has joined #panfrost
gcl_ has joined #panfrost
gcl has quit [Ping timeout: 258 seconds]
apol has joined #panfrost
<apol> I'm adding the use of eglSetDamageRegionKHR in kwin and I see this weird effect when I add it https://youtu.be/6v7hbMJYO58
<apol> if I remove the call or pass 0 n_rects, I get perfectly fine behaviour
<apol> any ideas?
<apol> (sorry if this is offtopic here, I'm really at a loss right now and I can only test on panfrost)
warpme_ has quit [Read error: Connection reset by peer]
warpme_ has joined #panfrost
<tomeu> bbrezillon may know
<bbrezillon> apol: are you taking buffer age into account?
<apol> bbrezillon: yes
<apol> let me show you the patch
<apol> bbrezillon: although if I force the the damage to be { 0, 0, 1920, 1080 } I get the same issue
<bbrezillon> apol: did you check the damage rect coordinates origin (the spec says "Coordinates are specified relative to the lower left corner")
<bbrezillon> well, if you set damage rect to cover the whole surface, then you have to redraw everything
<apol> bbrezillon: I added this change since I couldn't figure out what was going wrong and I get similar output as weston https://invent.kde.org/snippets/748
<apol> bbrezillon: well we are rendering the same AFAIU, like I said, if I don't call eglSetDamageRegionKHR it all works okay
<bbrezillon> apol: which is expected since we reload the entire FB into the tile buffer in that case
<bbrezillon> I guess you only redraw a sub-region if eglSetDamageRegion() is set
<bbrezillon> if the trace you've added to mesa and some extra traces in kwin, you can make sure the damage rects match
<bbrezillon> s/if/with/
<apol> let me look at what we are rendering, thanks
* apol is feeling n00b
<alyssa> bifrost branching is complicated~
<alyssa> Anything to save a bit, I guess.
<cwabbott> ^ the mantra of the entire bifrost ISA
<alyssa> cwabbott: :3
<alyssa> I thought it was "don't do whatever *that* was again *vigorous waving in midgard's general direction*"
<cwabbott> there are so many clever hacks to squeeze out those bits, though
<cwabbott> I vaguely remember figuring out the branch condition stuff... the biggest PITA was that the blob compiler sucked ass and didn't use the floating-point comparisons nearly as much as it could've
<alyssa> Wee.
<alyssa> Well, I guess that Bifrost motto (the second one) is my philosophy for IR design here
<alyssa> "You know all the zillions of rewrites you did for midgard? yeah, don't do that again."
<alyssa> See: register allocation, scheduling, control flow, !32-bit..
<alyssa> Meanwhile, the blob here is literally emitting FCMP.D3D.OGT.f32 and then BRANCH.EQ.i32.Z
<alyssa> (which I guess is what you meant)
<apol> bbrezillon: so the damage traces match, I see on the logs what I expect to see. It's actually these artifacts I start seeing outside the damaged region that I don't understand
<bbrezillon> apol: does the damage extent (mesa trace you've added) cover the region showing artifacts?
<apol> bbrezillon: I don't think so, do you know if there's a good tool to debug this?
<bbrezillon> not that I know
<bbrezillon> apol: what GPU are you testing on BTW?
<apol> that's a pinebook pro
<apol> OpenGL renderer string: Mali T860 (Panfrost)
<bbrezillon> should work just fine
<bbrezillon> alyssa: should I move all panfrost_emit_ functions to a new file (pan_cmdstream.{c,h} ?) or should they stay where they are (pan_context.c, pan_varyings.c, pan_compute.c, pan_attributes.c)?
<bbrezillon> apol: if the artifacts appear outside the damage extent, maybe it's a bug in the damage region tracking logic (you have to merge damage regions of the N last frames, N being the buffer age)
<bbrezillon> apol: I'd say that you should pass 'region' to regionToRects(), not 'output.damageHistory.constFirst()'
<apol> bbrezillon: I've already tried this
<apol> there's something else wrong
<apol> but I'm sadly not very familiar with our codebase, which doesn't help
yann has quit [Ping timeout: 260 seconds]
<apol> bbrezillon: yep, I was creating the rects wrong :'(
<apol> bbrezillon: it's working great now, thanks for the patience :)
apol has quit [Remote host closed the connection]
gcl_ has quit [Ping timeout: 240 seconds]
gcl has joined #panfrost
nerdboy has joined #panfrost
pH5 has quit [Quit: bye]
<alyssa> bbrezillon: Up to you, whatever you think is easier / less churn :)
pH5 has joined #panfrost
rak-zero has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
stikonas has joined #panfrost
gcl has quit [Ping timeout: 260 seconds]
gcl has joined #panfrost
gcl has quit [Ping timeout: 256 seconds]
gcl has joined #panfrost
adjtm_ has joined #panfrost
adjtm has quit [Ping timeout: 256 seconds]
NeuroScr has quit [Ping timeout: 256 seconds]
mixfix41 has joined #panfrost
_whitelogger has joined #panfrost
anarsoul|c has quit [Ping timeout: 256 seconds]
anarsoul|c has joined #panfrost
_whitelogger has quit [Ping timeout: 256 seconds]
_whitelogger has joined #panfrost
nerdboy has quit [Ping timeout: 256 seconds]
gcl_ has joined #panfrost
mixfix41 has quit [Remote host closed the connection]
gcl has quit [Ping timeout: 258 seconds]
mixfix41 has joined #panfrost
gcl_ has quit [Ping timeout: 256 seconds]
gcl has joined #panfrost
buzzmarshall has joined #panfrost
mias has quit [Ping timeout: 256 seconds]
TheKit has joined #panfrost
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
pH5 has quit [Ping timeout: 256 seconds]
QwertyChouskie has joined #panfrost
megi has quit [Ping timeout: 260 seconds]
robmur01_ has joined #panfrost
robmur01 has quit [Read error: Connection reset by peer]
megi has joined #panfrost
QwertyChouskie has quit [Ping timeout: 260 seconds]
megi has quit [Ping timeout: 268 seconds]
megi has joined #panfrost
<alyssa> anarsoul: fwiw, the win of that series was specifically for gles3 supertuxkart
<alyssa> gles2 supertuxkart is bottlenecked on *user* indices
<alyssa> which can also be optimized a bit but not via caching, the patch that helped there I dropped since it was kinda sketchy
<anarsoul> oh
<anarsoul> I wonder if disabling user indices would help
<alyssa> anarsoul: I don't believe that's an option iirc
<alyssa> anarsoul: For gl2.1 supertuxkart anyway, you might try something like https://people.collabora.com/~alyssa/0001-panfrost-Add-minmax-upload-path-for-user-buffers.patch
<alyssa> patch was dropped since 1) it doesn't hadle other sizes but you could fix that and 2) I never got around to doing good benchmarking and from the asm I'm not convinced it's actually a win
<alyssa> Theoretically it should definitely be a win since it amortizes the cost of reading user indices so it'll save [size of index buffer] in read traffic
<alyssa> But in practice I don't know if `for (i < len) out[i] = in[i]` is as efficient as memcpy ... obviously a compiler on -O3 should be able to vectorize it and get close but memcpy I think can have some hand asm tricks on some platforms, and I don't know if arm32/64 do that
<alyssa> If it's a bbottleneck for you, though, worth a look
<HdkR> ARM64 has some major memcpy optimizations in the libraries
<anarsoul> iirc supertuxkart spends quite some time in min/max calculation
<anarsoul> but not q3a
<alyssa> anarsoul: yeah, but it's not clear if the bottleneck is actually the min/max ALU side or just the I/O of reading all that data
<alyssa> HdkR: significantly better than an unrolled NEON vectorized load/store then?
<HdkR> https://github.com/lattera/glibc/blob/master/sysdeps/aarch64/memcpy.S It uses 128bit paired loadstores which saturates the memory
<HdkR> Unrolled loops as well
<HdkR> NEON loadstores doesn't help significantly here since there isn't a consumer ARM CPU that has a 512bit loadstore pipeline
<anarsoul> alyssa: I think it's already in cache after util_upload_index_buffer() so it shouldn't be I/O