alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
stikonas has quit [Ping timeout: 272 seconds]
Xalius has joined #panfrost
vstehle has quit [Ping timeout: 265 seconds]
<alyssa> 8 files changed, 70 insertions(+), 113 deletions(-)
<alyssa> I love deleting code :p
<alyssa> Next up is doing some heavier duty tag analysis
<alyssa> Which is a bit of yak shaving but having tags everywhere in the disassembly is quite annoying and rather useless information
<alyssa> We already verify tags in the disassembler, might as well go all the way.
<alyssa> is it wrong that i end comments with QED
<alyssa> Alright - got both next tags and regular tags inferred and removed from disassembly. Things are looking a lot cleaner now :)
afaerber has quit [Ping timeout: 240 seconds]
buzzmarshall has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
buzzmarshall has quit [Remote host closed the connection]
icecream95 has joined #panfrost
megi has quit [Ping timeout: 240 seconds]
<icecream95> tomeu: Try compiling with -ftree-vectorize (and -mfpu=neon for 32-bit ARM)
<icecream95> IIRC, minmax is about 7x faster vectorised
chewitt has quit [Quit: Zzz..]
<anarsoul|c> icecream95: then maybe it makes sense to enable this flag for this file unconditionally?
<icecream95> The minmax function is called on the same set of indices a lot,
<anarsoul> but how do we know whether it's the same set or not?
<icecream95> Looking at the indices pointer will work most of the time, but isn't 100% reliable
<anarsoul> exactly
<anarsoul> especially if you have BO cache
vstehle has joined #panfrost
afaerber has joined #panfrost
chewitt has joined #panfrost
<icecream95> set overload-resolution off
<icecream95> Whoops, was trying to paste that into gdb...
chewitt has quit [Quit: Adios!]
guillaume_g has joined #panfrost
<icecream95> HdkR: https://github.com/ptitSeb/box86 is a reasonably fast x86 JIT that supports accelerated OpenGL
<HdkR> icecream95: Aye, only 32bit x86 though :P
<anarsoul> and 32-bit ARM
davidlt has joined #panfrost
NeuroScr has joined #panfrost
<anarsoul> anyway nice piece of software
davidlt has quit [Remote host closed the connection]
yann has quit [Ping timeout: 268 seconds]
<tomeu> icecream95: thanks, -ftree-vectorize indeed reduces the oprtion of time spent there from 50% to 17% and increases FPS a lot on stk with the gles3 renderer
pH5 has joined #panfrost
icecream95 has quit [Ping timeout: 240 seconds]
<tomeu> icecream95: __builtin_prefetch doesn't seem to help here, guess we only go over the indices once per frame
NeuroScr has quit [Quit: NeuroScr]
Xalius has quit [Remote host closed the connection]
<anarsoul> tomeu: I'd say once per draw
<tomeu> ah yes
<anarsoul> tomeu: btw if indices array is sparse you're likely wasting VS cycles as well
yann has joined #panfrost
megi has joined #panfrost
NeuroScr has joined #panfrost
raster has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
<alyssa> tomeu: 17% is still a ton of time fwiw.
<tomeu> alyssa: yeah, I'd like to stage the data to a malloced BO and copy later to WC on submission
<tomeu> and see what the change is
<alyssa> tomeu: I don't think we need to introduce copies for this.
<tomeu> I suspect this is triggered in the aquarium
<alyssa> (in fact I think any sort of copying would be slower overall)
<alyssa> There are two cases, indices are BOs and indices are user pointers
<tomeu> that's what I wanted to check, so many uncached reads must have been making everything much slower
<alyssa> For BOs, we know exactly when things are updated and in fact we have the normal malloced copy at our disposal during transfer_map
<alyssa> For user pointers -- which STK is unfortunately using and is firmly not recommended for perf -- you lose all that information and mesa is forced to copy the entire thing from user malloc'd memory to the BO. But if you sneak in the min/max computation into that copying routine you avoid doing another pass over the data (requires hacking up Gallium but that's probably okay)
<alyssa> [And doing it in Gallium would benefit lima as well]
<tomeu> hmm, that sounds like a plan
<alyssa> :+1:
<alyssa> The only caveat is that you can't do that for MAP_DIRECTLY BOs
<alyssa> but if you can avoid MAP_DIRECTLY for index buffers, you're solid.
<alyssa> (I don't know if MAP_DIRECTLY is presently used for index buffers)
<alyssa> Ideally PIPE_BIND_INDEX_BUFFER would be a reliable indicator here
<alyssa> for mesa/st at least, I think that'll do.
<tomeu> I was kind of expecting that MAP_DIRECTLY was what stk was doing
<alyssa> nope, user pointers (iirc) :\
guillaume_g has quit [Quit: Konversation terminated!]
guillaume_g has joined #panfrost
guillaume_g has quit [Read error: Connection reset by peer]
guillaume_g has joined #panfrost
<MoeIcenowy> alyssa: is STK only using it in the fallback renderer or it also uses it in GLES3 renderer?
<tomeu> MoeIcenowy: I think it only uses it on the gles3 renderer
<tomeu> alyssa: so I have downloaded and booted the same kernel and ramdisk from https://gitlab.freedesktop.org/tomeu/mesa/-/jobs/1608289 on my nanopc-t4
<tomeu> have run deqp-gles2 in the same way, and everything looked fine
<tomeu> so we have some problem with the kevins
yann has quit [Ping timeout: 246 seconds]
<alyssa> MoeIcenowy: IIRC I saw this with gles3 as well
<alyssa> tomeu: ...hm.
guillaume_g has quit [Quit: Konversation terminated!]
* alyssa takes another look at ld/st argument packing
<alyssa> I can't help but feel like I'm getting close but.. hm
<alyssa> Oh... I bet the shift field is actually outside the ldst_reg abstraction, so the ldst_reg thing is really just 5-bits (not 8), and there's some other field where shift 'would' be for arg_1
<alyssa> Recovered another bit of barrier state while poking at this so I guess that's still a net win
pH5 has quit [Quit: bye]
pH5 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
stikonas has joined #panfrost
chewitt has joined #panfrost
yann has joined #panfrost
raster has joined #panfrost
ChanServ has quit [shutting down]
ChanServ has joined #panfrost
buzzmarshall has joined #panfrost
TheKit has joined #panfrost
pH5 has quit [Quit: -_-]
anarsoul has quit [Remote host closed the connection]
anarsoul has joined #panfrost
NeuroScr has joined #panfrost
chewitt has quit [Read error: Connection reset by peer]
gtucker has joined #panfrost
NeuroScr has quit [Ping timeout: 240 seconds]
enunes has quit [Quit: ZNC 1.7.2 - https://znc.in]
enunes has joined #panfrost