davidlt has quit [Remote host closed the connection]
yann has quit [Ping timeout: 268 seconds]
<tomeu>
icecream95: thanks, -ftree-vectorize indeed reduces the oprtion of time spent there from 50% to 17% and increases FPS a lot on stk with the gles3 renderer
pH5 has joined #panfrost
icecream95 has quit [Ping timeout: 240 seconds]
<tomeu>
icecream95: __builtin_prefetch doesn't seem to help here, guess we only go over the indices once per frame
NeuroScr has quit [Quit: NeuroScr]
Xalius has quit [Remote host closed the connection]
<anarsoul>
tomeu: I'd say once per draw
<tomeu>
ah yes
<anarsoul>
tomeu: btw if indices array is sparse you're likely wasting VS cycles as well
yann has joined #panfrost
megi has joined #panfrost
NeuroScr has joined #panfrost
raster has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
<alyssa>
tomeu: 17% is still a ton of time fwiw.
<tomeu>
alyssa: yeah, I'd like to stage the data to a malloced BO and copy later to WC on submission
<tomeu>
and see what the change is
<alyssa>
tomeu: I don't think we need to introduce copies for this.
<tomeu>
I suspect this is triggered in the aquarium
<alyssa>
(in fact I think any sort of copying would be slower overall)
<alyssa>
There are two cases, indices are BOs and indices are user pointers
<tomeu>
that's what I wanted to check, so many uncached reads must have been making everything much slower
<alyssa>
For BOs, we know exactly when things are updated and in fact we have the normal malloced copy at our disposal during transfer_map
<alyssa>
For user pointers -- which STK is unfortunately using and is firmly not recommended for perf -- you lose all that information and mesa is forced to copy the entire thing from user malloc'd memory to the BO. But if you sneak in the min/max computation into that copying routine you avoid doing another pass over the data (requires hacking up Gallium but that's probably okay)
<alyssa>
[And doing it in Gallium would benefit lima as well]
<tomeu>
hmm, that sounds like a plan
<alyssa>
:+1:
<alyssa>
The only caveat is that you can't do that for MAP_DIRECTLY BOs
<alyssa>
but if you can avoid MAP_DIRECTLY for index buffers, you're solid.
<alyssa>
(I don't know if MAP_DIRECTLY is presently used for index buffers)
<alyssa>
Ideally PIPE_BIND_INDEX_BUFFER would be a reliable indicator here
<alyssa>
for mesa/st at least, I think that'll do.
<tomeu>
I was kind of expecting that MAP_DIRECTLY was what stk was doing
<alyssa>
nope, user pointers (iirc) :\
guillaume_g has quit [Quit: Konversation terminated!]
guillaume_g has joined #panfrost
guillaume_g has quit [Read error: Connection reset by peer]
guillaume_g has joined #panfrost
<MoeIcenowy>
alyssa: is STK only using it in the fallback renderer or it also uses it in GLES3 renderer?
<tomeu>
MoeIcenowy: I think it only uses it on the gles3 renderer
<tomeu>
have run deqp-gles2 in the same way, and everything looked fine
<tomeu>
so we have some problem with the kevins
yann has quit [Ping timeout: 246 seconds]
<alyssa>
MoeIcenowy: IIRC I saw this with gles3 as well
<alyssa>
tomeu: ...hm.
guillaume_g has quit [Quit: Konversation terminated!]
* alyssa
takes another look at ld/st argument packing
<alyssa>
I can't help but feel like I'm getting close but.. hm
<alyssa>
Oh... I bet the shift field is actually outside the ldst_reg abstraction, so the ldst_reg thing is really just 5-bits (not 8), and there's some other field where shift 'would' be for arg_1
<alyssa>
Recovered another bit of barrier state while poking at this so I guess that's still a net win
pH5 has quit [Quit: bye]
pH5 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
stikonas has joined #panfrost
chewitt has joined #panfrost
yann has joined #panfrost
raster has joined #panfrost
ChanServ has quit [shutting down]
ChanServ has joined #panfrost
buzzmarshall has joined #panfrost
TheKit has joined #panfrost
pH5 has quit [Quit: -_-]
anarsoul has quit [Remote host closed the connection]
anarsoul has joined #panfrost
NeuroScr has joined #panfrost
chewitt has quit [Read error: Connection reset by peer]