alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
raster has joined #panfrost
stikonas has quit [Remote host closed the connection]
leper` has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
vstehle has quit [Ping timeout: 240 seconds]
rcf has quit [Quit: WeeChat 2.5]
rcf has joined #panfrost
icecream95 has joined #panfrost
megi has quit [Ping timeout: 240 seconds]
anarsoul|c has quit [Quit: Connection closed for inactivity]
icecrea105 has joined #panfrost
icecream95 has quit [Ping timeout: 268 seconds]
icecrea105 is now known as icecream95
davidlt has joined #panfrost
vstehle has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
anarsoul|c has joined #panfrost
davidlt has quit [Ping timeout: 268 seconds]
MoeIcenowy has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
MoeIcenowy has joined #panfrost
pH5 has joined #panfrost
JaceAlvejetti has quit [Ping timeout: 248 seconds]
JaceAlvejetti has joined #panfrost
megi has joined #panfrost
Space_Man has joined #panfrost
daniels has quit [Ping timeout: 240 seconds]
daniels has joined #panfrost
raster has joined #panfrost
stikonas has joined #panfrost
icecream95 has quit [Ping timeout: 268 seconds]
adjtm_ has joined #panfrost
adjtm has quit [Ping timeout: 265 seconds]
megi has quit [Ping timeout: 240 seconds]
<alyssa> Meanwhile, sysprof has regressed :|
* alyssa valgrinds
<alyssa> sysprof bug looks like.
<alyssa> Looks to be the same as https://gitlab.gnome.org/GNOME/sysprof/issues/23
<alyssa> Guess I'll really have to learn perf, eh?
raster has quit [Quit: Gettin' stinky!]
pak0st has joined #panfrost
buzzmarshall has joined #panfrost
megi has joined #panfrost
davidlt has joined #panfrost
karolherbst has joined #panfrost
pak0st has quit [Remote host closed the connection]
* alyssa tries to get proper proper mainline
<alyssa> So far most things seem to work.... except for panfrost :p
<alyssa> (Fixed)
* alyssa discovers perf :)
<anarsoul|c> alyssa: check out flamegraph in addition to perf
<alyssa> anarsoul|c: alright
* alyssa revisits scoreboarding now that the compute stuff is mildly sane
<alyssa> Doing it right is... obnoxiously difficult. But we'll get there.
pH5 has quit [Read error: Connection reset by peer]
pH5 has joined #panfrost
* alyssa has an algorithm sketched out, but u_blitter makes this needlessly complicated.
nerdboy has joined #panfrost
<alyssa> 7 files changed, 94 insertions(+), 513 deletions(-)
<alyssa> oh boy~
<alyssa> Not done but I digress
<alyssa> Making forward progress... I think..
<alyssa> Okay, stuff that doesn't wallpaper works (again)
yann has joined #panfrost
yann has quit [Ping timeout: 260 seconds]
<alyssa> MR opened. I'm quite pleased with how this turned out!
<alyssa> 9 files changed, 118 insertions(+), 527 deletions(-)
<alyssa> Lots of cleanup *and* a perf improvement? Count me in ;)
* alyssa tries ot learn `perf annotate`
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 265 seconds]
<alyssa> Evidently bitfields are slow.
davidlt_ has quit [Ping timeout: 265 seconds]
<urjaman> I'm not exactly surprised
<alyssa> Actually that wasn't the issue, it was another reading-from-GPU-memory issue. How many of theose will we keep catching, idk :c
<alyssa> -----Actually, it *is* still the issue. Because bitfields necessarily imply reading from memory.
<HdkR> x86 cheats with bitfields since it has some instructions for helping accessing them directly from memory
<HdkR> ARM has to do a loadstore shuffle
<alyssa> HdkR: The issue here isn't the shuffling around per se, it's that reading from GPU mapped memory is stupidly expensive
<HdkR> ah, uncached then?
<alyssa> Yeah, for now at least
<alyssa> When you manually inspect code like `foo.bar = 5;` it's like, cool, that's just a write, no reads here
<alyssa> but if you look at the assembly level, that has to load the uncached memory to do the dance... and that becomes slow.
<HdkR> yea, uncached ends up being pretty bad
<HdkR> usually ends up being worth keeping it cached then doing a dcache flush at the end of whatever you need to do
<alyssa> oh hey it's register spilling
<alyssa> ins't that cute
* alyssa shivers
<alyssa> (I must say - as far as learning perf goes, this has been extremely educative. Way more productive with this than I've ever been with any other profiler ever and this is day #1. <3)
<alyssa> On min/max index computation... I see 99% of the time spent loading indices, which absolutely supports the theory that this is a caching issue (so the proposed Gallium-based fix ought to work well)
<alyssa> Next to the memory access, the actual min'ing and max'ing is effectively free.
<alyssa> Same thing with the heavy access_tiled_image_generic usage in stk
<HdkR> vector min/max ends up being three cycles per op, compared to uncached memory accesses it is nothing :P
<alyssa> true!
<anarsoul> alyssa: btw do you see heavy access_tiled_image_generic usage in weston?
<anarsoul> I'm seeing it with lima for some reason :(
<alyssa> anarsoul: not sure, I can look in a bit
<alyssa> currently in gnome
<anarsoul> I assume it'd be the same
<HdkR> It's a sad day that we don't get gather loads on ARM until SVE
<alyssa> (Note: this is again the same bottleneck we see for WebGL on firefox. Unfortunately I don't think there's much to be done there.)
TheKit has quit [Read error: Connection reset by peer]
<alyssa> anarsoul: In weston, I'm seeing the top function be panfrost_store_tiled_image_yes.
<alyssa> It's just a lot of memory access anyway.
<anarsoul> alyssa: it's not the case with gnome-shell?
<alyssa> Apparently not? Dunno
<anarsoul> I see
<alyssa> anarsoul: At any rate I suspect something like https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3818 would help
<anarsoul> I see
<anarsoul> thanks for the pointer
<alyssa> If you do an impl for lima, please do check what the win is. But it might be pretty decent for glamor at least :)
<anarsoul> do you have any specific benchmark in mind?
<alyssa> anarsoul: the MR linked, and the issue linked from that, talk about ShmPutImage in x11perf, which seems a decent proxy for glamor perf
<alyssa> tomeu: dj,hgskkgyrs ack, I didn't realize you were *already* working on this in a branch, aaa I didn't mean to duplicate effort >..<
<alyssa> Not time wasted - I did need to learn perf - but still feel bad :|
<alyssa> Actually, it looks like most of it is complementary (so conflicts will be "fun" but not strictly duplicated work)
<daniels> anarsoul: that's weird, we don't ourselves do any readbacks unless you're taking screenshots, we don't use FBOs, and we only do software uploads when software clients give us changed buffers
<daniels> so it shouldn't be spending a ton of time doing that
<anarsoul> daniels: according to perf it's coming from gl-renderer, which (indirectly) calls _mesa_TexSubImage2D
<daniels> anarsoul: right, we do that to upload client content which has been given to us as a SHM buffer
<daniels> we only do it clipped to the changed region(s), but that means the TexSubImage2D path might not be tile-aligned
<daniels> alyssa changed Panfrost so that it would only do a partial fallback for unaligned regions (i.e. use the generic unaligned access routine for the sub-tile regions, use the fast routine for the others), rather than doing all the accesses using the generic helper if the region was unaligned
<daniels> hmm yeah, 2091d311c9d0 applied that fix to the shared code, so it should've helped Lima as well
NeuroScr has joined #panfrost
pH5 has quit [Quit: -_-]