<alyssa>
^ That's a good example of a failing test I'd like to address in this refactor
<alyssa>
There's gotta be something silly I'm missing.
fysa has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
<alyssa>
I don't understand why there are a zillon UBO ops
<alyssa>
Something more complex is going on, I think
<alyssa>
hm.
fysa has quit [Ping timeout: 268 seconds]
<HdkR>
alyssa: Are there a zillion?
<HdkR>
Exponential growth I guess
<alyssa>
Alas
<alyssa>
wait what.
<alyssa>
I can't tell if the hardware is terribly limited or if this code is just unoptimized.
<alyssa>
I have confirmed that the immediate offset and shift in ld_ubo_* are consistent with our expectations
<alyssa>
On an int4, the swizzle acts on 32-bit components
<alyssa>
and the mask too
<alyssa>
on char4, the swizzle acts on ????? and the mask on 32-bit
<HdkR>
<32bit UBOs have issues everwhere
<alyssa>
HdkR: Wait
<alyssa>
<32bit UBOs are a thing?
<HdkR>
With some of the GL 4.x things you can stuff smaller than 32bit values in to them yea
<alyssa>
I see.
<HdkR>
They've had fun restrictions through the years since they aren't "just memory"
<alyssa>
Is this in GLES..?
<HdkR>
I don't think the restrictions were entirely lifted in ES land
<HdkR>
Maybe Vulkan
<HdkR>
Latest GL they effectively act like SSBOs with API facing restrictions on things like size and alignment :P
<alyssa>
Weeee.
<HdkR>
(Also some hardware ends up generating less effective code on edge cases)
<alyssa>
ld_ubo_short4 has a swizzle that also acts like 32-bit
<alyssa>
so why has 3 different ops?
<alyssa>
Is this some sort of opt?
<alyssa>
Or is there some subtle semantic distinction?
<alyssa>
I could believe it being an opt, actually
<alyssa>
If it's faster to load 8 bytes instead of 16 bytes
<alyssa>
so the opcode acts as a worst case
<alyssa>
I don't know enough about the ld/st pipeline to know if that's really true.
<alyssa>
I also don't know why the hardware can't just look at the mask to figure that out for itself.
<alyssa>
I suppose the risk might be crossing pages/cache lines/vectors/whatever
<alyssa>
(something that the compiler can sort out but would be harder to guess from hw)
<alyssa>
So maybe that's it..?
megi has quit [Ping timeout: 240 seconds]
fysa has joined #panfrost
fysa has quit [Ping timeout: 240 seconds]
fysa has joined #panfrost
davidlt has joined #panfrost
fysa has quit [Read error: Connection reset by peer]
anarsoul_ has joined #panfrost
anarsoul_ has quit [Remote host closed the connection]
fysa has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
Thra11_ has quit [Ping timeout: 240 seconds]
chewitt has joined #panfrost
Thra11_ has joined #panfrost
vstehle has joined #panfrost
<tomeu>
robmur01: that's cool I think, we can bring back the per-gpu ifs
<tomeu>
robmur01: do you know what specifically is t720-specific and what is SFBD-specific?
fysa has joined #panfrost
Thra11_ has quit [Ping timeout: 276 seconds]
fysa has quit [Ping timeout: 265 seconds]
Thra11_ has joined #panfrost
fysa has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
<tomeu>
alyssa: any idea of why a hierarchy_mask of 0x41 makes glmark run smooth as opposed to 0xff?
<tomeu>
these are all the different hierarchy masks that the blob uses on t720 in a whole glmark2 run: http://paste.debian.net/1116154/
<tomeu>
0xfff is when there are no draws
<tomeu>
alyssa: one more data point: replacing 0xff for 0x41 adds around 30 deqp regressions
warpme_ has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
<warpme_>
tomeu: fyi: i gave try for https://gitlab.freedesktop.org/tomeu/mesa/commits/lava-ci-t720-III on H6 (t720) and results are...perfect!. I have on mythtv fully working GL UI, playback with OpenGL & yv12 renderers + all GL shader based deinterlacers (OneFileld. LinearBlend & Kernel). this is fantastic work!
<tomeu>
cool!
<warpme_>
how fare we are with mainlining it?
<tomeu>
warpme_: the commit on top is just a big hack for now, we need to understand better what is different in the tiler
<tomeu>
once that's done, there will be something like 200 deqp tests that would be good to fix
<tomeu>
most will be due to 1 or 2 issues
megi has joined #panfrost
<warpme_>
sure. btw: how this work will may benefit t820? (still non usable at all for me)
<tomeu>
hard to tell, we've been changing the behavior depending on whether sfbd or 720 or 760, but we don't have a good way of figuring out what the differences are due to
<tomeu>
so we need to test and find out
<tomeu>
I don't have any hw with 820, but it should be easy for someone to grep for gpu_id and requires_sfbd and flip the siwtches and observe what happens
<warpme_>
ok. thx!
<tomeu>
probably something that we believe applies to >760 applies only to >=860
<tomeu>
alyssa: why would 0x1C7 be used by glmark then? FBO?
<alyssa>
tomeu: As I outline in the notes, you want to use bigger tiles if your geometry is expected to be sparse.
<alyssa>
If you think about something like Weston
<alyssa>
You don't *want* to have a zillon little 16x16 tiles
<alyssa>
You pretty much will be content with massive tiles (like 512x512, say) since you're drawing so few triangles on such a large screen
<alyssa>
So then you use a mask like 0x186
<alyssa>
If you're drawing a single fullscreen quad and your quad is 1920x1080, then the blob will want to use 1024x1024 tiles ===> 0x1C7
<alyssa>
The reason you can't use 0x1C7 everywhere is that the framebuffer has to be at least 1024x1024 to have 1024x1024 tiles, hence the regressions
<alyssa>
(I think)
<alyssa>
And also that would trash your performance for the same reason 0xfff did
<alyssa>
(Actually, if the formula I gave doesn't work, substitute 0xfff for 0x0, not sure which one)
<alyssa>
It's not obvious to me that T720 does hierarchical tiling, in the "hierarchy" sense
<alyssa>
so much as just supporting rectangular tiles of a size the driver chooses
<alyssa>
(Whereas T860 literally has many overlapping tiles, afaik)
<tomeu>
alyssa: but just checked an the fb is 1920x1080 when 0x1c7 is used
maccraft123 has joined #panfrost
<alyssa>
tomeu: Exactly :)
<alyssa>
1920x1080 > 1024x1024
<alyssa>
so it's fine then
<alyssa>
But if the fb is 512x512, then you couldn't use 0x1c7 since that's smaller than 1024x1024
raster has joined #panfrost
<tomeu>
oh, finally got it
<tomeu>
sorry, doing too many things at once
<tomeu>
will go back to this later
<warpme_>
tomeu: You wrote @10:44AM: " could be good to have that in mind when one starts flipping swicthes :)" - this could be me :-) but I will need strong guidance....
<alyssa>
If someone could get a single good trace from T820, that would probably be enough to figure out most of the magic switches, I think
<alyssa>
good as in exercising all the different code paths, not just kmscube
<tomeu>
guess a glmark trace could be enough?
<warpme_>
alyssa: do you mean apitrace from app on t820?
<alyssa>
tomeu: glmark would be more than enough :)
<alyssa>
robmur01: Probably stopped working when I refactored all that code without ever testing on real SFBD hardware :)
<alyssa>
tomeu: If nr_cbufs == 0, you probably need a dummy cbuf
<tomeu>
it almost worked :p
<alyssa>
See the corresponding code in pan_mfbd.c
* robmur01
should really try building on something faster than the Juno itself
<alyssa>
tomeu: Looks like that commit is good, just don't do it on non-t720
<raster>
bbrezillon: managed to get to my rockpro ... patch does seem to work... i see no junk :)
<bbrezillon>
raster: great!
<bbrezillon>
raster: and thx for testing
davidlt has joined #panfrost
<raster>
though there some commits in mesa over the last month that lead to kernel faults in the kernell panfrost driver that eventually ykae my entire system down
<raster>
i backed out to 5d085ad052aac1f35cef7b60c0e6ecad65a6807b and it now doesnt end up with a kernel oops eventually
<raster>
i can run glmark to completion at least :)
<raster>
i have to bisect to figure out what the commit is that causes this so i can then have an idea of what it might be in the kernel
<robmur01>
raster: 5.4-rc kernel or something older?
<raster>
havent done that yet.
<raster>
robmur01: 5.3.0-rc6
<raster>
with patches to allow ddk to work
<robmur01>
ah, you may well be missing one or two significant kernel fixes (like the drm_sched one)
<raster>
it's possible for sure
<raster>
there are so many moving parts here
<raster>
like needed patches for kernel to work on hikey960
<raster>
needed patches for ddk to work
<raster>
need a patched mesa to work
<raster>
so much forkage :)
<robmur01>
there's still definitely *something* up with buffer lifetimes being slightly wonky, but only to the point of spewing errors rather than killing the kernel
<raster>
oh this kills my kernel
<raster>
a bunch of paging faults then eventually an oops
<raster>
yaaargh
<raster>
and now gstreamer is deadlocking somewhere
<raster>
i also have to patch my devicetree to work on my rockpro64
<raster>
(otherwise i cant even boot)
fysa has joined #panfrost
fysa has quit [Ping timeout: 276 seconds]
<tomeu>
raster: at least we have a path to reduce the forkage :)
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
chewitt has quit [Client Quit]
<tomeu>
damn, I have found a test case that fails with the DDK :/
<tomeu>
I think they got their tiler hierarchy mask wrong :p
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Client Quit]
shadeslayer has joined #panfrost
tomeu has quit [Quit: Coyote finally caught me]
tomeu has joined #panfrost
Thra11_ has quit [Ping timeout: 252 seconds]
fysa has joined #panfrost
Thra11_ has joined #panfrost
karolherbst has quit [Ping timeout: 276 seconds]
karolherbst has joined #panfrost
lfrb has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
rhyskidd has quit [Ping timeout: 265 seconds]
fysa has joined #panfrost
rhyskidd has joined #panfrost
fysa has quit [Remote host closed the connection]
fysa has joined #panfrost
Thra11_ has quit [Ping timeout: 276 seconds]
eballetbo[m] has quit [Remote host closed the connection]
thefloweringash has quit [Read error: Connection reset by peer]
EmilKarlson has quit [Read error: Connection reset by peer]
TheCycoONE has quit [Write error: Connection reset by peer]
Thra11_ has joined #panfrost
pH5 has joined #panfrost
fysa has quit [Remote host closed the connection]
<robmur01>
OK, apparently I'm mistaken - it looks like shadow could never have run on t620... I must be misremembering testing an FPGA GPU on the same board :(
maccraft123 has joined #panfrost
fysa has joined #panfrost
<raster>
robmur01: too many moving parts. trees, branches, patches, revisions, mes, kernel, ddk, different gpu variants with different errata...
maccraft123 has quit [Ping timeout: 276 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
maccraft123 has quit [Ping timeout: 246 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Remote host closed the connection]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 268 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Remote host closed the connection]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 252 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
fysa has quit [Remote host closed the connection]
davidlt has quit [Ping timeout: 268 seconds]
maccraft123 has joined #panfrost
NeuroScr has joined #panfrost
BenG83 has joined #panfrost
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Remote host closed the connection]