alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
karolherbst has quit [Ping timeout: 246 seconds]
karolherbst has joined #panfrost
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #panfrost
warpme_ has quit [Quit: Connection closed for inactivity]
stikonas has quit [Remote host closed the connection]
vstehle has quit [Ping timeout: 240 seconds]
* alyssa tries to refactor UBO reads to be more, well, uniform
* alyssa tries to figure out how to do this sanely.
<alyssa> There are a lot of weird corner cases with the current model
<alyssa> Ideally we would fix that and things would be overall happier
<alyssa> Guess really I just need to represent these constants
<alyssa> Okay, this is fine, tear out some code and put it back together again :)
<alyssa> Should fix some GLES3 stuff I think
<alyssa> dEQP-GLES3.functional.ubo.single_basic_type.shared.row_major_lowp_mat2_fragment
<alyssa> ^ That's a good example of a failing test I'd like to address in this refactor
<alyssa> There's gotta be something silly I'm missing.
fysa has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
<alyssa> I don't understand why there are a zillon UBO ops
<alyssa> Something more complex is going on, I think
<alyssa> hm.
fysa has quit [Ping timeout: 268 seconds]
<HdkR> alyssa: Are there a zillion?
<HdkR> Exponential growth I guess
<alyssa> Alas
<alyssa> wait what.
<alyssa> I can't tell if the hardware is terribly limited or if this code is just unoptimized.
<alyssa> I have confirmed that the immediate offset and shift in ld_ubo_* are consistent with our expectations
<alyssa> On an int4, the swizzle acts on 32-bit components
<alyssa> and the mask too
<alyssa> on char4, the swizzle acts on ????? and the mask on 32-bit
<HdkR> <32bit UBOs have issues everwhere
<alyssa> HdkR: Wait
<alyssa> <32bit UBOs are a thing?
<HdkR> With some of the GL 4.x things you can stuff smaller than 32bit values in to them yea
<alyssa> I see.
<HdkR> They've had fun restrictions through the years since they aren't "just memory"
<alyssa> Is this in GLES..?
<HdkR> I don't think the restrictions were entirely lifted in ES land
<HdkR> Maybe Vulkan
<HdkR> Latest GL they effectively act like SSBOs with API facing restrictions on things like size and alignment :P
<alyssa> Weeee.
<HdkR> (Also some hardware ends up generating less effective code on edge cases)
<alyssa> ld_ubo_short4 has a swizzle that also acts like 32-bit
<alyssa> so why has 3 different ops?
<alyssa> Is this some sort of opt?
<alyssa> Or is there some subtle semantic distinction?
<alyssa> I could believe it being an opt, actually
<alyssa> If it's faster to load 8 bytes instead of 16 bytes
<alyssa> so the opcode acts as a worst case
<alyssa> I don't know enough about the ld/st pipeline to know if that's really true.
<alyssa> I also don't know why the hardware can't just look at the mask to figure that out for itself.
<alyssa> I suppose the risk might be crossing pages/cache lines/vectors/whatever
<alyssa> (something that the compiler can sort out but would be harder to guess from hw)
<alyssa> So maybe that's it..?
megi has quit [Ping timeout: 240 seconds]
fysa has joined #panfrost
fysa has quit [Ping timeout: 240 seconds]
fysa has joined #panfrost
davidlt has joined #panfrost
fysa has quit [Read error: Connection reset by peer]
anarsoul_ has joined #panfrost
anarsoul_ has quit [Remote host closed the connection]
fysa has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
Thra11_ has quit [Ping timeout: 240 seconds]
chewitt has joined #panfrost
Thra11_ has joined #panfrost
vstehle has joined #panfrost
<tomeu> robmur01: that's cool I think, we can bring back the per-gpu ifs
<tomeu> robmur01: do you know what specifically is t720-specific and what is SFBD-specific?
fysa has joined #panfrost
Thra11_ has quit [Ping timeout: 276 seconds]
fysa has quit [Ping timeout: 265 seconds]
Thra11_ has joined #panfrost
fysa has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
<tomeu> alyssa: any idea of why a hierarchy_mask of 0x41 makes glmark run smooth as opposed to 0xff?
<tomeu> these are all the different hierarchy masks that the blob uses on t720 in a whole glmark2 run: http://paste.debian.net/1116154/
<tomeu> 0xfff is when there are no draws
<tomeu> alyssa: one more data point: replacing 0xff for 0x41 adds around 30 deqp regressions
warpme_ has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
<warpme_> tomeu: fyi: i gave try for https://gitlab.freedesktop.org/tomeu/mesa/commits/lava-ci-t720-III on H6 (t720) and results are...perfect!. I have on mythtv fully working GL UI, playback with OpenGL & yv12 renderers + all GL shader based deinterlacers (OneFileld. LinearBlend & Kernel). this is fantastic work!
<tomeu> cool!
<warpme_> how fare we are with mainlining it?
<tomeu> warpme_: the commit on top is just a big hack for now, we need to understand better what is different in the tiler
<tomeu> once that's done, there will be something like 200 deqp tests that would be good to fix
<tomeu> most will be due to 1 or 2 issues
megi has joined #panfrost
<warpme_> sure. btw: how this work will may benefit t820? (still non usable at all for me)
<tomeu> hard to tell, we've been changing the behavior depending on whether sfbd or 720 or 760, but we don't have a good way of figuring out what the differences are due to
<tomeu> so we need to test and find out
<tomeu> I don't have any hw with 820, but it should be easy for someone to grep for gpu_id and requires_sfbd and flip the siwtches and observe what happens
<warpme_> ok. thx!
<tomeu> probably something that we believe applies to >760 applies only to >=860
<tomeu> hmm
<tomeu> so looks like the t820 may relate more closely to the t720 than to the t760
<tomeu> warpme_: could be good to have that in mind when one starts flipping swicthes :)
<narmstrong> seems tXYZ encodes X family, Y features Z zero for marketing ?
fysa has joined #panfrost
raster has joined #panfrost
fysa has quit [Ping timeout: 276 seconds]
<bbrezillon> raster: when you find some time => https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2729
<raster> bbrezillon: hey hey
<bbrezillon> and I figured where those implicit flushes were coming from (CPU access to the render target)
<raster> i probably won't today
<bbrezillon> np
<raster> i'm off at conference these 2 days
<raster> nowhere near my rockpro64 :|
<bbrezillon> don't worry, it can wait 2 more days
<raster> cpu access to render target?
<raster> why would mesa need to fallback to the cpu?
<bbrezillon> I didn't dig that far
<raster> that smells of a sw fallback path that probably shouldn't happen
<raster> ... :|
<bbrezillon> probably
<bbrezillon> but we need to handle implicit flushes anyway, just in case
<raster> oh sure
<raster> this needs to not break
<raster> but .. it points out a whole new "bug" too :)
<bbrezillon> and if we manage to get rid of those CPU accesses, that's even better
<bbrezillon> bug?
<raster> well the cpu accesses i would consider a performance bug
<bbrezillon> a sub-optimal way of doing things, certainly
<bbrezillon> but there's no regression here, since I suspect we're relying on the SW fallback from the beginning :P
<raster> bad that we are though :)
<bbrezillon> nothing that can't be fixed
<bbrezillon> s/fixed/addressed/
<bbrezillon> but that's for another week, have to get back to !panfrost stuff now ;)
<raster> indeed
<raster> just was an interesting code path and thus why it probably was odd/unusual to be triggered
raster has quit [Read error: Connection reset by peer]
raster has joined #panfrost
<tomeu> narmstrong: that makes sense
fysa has joined #panfrost
fysa has quit [Ping timeout: 276 seconds]
raster has quit [Quit: Gettin' stinky!]
davidlt has quit [Ping timeout: 240 seconds]
fysa has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
Elpaulo has quit [Quit: Elpaulo]
BenG83 has joined #panfrost
BenG83 has quit [Remote host closed the connection]
maccraft123 has joined #panfrost
karolherbst has quit [*.net *.split]
AreaScout_ has quit [*.net *.split]
robertfoss has quit [*.net *.split]
bbrezillon has quit [*.net *.split]
austriancoder has quit [*.net *.split]
Green has quit [*.net *.split]
mani_s has quit [*.net *.split]
fysa has joined #panfrost
karolherbst has joined #panfrost
AreaScout_ has joined #panfrost
robertfoss has joined #panfrost
bbrezillon has joined #panfrost
Green has joined #panfrost
austriancoder has joined #panfrost
mani_s has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
<alyssa> tomeu: the 0x41 thing is baffling
<alyssa> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
<urjaman> channeling @infinite_scream? :P
<alyssa> urjaman: 0x41 0x41 0x41....
<alyssa> or probing for a buffer overflow, up to you
<alyssa> Ooooooooooh.
<alyssa> Okay
<alyssa> Yeah
<alyssa> Silly ducks.
<alyssa> tomeu: Hint: look at the bit pattern
<urjaman> honk.
<alyssa> Or just wait a minute and I'll copypaste over my notes
<alyssa> okay maybe slightly more complicated
<alyssa> but this is still important
maccraft123 has quit [Ping timeout: 252 seconds]
<alyssa> tomeu: What happens if you use 0x1C7 everywhere?
<tomeu> alyssa: stuff broke
<alyssa> What stuff, specifically?
<tomeu> don't remember, can test after this meeting(s)
<alyssa> OK
<alyssa> Oh hrm.
maccraft123 has joined #panfrost
<alyssa> Oh, I see what would've broken, yes.
<alyssa> Alright.
<tomeu> alyssa: ok, I'm done
<tomeu> alyssa: have 27 minutes free if you want me to test anything
<alyssa> Merry Christmas.
<alyssa> Read in full if you want to see how witchcraft happens, skip to the end for a formula that should get good perf without regressions.
<tomeu> alyssa: btw, in https://docplayer.net/21423359-Midgard-gpu-architecture-october-2014.html , the t760 is said to have an "Advanced Tiling Unit", but the 720 don't
* alyssa eyes
maccraft123 has quit [Quit: WeeChat 2.6]
<tomeu> alyssa: why would 0x1C7 be used by glmark then? FBO?
<alyssa> tomeu: As I outline in the notes, you want to use bigger tiles if your geometry is expected to be sparse.
<alyssa> If you think about something like Weston
<alyssa> You don't *want* to have a zillon little 16x16 tiles
<alyssa> You pretty much will be content with massive tiles (like 512x512, say) since you're drawing so few triangles on such a large screen
<alyssa> So then you use a mask like 0x186
<alyssa> If you're drawing a single fullscreen quad and your quad is 1920x1080, then the blob will want to use 1024x1024 tiles ===> 0x1C7
<alyssa> The reason you can't use 0x1C7 everywhere is that the framebuffer has to be at least 1024x1024 to have 1024x1024 tiles, hence the regressions
<alyssa> (I think)
<alyssa> And also that would trash your performance for the same reason 0xfff did
<alyssa> (Actually, if the formula I gave doesn't work, substitute 0xfff for 0x0, not sure which one)
<alyssa> It's not obvious to me that T720 does hierarchical tiling, in the "hierarchy" sense
<alyssa> so much as just supporting rectangular tiles of a size the driver chooses
<alyssa> (Whereas T860 literally has many overlapping tiles, afaik)
<tomeu> alyssa: but just checked an the fb is 1920x1080 when 0x1c7 is used
maccraft123 has joined #panfrost
<alyssa> tomeu: Exactly :)
<alyssa> 1920x1080 > 1024x1024
<alyssa> so it's fine then
<alyssa> But if the fb is 512x512, then you couldn't use 0x1c7 since that's smaller than 1024x1024
raster has joined #panfrost
<tomeu> oh, finally got it
<tomeu> sorry, doing too many things at once
<tomeu> will go back to this later
<warpme_> tomeu: You wrote @10:44AM: " could be good to have that in mind when one starts flipping swicthes :)" - this could be me :-) but I will need strong guidance....
<alyssa> If someone could get a single good trace from T820, that would probably be enough to figure out most of the magic switches, I think
<alyssa> good as in exercising all the different code paths, not just kmscube
<tomeu> guess a glmark trace could be enough?
<warpme_> alyssa: do you mean apitrace from app on t820?
<alyssa> tomeu: glmark would be more than enough :)
<alyssa> warpme_: pantrace
<alyssa> Dunno how I missed that =p
<tomeu> yeah, I started debugging the wrong test :p
<alyssa> Hey :p
<alyssa> Fixed tests are fixed tests
<warpme_> alyssa: ok. my enviroment is cross-build appliance OS. Where are sources of pantrace? Also is usage model similar to apitrace?
<alyssa> warpme_: 1) Kind of a mess but "within mesa and an out-of-tree repo simultaneously ..." and 2) fraid not
<alyssa> (#1 is because the code isn't in mesa but links with some code in mesa)
<tomeu> a meson arg for specifying mesa's build dir could improve things
<tomeu> alyssa: btw, what can be done about MRT on SFBD?
<tomeu> [shadow] <default>:glmark2-es2-drm: ../mesa/src/gallium/drivers/panfrost/pan_sfbd.c:210: panfrost_sfbd_fragment: Assertion `batch->key.nr_cbufs == 1' failed.
<robmur01> FWIW I'm sure that shadow on t620 did once work; I don't remember exactly how long ago it stopped
<tomeu> well, that's very interesting
<tomeu> robmur01: did it look fine?
<tomeu> or could the horse shadow just be missing?
<tomeu> maybe I should just comment that assert out and try it :p
<robmur01> hmm, I honestly can't remember whether the shadow actually rendered properly - I might try winding back to 19.1 or so to see
<alyssa> tomeu: glmark doesn't use MRT
<alyssa> -bshadow uses a depth-only render target
<alyssa> i.e. nr_cbufs == 0
<alyssa> SFBD supports that just fine
<tomeu> oh, I see
<robmur01> ha, so s/==/<=/ ?
<alyssa> something like that
<alyssa> robmur01: Probably stopped working when I refactored all that code without ever testing on real SFBD hardware :)
<alyssa> tomeu: If nr_cbufs == 0, you probably need a dummy cbuf
<tomeu> it almost worked :p
<alyssa> See the corresponding code in pan_mfbd.c
* robmur01 should really try building on something faster than the Juno itself
<alyssa> tomeu: Looks like that commit is good, just don't do it on non-t720
<raster> bbrezillon: managed to get to my rockpro ... patch does seem to work... i see no junk :)
<bbrezillon> raster: great!
<bbrezillon> raster: and thx for testing
davidlt has joined #panfrost
<raster> though there some commits in mesa over the last month that lead to kernel faults in the kernell panfrost driver that eventually ykae my entire system down
<raster> i backed out to 5d085ad052aac1f35cef7b60c0e6ecad65a6807b and it now doesnt end up with a kernel oops eventually
<raster> i can run glmark to completion at least :)
<raster> i have to bisect to figure out what the commit is that causes this so i can then have an idea of what it might be in the kernel
<robmur01> raster: 5.4-rc kernel or something older?
<raster> havent done that yet.
<raster> robmur01: 5.3.0-rc6
<raster> with patches to allow ddk to work
<robmur01> ah, you may well be missing one or two significant kernel fixes (like the drm_sched one)
<raster> it's possible for sure
<raster> there are so many moving parts here
<raster> like needed patches for kernel to work on hikey960
<raster> needed patches for ddk to work
<raster> need a patched mesa to work
<raster> so much forkage :)
<robmur01> there's still definitely *something* up with buffer lifetimes being slightly wonky, but only to the point of spewing errors rather than killing the kernel
<raster> oh this kills my kernel
<raster> a bunch of paging faults then eventually an oops
<raster> yaaargh
<raster> and now gstreamer is deadlocking somewhere
<raster> i also have to patch my devicetree to work on my rockpro64
<raster> (otherwise i cant even boot)
fysa has joined #panfrost
fysa has quit [Ping timeout: 276 seconds]
<tomeu> raster: at least we have a path to reduce the forkage :)
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
chewitt has quit [Client Quit]
<tomeu> damn, I have found a test case that fails with the DDK :/
<tomeu> I think they got their tiler hierarchy mask wrong :p
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Client Quit]
shadeslayer has joined #panfrost
tomeu has quit [Quit: Coyote finally caught me]
tomeu has joined #panfrost
Thra11_ has quit [Ping timeout: 252 seconds]
fysa has joined #panfrost
Thra11_ has joined #panfrost
karolherbst has quit [Ping timeout: 276 seconds]
karolherbst has joined #panfrost
lfrb has joined #panfrost
fysa has quit [Ping timeout: 265 seconds]
rhyskidd has quit [Ping timeout: 265 seconds]
fysa has joined #panfrost
rhyskidd has joined #panfrost
fysa has quit [Remote host closed the connection]
fysa has joined #panfrost
Thra11_ has quit [Ping timeout: 276 seconds]
eballetbo[m] has quit [Remote host closed the connection]
thefloweringash has quit [Read error: Connection reset by peer]
EmilKarlson has quit [Read error: Connection reset by peer]
TheCycoONE has quit [Write error: Connection reset by peer]
Thra11_ has joined #panfrost
pH5 has joined #panfrost
fysa has quit [Remote host closed the connection]
<robmur01> OK, apparently I'm mistaken - it looks like shadow could never have run on t620... I must be misremembering testing an FPGA GPU on the same board :(
maccraft123 has joined #panfrost
fysa has joined #panfrost
<raster> robmur01: too many moving parts. trees, branches, patches, revisions, mes, kernel, ddk, different gpu variants with different errata...
maccraft123 has quit [Ping timeout: 276 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
maccraft123 has quit [Ping timeout: 246 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Remote host closed the connection]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 268 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Remote host closed the connection]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 252 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Ping timeout: 245 seconds]
fysa has quit [Remote host closed the connection]
davidlt has quit [Ping timeout: 268 seconds]
maccraft123 has joined #panfrost
NeuroScr has joined #panfrost
BenG83 has joined #panfrost
maccraft123 has quit [Ping timeout: 240 seconds]
maccraft123 has joined #panfrost
maccraft123 has quit [Remote host closed the connection]
maccraft123 has joined #panfrost
maccraft123 has quit [Quit: WeeChat 2.6]
pH5 has quit [Remote host closed the connection]