#panfrost on 2019-11-14 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:07 karolherbst has quit [Ping timeout: 246 seconds]

00:10 karolherbst has joined #panfrost

00:31 karolherbst has quit [Remote host closed the connection]

00:32 karolherbst has joined #panfrost

00:37 warpme_ has quit [Quit: Connection closed for inactivity]

01:50 stikonas has quit [Remote host closed the connection]

02:15 vstehle has quit [Ping timeout: 240 seconds]

02:41 * alyssa tries to refactor UBO reads to be more, well, uniform

02:45 * alyssa tries to figure out how to do this sanely.

02:45 <alyssa> There are a lot of weird corner cases with the current model

02:46 <alyssa> Ideally we would fix that and things would be overall happier

02:46 <alyssa> Guess really I just need to represent these constants

02:48 <alyssa> Okay, this is fine, tear out some code and put it back together again :)

02:48 <alyssa> Should fix some GLES3 stuff I think

02:50 <alyssa> dEQP-GLES3.functional.ubo.single_basic_type.shared.row_major_lowp_mat2_fragment

02:50 <alyssa> ^ That's a good example of a failing test I'd like to address in this refactor

02:53 <alyssa> There's gotta be something silly I'm missing.

02:55 fysa has joined #panfrost

02:55 NeuroScr has quit [Quit: NeuroScr]

02:56 <alyssa> I don't understand why there are a zillon UBO ops

02:58 <alyssa> Something more complex is going on, I think

03:00 <alyssa> hm.

03:02 fysa has quit [Ping timeout: 268 seconds]

03:12 <HdkR> alyssa: Are there a zillion?

03:12 <HdkR> Exponential growth I guess

03:12 <alyssa> Alas

03:15 <alyssa> wait what.

03:16 <alyssa> I can't tell if the hardware is terribly limited or if this code is just unoptimized.

03:21 <alyssa> I have confirmed that the immediate offset and shift in ld_ubo_* are consistent with our expectations

03:23 <alyssa> On an int4, the swizzle acts on 32-bit components

03:23 <alyssa> and the mask too

03:23 <alyssa> on char4, the swizzle acts on ????? and the mask on 32-bit

03:24 <HdkR> <32bit UBOs have issues everwhere

03:24 <alyssa> HdkR: Wait

03:24 <alyssa> <32bit UBOs are a thing?

03:24 <HdkR> With some of the GL 4.x things you can stuff smaller than 32bit values in to them yea

03:25 <alyssa> I see.

03:26 <HdkR> They've had fun restrictions through the years since they aren't "just memory"

03:26 <alyssa> Is this in GLES..?

03:28 <HdkR> I don't think the restrictions were entirely lifted in ES land

03:28 <HdkR> Maybe Vulkan

03:28 <HdkR> Latest GL they effectively act like SSBOs with API facing restrictions on things like size and alignment :P

03:29 <alyssa> Weeee.

03:29 <HdkR> (Also some hardware ends up generating less effective code on edge cases)

03:29 <alyssa> ld_ubo_short4 has a swizzle that also acts like 32-bit

03:29 <alyssa> so why has 3 different ops?

03:31 <alyssa> Is this some sort of opt?

03:31 <alyssa> Or is there some subtle semantic distinction?

03:32 <alyssa> I could believe it being an opt, actually

03:32 <alyssa> If it's faster to load 8 bytes instead of 16 bytes

03:32 <alyssa> so the opcode acts as a worst case

03:33 <alyssa> I don't know enough about the ld/st pipeline to know if that's really true.

03:33 <alyssa> I also don't know why the hardware can't just look at the mask to figure that out for itself.

03:40 <alyssa> I suppose the risk might be crossing pages/cache lines/vectors/whatever

03:40 <alyssa> (something that the compiler can sort out but would be harder to guess from hw)

03:40 <alyssa> So maybe that's it..?

03:47 megi has quit [Ping timeout: 240 seconds]

03:58 fysa has joined #panfrost

04:06 fysa has quit [Ping timeout: 240 seconds]

04:26 fysa has joined #panfrost

04:27 davidlt has joined #panfrost

04:28 fysa has quit [Read error: Connection reset by peer]

04:33 anarsoul_ has joined #panfrost

04:33 anarsoul_ has quit [Remote host closed the connection]

05:15 fysa has joined #panfrost

05:24 fysa has quit [Ping timeout: 265 seconds]

05:33 Thra11_ has quit [Ping timeout: 240 seconds]

05:35 chewitt has joined #panfrost

05:46 Thra11_ has joined #panfrost

06:00 vstehle has joined #panfrost

06:01 <tomeu> robmur01: that's cool I think, we can bring back the per-gpu ifs

06:08 <tomeu> robmur01: do you know what specifically is t720-specific and what is SFBD-specific?

06:20 fysa has joined #panfrost

06:23 Thra11_ has quit [Ping timeout: 276 seconds]

06:29 fysa has quit [Ping timeout: 265 seconds]

06:37 Thra11_ has joined #panfrost

07:26 fysa has joined #panfrost

07:34 fysa has quit [Ping timeout: 265 seconds]

07:45 <tomeu> alyssa: any idea of why a hierarchy_mask of 0x41 makes glmark run smooth as opposed to 0xff?

07:46 <tomeu> these are all the different hierarchy masks that the blob uses on t720 in a whole glmark2 run: http://paste.debian.net/1116154/

07:46 <tomeu> 0xfff is when there are no draws

08:02 <tomeu> alyssa: one more data point: replacing 0xff for 0x41 adds around 30 deqp regressions

08:26 warpme_ has joined #panfrost

08:28 stikonas has joined #panfrost

08:37 stikonas has quit [Remote host closed the connection]

08:38 <warpme_> tomeu: fyi: i gave try for https://gitlab.freedesktop.org/tomeu/mesa/commits/lava-ci-t720-III on H6 (t720) and results are...perfect!. I have on mythtv fully working GL UI, playback with OpenGL & yv12 renderers + all GL shader based deinterlacers (OneFileld. LinearBlend & Kernel). this is fantastic work!

08:39 <tomeu> cool!

08:39 <warpme_> how fare we are with mainlining it?

08:39 <tomeu> warpme_: the commit on top is just a big hack for now, we need to understand better what is different in the tiler

08:40 <tomeu> once that's done, there will be something like 200 deqp tests that would be good to fix

08:40 <tomeu> most will be due to 1 or 2 issues

08:40 megi has joined #panfrost

08:41 <warpme_> sure. btw: how this work will may benefit t820? (still non usable at all for me)

08:42 <tomeu> hard to tell, we've been changing the behavior depending on whether sfbd or 720 or 760, but we don't have a good way of figuring out what the differences are due to

08:42 <tomeu> so we need to test and find out

08:42 <tomeu> I don't have any hw with 820, but it should be easy for someone to grep for gpu_id and requires_sfbd and flip the siwtches and observe what happens

08:44 <warpme_> ok. thx!

08:44 <tomeu> probably something that we believe applies to >760 applies only to >=860

08:55 <tomeu> hmm

08:55 <tomeu> just realized this: https://developer.arm.com/support/training/arm-mali-graphics-mali-t720-t820-and-t830-hardware-design

08:56 <tomeu> so looks like the t820 may relate more closely to the t720 than to the t760

08:56 <tomeu> warpme_: could be good to have that in mind when one starts flipping swicthes :)

09:31 <narmstrong> seems tXYZ encodes X family, Y features Z zero for marketing ?

09:36 fysa has joined #panfrost

09:42 raster has joined #panfrost

09:45 fysa has quit [Ping timeout: 276 seconds]

10:03 <bbrezillon> raster: when you find some time => https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2729

10:03 <raster> bbrezillon: hey hey

10:04 <bbrezillon> and I figured where those implicit flushes were coming from (CPU access to the render target)

10:04 <raster> i probably won't today

10:04 <bbrezillon> np

10:04 <raster> i'm off at conference these 2 days

10:05 <raster> nowhere near my rockpro64 :|

10:05 <bbrezillon> don't worry, it can wait 2 more days

10:05 <raster> cpu access to render target?

10:05 <raster> why would mesa need to fallback to the cpu?

10:05 <bbrezillon> I didn't dig that far

10:06 <raster> that smells of a sw fallback path that probably shouldn't happen

10:06 <raster> ... :|

10:06 <bbrezillon> probably

10:06 <bbrezillon> but we need to handle implicit flushes anyway, just in case

10:07 <raster> oh sure

10:07 <raster> this needs to not break

10:07 <raster> but .. it points out a whole new "bug" too :)

10:08 <bbrezillon> and if we manage to get rid of those CPU accesses, that's even better

10:08 <bbrezillon> bug?

10:08 <raster> well the cpu accesses i would consider a performance bug

10:08 <bbrezillon> a sub-optimal way of doing things, certainly

10:09 <bbrezillon> but there's no regression here, since I suspect we're relying on the SW fallback from the beginning :P

10:09 <raster> bad that we are though :)

10:10 <bbrezillon> nothing that can't be fixed

10:10 <bbrezillon> s/fixed/addressed/

10:10 <bbrezillon> but that's for another week, have to get back to !panfrost stuff now ;)

10:12 <raster> indeed

10:13 <raster> just was an interesting code path and thus why it probably was odd/unusual to be triggered

10:19 raster has quit [Read error: Connection reset by peer]

10:20 raster has joined #panfrost

10:22 <tomeu> narmstrong: that makes sense

10:41 fysa has joined #panfrost

10:50 fysa has quit [Ping timeout: 276 seconds]

10:53 raster has quit [Quit: Gettin' stinky!]

11:12 davidlt has quit [Ping timeout: 240 seconds]

11:46 fysa has joined #panfrost

11:55 fysa has quit [Ping timeout: 265 seconds]

11:58 Elpaulo has quit [Quit: Elpaulo]

12:05 BenG83 has joined #panfrost

12:18 BenG83 has quit [Remote host closed the connection]

12:30 maccraft123 has joined #panfrost

12:51 karolherbst has quit [*.net *.split]

12:51 AreaScout_ has quit [*.net *.split]

12:51 robertfoss has quit [*.net *.split]

12:51 bbrezillon has quit [*.net *.split]

12:51 austriancoder has quit [*.net *.split]

12:51 Green has quit [*.net *.split]

12:51 mani_s has quit [*.net *.split]

12:51 fysa has joined #panfrost

12:53 karolherbst has joined #panfrost

12:53 AreaScout_ has joined #panfrost

12:53 robertfoss has joined #panfrost

12:53 bbrezillon has joined #panfrost

12:53 Green has joined #panfrost

12:53 austriancoder has joined #panfrost

12:53 mani_s has joined #panfrost

13:00 fysa has quit [Ping timeout: 265 seconds]

13:06 <alyssa> tomeu: the 0x41 thing is baffling

13:06 <alyssa> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

13:09 <urjaman> channeling @infinite_scream? :P

13:10 <alyssa> urjaman: 0x41 0x41 0x41....

13:10 <alyssa> or probing for a buffer overflow, up to you

13:11 <alyssa> Ooooooooooh.

13:11 <alyssa> Okay

13:11 <alyssa> Yeah

13:11 <alyssa> Silly ducks.

13:11 <alyssa> tomeu: Hint: look at the bit pattern

13:11 <urjaman> honk.

13:12 <alyssa> Or just wait a minute and I'll copypaste over my notes

13:17 <alyssa> okay maybe slightly more complicated

13:17 <alyssa> but this is still important

13:18 maccraft123 has quit [Ping timeout: 252 seconds]

13:18 <alyssa> tomeu: What happens if you use 0x1C7 everywhere?

13:19 <tomeu> alyssa: stuff broke

13:19 <alyssa> What stuff, specifically?

13:19 <tomeu> don't remember, can test after this meeting(s)

13:19 <alyssa> OK

13:20 <alyssa> Oh hrm.

13:23 maccraft123 has joined #panfrost

13:25 <alyssa> Oh, I see what would've broken, yes.

13:31 <alyssa> Alright.

13:33 <tomeu> alyssa: ok, I'm done

13:33 <tomeu> alyssa: have 27 minutes free if you want me to test anything

13:35 <alyssa> tomeu: https://people.collabora.com/~alyssa/t720-hierarchy-mask.txt

13:35 <alyssa> Merry Christmas.

13:36 <alyssa> Read in full if you want to see how witchcraft happens, skip to the end for a formula that should get good perf without regressions.

13:36 <tomeu> alyssa: btw, in https://docplayer.net/21423359-Midgard-gpu-architecture-october-2014.html , the t760 is said to have an "Advanced Tiling Unit", but the 720 don't

13:37 * alyssa eyes

13:39 maccraft123 has quit [Quit: WeeChat 2.6]

13:40 <tomeu> alyssa: why would 0x1C7 be used by glmark then? FBO?

13:41 <alyssa> tomeu: As I outline in the notes, you want to use bigger tiles if your geometry is expected to be sparse.

13:41 <alyssa> If you think about something like Weston

13:41 <alyssa> You don't *want* to have a zillon little 16x16 tiles

13:42 <alyssa> You pretty much will be content with massive tiles (like 512x512, say) since you're drawing so few triangles on such a large screen

13:42 <alyssa> So then you use a mask like 0x186

13:43 <alyssa> If you're drawing a single fullscreen quad and your quad is 1920x1080, then the blob will want to use 1024x1024 tiles ===> 0x1C7

13:43 <alyssa> The reason you can't use 0x1C7 everywhere is that the framebuffer has to be at least 1024x1024 to have 1024x1024 tiles, hence the regressions

13:43 <alyssa> (I think)

13:44 <alyssa> And also that would trash your performance for the same reason 0xfff did

13:44 <alyssa> (Actually, if the formula I gave doesn't work, substitute 0xfff for 0x0, not sure which one)

13:45 <alyssa> It's not obvious to me that T720 does hierarchical tiling, in the "hierarchy" sense

13:45 <alyssa> so much as just supporting rectangular tiles of a size the driver chooses

13:45 <alyssa> (Whereas T860 literally has many overlapping tiles, afaik)

13:49 <tomeu> alyssa: but just checked an the fb is 1920x1080 when 0x1c7 is used

13:51 maccraft123 has joined #panfrost

13:51 <alyssa> tomeu: Exactly :)

13:51 <alyssa> 1920x1080 > 1024x1024

13:51 <alyssa> so it's fine then

13:52 <alyssa> But if the fb is 512x512, then you couldn't use 0x1c7 since that's smaller than 1024x1024

13:53 raster has joined #panfrost

13:53 <tomeu> oh, finally got it

13:53 <tomeu> sorry, doing too many things at once

13:54 <tomeu> will go back to this later

13:55 <warpme_> tomeu: You wrote @10:44AM: " could be good to have that in mind when one starts flipping swicthes :)" - this could be me :-) but I will need strong guidance....

13:59 <alyssa> If someone could get a single good trace from T820, that would probably be enough to figure out most of the magic switches, I think

14:00 <alyssa> good as in exercising all the different code paths, not just kmscube

14:00 <tomeu> guess a glmark trace could be enough?

14:00 <warpme_> alyssa: do you mean apitrace from app on t820?

14:02 <alyssa> tomeu: glmark would be more than enough :)

14:02 <alyssa> warpme_: pantrace

14:03 <alyssa> tomeu: https://gitlab.freedesktop.org/tomeu/mesa/commit/163d2cbe6e3eba6b6f65e7e31a7a6d6e76fd3913 .......Nice :p

14:03 <alyssa> Dunno how I missed that =p

14:03 <tomeu> yeah, I started debugging the wrong test :p

14:04 <alyssa> Hey :p

14:04 <alyssa> Fixed tests are fixed tests

14:04 <warpme_> alyssa: ok. my enviroment is cross-build appliance OS. Where are sources of pantrace? Also is usage model similar to apitrace?

14:05 <alyssa> warpme_: 1) Kind of a mess but "within mesa and an out-of-tree repo simultaneously ..." and 2) fraid not

14:06 <alyssa> (#1 is because the code isn't in mesa but links with some code in mesa)

14:13 <tomeu> a meson arg for specifying mesa's build dir could improve things

14:32 <tomeu> alyssa: btw, what can be done about MRT on SFBD?

14:32 <tomeu> [shadow] <default>:glmark2-es2-drm: ../mesa/src/gallium/drivers/panfrost/pan_sfbd.c:210: panfrost_sfbd_fragment: Assertion `batch->key.nr_cbufs == 1' failed.

14:34 <robmur01> FWIW I'm sure that shadow on t620 did once work; I don't remember exactly how long ago it stopped

14:37 <tomeu> well, that's very interesting

14:37 <tomeu> robmur01: did it look fine?

14:37 <tomeu> or could the horse shadow just be missing?

14:37 <tomeu> maybe I should just comment that assert out and try it :p

14:41 <robmur01> hmm, I honestly can't remember whether the shadow actually rendered properly - I might try winding back to 19.1 or so to see

14:43 <tomeu> alyssa: I tihnk this might work: https://gitlab.freedesktop.org/tomeu/mesa/commit/12e848d98e0d67041467f00eb1435af8d9011d7f

14:49 <alyssa> tomeu: glmark doesn't use MRT

14:49 <alyssa> -bshadow uses a depth-only render target

14:49 <alyssa> i.e. nr_cbufs == 0

14:49 <alyssa> SFBD supports that just fine

14:49 <tomeu> oh, I see

14:49 <robmur01> ha, so s/==/<=/ ?

14:50 <alyssa> something like that

14:50 <alyssa> robmur01: Probably stopped working when I refactored all that code without ever testing on real SFBD hardware :)

14:50 <alyssa> tomeu: If nr_cbufs == 0, you probably need a dummy cbuf

14:50 <tomeu> it almost worked :p

14:50 <alyssa> See the corresponding code in pan_mfbd.c

14:51 * robmur01 should really try building on something faster than the Juno itself

14:52 <alyssa> tomeu: Looks like that commit is good, just don't do it on non-t720

14:52 <raster> bbrezillon: managed to get to my rockpro ... patch does seem to work... i see no junk :)

14:53 <bbrezillon> raster: great!

14:53 <bbrezillon> raster: and thx for testing

14:53 davidlt has joined #panfrost

14:54 <raster> though there some commits in mesa over the last month that lead to kernel faults in the kernell panfrost driver that eventually ykae my entire system down

14:54 <raster> i backed out to 5d085ad052aac1f35cef7b60c0e6ecad65a6807b and it now doesnt end up with a kernel oops eventually

14:54 <raster> i can run glmark to completion at least :)

14:55 <raster> i have to bisect to figure out what the commit is that causes this so i can then have an idea of what it might be in the kernel

14:55 <robmur01> raster: 5.4-rc kernel or something older?

14:55 <raster> havent done that yet.

14:55 <raster> robmur01: 5.3.0-rc6

14:55 <raster> with patches to allow ddk to work

14:56 <robmur01> ah, you may well be missing one or two significant kernel fixes (like the drm_sched one)

14:56 <raster> it's possible for sure

14:57 <raster> there are so many moving parts here

14:57 <raster> like needed patches for kernel to work on hikey960

14:57 <raster> needed patches for ddk to work

14:57 <raster> need a patched mesa to work

14:57 <raster> so much forkage :)

14:58 <robmur01> there's still definitely *something* up with buffer lifetimes being slightly wonky, but only to the point of spewing errors rather than killing the kernel

14:59 <raster> oh this kills my kernel

14:59 <raster> a bunch of paging faults then eventually an oops

14:59 <raster> yaaargh

15:00 <raster> and now gstreamer is deadlocking somewhere

15:00 <raster> i also have to patch my devicetree to work on my rockpro64

15:00 <raster> (otherwise i cant even boot)

15:01 fysa has joined #panfrost

15:10 fysa has quit [Ping timeout: 276 seconds]

15:14 <tomeu> raster: at least we have a path to reduce the forkage :)

15:15 chewitt has quit [Quit: Zzz..]

15:15 chewitt has joined #panfrost

15:16 chewitt has quit [Client Quit]

15:23 <tomeu> damn, I have found a test case that fails with the DDK :/

15:24 <tomeu> I think they got their tiler hierarchy mask wrong :p

15:30 maccraft123 has quit [Ping timeout: 240 seconds]

15:34 maccraft123 has joined #panfrost

15:34 maccraft123 has quit [Client Quit]

15:38 shadeslayer has joined #panfrost

15:49 tomeu has quit [Quit: Coyote finally caught me]

15:50 tomeu has joined #panfrost

15:54 Thra11_ has quit [Ping timeout: 252 seconds]

16:06 fysa has joined #panfrost

16:09 Thra11_ has joined #panfrost

16:09 karolherbst has quit [Ping timeout: 276 seconds]

16:10 karolherbst has joined #panfrost

16:13 lfrb has joined #panfrost

16:15 fysa has quit [Ping timeout: 265 seconds]

16:34 rhyskidd has quit [Ping timeout: 265 seconds]

16:42 fysa has joined #panfrost

16:47 rhyskidd has joined #panfrost

16:54 fysa has quit [Remote host closed the connection]

16:55 fysa has joined #panfrost

18:09 Thra11_ has quit [Ping timeout: 276 seconds]

18:21 eballetbo[m] has quit [Remote host closed the connection]

18:21 thefloweringash has quit [Read error: Connection reset by peer]

18:21 EmilKarlson has quit [Read error: Connection reset by peer]

18:21 TheCycoONE has quit [Write error: Connection reset by peer]

18:24 Thra11_ has joined #panfrost

18:26 pH5 has joined #panfrost

18:37 fysa has quit [Remote host closed the connection]

18:45 <robmur01> OK, apparently I'm mistaken - it looks like shadow could never have run on t620... I must be misremembering testing an FPGA GPU on the same board :(

18:50 maccraft123 has joined #panfrost

18:51 fysa has joined #panfrost

18:54 <raster> robmur01: too many moving parts. trees, branches, patches, revisions, mes, kernel, ddk, different gpu variants with different errata...

19:17 maccraft123 has quit [Ping timeout: 276 seconds]

19:20 maccraft123 has joined #panfrost

19:29 maccraft123 has quit [Ping timeout: 240 seconds]

19:32 maccraft123 has joined #panfrost

19:34 raster has quit [Quit: Gettin' stinky!]

19:38 maccraft123 has quit [Ping timeout: 246 seconds]

19:41 maccraft123 has joined #panfrost

19:52 maccraft123 has quit [Remote host closed the connection]

19:53 maccraft123 has joined #panfrost

19:58 maccraft123 has quit [Ping timeout: 240 seconds]

20:00 maccraft123 has joined #panfrost

20:08 maccraft123 has quit [Ping timeout: 268 seconds]

20:12 maccraft123 has joined #panfrost

20:15 maccraft123 has quit [Remote host closed the connection]

20:18 maccraft123 has joined #panfrost

20:24 maccraft123 has quit [Ping timeout: 245 seconds]

20:28 maccraft123 has joined #panfrost

20:37 maccraft123 has quit [Ping timeout: 252 seconds]

20:39 maccraft123 has joined #panfrost

20:45 maccraft123 has quit [Ping timeout: 245 seconds]

20:50 maccraft123 has joined #panfrost

20:56 maccraft123 has quit [Ping timeout: 245 seconds]

20:58 maccraft123 has joined #panfrost

21:10 maccraft123 has quit [Ping timeout: 245 seconds]

21:14 fysa has quit [Remote host closed the connection]

21:19 davidlt has quit [Ping timeout: 268 seconds]

21:22 maccraft123 has joined #panfrost

21:24 NeuroScr has joined #panfrost

21:27 BenG83 has joined #panfrost

21:28 maccraft123 has quit [Ping timeout: 240 seconds]

21:29 maccraft123 has joined #panfrost

21:36 maccraft123 has quit [Remote host closed the connection]

21:37 maccraft123 has joined #panfrost

21:51 maccraft123 has quit [Quit: WeeChat 2.6]

23:04 pH5 has quit [Remote host closed the connection]