#panfrost on 2019-08-30 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

05:39 davidlt has joined #panfrost

05:50 davidlt has quit [Remote host closed the connection]

06:24 guillaume_g has joined #panfrost

06:55 klaxa has quit [Ping timeout: 252 seconds]

07:44 yann has quit [Ping timeout: 268 seconds]

08:30 raster has joined #panfrost

08:45 <raster> mjorns

08:57 <bbrezillon> shadeslayer: do you think you can submit a v2 of the mult

08:57 <bbrezillon> *multi-ctx fixes

08:57 <bbrezillon> ?

08:59 <bbrezillon> I'm reworking the job deps logic and I'd like to apply those patches first

09:55 megi has joined #panfrost

10:23 guillaume_g has quit [*.net *.split]

10:23 adjtm_ has quit [*.net *.split]

10:23 hopetech has quit [*.net *.split]

10:23 Lyude has quit [*.net *.split]

10:23 paulk-leonov has quit [*.net *.split]

10:23 nickolas360 has quit [*.net *.split]

10:23 marvs has quit [*.net *.split]

10:23 maciejjo has quit [*.net *.split]

10:23 griffinp- has quit [*.net *.split]

10:23 urjaman has quit [*.net *.split]

10:23 bbrezillon has quit [*.net *.split]

10:23 gcl has quit [*.net *.split]

10:23 alyssa has quit [*.net *.split]

10:23 Prf_Jakob has quit [*.net *.split]

10:23 Depau has quit [*.net *.split]

10:23 anarsoul has quit [*.net *.split]

10:23 robmur01 has quit [*.net *.split]

10:23 ente has quit [*.net *.split]

10:23 WeaselSoup has quit [*.net *.split]

10:23 mani_s has quit [*.net *.split]

10:23 suihkulokki has quit [*.net *.split]

10:23 maciejjo has joined #panfrost

10:24 hopetech has joined #panfrost

10:24 bbrezillon has joined #panfrost

10:24 paulk-leonov has joined #panfrost

10:27 Depau has joined #panfrost

10:27 adjtm has joined #panfrost

10:27 anarsoul has joined #panfrost

10:28 mani_s has joined #panfrost

10:29 suihkulokki has joined #panfrost

10:29 guillaume_g has joined #panfrost

10:29 gcl has joined #panfrost

10:30 WeaselSoup has joined #panfrost

10:31 ente has joined #panfrost

10:32 Prf_Jakob has joined #panfrost

10:46 urjaman has joined #panfrost

12:08 megi has quit [Ping timeout: 258 seconds]

12:45 jolan has quit [Quit: leaving]

12:46 jolan has joined #panfrost

12:53 robmur01 has joined #panfrost

13:26 chewitt has joined #panfrost

14:40 megi has joined #panfrost

14:44 <shadeslayer> bbrezillon: Sure thing

14:44 <shadeslayer> bbrezillon: I'll send those out in a couple of hours

14:45 <shadeslayer> I had a question about kernel side BO's, is there a possibility that we're going to be creating different kind of BO's in the future like VC4?

15:00 <shadeslayer> bbrezillon: tomeu daniels ^^

15:02 guillaume_g has quit [Quit: Konversation terminated!]

15:56 davidlt has joined #panfrost

16:00 <shadeslayer> bbrezillon: v2 of the patches out

16:03 <shadeslayer> bbrezillon: hooray for that rename though

16:27 chewitt has quit [Remote host closed the connection]

16:38 raster has quit [Remote host closed the connection]

16:42 <bbrezillon> shadeslayer: what do you mean by different kind of BOs?

16:43 <bbrezillon> shadeslayer: thx for the v2 BTW

16:50 <shadeslayer> bbrezillon: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/vc4/vc4_drv.h#L19-L32

16:53 <bbrezillon> shadeslayer: I don't know, maybe robher has an idea

16:54 <bbrezillon> I don't think the kernel driver needs to allocate BOs for its own usage

16:55 <bbrezillon> right now, the only piece of code doing that is the perfcnt stuff, and we don't need BO labeling to identify the perfcnt BO

16:58 <robher> bbrezillon, shadeslayer: we now do with heap and noexec flags.

16:59 <robher> we have no need for a kernel side cache. I looked at that in vc4 and that was my conclusion, but I don't remember why.

17:03 <bbrezillon> VC4 is doing a lot more kernel side

17:03 <bbrezillon> like handling BO allocation for binning

17:04 <shadeslayer> robher: re kernel side cache, I discussed this as well, and what we concluded was that with madvise it wasn't required since we have user space BO cache

17:24 <bbrezillon> shadeslayer: do you know what BO labeling will be used for?

17:25 <shadeslayer> bbrezillon: from what I understand, it's useful for debugging purposes

17:26 <shadeslayer> and vc4 also apparently prints out stats to the debugfs for how many BO's are allocated and for what etc

17:26 <bbrezillon> yes, but it depends what you want to track

17:27 <bbrezillon> if it's just about tracking BOs allocated by a specific context, you can keep this information in mesa

17:27 <bbrezillon> if you can about imported BOs, it has to be implemented kernel side

17:27 <bbrezillon> s/can/care/

17:29 <shadeslayer> bbrezillon: I'm not sure tbh, daniels just mentioned that it'd be nice to have a couple of weeks ago and I put it on my todo after chatting with him for a bit

17:31 <bbrezillon> ok, then I fear you'll have to wait for daniels' reply :)

17:31 <shadeslayer> ack :)

17:39 <shadeslayer> bbrezillon: I don't think vc4 labels imported buffer's fwiw

17:40 <bbrezillon> no, but the label is attached to the kernel-side BO object

17:40 <bbrezillon> so it should be kept when exporting/importing BO

17:40 <shadeslayer> but that would be dependent on whether the exporting driver labelled it as well?

17:40 <bbrezillon> hm, maybe not

17:41 <bbrezillon> I don't remember if there's a specific case for imported BOs that have been produced by the DRM driver

17:51 stikonas has joined #panfrost

17:51 <daniels> i don't know if we have a use for multiple types of BO, but tbh I'm not sure I can see it

17:55 <HdkR> Would a type of BO also mean a BO with different caching behaviours?

17:56 <HdkR> Since that would make more sense in Vulkan land where the client has more control over that

18:04 <shadeslayer> daniels: I think what we really want to figure out is whether we should track imported bufs

18:04 <shadeslayer> er, track labels for imported bufs

18:14 TheKit has quit [Remote host closed the connection]

18:23 <daniels> shadeslayer: probably not for now at least

18:23 <daniels> HdkR: very good point, but changing cache attributes is super difficult on arm

18:25 <HdkR> That's the point of Vulkan exposing different memory regions so you can allocate a Vulkan buffer uncached :)

18:26 <HdkR> Let the kernel interface handle the fudging of cache attributes so the app doesn't need to touch it

18:29 <HdkR> Lack of `HOST_CACHED_BIT` specifically I'm thinking about here

18:46 adjtm has quit [Ping timeout: 244 seconds]

18:50 yann|work has joined #panfrost

19:05 TheKit has joined #panfrost

19:40 alyssa has joined #panfrost

19:43 <alyssa> So, my scheduling experiment is wrapping up

19:43 <alyssa> I'm spending some time to cherry-pick all the general changes it entailed, since it forced me to cleanup a lot of IR stuff

19:43 <alyssa> So all that stuff will be pushed shortly

19:44 <alyssa> The out-of-order scheduler itself, in this v1, probably won't see the light of day, for now

19:44 <alyssa> Far too many regressions and while I'd love to get the code in shape, well, that brings me to point 2:

19:44 <alyssa> This is my last day of summer; studies start up next week.

19:44 <HdkR> Was it educational? :)

19:45 <alyssa> HdkR: Quite! I learned a ton which will help for future Midgard scheduling work but also any other compiler I end up poking at

19:46 <HdkR> Then nothing is lost :)

19:46 <alyssa> 3 files changed, 953 insertions(+), 572 deletions(-)

19:47 <alyssa> HdkR: ^^ That's what's being lost (diff with the branch with all the cherry-picked improvements, diff with master is even larger)

19:48 <alyssa> Admittedly, 175 lines there is just prose describing the algorithm.

19:48 <alyssa> If you're wondering why it took me 175 lines to explain the algorithm.. Midgard is *complicated*

19:48 <HdkR> Important bit is the learning for when v2 comes along

19:48 <HdkR> :D

20:14 <alyssa> Something else I'm trying to cherry-pick is scheduling-before-RA

20:14 <alyssa> There's no reason to do this with the old algo but I'd like to have it working nicely and merged

20:14 <alyssa> One less thing to worry about when replacing the scheduler

20:15 <alyssa> https://people.collabora.com/~alyssa/after.txt

20:15 <alyssa> ^ shader-db regressions doing that on master

20:18 <daniels> alyssa: single-bundle bumps seem minor enough to not worry tbf

20:19 <alyssa> daniels: Thing is, out-of-order scheduling sometimes only saves a bundle or two ;)

20:19 <daniels> heh

20:19 <alyssa> If I can get this stuff cleaned up, no shader-db changes (or just wins), no deqp regressions, and pushed

20:20 <alyssa> That eliminates a *huge* source of friction for doing out-of-order on top

20:20 <anarsoul> alyssa: err

20:20 <anarsoul> are you doing RA *before* scheduling?

20:20 <alyssa> anarsoul: On current master, yes

20:20 <alyssa> Since current master is purely in-order "scheduling"

20:20 <anarsoul> well, I guess it's possible with in-order scheduling

20:21 <alyssa> Obviously that's a non-starter for out-of-order, hence why I want to flip that.

20:22 <anarsoul> you know I thought it would be easier if ppir compiler was written with keeping CF in mind, but it turns out that you have to keep *everything* in mind :)

20:22 <alyssa> Yeah

20:22 <anarsoul> we had out of order scheduler before CF in ppir

20:23 <anarsoul> but it wasn't CF-aware

20:23 <alyssa> That would've made the scheduler waaaay easier to write

20:23 <alyssa> and then made CF a nightmare to add

20:23 <alyssa> right?

20:23 <anarsoul> yeah

20:23 <anarsoul> I had to refactor it

20:23 <anarsoul> with some help from enunes :)

20:24 <anarsoul> and then re-add some features back, like pipelining ops

20:24 <anarsoul> it's still not perfect though

20:24 <anarsoul> it's never perfect

20:24 adjtm has joined #panfrost

20:24 <anarsoul> but I like current state of ppir :)

20:30 <alyssa> !pyt

20:30 <alyssa> Wrong window

20:32 <alyssa> Applied one of the patches from the new scheduler against the old

20:32 <alyssa> https://people.collabora.com/~alyssa/looking-better.txt

20:34 <anarsoul> alyssa: your "bundle" is actually a single instruction

20:35 <anarsoul> I'm not sure if it makes sense to count "instructions" which are ops

20:35 <anarsoul> I'm pretty sure it takes 1 clock to execute whole "bundle" unless there's penalty for cache miss

20:35 <alyssa> anarsoul: Correct (the last statement).

20:35 <alyssa> I like my naming better.

20:36 <alyssa> One NIR instruction -> one MIR instruction

20:36 <anarsoul> don't you find it a bit confusing?

20:36 <alyssa> shader-db counts # of MIR instructions as "instructions"

20:36 <alyssa> Multiple MIR instructions -> 1 scheduled bundle

20:36 <alyssa> A bundle of instructions

20:36 <alyssa> anarsoul: Having two diferent things called instruction would be confusing!

20:36 <anarsoul> alyssa: yeah, but # of MIR instructions doesn't have significant impact on performance

20:37 <anarsoul> but # of bundles has

20:37 <alyssa> anarsoul: This is true. However, it's worth reporting both on shader-db

20:37 <alyssa> `# of instructions` is a report about code generation quality

20:37 <alyssa> `# of bundles` is a report about scheduler quality

20:37 <anarsoul> OK, fair enough

20:37 <alyssa> If I make a change to core NIR, I would report only the former

20:37 <anarsoul> I still find your naming confusing though

20:38 <alyssa> If I make a change to the scheduling algorithm, ideally the former doesn't change but I report the latter

20:38 <alyssa> Likewise, I report `registers` and `threads` separately

20:38 <anarsoul> fair enough

20:38 <alyssa> (Even though threads is just a function of registers)

20:39 <anarsoul> btw, # of threads vs # of regs was pretty smart decision. On Utgard we have 128 threads with fixed number of regs

20:40 <alyssa> Hehe

20:40 <alyssa> anarsoul: FWIW, Intel does the same for their shader-db results

20:40 <alyssa> Although they're not VLIW

20:41 <alyssa> instructions vs cycles

20:41 <anarsoul> I haven't looked into what intel reports

20:44 <alyssa> Ohhh!

20:44 * alyssa understands how to fix this now

20:45 <alyssa> The issue is that the spill cost in bundle is more than not bundles

20:45 <anarsoul> is it?

20:45 <alyssa> Kina

20:45 <anarsoul> don't you have a slot in bundle that loads a temporary for you?

20:45 <alyssa> For this to make sense I probably need to cherry-pick my spilling stuff as well

20:46 <alyssa> It's.. ocmplicated

20:46 <anarsoul> and then you can reference load result as a regular reg?

20:46 <alyssa> Honestly at this point I think I'm going to eat the shader-db difference, it's good enough

20:46 * anarsoul just wonders how far they went with spilling in Midgard

20:49 <anarsoul> alyssa: I can't find anything about temporaries in https://gitlab.freedesktop.org/panfrost/mali-isa-docs/blob/master/Midgard.md

20:49 <anarsoul> is it documented yet?

20:50 <alyssa> To load from TLS, you can do a load instruction and that goes to a work register of your choice

20:51 <alyssa> To store from TLS, you put the value you want to write in a special pipeline register and then do a store instruction

20:53 <anarsoul> *ugh*

20:53 <anarsoul> so you have to use a whole register to load a value?

20:53 <alyssa> Yeah

20:53 <anarsoul> terrible

20:59 <HdkR> alyssa: You can whip us up a quick Vulkan driver before your studies start next week right? ;)

20:59 <alyssa> HdkR: Hush you.

21:00 <HdkR> hehe

21:01 <HdkR> How else am I to run D9VK and VKD3D games on Mali? :D

21:02 <alyssa> Zink⁻¹

21:02 <HdkR> Whoa

21:02 <HdkR> How many meta layers can we go through

21:02 <alyssa> Vulkan state tracker for Gallium

21:03 <HdkR> Very aware of kusma's great work :D

21:03 <alyssa> No, no, that's Zink

21:03 <alyssa> totally different

21:04 <HdkR> ooooooh

21:04 <alyssa> Zink⁻¹ is a moonshot by amsuk on behalf of Aroballoc

21:04 <HdkR> I see the difference

21:04 <HdkR> Silly

21:09 <anarsoul> I like "Aroballoc" name. Has it been registered yet?

21:11 <urjaman> the aromantic byte allocator

21:14 <alyssa> Anyways, I guess this afternoon is "push Alyssa's random branches to master on a Friday after 10pm GMT and then not be back on Monday to bisect regressions" day

21:14 <alyssa> ;P

21:14 <HdkR> If it is on a branch that isn't master then it can't regress :D

21:15 <alyssa> "If a tree regresses in a forest but Alyssa is at uni, does it matter?"

21:44 <alyssa> https://people.collabora.com/~alyssa/overall.txt

21:45 <alyssa> ^ Overall results for the series (which is a big potpourrie of minor improvements to make room for a better scheduler)

21:45 <alyssa> The only "huge" change is from reversing the order of scheduling/RA which makes spills more expensive (we would need a little post-RA scheduler to fix that)

21:46 <alyssa> compensated by rebasing my patches to make spilling way cheaper, so it's still a net win

21:49 <alyssa> --Desktop GL regression. Lovely.

22:53 <alyssa> Alright. Pushed what I could. Spilling stuff didn't make the cut.

22:55 <HdkR> :+1:

22:55 raster has joined #panfrost

23:01 * alyssa would love to take another stab at a v2 of the scheduler but

23:01 <alyssa> there's only an hour left of summer.

23:04 raster has quit [Remote host closed the connection]

23:06 <anarsoul> alyssa: so your internship is almost over?

23:37 stikonas has quit [Ping timeout: 252 seconds]

23:55 <alyssa> anarsoul: Yeah..