alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
davidlt has joined #panfrost
davidlt has quit [Remote host closed the connection]
guillaume_g has joined #panfrost
klaxa has quit [Ping timeout: 252 seconds]
yann has quit [Ping timeout: 268 seconds]
raster has joined #panfrost
<raster> mjorns
<bbrezillon> shadeslayer: do you think you can submit a v2 of the mult
<bbrezillon> *multi-ctx fixes
<bbrezillon> ?
<bbrezillon> I'm reworking the job deps logic and I'd like to apply those patches first
megi has joined #panfrost
guillaume_g has quit [*.net *.split]
adjtm_ has quit [*.net *.split]
hopetech has quit [*.net *.split]
Lyude has quit [*.net *.split]
paulk-leonov has quit [*.net *.split]
nickolas360 has quit [*.net *.split]
marvs has quit [*.net *.split]
maciejjo has quit [*.net *.split]
griffinp- has quit [*.net *.split]
urjaman has quit [*.net *.split]
bbrezillon has quit [*.net *.split]
gcl has quit [*.net *.split]
alyssa has quit [*.net *.split]
Prf_Jakob has quit [*.net *.split]
Depau has quit [*.net *.split]
anarsoul has quit [*.net *.split]
robmur01 has quit [*.net *.split]
ente has quit [*.net *.split]
WeaselSoup has quit [*.net *.split]
mani_s has quit [*.net *.split]
suihkulokki has quit [*.net *.split]
maciejjo has joined #panfrost
hopetech has joined #panfrost
bbrezillon has joined #panfrost
paulk-leonov has joined #panfrost
Depau has joined #panfrost
adjtm has joined #panfrost
anarsoul has joined #panfrost
mani_s has joined #panfrost
suihkulokki has joined #panfrost
guillaume_g has joined #panfrost
gcl has joined #panfrost
WeaselSoup has joined #panfrost
ente has joined #panfrost
Prf_Jakob has joined #panfrost
urjaman has joined #panfrost
megi has quit [Ping timeout: 258 seconds]
jolan has quit [Quit: leaving]
jolan has joined #panfrost
robmur01 has joined #panfrost
chewitt has joined #panfrost
megi has joined #panfrost
<shadeslayer> bbrezillon: Sure thing
<shadeslayer> bbrezillon: I'll send those out in a couple of hours
<shadeslayer> I had a question about kernel side BO's, is there a possibility that we're going to be creating different kind of BO's in the future like VC4?
<shadeslayer> bbrezillon: tomeu daniels ^^
guillaume_g has quit [Quit: Konversation terminated!]
davidlt has joined #panfrost
<shadeslayer> bbrezillon: v2 of the patches out
<shadeslayer> bbrezillon: hooray for that rename though
chewitt has quit [Remote host closed the connection]
raster has quit [Remote host closed the connection]
<bbrezillon> shadeslayer: what do you mean by different kind of BOs?
<bbrezillon> shadeslayer: thx for the v2 BTW
<bbrezillon> shadeslayer: I don't know, maybe robher has an idea
<bbrezillon> I don't think the kernel driver needs to allocate BOs for its own usage
<bbrezillon> right now, the only piece of code doing that is the perfcnt stuff, and we don't need BO labeling to identify the perfcnt BO
<robher> bbrezillon, shadeslayer: we now do with heap and noexec flags.
<robher> we have no need for a kernel side cache. I looked at that in vc4 and that was my conclusion, but I don't remember why.
<bbrezillon> VC4 is doing a lot more kernel side
<bbrezillon> like handling BO allocation for binning
<shadeslayer> robher: re kernel side cache, I discussed this as well, and what we concluded was that with madvise it wasn't required since we have user space BO cache
<bbrezillon> shadeslayer: do you know what BO labeling will be used for?
<shadeslayer> bbrezillon: from what I understand, it's useful for debugging purposes
<shadeslayer> and vc4 also apparently prints out stats to the debugfs for how many BO's are allocated and for what etc
<bbrezillon> yes, but it depends what you want to track
<bbrezillon> if it's just about tracking BOs allocated by a specific context, you can keep this information in mesa
<bbrezillon> if you can about imported BOs, it has to be implemented kernel side
<bbrezillon> s/can/care/
<shadeslayer> bbrezillon: I'm not sure tbh, daniels just mentioned that it'd be nice to have a couple of weeks ago and I put it on my todo after chatting with him for a bit
<bbrezillon> ok, then I fear you'll have to wait for daniels' reply :)
<shadeslayer> ack :)
<shadeslayer> bbrezillon: I don't think vc4 labels imported buffer's fwiw
<bbrezillon> no, but the label is attached to the kernel-side BO object
<bbrezillon> so it should be kept when exporting/importing BO
<shadeslayer> but that would be dependent on whether the exporting driver labelled it as well?
<bbrezillon> hm, maybe not
<bbrezillon> I don't remember if there's a specific case for imported BOs that have been produced by the DRM driver
stikonas has joined #panfrost
<daniels> i don't know if we have a use for multiple types of BO, but tbh I'm not sure I can see it
<HdkR> Would a type of BO also mean a BO with different caching behaviours?
<HdkR> Since that would make more sense in Vulkan land where the client has more control over that
<shadeslayer> daniels: I think what we really want to figure out is whether we should track imported bufs
<shadeslayer> er, track labels for imported bufs
TheKit has quit [Remote host closed the connection]
<daniels> shadeslayer: probably not for now at least
<daniels> HdkR: very good point, but changing cache attributes is super difficult on arm
<HdkR> That's the point of Vulkan exposing different memory regions so you can allocate a Vulkan buffer uncached :)
<HdkR> Let the kernel interface handle the fudging of cache attributes so the app doesn't need to touch it
<HdkR> Lack of `HOST_CACHED_BIT` specifically I'm thinking about here
adjtm has quit [Ping timeout: 244 seconds]
yann|work has joined #panfrost
TheKit has joined #panfrost
alyssa has joined #panfrost
<alyssa> So, my scheduling experiment is wrapping up
<alyssa> I'm spending some time to cherry-pick all the general changes it entailed, since it forced me to cleanup a lot of IR stuff
<alyssa> So all that stuff will be pushed shortly
<alyssa> The out-of-order scheduler itself, in this v1, probably won't see the light of day, for now
<alyssa> Far too many regressions and while I'd love to get the code in shape, well, that brings me to point 2:
<alyssa> This is my last day of summer; studies start up next week.
<HdkR> Was it educational? :)
<alyssa> HdkR: Quite! I learned a ton which will help for future Midgard scheduling work but also any other compiler I end up poking at
<HdkR> Then nothing is lost :)
<alyssa> 3 files changed, 953 insertions(+), 572 deletions(-)
<alyssa> HdkR: ^^ That's what's being lost (diff with the branch with all the cherry-picked improvements, diff with master is even larger)
<alyssa> Admittedly, 175 lines there is just prose describing the algorithm.
<alyssa> If you're wondering why it took me 175 lines to explain the algorithm.. Midgard is *complicated*
<HdkR> Important bit is the learning for when v2 comes along
<HdkR> :D
<alyssa> Something else I'm trying to cherry-pick is scheduling-before-RA
<alyssa> There's no reason to do this with the old algo but I'd like to have it working nicely and merged
<alyssa> One less thing to worry about when replacing the scheduler
<alyssa> ^ shader-db regressions doing that on master
<daniels> alyssa: single-bundle bumps seem minor enough to not worry tbf
<alyssa> daniels: Thing is, out-of-order scheduling sometimes only saves a bundle or two ;)
<daniels> heh
<alyssa> If I can get this stuff cleaned up, no shader-db changes (or just wins), no deqp regressions, and pushed
<alyssa> That eliminates a *huge* source of friction for doing out-of-order on top
<anarsoul> alyssa: err
<anarsoul> are you doing RA *before* scheduling?
<alyssa> anarsoul: On current master, yes
<alyssa> Since current master is purely in-order "scheduling"
<anarsoul> well, I guess it's possible with in-order scheduling
<alyssa> Obviously that's a non-starter for out-of-order, hence why I want to flip that.
<anarsoul> you know I thought it would be easier if ppir compiler was written with keeping CF in mind, but it turns out that you have to keep *everything* in mind :)
<alyssa> Yeah
<anarsoul> we had out of order scheduler before CF in ppir
<anarsoul> but it wasn't CF-aware
<alyssa> That would've made the scheduler waaaay easier to write
<alyssa> and then made CF a nightmare to add
<alyssa> right?
<anarsoul> yeah
<anarsoul> I had to refactor it
<anarsoul> with some help from enunes :)
<anarsoul> and then re-add some features back, like pipelining ops
<anarsoul> it's still not perfect though
<anarsoul> it's never perfect
adjtm has joined #panfrost
<anarsoul> but I like current state of ppir :)
<alyssa> !pyt
<alyssa> Wrong window
<alyssa> Applied one of the patches from the new scheduler against the old
<anarsoul> alyssa: your "bundle" is actually a single instruction
<anarsoul> I'm not sure if it makes sense to count "instructions" which are ops
<anarsoul> I'm pretty sure it takes 1 clock to execute whole "bundle" unless there's penalty for cache miss
<alyssa> anarsoul: Correct (the last statement).
<alyssa> I like my naming better.
<alyssa> One NIR instruction -> one MIR instruction
<anarsoul> don't you find it a bit confusing?
<alyssa> shader-db counts # of MIR instructions as "instructions"
<alyssa> Multiple MIR instructions -> 1 scheduled bundle
<alyssa> A bundle of instructions
<alyssa> anarsoul: Having two diferent things called instruction would be confusing!
<anarsoul> alyssa: yeah, but # of MIR instructions doesn't have significant impact on performance
<anarsoul> but # of bundles has
<alyssa> anarsoul: This is true. However, it's worth reporting both on shader-db
<alyssa> `# of instructions` is a report about code generation quality
<alyssa> `# of bundles` is a report about scheduler quality
<anarsoul> OK, fair enough
<alyssa> If I make a change to core NIR, I would report only the former
<anarsoul> I still find your naming confusing though
<alyssa> If I make a change to the scheduling algorithm, ideally the former doesn't change but I report the latter
<alyssa> Likewise, I report `registers` and `threads` separately
<anarsoul> fair enough
<alyssa> (Even though threads is just a function of registers)
<anarsoul> btw, # of threads vs # of regs was pretty smart decision. On Utgard we have 128 threads with fixed number of regs
<alyssa> Hehe
<alyssa> anarsoul: FWIW, Intel does the same for their shader-db results
<alyssa> Although they're not VLIW
<alyssa> instructions vs cycles
<anarsoul> I haven't looked into what intel reports
<alyssa> Ohhh!
* alyssa understands how to fix this now
<alyssa> The issue is that the spill cost in bundle is more than not bundles
<anarsoul> is it?
<alyssa> Kina
<anarsoul> don't you have a slot in bundle that loads a temporary for you?
<alyssa> For this to make sense I probably need to cherry-pick my spilling stuff as well
<alyssa> It's.. ocmplicated
<anarsoul> and then you can reference load result as a regular reg?
<alyssa> Honestly at this point I think I'm going to eat the shader-db difference, it's good enough
* anarsoul just wonders how far they went with spilling in Midgard
<anarsoul> alyssa: I can't find anything about temporaries in https://gitlab.freedesktop.org/panfrost/mali-isa-docs/blob/master/Midgard.md
<anarsoul> is it documented yet?
<alyssa> To load from TLS, you can do a load instruction and that goes to a work register of your choice
<alyssa> To store from TLS, you put the value you want to write in a special pipeline register and then do a store instruction
<anarsoul> *ugh*
<anarsoul> so you have to use a whole register to load a value?
<alyssa> Yeah
<anarsoul> terrible
<HdkR> alyssa: You can whip us up a quick Vulkan driver before your studies start next week right? ;)
<alyssa> HdkR: Hush you.
<HdkR> hehe
<HdkR> How else am I to run D9VK and VKD3D games on Mali? :D
<alyssa> Zink⁻¹
<HdkR> Whoa
<HdkR> How many meta layers can we go through
<alyssa> Vulkan state tracker for Gallium
<HdkR> Very aware of kusma's great work :D
<alyssa> No, no, that's Zink
<alyssa> totally different
<HdkR> ooooooh
<alyssa> Zink⁻¹ is a moonshot by amsuk on behalf of Aroballoc
<HdkR> I see the difference
<HdkR> Silly
<anarsoul> I like "Aroballoc" name. Has it been registered yet?
<urjaman> the aromantic byte allocator
<alyssa> Anyways, I guess this afternoon is "push Alyssa's random branches to master on a Friday after 10pm GMT and then not be back on Monday to bisect regressions" day
<alyssa> ;P
<HdkR> If it is on a branch that isn't master then it can't regress :D
<alyssa> "If a tree regresses in a forest but Alyssa is at uni, does it matter?"
<alyssa> ^ Overall results for the series (which is a big potpourrie of minor improvements to make room for a better scheduler)
<alyssa> The only "huge" change is from reversing the order of scheduling/RA which makes spills more expensive (we would need a little post-RA scheduler to fix that)
<alyssa> compensated by rebasing my patches to make spilling way cheaper, so it's still a net win
<alyssa> --Desktop GL regression. Lovely.
<alyssa> Alright. Pushed what I could. Spilling stuff didn't make the cut.
<HdkR> :+1:
raster has joined #panfrost
* alyssa would love to take another stab at a v2 of the scheduler but
<alyssa> there's only an hour left of summer.
raster has quit [Remote host closed the connection]
<anarsoul> alyssa: so your internship is almost over?
stikonas has quit [Ping timeout: 252 seconds]
<alyssa> anarsoul: Yeah..