alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
<alyssa> robclark: nice :)
<alyssa> urjaman: woop!
adjtm has quit [Remote host closed the connection]
adjtm has joined #panfrost
<urjaman> woop woop for boring office tasks on the C201 :P
stikonas has quit [Remote host closed the connection]
<alyssa> Bifrost sure generates a lot of moves..
kaspter has joined #panfrost
vstehle has quit [Ping timeout: 256 seconds]
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 258 seconds]
camus1 is now known as kaspter
<alyssa> Interesting. It doesn't look like ATEST is inherently a high latency instruction
<alyssa> which makes you wonder what it's actually doing
<alyssa> OTOH it definitely ingests alpha so...
<alyssa> Er, yes it is, but it does look like the rule that you can't have stuff after high latency isn't enforced as strictly as intuition would lead you to believe
icecream95 has joined #panfrost
<icecream95> alyssa: STK doesn't reach 100% CPU even when a lot of text is being drawn, so I don't think lowering CPU overhead will do much.
kaspter has quit [Remote host closed the connection]
kaspter has joined #panfrost
<alyssa> icecream95: gotcha.
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
icecream95 has quit [Quit: leaving]
kaspter has quit [Ping timeout: 256 seconds]
kaspter has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
kaspter has quit [Ping timeout: 256 seconds]
kaspter has joined #panfrost
davidlt has joined #panfrost
QwertyChouskie has joined #panfrost
QwertyChouskie has quit [Ping timeout: 256 seconds]
_whitelogger has joined #panfrost
vstehle has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
kaspter has joined #panfrost
Elpaulo has quit [Quit: Elpaulo]
kaspter has quit [Ping timeout: 268 seconds]
kaspter has joined #panfrost
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
icecream95 has joined #panfrost
pH5 has quit [Quit: Lost terminal]
nerdboy has joined #panfrost
yann has quit [Ping timeout: 256 seconds]
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 240 seconds]
camus1 is now known as kaspter
icecream95 has quit [Ping timeout: 256 seconds]
yann has joined #panfrost
gcl_ has joined #panfrost
gcl has quit [Ping timeout: 256 seconds]
thefloweringash has quit [Quit: killed]
EmilKarlson has quit [Quit: killed]
nhp[m] has quit [Quit: killed]
clementp[m] has quit [Quit: killed]
Ke has quit [Quit: killed]
davidlt has quit [Ping timeout: 260 seconds]
<daniels> tomeu: xfb fixup looks good, only one unrelated flake https://mesa-ci.01.org/daniels/builds/37/group/63a9f0ea7bb98050796b649e85481845
<daniels> tomeu: thanks for catching that!
<tomeu> daniels: nice, thanks for testing!
<daniels> np :)
<daniels> you should be able to push to lfrb's branch from !2433 and take it up
<daniels> and then just merge
thefloweringash has joined #panfrost
clementp[m] has joined #panfrost
EmilKarlson has joined #panfrost
nhp[m] has joined #panfrost
paulk-leonov has quit [Ping timeout: 272 seconds]
paulk-leonov has joined #panfrost
<alyssa> +1:
<tomeu> I cannot, have already asked lfrb to open the MR to others
<daniels> tomeu: done
<alyssa> 10 days until Valhall is released to consumers...
<daniels> alyssa: which SoC/product?
<alyssa> daniels: Samsung Galaxy 20 or something. US version is Snapdragon since 'merica.
<alyssa> Europe version is Exynos 990 which has Mali-G77
<alyssa> (s/Europe/!US/)
<daniels> so, totally locked down then
<alyssa> Alas.
<alyssa> Not sure when the first openish valhall product will come along
<daniels> no open (i.e. non-Galaxy) Exynos devices since 2014's 542x
<alyssa> Eep, okay.
<tomeu> daniels: thanks!
<daniels> and looks like the only other announced option is a phone-only MediaTek SoC :(
<tomeu> I guess it will take 1-2 years to get valhall into a sbc or a chromebook
<tomeu> but maybe before then, we may see a low-power valhall for STBs?
<tomeu> with amlogic I guess
<tomeu> or realtek
<tomeu> but this one won't have good mainline support in a very long time, probably
<alyssa> tomeu: Keep in mind Bifrost is 3-4 years old now and still hasn't shown up in a released chromebook
<tomeu> I thought there were already chromebooks out there with mt8183
<tomeu> but yeah, it tends to take quite some time before stuff for new phones reach something mainlined
<robmur01> aren't the MTK chromebooks using the SoCs with PowerVR? I know the first one was
<tomeu> yes, the ones with mt8173 have PVR
* alyssa blinks
<alyssa> why is fround() a different opcode from froundeven/trunc/floor/ceil
<alyssa> are those different
* alyssa checks
<alyssa> Oh, right, they are
<robmur01> different rounding modes maybe? IIRC on the CPU there are some funky NEON instructions that go that way
<robmur01> (by which I mean do a specific thing regardless of what mode FPCR has set)
<alyssa> round(3.5) = 4.0, round(4.5) = 5.0
<alyssa> roundEven(3.5) = roundEven(4.5) = 4.0
<alyssa> trunc(3.5) = 3.0 = floor(3.5)
<alyssa> ceil(3.5) = 4.0
<robmur01> and now with negative inputs ;)
<alyssa> Alas.
<robmur01> derp, I've even commented on the patches pertaining to Bifrost for MT8183 for an apparent Chromebook. How did I manage to forget that?
* robmur01 retreats in shame
<tomeu> hehe
<tomeu> apparently the igt tests passed there :p
<alyssa> \o/
klaxa has quit [Killed (Sigyn (Spam is off topic on freenode.))]
raster has joined #panfrost
davidlt has joined #panfrost
<alyssa> IR design continues.
<alyssa> Getting close to something reasonable, I think
<alyssa> Main question now is how to handle the vector corners of the ISA
<alyssa> One corner is the support i8/i16/f16 which is vec4/2/2
<alyssa> I guess better make swizzles first class depending on CLASS.
<alyssa> er caps
xantoz has quit [Ping timeout: 260 seconds]
<alyssa> Then the other issue is the vectorized I/O
<alyssa> For stores writing 4 registers, the two options are essentially..
<alyssa> 1) Allow four destinations, and then force them to be paired at RA time
<alyssa> er, loads
<alyssa> 2) Or allow one destination but taking up four registers.
<alyssa> (at which point, the IR becomes essentially vector again. which isn't the end of the world.)
<alyssa> LCRA can of course handle #2 easily, just not sure if it has negative effects across the rest of the ISA.
<alyssa> Likewise for stores reading 4 registers...
<alyssa> 1) Allow four sources (we already do), and force pairing at RA time -- maybe not so terrible, but this would also require a lowering pass before sched/RA to ensure that actually works
<alyssa> 2) Allow vector sources and implement nir_vec4, probably with the same lower-before-sched then pair-after-RA dance.
<alyssa> Oh, and then 64-bit ops need aligned register pairs, so there's that
<alyssa> I guess I need to just accept my fate that I can't escape vector-aware RA
<alyssa> (if I want to have optimal codegen, anyway)
<HdkR> vector GPR classes for the vec4 stuff is pretty classic. Where you insert an "extract" on scalar use which is a no-op
<alyssa> Meep.
* alyssa contemplates
<alyssa> Okay, that seems reasonable.
<HdkR> basically the same for SWAR based class where you have packed 16bit, where extract may be a no-op if the instruction can consume it directly, or it can be an expand otherwise
<alyssa> So, maintain a table of vec4_regs[components] <--> scalar indices
<alyssa> and let the RA sort out the details?
<alyssa> (for vec4 loads)
<HdkR> Yea, you have the class interference in the RA
<alyssa> (and actually for stores too, with an appropriate reduction)
<alyssa> I mean, I was hoping to use LCRA since NIH but yes.
<HdkR> Which is why the mesa RA supports that :P
<alyssa> You're saying I should use the normal RA system
* alyssa gasps
<HdkR> oh no not that, I'm saying that mesa RA supports it
* alyssa admits LCRA's selling points apply specifically to Midgard
<alyssa> Without first class support for swizzles/masks it doesn't help much
<HdkR> Actually for packed registers it work pretty well since reusing part of the register is hard/impossible
<HdkR> for vec4 it gets a bit tricky to partial kill
<alyssa> Mm
<HdkR> Works pretty well for vec4 if you managed to "extract" the components early and don't need them to stay as vec4
<HdkR> but then you can get in to an extract + scalar ops + recombine dance which may or may not always shake out correctly
<alyssa> So, literally maintain vec_to_scalar/scalar_to_vec tables?
<alyssa> (Is that better or worse than synthetic extract instructions?)
<HdkR> eh, it's just register class interference tables that understands partial interferences :P
<HdkR> LLVM already does this, just use that </s>
<alyssa> Shh :p
<alyssa> Wait, uh, maybe extract/combine instructions would make more sense after all.
<HdkR> They turn in to no-ops or moves in most cases
<HdkR> moves when you need some explicit vector ordering, nop-ops otherwise
<alyssa> Can we know if they'll disappear during RA before actually running RA?
<alyssa> (In particular, can we know before scheduling?)
<alyssa> Seems like it should be possible so we can lower before sched/RA.
<HdkR> I think so?
<alyssa> :D
<alyssa> Oh, and then we have that adorable preload mechanism
<HdkR> Which is pretty cute yea
<alyssa> Oh joy, texture descriptors seem to have changed dramatically on bifrost
<HdkR> Of course
<HdkR> Got to do the Nvidia approach and change it every generation
<alyssa> :d
<cwabbott> alyssa: I actually kinda remember running into that
<cwabbott> what I remember is that they replaced the array of pointers with an array of fixed-size structures with pointers to the miplevels if necessary
<cwabbott> so they only need to chase an extra pointer if there are actually a lot of miplevels
<cwabbott> makes sense, actually
<milkii> 3 times in two hours! maybe time to turn off the "bifrost" notification for this channel.
<milkii> btw, thanks to all for your works :)
raster has quit [Quit: Gettin' stinky!]
<alyssa> cwabbott: oh, hi there! :)
<alyssa> Yeah, that's what I would assume... not sure why they didn't on midgard anyway.
<alyssa> thank you :)
<alyssa> milkii: :)
<alyssa> My original plan was to make this my Secret Project(TM) for a while and then do a big splash when we got the first renders goin
<alyssa> Of course you can't have secret projects when you do all development upstream-first ;)
<alyssa> So now that the cat is out of the bag, and the bag didn't exist to begin with
<alyssa> I'm still doing Midgard work and of course class so hours are limited but I've been pressing forward on Bifrost (initially focusing on G52) and we'll see what happens
<alyssa> Estimating about a month until anything interesting happens but we'll see. Lots and lots of plumbing.
<alyssa> The Midgard driver was developed very haphazardly and there was a lot of code that had to get rewritten sometimes multiple times due to initial oversights
<alyssa> so I'm trying to use Bifrost as an opportunity to fix some of those big mistakes (type sizing issues are the #1 here, control flow is also high)
klaxa has joined #panfrost
raster has joined #panfrost
cphealy has quit [Remote host closed the connection]
davidlt has quit [Ping timeout: 258 seconds]
<MastaG> Hi alyssa I haven't been following the IRC chat (at all) but for midgard do you also develop on the odroid xu3/4 boards?
<urjaman> Those are a T620 ... the answer to that is afaik nope (not yet atleast)
<robmur01> a fair amount of stuff does actually work on T620, but more by chance than intent :)
<tomeu> guess it could be good to have those in CI, so at least we know when they break (even if probably nobody will be able to fix them)
tlwoerner is now known as tw-eh
tw-eh is now known as tlwoerner
davidlt has joined #panfrost
davidlt has quit [Quit: Leaving]
davidlt has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
<alyssa> MastaG: ^^
<alyssa> Reportedly some things work and some things don't, YMMV. If folks are interested in developing for it, I'm happy to guide/mentor/code review/etc. That said at this time I'm not able to develop for it myself.
gcl_ has quit [Ping timeout: 260 seconds]
gcl has joined #panfrost
yann has quit [Ping timeout: 258 seconds]
Stary has quit [Quit: ZNC - http://znc.in]
Stary has joined #panfrost
MastaG has quit [Quit: The Lounge - https://thelounge.chat]
pH5 has joined #panfrost
<alyssa> oh gosh it looks like bifrost has indirect jumps?
<anarsoul> midgard doesn't have them?
<bbrezillon> alyssa: before I spend time debugging it, can you have a look at https://gitlab.freedesktop.org/bbrezillon/mesa/-/blob/panfrost-cmdstream-rework/src/gallium/drivers/panfrost/pan_context.c and let me know if I'm going in the right direction with the cmdstream rework
<alyssa> :eyes:
<alyssa> anarsoul: not to my knowledge
<krh> alyssa: shaders with virtual functions!
<anarsoul> yay!
<alyssa> krh: bu-- bu--
<bbrezillon> alyssa: descs are still prepared/emitted at draw time, but we might be able to do the preparation at CSO bind time for some of them
MastaG has joined #panfrost
<alyssa> bbrezillon: It looks like you're on the right track, yeah :)
<alyssa> Ideally they could all be done at bind time
<alyssa> Even better at create time, of course
<alyssa> (Descriptors which depend on multiple CSOs would have to be fixed up at bind times for any of them. Other gallium drivers deal with this of course.)
<alyssa> It's not that we need to shy away from the native structures, per se -- if mali_shader_meta is the right structure, it's the right structure, no inherent need to duplicate that
<alyssa> Rather, we want to make sure that we don't have state flying around and that we can precompute everything so draw time is just shuffling pointers. Forcing a structure to be generated all "at once" and then uploading immediately and throwing out the CPU side pointer is a way to do that (see textures).
<alyssa> Breaking up emit_for_draw as you've done there is probably a good intermediate step.
<alyssa> So yeah, I think you're on the right track :)
<alyssa> Good stuff :)
yann has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
<bbrezillon> alyssa: pre-populating a desc template at create time, adjusting it at bind time and further adjusting it at draw/emit time before the upload-to-transient takes place should be doable
<bbrezillon> not sure we want to upload things to an actual GPU buf earlier than draw time though
nerdboy has quit [Ping timeout: 256 seconds]
<alyssa> bbrezillon: I guess it depends on the descriptor in question
<alyssa> For things like textures you really can do it at all at create time, no adjustments needed
<alyssa> Fragment shader descriptors... less so :)
stikonas has joined #panfrost
<bbrezillon> alyssa: you mean the tex trampolines and tex sampler descs?
<bbrezillon> I guess we could, but that means reserving 4K everytime for something that's likely to much smaller
<alyssa> Fair
<bbrezillon> or having desc pools, but using desc pools means tracking job states to get those descs back
<bbrezillon> not so easy
<alyssa> Mm, why not track just the last one in the batch
raster has joined #panfrost
<alyssa> So if two consecutive draws use the same textures for instance it'll reuse the same slot of transient memory
<bbrezillon> you mean having a desc cache
<alyssa> Not even that fancy
<alyssa> Just whatever the *last* bound one is
<alyssa> Since then there's nothing to hash/cache/diff/etc, you just check if it's there or not
<bbrezillon> well, you'd need to have that cache in pure CPU mem (AKA mapped cacheable)
<alyssa> (And it gets changed out at bind time, and you upload at bind time. And also upload on draw time)
<alyssa> Again, no cache needed
<alyssa> struct panfrost_batch { mali_ptr bound_texture_tramplines; mali_ptr bound_samplers; }
<bbrezillon> ok, got it, just track if something changed
<alyssa> Yeah :)
<bbrezillon> but the state is attached to the ctx, not the batch
<alyssa> --Right, yes, that was the wrinkle
<alyssa> Reset all that to zero on set_framebuffer_state then
<bbrezillon> if a new FBO is bound in between, we need to invalidate things
<alyssa> ?
<bbrezillon> still doable
<alyssa> Gallium and well-optimized apps are predicated on avoiding state changes, so you should get an okay hit rate I suspect.
<alyssa> s/predicated/designed/
<bbrezillon> yep, we could try that
<bbrezillon> I'll finish the split first
<alyssa> Sounds good :-)
<bbrezillon> do you have CPU bound workloads where draw_vbo() is the culprit?
<alyssa> I don't think so, the performance side is low-prio
<bbrezillon> well, not necessarily CPU bound, as long as draw_vbo() is high in the perf report
<bbrezillon> okay
raster has quit [Quit: Gettin' stinky!]
<alyssa> It certainly has come up in profiles, so it's not cold, but it's not the hottest thing ever?
raster has joined #panfrost
<anarsoul> hm
buzzmarshall has joined #panfrost
davidlt has quit [Ping timeout: 268 seconds]
raster has quit [Quit: Gettin' stinky!]
mias has joined #panfrost
mias has quit [Ping timeout: 256 seconds]
nerdboy has joined #panfrost
<anarsoul> alyssa: mind if I move your index cache into shared part?
QwertyChouskie has joined #panfrost
QwertyChouskie has quit [Ping timeout: 256 seconds]