alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - Discord Discard
<HdkR> Lyude: How much do you know about GPU hotpluging? :P
<alyssa> That's a thing? O_O
<HdkR> Of course. eGPU is a significant use case now
<urjaman> yes
<HdkR> bnieuwen1uizen: Speaking of which, how does mesa + radeon handle this? :P
<alyssa> that's a little unsettling
<HdkR> Fun fact, ARM devices could use Thunderbolt + eGPU to have the same issue
<alyssa> That's it done with graphics
<HdkR> Just need 4x PCIe lanes + buying a chip more expensive than the SoC it is attached to
<bnieuwen1uizen> HdkR: handlke what?
<HdkR> bnieuwen1uizen: Surprise hotplug of GPUs
<HdkR> I have literally zero idea how X/Wayland reacts, but as long as the driver is sane then...eh?
<bnieuwen1uizen> well, the kernel driver exposes them, next time an app lists all devices it will show up
<urjaman> based on very little info (some comments on an lwn post i read a while ago) my guess is: not very well :P
<bnieuwen1uizen> as far as how it handles unplugs, no clue
<HdkR> https://twitter.com/whitequark/status/1056465535477710856 Someone linked me to this this morning which is why I'm curious about it
<bnieuwen1uizen> also no comment on whether the kernel driver is buggy for this case :P
<HdkR> :D
<bnieuwen1uizen> I think if you'll just fail any ioctl we eventually get the message and will return a DEVICE_LOST error*
<bnieuwen1uizen> *once the kernel driver is bug-free enough for us to get motivation to write the handling :P
<HdkR> Lets rig up a CI machine that just pulls out and plugs in thunderbolt cables for maximum silliness
<bnieuwen1uizen> which is actually around now, since gpu reset has just been declared stable enough to enable it for some
<HdkR> Neat
<bnieuwen1uizen> of course you'll still lose all VRAM so it is not quite transparent, but not unlike a hotplug in that regard?
<HdkR> Yea, fairly similar
<HdkR> Just that it can never recover
<bnieuwen1uizen> device lost says nothing about device being available again :P
<HdkR> aye
<bnieuwen1uizen> hmm, now that I think about it, is the app required to relist the physical devices?
* bnieuwen1uizen checks
<HdkR> Tell that to nvidia-uvm and... NVRM? I think is the one complaining in that twitter post
<bnieuwen1uizen> wow, "In some cases, the physical device may also be lost, and attempting to create a new logical device will fail, returning VK_ERROR_DEVICE_LOST."
<bnieuwen1uizen> How I love the vulkan spec thinking about corner cases
<HdkR> Nice
<alyssa> Vulkan is too pure for this world
<bnieuwen1uizen> wha why?
<alyssa> bnieuwen1uizen: How I love the vulkan spec thinking about corner cases
<alyssa> Humans aren't ready for this
<bnieuwen1uizen> hmm, I should have phrased that as how it love it when the vulkan spec thinks of a corner case.
<bnieuwen1uizen> There are lots where it does not :P
<alyssa> :P
<alyssa> Can I just
<alyssa> OpenGL's blending is _madness_
<alyssa> just like
<HdkR> AMD + Khronos made a good spec, then Nvidia comes along and sticks a bunch of GL garbage in it. Nobody wants subroutines :P
<bnieuwen1uizen> well, vulkan has no subroutines?
<alyssa> Isn't "opaque" and "transparent" enough for all sane applications and then just offer a programmable fallback or something? Meh
<HdkR> bnieuwen1uizen: NVX_raytracing adds them
<HdkR> :D
<bnieuwen1uizen> ... why?
<bnieuwen1uizen> also it is NV only and experimental only?
<HdkR> Good question
<bnieuwen1uizen> then again, Khronos does not want KHX anymore because it is confusing, time for NV to let go of NVX?
<HdkR> The NVX extension will quickly die and it'll convert to NV_
<alyssa> seriously uh
<HdkR> If it ever wants to be core Vulkan then something will need to change with that callable bit :)
<alyssa> what's the use case of literally anything but opaque and alpha blending?
<bnieuwen1uizen> alyssa: what other kinds of blending are you talking about?
<bnieuwen1uizen> also if you want lots of weirdness where you see no point of an app ever using it, try logical operations
<HdkR> Oh hey, Dolphin uses those
<HdkR> ;)
<HdkR> and games mix logic ops with blending, so wtf
<bnieuwen1uizen> HdkR: how did I know radv did not implement them? :P
<HdkR> :D
<alyssa> bnieuwen1uizen: Literally any other argument to glBlend*
<bnieuwen1uizen> but logical ops can do stuff like a andnot or a nand on the color output
<alyssa> CONSTANT_COLOR/ALPHA, weird tricks with e.g. source factor = destination color, etc
<HdkR> Like GL_ONE, GL_ONE, etc?
<HdkR> ..
<alyssa> mm
<HdkR> GL_ZERO, GL_ONE, CONSTANT, yea
<bnieuwen1uizen> probably for strange stuff like approximations of order independent blending?
<bnieuwen1uizen> subtract might be useful for fog
<HdkR> You can also do wacky things like generating a mask
<HdkR> I've seen this happen for sprite generation with a mask
<HdkR> Did some magic with intersection testing with the mask they generated I think
<bnieuwen1uizen> funny thing: dota2 uses a stencil mask to only render parts that are not obscured by the UI
<bnieuwen1uizen> (not blending but still)
<HdkR> Neat
* bnieuwen1uizen is still annoyed that nobody uses VK_EXT_discard_rectangles for this (or the GL equivalent)
<HdkR> Got to get as much performance as possible so dota can run on the slowest of hardware
<bnieuwen1uizen> HdkR: or for VR
<bnieuwen1uizen> though getting 90 fps on a threadripper is pretty hard
<HdkR> oof
<bnieuwen1uizen> seriously, be prepared to get like -20% perf due to too much threads
<HdkR> anything that communicates cross-CCX is going to hurt
<alyssa> Wee
<alyssa> Never understand overlay till now so that's cool
<HdkR> Don't you love having photoshop blend modes implemented in hardware? :D
<alyssa> I mean
<alyssa> I'd rather just have blend shaders but
<HdkR> I think it is funnier to have them implemented in hardware
<alyssa> Then again Midgard does just use blend shaders
<alyssa> but pretends it's hardware
<HdkR> When you don't have any form of programmable blending :)
<alyssa> and then when your performance nosedives we just shrug
<HdkR> hehe
<bnieuwen1uizen> alyssa: you blend shader time will come probably, with GL_KHR_blend_equation_advanced
<alyssa> Seriously would it be so terrible to at least document which ones are accelerated and which ones are sw? :P
<alyssa> bnieuwen1uizen: Yeah that's all shaders
<alyssa> but also like
<bnieuwen1uizen> (needed for the GLES AEP)
<alyssa> some pure ES 2.0 blend modes are shader
<alyssa> ....ES 3.2 spec is 600 pages
<alyssa> Ugh
<HdkR> It's big
<bnieuwen1uizen> well, unless you are ROP bound, I'd expect a blend shader to be not too terrible?
<HdkR> It's like GL 4.x with a bunch of dumb removed
<bnieuwen1uizen> (for comparison, vulkan is like 1767 already)
<bnieuwen1uizen> you can ignore all the extensions though
<alyssa> Oh dear
<alyssa> 3.0 is only 350 pages. That seems a lot more managable
<alyssa> (2.0 is 200 pages and we essentially have the big stuff there down)
<bnieuwen1uizen> the real question is why care about the GL spec if you're doing gallium ;)
<alyssa> bnieuwen1uizen: I mean
<alyssa> all of the tests I'm using are GL
<alyssa> all the apps I care about are GL
<alyssa> and the hardware is rated for a GL version level, not a Gallium one
<HdkR> Guess there is something to be said for knowing the enemy from from the spec to know why you're implementing something in gallium
<HdkR> Hard to know what an SSBO is if you're never once read the GL spec :)
<HdkR> Or done nothing with GL
<bnieuwen1uizen> right, but a quick introduction is very different from parsing standardese
<alyssa> bnieuwen1uizen: also, it's invaluable for RE
<alyssa> at least for my workflow
<alyssa> (Something I imagine you don't have to deal with for your hw?)
<bnieuwen1uizen> less, but you'd be surprised
<alyssa> rip
<bnieuwen1uizen> sometimes you have interesting register fields with very cryptic names: CB_HW_CONTROL_3__DISABLE_ROP3_FIXES_OF_BUG_511967
<alyssa> ah
<bnieuwen1uizen> and you have large swathes of registers/commands for which the existence is not documented, and for most we only know the name but not what they do
<bnieuwen1uizen> most that we don't already use*
<alyssa> Nice.
<alyssa> surely somebody has the Verilog (or whatever)? :P
<HdkR> That's when you have to shoot off an email and hope someone answers your question about the register :D
<bnieuwen1uizen> well, we have a partial leaked register documentation for an older generation chip that the linux on PS4 developers found
<HdkR> Which is funny
<bnieuwen1uizen> and the rest is just RE, get a hint from the name and see how it works
<bnieuwen1uizen> the funniest changes are IMO when Mareko comes with some magic number changes for some situation, claiming it is faster, and routinely I can't find/create a bench that shows it is faster
<alyssa> bnieuwen1uizen: (don't you work for the same company that makes the chip?)
<bnieuwen1uizen> alyssa: what gave you that impression?
<alyssa> Not sure. Too many people to keep track of
<alyssa> er wait
* bnieuwen1uizen sometimes doubt AMD has competent documentation themselves
<alyssa> You're Valve working on AMD?
<bnieuwen1uizen> Half hobby, half 20% project at Google
<alyssa> ...huh, ok
<alyssa> I can barely keep track of the ARM GPU space as it is
<bnieuwen1uizen> and yes, AMD HW
<HdkR> hehe
<bnieuwen1uizen> so, what if somebody runs AMD GPUs on an ARM host? Isn't it included in your GPU space already? ;)
<alyssa> By ARM GPUs, I meant GPUs produced by ARM ;)
<bnieuwen1uizen> ah
<alyssa> (Actually I included Adreno, VideoCore, and Vivante but yeah)
<bnieuwen1uizen> well, that is a pretty wide group :P
<alyssa> ...and I can barely keep track :)
<bnieuwen1uizen> then again, even most people which do keep track tend to have the wrong impression about my affiliation :P
<bnieuwen1uizen> always interesting at XDC telling half the people you don't work for AMD
<HdkR> haha
<HdkR> I just like poking you about random AMD hardware quirks :)
<bnieuwen1uizen> hey I do work on AMD HW, that is totally valid B)
<alyssa> bnieuwen1uizen: I mean, it's confusing since AMD does support foss drivers
<alyssa> With me there's no ambiguity since nobody is funding free Mali :p
<bnieuwen1uizen> alyssa: have you heard about this situation with two AMD open-source Vulkan drivers?
* bnieuwen1uizen is working on the one that is not supported by AMD
<HdkR> <3 that situation
<bnieuwen1uizen> (which shares the most code with the GL driver, that is supported by AMD, to make things more confusing)
<bnieuwen1uizen> HdkR: the big question is: how are we ever to get out of this situation in a reasonable way? :P
<alyssa> bnieuwen1uizen: I have and don't understand it
<HdkR> I think the reasonable way is that the other one dies
<alyssa> how can there be competition if they're uh
<alyssa> both foss
<bnieuwen1uizen> alyssa: competition between which development team gets funding?
<bnieuwen1uizen> I mean at the end of the day it is a question of whether we can do two driver mediocre in their own way or one great driver
<alyssa> Code sharing tho?
<alyssa> or is it too different
<bnieuwen1uizen> with the same developer resources
<bnieuwen1uizen> some of it, but not all of it right now. It is complicated
<bnieuwen1uizen> HdkR: the AMD driver dying is a long ways of, like I'd expect this to sudder for 5 years or so unless the radv side gives up ...
* bnieuwen1uizen gets the feeling he is ranting too much about AMD in a Mali channel
<HdkR> haha
<alyssa> levenstein distance of 2 is ARM, so
<alyssa> *levenshtein
<HdkR> `class SleepyLatentWorker(SleepyBaseWorker):`
<HdkR> Oh such a sleepy baby
<alyssa> Nini
<alyssa> The good news is that the cmdstream side of blend shaders is reasonable
<alyssa> And there's not a lot of ABI stuff to worry about, I don't thin
<alyssa> i.e. I can compile a blend shader with the blob and use that for starting out, independent of being able to generate them from the compiler
<alyssa> ("Alyssa, have you lost your mind?" "I think I left it in Galicia")
<anarsoul> alyssa: btw, I just tested latest lima and weston works here :P
<alyssa> anarsoul: hooray! :D
<alyssa> (what's the :P for?)
<anarsoul> for nothing
<anarsoul> ignore it
<anarsoul> :)
<urjaman> i've noticed i have a similar issue lol
<alyssa> :P
<alyssa> anarsoul: for my vain interest
<alyssa> could you test which scenes in glmark do/don't work?
<urjaman> it was extremely hard not to end that message with ":P"
<alyssa> (https://rosenzweig.io/glmark.txt for reference on Panfrost progress)
bnieuwenhuizen has joined #panfrost
* alyssa dumps a blend shader
<urjaman> ...
<alyssa> urjaman: what
<alyssa> I'll add 6 to whatever the number is to compensate for the kernelspace :p
<urjaman> nvm
* alyssa is confused
* urjaman didnt parse "dump" correctly
<alyssa> geez urja, I'm not _dating_ the shader!
* alyssa is exclusive with Kevin
<alyssa> Hm, so injecting a blend shader I'm getting an OPER_FAULT
<alyssa> There's probably a work_register_count field somewhere here
<alyssa> (The good news is that it's definitely executing the shader, or at least trying)
<alyssa> Who needs fragment shaders when you have blend shaders?!
<alyssa> :P
<urjaman> i dont know much but i guess both would be optimal :P
<alyssa> okay what
<alyssa> AH!
<alyssa> It reuses work_count what lol
<alyssa> so er
<alyssa> uh-huh
<alyssa> Well then
<alyssa> First blend shader injected successfull!
<alyssa> (Admittedly it's an uphill battle since This Was The Easy Part..)
<alyssa> Let's clean up that code and write some docs
paulk-leonov has quit [Ping timeout: 272 seconds]
paulk-leonov has joined #panfrost
embed-3d has joined #panfrost
jernej has joined #panfrost
<alyssa> You know, let's demoify this
<alyssa> Context: with some hacks added to the assembler for uint8/fp16/etc stuff, I can now write a blend shader
<alyssa> (manually)
<alyssa> I'm not ready for the compiler to start outputting this stuff, but I could bundle just this one shader and hot patch into the constants, to finish up the demo I was trying to do :P
<alyssa> Cool!
<alyssa> Just pushed a set of changes to be able to assemble the shader
<alyssa> (It's simpler than the shader the blob emits, unclear if I'm missing functionality or what. Shrug)
<alyssa> 12 files changed, 519 insertions(+), 54 deletions(-)
<alyssa> been busy today
chewitt has quit [Quit: Zzz..]
afaerber has quit [Quit: Leaving]
jernej has quit [Ping timeout: 240 seconds]
_whitelogger has joined #panfrost
jernej has quit [Ping timeout: 244 seconds]
_whitelogger has joined #panfrost
indy has quit [Read error: Connection reset by peer]
indy has joined #panfrost
jernej has joined #panfrost
paulk-leonov has quit [Ping timeout: 246 seconds]
paulk-leonov has joined #panfrost
pH5 has joined #panfrost
afaerber has joined #panfrost
TheCycoONE has quit [Read error: Connection reset by peer]
TheCycoONE has joined #panfrost
afaerber has quit [Quit: Leaving]
TheCycoONE has quit [Quit: ZNC 1.7.1 - https://znc.in]
TheCycoONE has joined #panfrost
cwabbott_ has joined #panfrost
cwabbott has quit [Ping timeout: 250 seconds]
cwabbott_ is now known as cwabbott
pH5 has quit [Quit: bye]
anarsoul|2 has joined #panfrost
pH5 has joined #panfrost
jernej has left #panfrost ["Konversation terminated!"]
jernej has joined #panfrost
jernej has quit [Quit: ZNC 1.6.5-elitebnc:7 - http://elitebnc.org]
jernej has joined #panfrost
tlwoerner has joined #panfrost
mearon has joined #panfrost
<mearon> Hey all. I own a Chromebook Plus. And today I discovered about panfrost (I honestly thought this was never going to be happen). This really made my day!
* mearon will have sweet dreams tonight
<mearon> A big Thank You to all the devs :3
<alyssa> mearon: <2
<alyssa> erm
<alyssa> <3
<HdkR> <4
<HdkR> 4>?
<HdkR> 4:) The 4 looks like a little hat =o
<alyssa> HdkR: Hey question
<alyssa> What are your thoughts about patching compiled binaries in real-time?
<anarsoul|2> alyssa: don't do it
<alyssa> anarsoul|2: why not tho
<HdkR> It's a common practice though
<HdkR> Either that or maintaining a set of known good blobs with patch points that you append to the end
<alyssa> HdkR: No, not blobs
<alyssa> Like, compile the shader once and save the patch point, and then patch it for later runs
<alyssa> (The glBlendColor is an immediate hardcoded into the blend shader. I don't want to reinvoke the compiler just because you changed the color. That's dumb :P)
<alyssa> Just writing to a standard vec4 in the binary, alignment is sane, etc
<alyssa> So it's as easy as
<HdkR> When you're patching are you thinking saving a copy of the shader to not have to repatch when it goes back to the previous blend mode?
<alyssa> blend color doesn't work like that :p
<alyssa> If the _mode_ changes, we're forced to recompile of course
<alyssa> memcpy((float *) ((uintptr) shader_binary) + color_offset), color, sizeof(float) * 4);
<HdkR> I mean when the application changes the blend with a another API call, another draw call, but uses the same program
<alyssa> I'm confused what the question is.
<anarsoul|2> alyssa: ah, so you're talking about shader binaries...
<HdkR> I'm confused what the use case is
<alyssa> Impedence mismatch between Gallium and Mali, I guess
<alyssa> In Gallium, the blend mode is part of a constant-state object (which is cached nicely and whatever), so I can elegantly express "compile on CSO create, attachment is free, so we compile only once and then the application can flip-flop how much it wants"
<alyssa> The wacky exception is the blend color, which is _not_ part of the CSO since it's too variable I guess
<alyssa> So I shouldn't try to back it by CSO either -- I should just make updating the blend color fast
<alyssa> And patching the shader binary directly seems like a really good solution technically, even if it sounds ugly
<HdkR> Just make blend color be a uniform?
<alyssa> Blend shaders don't have uniforms :p
<alyssa> No choice but to hardcode it
<HdkR> Does it have multiple input values that can be passed from the fragment side?
<HdkR> Data other than colour
<HdkR> er, output colour*
<alyssa> Don't think so
<HdkR> So number of input colours has to match number of blending output colours in the blend stage?
<alyssa> Erm
<HdkR> (Also Constant colour isn't used extensively so it may be worth eating the recompile and deal with the issue later)
<alyssa> Look uh
<alyssa> What is your objection to it? I thought it was a really cute solution why are we aruing :P
<HdkR> I'm attempting to explore all choices before suggesting patch points because yes they area usually pretty ugly
<alyssa> Hmph
<HdkR> s/area/are
<alyssa> If there's ever a clean patch point thing, it's this
<alyssa> Since:
<alyssa> - We're patching data, not code
<alyssa> - The patch points are computed dynamically as a compiler output, not hardcoded offsets into blobs
<alyssa> - It avoids passing into the constant to the compiler context directly which goes against any reasonable model :p
<alyssa> - From the compiler side, it's really easy to implement
<alyssa> - From the cmdstream side, it's even easier to implement
<alyssa> - Avoids ridiculous CPU overhead
<HdkR> Yea, there are lots of upsides to it
<alyssa> Look I just think blend shaders are adorable and I've always wanted to say "Oh, yeah, I patch compiled binaries in real-time" *waggles eyebrows*
<HdkR> So does the PS3 though, it isn't completely archaic :P
<HdkR> Although that has to patch shader programs to emulate uniforms
* HdkR shivers
<urjaman> lgtm :P (actually, to me a memcpy like that would seem a bit over the top, but the compiler just saving a pointer to a struct with the appropriate color data (or data to make said ptr) and later using it, fine...)
<alyssa> urjaman: I wouldn't actually do a memcpy; that was just to fit it on one-line for irc
<HdkR> alyssa: So another question. What if you don't do a full recompile in case blending mode changes and instead just generate blend shaders that are reused to match a program's blending. Then your constant one you just patch + give to the duplicated program for that state?
<alyssa> I'm confused
<urjaman> yeah me too
<alyssa> Are you suggesting ubershaders
<alyssa> Because if so, go away Dolphin shill :v
<alyssa> :p
<HdkR> Think you have three draw calls with different blend state. One with "regular" blend modes, one with GL_CONSTANT with constant A, then GL_CONSTANT with constant B. The application spams between the three but they all use the same original shaders
<HdkR> Do you patch the constant versions of the programs between every draw call or retain a unique copy for each?
<alyssa> Patch between them but again, patching is free
<alyssa> (Well, not free, but.... let's say it costs 10 cycles per patch :P)
<HdkR> Does this mean you have to flush the previous draw call before modifying the blend stage to make sure you don't get corrupted output for modifying the constant colour live while rasterization is still happening?
<alyssa> I mean you can duplicate the shader in memory
<HdkR> That's what I'm getting to
<alyssa> this is bikeshedding
<alyssa> :P
<HdkR> Or trying to and constantly failing by sticking foot in mouth
<HdkR> That's the main thing I was wondering about though D:
<HdkR> Duplicate program then modify constant color, or modify program and not duplicate
<alyssa> idk that's a minor detail rn :p
<HdkR> Because if you're duplicating then I don't see an issue with it
<HdkR> Eats some additional memory but programs are typically small
<HdkR> The workaround for that being if you can pass a uniform in to the fragment stage in the case of constant color and passing that uniform to the blend stage then the overhead is just filling a uniform
<alyssa> Blah
<HdkR> :)
<HdkR> Just wait until you have a shader that live modifies itself
<alyssa> that's not legal on our hardware :p
<HdkR> hehe
<HdkR> Determine how big the icache is, how much it can prefetch. Modify the blend shader patchpoint itself
HdkR was kicked from #panfrost by alyssa [Heresy]
<alyssa> :P
HdkR has joined #panfrost
<alyssa> <3
<HdkR> lol
tgall_foo has quit [Read error: Connection reset by peer]
<HdkR> My cute idea is only viable if you 1) Can read/write the blend shader memory space, 2) Can't pass things that aren't output colours to it
<HdkR> Something that would clear up a lot of this for me. How does the fragment stage and blend stage pass data between them?
<HdkR> fixed size sram or something?
<alyssa> I'm not sure yet
<HdkR> Does it appear as a specialized write in the fragment shader?
<alyssa> It's the same write used for the fixed function blending pass
<HdkR> Some sort of indexed store?
<alyssa> Not indexed, just a store
<alyssa> well
<alyssa> a branch but I digress
<HdkR> some sort of auto post-decrement for choosing which channel it ends up in?
<alyssa> ryan i have no idea how they wired it up, I'm not psychic :p
<HdkR> Effectively disallowing writing results out of order?
<HdkR> So if you only write alpha it still has to first pass in rgb? :P
<HdkR> (Sometimes I'm exhausting)
<alyssa> ...hm