<icecream95>
tomeu: The first step would be reading (and ignoring) the comment "NOT FOR HARDWARE DRIVERS NEVER WILL BE"
nlhowell has joined #panfrost
paulk-leonov has quit [Ping timeout: 272 seconds]
guillaume_g has quit [Quit: Konversation terminated!]
guillaume_g has joined #panfrost
paulk-leonov has joined #panfrost
nlhowell has quit [Ping timeout: 246 seconds]
guillaume_g has quit [Quit: Konversation terminated!]
guillaume_g has joined #panfrost
icecream95 has quit [Ping timeout: 264 seconds]
tomboy65 has quit [Ping timeout: 240 seconds]
tomboy65 has joined #panfrost
tomboy65 has quit [Ping timeout: 240 seconds]
tomboy65 has joined #panfrost
kaspter has joined #panfrost
kaspter has quit [Quit: kaspter]
kaspter has joined #panfrost
nlhowell has joined #panfrost
guillaume_g has quit [Quit: Konversation terminated!]
guillaume_g has joined #panfrost
cwabbott_ has joined #panfrost
cwabbott has quit [Ping timeout: 272 seconds]
cwabbott_ is now known as cwabbott
<chrisf>
tomeu, is possible that was a mistake -- but it does populate the /boot with the bits that look necessary.
* chrisf
is far more comfortable with shader compiler guts than fiddling with getting the board to work
<tomeu>
yeah, I also hate that
<tomeu>
not sure your u-boot knows it has to mount the disk and look in /boot
<chrisf>
the existing u-boot (hardkernel's one) was loading an uncompressed image, dtb, and uinitrd from this disk
<tomeu>
mmind00: is there an easy way for chrisf to update the kernel in his go advance if he doesn't have a serial console?
<mmind00>
tomeu: not sure ... like on my Go, I cleared the vendor uboot from the spi, put one on the sd-card and just am using extlinux to load a kernel also from there
<mmind00>
tomeu: so I don't really know how the procedure _with_ the vendor uboot is
<tomeu>
chrisf: and what happens when the board boots?
<chrisf>
tomeu, in the failure case?
<tomeu>
chrisf: yep
<chrisf>
the uboot splash stays up, and nothing appears to happen
<chrisf>
i imagine with the console i'd see it upset about something
<tomeu>
hmm, I think I saw some complaint in the splash when it wasn't able to find the kernel
<chrisf>
i *have* confirmed that it's using the uboot on the sd-card. in another experiment i rebuilt that and i could see my one was running.
<chrisf>
there
<tomeu>
guess you have the display driver built-in in the kernel?
<chrisf>
's a generic "system failure" complaint in the splash which tells you nothing useful
<tomeu>
ah, guess a custom u-boot could make it easier to figure out
<chrisf>
i was trying to get the uboot netconsole working so i could see what it was doing
* alyssa
popcorns
<alyssa>
daniels: when is rk3399 expected back up?
<daniels>
alyssa: I was told 'a few hours'
<daniels>
alyssa: so probably somewhere between 2-4h from now?
<daniels>
(Vivek is replacing switches, recabling, and reconfiguring the network, so it should be a lot more stable and hopefully also faster - at some stage it's also going to get nicely sharded so we don't lose an entire class of test devices from network/switch/power/USB/rack/... outages)
<alyssa>
gotcha!
<tomeu>
chrisf: no easy way to get a serial cable?
<chrisf>
tomeu, last part i need *should* arrive today
<tomeu>
nice :)
<chrisf>
assuming the uart connected to header along the top actually works :)
<chrisf>
on a completely different tack, for vulkan -- it seems there's a few ugly things about the hw that complicate a very cheap mapping
<alyssa>
chrisf: oh?
<alyssa>
blending for one :p
<chrisf>
i think that's actually not the end of the world -- we get to build a monolithic pipeline object which can contain the blend shader if we need one
<alyssa>
so what is the end of the world?
<alyssa>
min/max index?
<chrisf>
ideally we could go all the way to the hardware descriptor structures during command buffer recording
<chrisf>
min/max index is kinda gross, yes
<chrisf>
but the job descriptor headers are mutated by the hw for status etc -- how to deal with a command buffer being submitted multiple times?
<alyssa>
oh, oof
<alyssa>
in GL we re-emit the headers (and payloads)
<alyssa>
in practice it's not the *worst* thing since all the interesting bits are in other descriptors pointed around
<alyssa>
so the actual main job descriptor header+payload isn't as large as it would be in a more conventional architecture
<alyssa>
but yeah it's ugly
nlhowell has quit [Ping timeout: 260 seconds]
<chrisf>
also what to do with secondary command buffers, where we record a bunch of draws to later include in a primary, but dont necessarily know what render targets will be in use
<chrisf>
afaik the blob doesnt even try there -- they defer everything, and inline secondaries into the stream of stuff they produce at the last moment at submission time
nlhowell has joined #panfrost
<chrisf>
the job resubmission thing is why vulkan has explicit one-time submission and simultaneous use flags on its command buffers
<chrisf>
so if the app says it doesnt need the fully general thing you can do something less weird
kaspter has quit [Remote host closed the connection]
kaspter has joined #panfrost
<chrisf>
icecream95: had you given this stuff any thought yet?
<chrisf>
i suppose if you do a gallium state tracker then you dont have to deal with a lot of it, but you burn a bunch of cpu vs a direct mapping
<bbrezillon>
chrisf: I had, and IIRC, the conclusion was that we need to have templates for some of those descs and re-emit them (some descs can be kept around, like textures)
<chrisf>
bbrezillon: for resubmission, for secondaries, or both?
<bbrezillon>
unfortunately I don't remember
<bbrezillon>
I guess it was both
<chrisf>
bbrezillon: my real mission on this thing is to beat the pants off the blob on cpu overhead.. but that's going to be a long road :)
<bbrezillon>
chrisf: I guess we can start with a sub-optimal solution involving a lot of CPU -> GPU copies, and see how we can improve that afterwards
<alyssa>
but yes, definitely go for a 'real' vk driver
<bbrezillon>
there's a lot of plumbing to do before we can even run a simple VK prog
<alyssa>
(i.e. without depending on Gallium)
nlhowell has quit [Quit: WeeChat 2.8]
nlhowell has joined #panfrost
<chrisf>
bbrezillon: oh yes :)
<chrisf>
bbrezillon: i just got done implementing it in software, well aware of the amount of plumbing :)
<bbrezillon>
chrisf: if you don't want to start from scract, you can check my branch
<bbrezillon>
but it's far from functional
<tomeu>
chrisf: oh, ANGLE?
Ntemis has joined #panfrost
<chrisf>
tomeu: swiftshader
<tomeu>
ah, mixed the two
<tomeu>
guess then that panfrost will be a walk in the park for you :p
<chrisf>
tomeu: on the vulkan side, sure. mali is still plenty weird though ;)
<tomeu>
we have alyssa for that :)
raster has quit [Remote host closed the connection]
raster has joined #panfrost
<chrisf>
unrelated -- do we have any idea how the idvs jobs work?
<chrisf>
this was supposed to be a big deal on bifrost
raster has quit [Client Quit]
raster has joined #panfrost
indy has quit [Ping timeout: 265 seconds]
buzzmarshall has joined #panfrost
indy has joined #panfrost
<cwabbott>
chrisf: sounds like a fun project... and yeah, needing the index bounds is a big ouch there... best case you have to patch on command submission (because you don't know if it's been touched by the CPU before then) and worst case you have to emit a compute shader on-the-fly to calculate it or something
<chrisf>
cwabbott: the blob emits a compute job to do it
<cwabbott>
makes sense I guess
<cwabbott>
the user could be evil and record two command buffers, one that writes to a buffer and one that uses the buffer as an index buffer, and submit them at the same time
<chrisf>
cwabbott: super evil case: you have two draws in a single renderpass, the first has side effects to mangle the index buffer for the second.
<cwabbott>
so you can't really know whether it'll get overwritten by the GPU
<cwabbott>
chrisf: at least you can detect that case using dependencies
<cwabbott>
and pipeline barriers, if it's between passes
<chrisf>
cwabbott: yes, you'll see a pipeline barrier for it
<cwabbott>
but if it's in a separate cmd buffer then you won't see the pipeline barrier, is what I'm trying to say
<chrisf>
the app *still* has to provide the pipeline barrier even if the dependency is across a command buffer boundary
<cwabbott>
it could be in the earlier command buffer though
<chrisf>
ah, i see what you're saying
<chrisf>
yeah, that's ok.
<cwabbott>
so I think that completely defeats your ability to calculate the bounds on the CPU
<chrisf>
i think you just dont bother and do it on the GPU
<cwabbott>
yeah
raster has quit [Ping timeout: 240 seconds]
raster has joined #panfrost
tomeu has quit [Quit: Ping timeout (120 seconds)]
shadeslayer has quit [Quit: Ping timeout (120 seconds)]
ndufresne has quit [Quit: Ping timeout (120 seconds)]
ndufresne has joined #panfrost
tomeu has joined #panfrost
shadeslayer has joined #panfrost
<chrisf>
tomeu: anything else worth poking at before i get the serial console working?
<alyssa>
cwabbott: I had not considered that case ... that is horrifying
yann has quit [Ping timeout: 240 seconds]
<alyssa>
Does GL also have that issue, I guess?
<chrisf>
alyssa: this is why GL grew things like DrawRangeElements
<alyssa>
chrisf: I specifically meant "bind an index buffer as an SSBO and mangle it" etc
<chrisf>
oh, absolutely. GL buffers are buffers for any purpose
<alyssa>
I guess with our gallium stack, that would trigger a flush so it'd be fine
<chrisf>
yeah, im pretty sure gallium takes care of it
<alyssa>
on T720 with my scheduler/RA changes, but fine on t860. wat
robmur01_ is now known as robmur01
<alyssa>
Argh, okay, register spilling is broken on t720 at least under some circumstances
<alyssa>
Unrelatedly, wondering if we should really just flip fp16 on
<alyssa>
We still take a hit on cycle count but I suspect the win in register pressure makes up for it
<alyssa>
register spilling fixed on t720 :)
<chrisf>
well, i have a working serial console now... but no joy
guillaume_g has quit [Quit: Konversation terminated!]
<alyssa>
:v
<HdkR>
:>
<robmur01>
:^
<chrisf>
ive already run the gamut of unimpressed facial expressions ;)
buzzmarshall has quit [Ping timeout: 260 seconds]
tomboy65 has quit [Ping timeout: 240 seconds]
tomboy65 has joined #panfrost
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #panfrost
<alyssa>
so txf with msaa has the sample # in coord.z
jolan has quit [Quit: leaving]
jolan has joined #panfrost
<HdkR>
alyssa: As in number of samples or sample index?
<alyssa>
index
<alyssa>
Okay, so I have txf_ms handled now
<HdkR>
With 3D MSAA textures does it extend to w then?
<alyssa>
GLES doesn't do 3D MSAA textures
<alyssa>
anyway .z makes sense because Mali is treating MSAA textures as 3D textures
<alyssa>
where depth = sample count
<HdkR>
Ah, it needs GL_OES_texture_storage_multisample_2d_array for MSAA 2D array
<cwabbott>
alyssa: yeah, the blessing/curse with vulkan is that the driver never gets the full picture... you can record various commands in parallel, and then only at the very end, after all the descriptors etc. have been generated, does the driver find out what order they're going to be executed
<alyssa>
cwabbott: joy
<alyssa>
Oh, even better, they literally have 4 separate layers for each sampling... ugh
<alyssa>
this can't be good for perf
<alyssa>
So all the 'magic' is in rendering ... fun
nlhowell has quit [Ping timeout: 260 seconds]
<cwabbott>
chrisf: one thing you should think about early is how to handle renderpasses
<cwabbott>
in turnip we're rather lazy at the moment, since we can always fall back to immediate-rendering mode if something goes wrong
<Lyude>
HdkR: you started any work on vulkan w/ bifrost yet btw?
<HdkR>
wha? Nah. that was just a weekend thing to see what the initial infrastructure work takes
<Lyude>
ah
<cwabbott>
so we statically divvy up the tile buffer between all the attachments, and if it runs out of space then whoops, let's just use sysmem (immediate-mode rendering) for this renderpass instead
<cwabbott>
but you don't really have that luxury with mali
<HdkR>
Dang Adreno, being able to cheat
<HdkR>
Probably isn't even completely terrible with the platforms that have 68GB/s of memory bandwidth :P
<cwabbott>
my understanding is that you're supposed to think of each subpass like an instruction and do "register allocation" on the tile buffer, including spilling
buzzmarshall has joined #panfrost
<cwabbott>
and in the downstream kernel there's even some special JIT memory allocation path to handle the "oh shit, we need to allocate a giant framebuffer during command submission" case if you spill
nlhowell has joined #panfrost
<cwabbott>
also there are some, err, fun rules around vertex/fragment atomics which mean that as soon as you enable that feature, you may have to split the renderpass into smaller parts
<cwabbott>
(that's yet another thing we can get around in turnip by forcing sysmem)
<cwabbott>
so getting the "render pass allocator" right is going to be one of the toughest part of a mali vulkan driver, I think
Ntemis has quit [Read error: Connection reset by peer]