stikonas has quit [Remote host closed the connection]
<Lyude>
lvrp16: got my system, thanks a ton!
<lvrp16>
Lyude: speak of the devil
<lvrp16>
ask and your wish is granted by the USPS
<Lyude>
Hehe
<Lyude>
alyssa: ^ that's my rk3399
<alyssa>
Lyude: Awesomesauce :)
<anarsoul>
Lyude: so now you're working on midgard? :)
<tomeu>
o/
<hanetzer>
o/
<Lyude>
anarsoul: a bit! I really needed a reference system for it since a lot of the work that's getting done right now with winsys and others is happening on there
<Lyude>
hanetzer: also re chromeos using xwayland: Huh.
<Lyude>
Huh.....
<Lyude>
so are they using glamor with gbm?
<hanetzer>
Lyude: unsure about whole specifics, but if you install their linux-on-chromeos thing (its some form of aarch64 vm while the real system is compiled in 32 bit arm mode with some flags that make it work better on aarch64) and check the env its fulla waylandstuff :)
<tomeu>
hopefully I will find time to replace that custom driver with vsock and out-of-band buffer passing
<tomeu>
alyssa: what's the rationale for pre-baking stuff instead of waiting until we emit?
<tomeu>
it's being a bit problematic when trying to replace the GPU addresses that we are adding to the cmd stream with relocs
<tomeu>
if it was done all during the emit phase, we would have a single point of divergence between the DRM and non-DRM drivers
<tomeu>
having it all together in the same place would also help with readibility
<tomeu>
alyssa: what do you think of moving closer to what other drivers do and have a panfrost_emit.c with all the emission code, with macros such as OUT_RING and OUT_RELOC?
<tomeu>
the non-drm backend would emit a GPU address with OUT_RING, but the drm one would emit instead a OUT_RELOC
pH5 has quit [Quit: _]
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #panfrost
<cwabbott>
tomeu: why do you want relocs at all? all the other major drivers have ditched them
<cwabbott>
intel is currently rewriting their driver so that they can do exactly what alyssa's code is doing
<tomeu>
cwabbott: oh, do you have any pointers to any discussions on this?
<tomeu>
well, or to code, I wasn't aware of the switch away from relocs
<cwabbott>
tomeu: anholt has a blog post on how he decided to go userptr-only for v3d
<cwabbott>
that kernel driver is upstream afaik
cwabbott has quit [Client Quit]
cwabbott has joined #panfrost
<cwabbott>
it turns out that relocs are a huge source of draw-time overhead, and extra complexity in the userspace driver
<cwabbott>
and on modern hardware with an mmu, that's almost never worth it
<cwabbott>
in terms of drivers to emulate, v3d is probably better than etnaviv or lima since it's never, written by someone with experience, and also only has to deal with modern hw
<tomeu>
cwabbott: hmm, just checked emit_one_texture and cl_address returns a v3d_cl_reloc
<tomeu>
which is placed in the cmd stream instead of a gpu address
<cwabbott>
tomeu: I don't see that, there's no v3d_cl_reloc in the uapi
<tomeu>
hmm, so maybe relocations are resolved at a later stage in userpsace instead of in the kernel?
<cwabbott>
from drm_v3d_create_bo: "Returned offset for the BO in the V3D address space. This offset is private to the DRM fd and is valid for the lifetime of the GEM handle."
<cwabbott>
tomeu: it might just be leftovers from vc4
<tomeu>
cwabbott: do you know what's used to provide isolation between processes?
<cwabbott>
tomeu: like midgard and above, each process has its own GPU page table
<cwabbott>
this is something pretty much all modern GPU's support
<tomeu>
awesome, lots of work just saved, thanks!
<cwabbott>
tomeu: btw, one kinda kernel-uapi-related thing I've come across: apparently, the program counter is only 24 bits (or something like that)
<cwabbott>
this means that every location for every instruction must be in the same 2^24 bits
<cwabbott>
*2^24 bytes
<cwabbott>
since otherwise it'll just wrap around
<cwabbott>
I'm not sure what it takes to run two programs with different upper 64-24 bits, but it might involve a cache flush or something like that
<cwabbott>
but the point is, the blob basically deals with this by allocating a whole aligned 2^24 bytes at once for shaders, and the kernel automagically aligns the allocation to 2^24 when the GPU execute permission is set
<cwabbott>
then anything allocated within that 2^24 byte pool has the same upper bits
<cwabbott>
I'm not sure how you want to deal with it, but panfrost userspace and/or kernel has to deal with it somehow
pH5 has joined #panfrost
<tomeu>
ok, I see
<tomeu>
wonder if that's different for all the other GPUs supported in mesa
<cwabbott>
tomeu: I've never seen that on another GPU
<cwabbott>
what I have seen, is something similar on Intel where there are different base addresses, and everything has to be within a fixed small distance of the base address
<cwabbott>
but that's better than what ARM does because there's no alignment restriction
<cwabbott>
this just seems like a suckier version of the base address concept
<cwabbott>
the motivation is similar, they want to make the instruction cache and program counters smaller, but executed much worse
<tomeu>
yeah
<cwabbott>
I can't think of a better way to handle it than allocating an aligned 2^24 bytes for shaders
<cwabbott>
although, maybe that alignment should be explicit in the uapi instead of added automatically for executable allocations
<alyssa>
Which means the prebaking is taking full advantage of Gallium capabilities and is the Right way to handle it for our hardware; OUT_RING etc would be a regression in cleanliness and performance
<alyssa>
tomeu: And yeah, as cwabbott says we have a full MMU so it's fine :)
<alyssa>
narmstrong: Nice nice
BenG83 has quit [Ping timeout: 250 seconds]
pH5 has quit [Quit: bye]
TheKit has quit [Remote host closed the connection]