alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
youcai has quit [Read error: Connection reset by peer]
youcai has joined #panfrost
<alyssa> robmur01: \o/
<HdkR> Woo bifrost
raster has quit [Quit: Gettin' stinky!]
archetech has quit [Quit: Konversation terminated!]
vstehle has quit [Read error: Connection reset by peer]
stikonas has quit [Remote host closed the connection]
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 264 seconds]
camus1 is now known as kaspter
kaspter has quit [Remote host closed the connection]
kaspter has joined #panfrost
kaspter has quit [Client Quit]
robink has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
robink has joined #panfrost
kaspter has joined #panfrost
kaspter has quit [Excess Flood]
kaspter has joined #panfrost
robink has quit [Ping timeout: 272 seconds]
robink has joined #panfrost
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
<kinkinkijkin> duet will be around in a week or so
kaspter has quit [Quit: kaspter]
<kinkinkijkin> and ive just realized that, somewhat annoyingly, i actually cannot find anything on google about installing a base gnu distro directly on a chromebook
<kinkinkijkin> it's all about running a distro in a container on top of the existing chromeos
<kinkinkijkin> which is obviously not what i want
kaspter has joined #panfrost
icecream95 has joined #panfrost
<icecream95> kinkinkijkin: The instructions at https://archlinuxarm.org/platforms/armv8/rockchip/samsung-chromebook-plus should work, except the rootfs tarball linked from there has a too old kernel
<kinkinkijkin> thanks, bookmarking
<alyssa> icecream95: blog post is up, not sure if you saw, let me know if I butchered the description of your changes and we'll fix it 😇
<icecream95> alyssa: You misspelt 'OpenGL 3.3 with working geometry shaders' in the last line :P
<HdkR> Now we just need Valhall and a devboard that has it
<HdkR> :)
<anarsoul> I thought that midgard has geometry shaders
<HdkR> They are implemented as compute
<HdkR> I presume Bifrost would be the first to have it as a real hardware stage
<chewitt> no more PAN_MESA_DEBUG=bifrost! .. congrats and thanks to all involved :)
<HdkR> next up PAN_MESA_DEBUG=valhall
kaspter has quit [Remote host closed the connection]
kaspter has joined #panfrost
davidlt has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
kaspter has joined #panfrost
<chewitt> :)
<icecream95> then PAN_MESA_DEBUG=nv
vstehle has joined #panfrost
nlhowell has joined #panfrost
<macc24> kinkinkijkin: when i get my duet i will make debian run on it without any containers :D
camus1 has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
chewitt has quit [Read error: Connection reset by peer]
chewitt has joined #panfrost
<narmstrong> VIM3L (G31) on GloDroid \o/
<narmstrong> rsglobal did all the job
<narmstrong> only integration with amlogic specific stuff was needed
<tomeu> nice!
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
icecream95 has quit [Ping timeout: 260 seconds]
_whitelogger has joined #panfrost
raster has joined #panfrost
kaspter has quit [Ping timeout: 260 seconds]
camus1 has joined #panfrost
sphalerite has quit [Ping timeout: 260 seconds]
camus1 is now known as kaspter
alpernebbi has joined #panfrost
sphalerite has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
<brads> I now have more frost in my pan by adding "-Dgles2=true -Dglvnd=true -Dglx-direct=true -Dgbm=true -Ddri3=true", gnome runs like a rocket (defnently out perfroms libMali now no doubt) and my mouse has become smooth :)
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
archetech has joined #panfrost
<alyssa> icecream95: >:
<alyssa> HdkR: Bifrost has not native geom/tess either
nlhowell has quit [Quit: WeeChat 2.9]
nlhowell has joined #panfrost
kaspter has quit [Ping timeout: 256 seconds]
kaspter has joined #panfrost
<bbrezillon> robmur01, stepri01: I faced this error https://gitlab.freedesktop.org/-/snippets/1305, which makes me wonder if our MMU AS removal/re-assignment is safe
<bbrezillon> say we have a context that's assigned an AS on which an MMU fault happens, but by the time we reach the MMU fault handler (threaded IRQ), the AS gets re-assigned to a different context
<alyssa> Hmm can I bang out +ZS_EMIT support in the next 45 minutes? let's find out!
<robmur01> bbrezillon: hmm, so panfrost_mmu_map_fault_addr() tries to resolve the fault, looks up the wrong context and maps the page into someone else's pagetable?
<robmur01> bleh :(
<bbrezillon> well, that trace says it tries to map something that's already mapped
<bbrezillon> which means the region the fault happened on match a heap BO in both contexts
<robmur01> yup, it seems entirely possible
<bbrezillon> so there shouldn't be any security issues, but we might try to map something that's not needed
<bbrezillon> or remap something that's already mapped (the case I hit here)
<alyssa> okay dropping the internet, hopefully bbiab with working z/s stuff, bbiab
<bbrezillon> robmur01: clearing IRQs when we re-assign an AS should help
<robmur01> a simple approach might be to just not reschedule an AS while it's in fault state, but that seems antiproductive...
<bbrezillon> but I'm not sure it's enough
<robmur01> since ideally a fault would be a great time to schedule something else in to do useful work while we resolve it :/
nlhowell has quit [Ping timeout: 240 seconds]
<robmur01> do we have anything to uniquely identify a context irrespective of which AS it happens to be running in (or not) at any given time?
<robmur01> Actually, isn't "heap BO in both contexts" the best case? If a legitimate fault on a heap BO is pending and we switch in a context where a fault at that address *isn't* valid, won't that end up killing the second (innocent) job?
<bbrezillon> yep
<bbrezillon> probably
<bbrezillon> I don't think we have anything identifying the context apart from the AS it's been assigned
<bbrezillon> maybe we should collect/clear faults in the hard irq and assign them to the currently bound context
<robmur01> suddenly I feel unusually glad that I need to go off and do other things now :P
<bbrezillon> :D
* bbrezillon regrets that the lockdown happened one week earlier in France
<stepri01> panfrost_mmu_as_get() does look faulty - it should only reclaim an address space if that other address space is actually free (i.e. not running a job on the hardware). There is a seperate potential issue of the MMU fault handler still dealing with a fault *after* the job is belongs to has finished (one of those 'really shouldn't happen, but technically can' situations)
<brads> bbrezillion: just had screen freeze doing silly Xwayland stuff and these locks being held on closedown of glmark2 (CTRL-C) - https://pastebin.com/eSe5HLab
<stepri01> but I think we should hit a WARN_ON() in panfrost_mmu_as_get() if we attempt a reclaim on an in-use AS, so I'm not sure why that isn't triggering too if that's the bug
<bbrezillon> stepri01: maybe the AS is no longer used but still has faults pending
<stepri01> that's the shouldn't really happen situation. If there's a fault pending the hardware will stall. However it is possible:
<stepri01> if the fault happens, and another action restarts the hardware (e.g. userspace maps/unmaps something on the GPU) then the job can continue, if it just so happens that the fault condition has gone away (e.g. userspace mapped something in the area that caused the fault) then the job can complete. And the kernel might then handle the JOB irq before it gets rather enough with the MMU irq
<stepri01> the upshot is really we should synchronise with the MMU irq before reassigning an address space - but it's an unlikely situation as far as I know
<bbrezillon> note that this has been triggered while debugging the timeout/reset handling stuff
<bbrezillon> so I had a lot of job faults and reset happening
<stepri01> ah - perhaps the DRM scheduler's idea of if a job is running is different from the hardware's...
<bbrezillon> and I still have a drm_sched hand BTW :'-(
<bbrezillon> *hang
<bbrezillon> which I can only reproduce on CI despite using the same kernel+config locally
<stepri01> :(
<bbrezillon> and as soon as I add traces, it goes away, of course
<brads> bbrezillon: it seems not, I might have to move to a newer kernel I think
<alyssa> answer: no, but I have the compiler side piped through and pushed
<alyssa> need to fix a few things on the cmdstream before the tests pass but class
<alyssa> bbrezillon: branch pushed if you want to take a look
<bbrezillon> stepri01: nailed it (I think)
kaspter has quit [Remote host closed the connection]
kaspter has joined #panfrost
kaspter has quit [Read error: Connection reset by peer]
kaspter has joined #panfrost
<stepri01> bbrezillon: cool - I hope it survives some testing this time! ;)
* alyssa shills for TLA+
archetech has quit [Quit: Konversation terminated!]
archetech has joined #panfrost
<bbrezillon> stepri01: well, it's hard to be sure given the number of times I thought I had it fixed
* alyssa shills more for TLA+
* bbrezillon waits for alyssa to convert the linux kernel (or even just the DRM part of it) to TLA+ :P
<daniels> 'the language is similar to LaTeX'
* daniels closes tab
<bbrezillon> :D
<alyssa> bbrezillon: It's not a programming language
<alyssa> It's a specification language, it's about precisely expressing what the system _should_ do.
<alyssa> The actual implementation is still in C or Rust or VHDL or whatever; it's ballparked that the actual code will be 10x larger than the spec.
<stepri01> I think at the moment the 'spec' is "run stuff on the hardware" and the code is significantly larger ;)
<alyssa> But the precision of it forces things to be really explicit, makes it possible to do formal proofs, and allows a lot of invariants to be machine-checked,
<alyssa> specialty is exposing concurrency bugs
<bbrezillon> if only I know what drm_sched tries to do/expects :p
<alyssa> ^^ exactly :p
<daniels> I mean if you manage to write a meaningful spec for Mali I'll be _super_ impressed
<bbrezillon> more seriously, the real problem boils down to the fact that drm_sched expects things to be controlled at the queue/scheduler granularity, including resets, while panfrost wants 3 schedulers (one per job slot) and reset to happen globally
<stepri01> well what we really need is one queue that feeds the three slots, but where jobs can overtake other jobs to keep the hardware busy
<bbrezillon> we're bending the drm_sched logic to make it fit our needs
<alyssa> bbrezillon: ^ then that's the sort of level you spec at
<bbrezillon> stepri01: yes, that's also an option I thought about
<alyssa> Internals are a black box but you would make the exact interactions between drm_sched (as a black box) and mali's hw schedulers (as black boxes) with the interactions between them precisely specced
<stepri01> that's the logic that kbase uses - but instead of a "queue" it's a tree of dependencies
<alyssa> Assuming a priori that drm_sched and the hardware are both correct
<bbrezillon> stepri01: but drm_sched is really not designed for that
<bbrezillon> AFAICT
<stepri01> it certainly doesn't seem to be :(
<bbrezillon> so maybe the right thing to do would be to have our own scheduler
<stepri01> of course I wouldn't say kbase's scheduler is great either. I was responsible for the rewrite to the current design, but it was constrained somewhat by having to have a migration path from the previous design
<stepri01> and then of course it evolved
<stepri01> although my biggest bugbear is that atom ID 0 is 'special' but it doesn't need to be ;)
* alyssa squints
<alyssa> Did you say bugbear?
<stepri01> yes...?
* alyssa sighs
<alyssa> My name isn't Alyssa. It's Special Agent Sweetie Drops. I worked...
<alyssa> :p
<stepri01> *whoosh* not a reference I understand :p
<alyssa> mlp:fim
<stepri01> yeah I got that much from google - but my knowledge of mlp is very limited!
<alyssa> I'd send the video but trying to cram fragdepth support before the branchpoint
<tomeu> priorities!
<alyssa> ^^ This hack fixes a bunch of L8/A8/L8A8 tests failing. Probably worth finding the root cause but if it's too complicated that'll be easy to backport later.
<bbrezillon> stepri01: looks like amdgpu has pretty much the same model, with one scheduler per queue (there's also the distinction between gfx and compute queues there) and a global reset used when the per queue reset is not possible
<stepri01> bbrezillon: yeah it should work - it just feels like we're hacking around a limitation in the drm scheduler unfortunately
<bbrezillon> stepri01: I had a quick look, and I could add a multi-queue sched, re-using most of the logic with the single-queue one, I'm just wondering if it's the right solution
<bbrezillon> s/re-using/sharing/
<stepri01> might be worth trying to get a view from the amdgpu folks to see if they'd be interested in it as well
<alyssa> ^^
<bbrezillon> stepri01: ok, let's get this hack/fix merged first
<stepri01> yes it would be good to get the fix in first
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
archetech has quit [Ping timeout: 265 seconds]
tomboy64 has quit [Remote host closed the connection]
tomboy64 has joined #panfrost
rando25892 has quit [Ping timeout: 272 seconds]
tgall_fo_ has joined #panfrost
tgall_foo has quit [Ping timeout: 256 seconds]
rando25892 has joined #panfrost
davidlt has quit [Ping timeout: 240 seconds]
mfilion has left #panfrost ["The Lounge - https://thelounge.chat"]
alpernebbi has quit [Quit: alpernebbi]
raster has quit [Quit: Gettin' stinky!]
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
camus1 has joined #panfrost
raster has joined #panfrost
kaspter has quit [Ping timeout: 260 seconds]
camus1 is now known as kaspter
archetech has joined #panfrost