alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
vstehle has quit [Ping timeout: 246 seconds]
_whitelogger has joined #panfrost
fysa has joined #panfrost
NeuroScr has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
vstehle has joined #panfrost
<tomeu> chewitt: don't know about amlogic :/
<tomeu> chewitt: wonder if we could find someone to be our amlogic champion :)
<chewitt> narmstrong is the person i'd normally point fingers at, but he's currently enjoying his summer vacation
<chewitt> tomeu: do you have any Amlogic S912 hardware?
<tomeu> chewitt: nope
<chewitt> can I fix that?
<tomeu> not sure, it doesn't seem that scalable
<tomeu> to collabora, it makes more sense to focus on rockchip, as we have made big investments in other parts of the hw
<tomeu> if we had a customer that wanted panfrost on amlogic, all would change, of course :)
stikonas has joined #panfrost
raster has joined #panfrost
stikonas has quit [Remote host closed the connection]
<chewitt> understood
<tomeu> it's not just getting things to work, btw, it's also keeping them working
<tomeu> so we need CI, and we need to keep that working so it doesn't disrupt development
raster has quit [Remote host closed the connection]
raster has joined #panfrost
<chewitt> Neil is the right guy.. Baylibre have all the right stuff in their lava lab afaik
<tomeu> we just need to wait then :)
<tomeu> alyssa: wonder if our CI should run astyle
<tomeu> so we don't have to choose later between having inconsistent style and having to re-run it again on the code base
<tomeu> alyssa: what's the reason for tiler_dummy, btw?
<tomeu> cannot we just use the tiler_polygon_list BO?
Elpaulo has joined #panfrost
<tomeu> alyssa: it's starting to look as if the hierarchy mask was totally different on T720
<tomeu> otherwise, the biggest difference I see with the blob is the depth stencil stuff in mali_shader_meta
davidlt has joined #panfrost
<tomeu> alyssa: have hacked the cmdstream to match libmali in terms of ds and blend, but keep seeing the same NULL dereference from the GPU
<tomeu> wonder if it could be related to how now the tiler jobs depend on the set_value job, but the vertex jobs execute before
<tomeu> that doesn't match what libmali is doing here
<tomeu> (set_value is submitted first in the chain without dependencies, then come the vertex jobs and then the tiler jobs)
<tomeu> this is the working cmdstream: http://paste.debian.net/1091233/
<tomeu> this one corresponds to the pile of hacks I pushed above: http://paste.debian.net/1091234/
<tomeu> and this is what I get in dmesg: http://paste.debian.net/1091235/
<tomeu> the DATA_INVALID_FAULT is due to the polygon_list being empty, because the previous job chain didn't finish successfully
<tomeu> because of the NULL access
<tomeu> ah, that's a single frame from kmscube, but with only one face of the cube being rendered
guillaume_g has joined #panfrost
<alyssa> tomeu: I don't really want CI to run astyle because I mean
<alyssa> I'd rather push badly spaced but working code than no code at all
<alyssa> and CI slows things down
<alyssa> tomeu: I'm not 100% certain about tiler-dummy, but this is a very firm guess:
<guillaume_g> Hi. I am trying to use mali-t604 on my chromebook snow, but it fails on boot, I get:
<guillaume_g> [ 14.002299] panfrost 11800000.mali: clock rate = 200000000
<guillaume_g> [ 14.002411] Unhandled fault: asynchronous external abort (0x211) at 0x00000000
<guillaume_g> How could I move forward?
<alyssa> Mali T604 isn't supported (yet)..
<guillaume_g> it seems to fail a panfrost_gpu_soft_reset
<alyssa> Although that does seem to be a legitimate bug in the kernel
<alyssa> robher: ^^
<guillaume_g> alyssa: ok. What is missing?
<alyssa> tomeu: SET_VALUE jobs reset the polygon list; TILER jobs add to the polygon list; FRAGMENT jobs read the polygon list
<alyssa> tomeu: So if you have no draws, there is no TILER but also no SET_VALUE.
<guillaume_g> maybe my DTB fragment is wrong.
<alyssa> tomeu: .....Meaning your fragment-only frame will actually end up redrawing whatever you drew last frame!
<tomeu> alyssa: ah, but the CPU writes to it :)
<alyssa> tomeu: So the blobl's apparent solution is to have the actual per-FBO poltgon list, and also a dummy empty one they keep. They switch them out depending if there are draws
<alyssa> tomeu: Well, it's possible that on T720, the internal polygon list structures were invalid if zeroed out
<tomeu> yeah, that's how it looks
<alyssa> But with a field or two in the header set, they became valid but stil empty
<alyssa> Regardless, you really do need the tiler_dummy
<tomeu> ah, haven't looked in the header
<robher> guillaume_g, alyssa: need to look at kbase and see if there are any reset related errata for t604. I don't recall any, but there's lots for t604 in snow. Reset needs to work well on it because my understanding is that h/w has to be reset several times a second.
<alyssa> robher: That's terrifying.
<tomeu> well, there's a bunch of microseconds in a second :p
<robher> Can we just buy everyone that asks about snow a new chromebook...
<tomeu> works for me, I'm not looking forward to debugging panfrost on t6xx with 64-bit descriptors...
<alyssa> Juno :o
<HdkR> Someone still has a juno?
<HdkR> Madness
davidlt has quit [Remote host closed the connection]
<tomeu> hmm, not sure there's a 64-bit DDK for Juno
davidlt has joined #panfrost
<alyssa> tomeu: I don't see the difference with depth/stencil? The defaults between Gallium and libmali are probably different without going out of spec
<alyssa> tomeu: I thought Juno *is* 64-bit
<tomeu> alyssa: yeah, I think you can ignore all my suggestions before
<tomeu> I'm really out of ideas :)
<alyssa> tomeu: I'm lookin
<alyssa> First weirdness:
<alyssa> Why on *earth* is the blob allocating the polygon list as executable?
<alyssa> vt_sfbd.clear_flags differ
<alyssa> workgroups_z_shift differs but I doubt that affects anything
<alyssa> tomeu: vertex payload's gl_enables differ <------ this one probably matters
<guillaume_g> alyssa, robher: adding OPP, I do not have a crash anymore, but still external aborts: https://pastebin.com/kGnq1L3d
chewitt has quit [Quit: Zzz..]
<tomeu> I seem to have broken CI and have no idea how
<robher> guillaume_g: Looks to me like there's a problem accessing the GPU registers. Probably something else needs to be enabled and that could be exynos specific. There were some patches on the list for exynos support on panfrost. Do you have those?
chewitt has joined #panfrost
chewitt has quit [Client Quit]
hanetzer has quit [Changing host]
hanetzer has joined #panfrost
<guillaume_g> robher: no, I am using kernel 5.2.0 atm.
<guillaume_g> robher: dou you have a link, please?
<guillaume_g> *do
<tomeu> alyssa: well, the mir_foreach_instr_in_block_safe patch seems to have broken everything
chewitt has joined #panfrost
<robher> guillaume_g: "[PATCH v2 3/7] arm64: dts: exynos: Add GPU/Mali T760 node to Exynos5433", but I don't see anything extra needed on other chips (there's no 5250 support in the series though).
<robher> guillaume_g: maybe one of the exynos folks can help. It's going to be some clock, regulator, or power domain most likely.
<chewitt> tomeu: narmstrong: I've tested reverting all commits since the initial panfrost merge in 5.2, e.g. all here https://github.com/torvalds/linux/commits/master/drivers/gpu/drm/panfrost
<robher> guillaume_g: Or you just have the wrong address.
<chewitt> and its still shows the same "panfrost: probe of d00c0000.gpu failed with error -12"
<guillaume_g> robher: I will switch to the arndale board, at least I will have a serial to debug things.
<chewitt> tomeu: narmstrong: so I have to conclude it's something external to panfrost (something else in the kernel changed)
<chewitt> one last thing to try is an aarch64 image
<chewitt> as we normally build 'arm'
<tomeu> ah, guess that could be it
<tomeu> alyssa: guess I will have to revert it in the meantime
<guillaume_g> robher: adding pd_g3d to power-domains make the system to freeze :(
<robher> guillaume_g: perhaps the PD is on already, and then on probe failure it gets turned off.
raster has quit [Remote host closed the connection]
<guillaume_g> robher: and it could freeze the board?
<robher> guillaume_g: certainly if something else is relying on the PD to be default enabled.
<guillaume_g> robher: ok. The last line I have before the freeze is "panfrost 11800000.gpu: clock rate = 200000000"
<robher> guillaume_g: I think also there's some issues in the clean-up error paths in the panfrost driver interacting with runtime-pm. I'm not certain though.
<chewitt> tomeu: narmstrong: for kicks I fully reverted panfrost and then cherry-picked the commit that added panfrost to our 5.1 kernel sources .. and this works
<chewitt> so i've exported the two commits as patch files .. now doing 'diff -y' to compare them
<guillaume_g> robher: it seems that upstream dts fragments have only one clock, whereas my downstream reference code has multiple clocks
<guillaume_g> robher: it seems I am missing some configuration for the main clock :(
<chewitt> tomeu: narmstrong: robher: this is the (inverse) diff between the "add panfrost" 5.1 kernel patch and the initial 5.2 commit merged in mainline
<chewitt> although when building with that patch added it doesn't compile http://paste.ubuntu.com/p/QQMFkZV4Bs/
<chewitt> could be my bad copy/pasting .. but something there is why T820 (S912) stopped working after the 5.2 bump
<robher> chewitt: there was some issue with 32-bit GPU VA being rejected by io-pgtable code. I'm not sure if Robin ever got a fix in.
<chewitt> we run a split 64/32 arrangement so kernel is 64-bot
<chewitt> s/bot/bit
<alyssa> tomeu: Hrm
<chewitt> alyssa: with the patch hacking .. I still see the black text problem I described before
<guillaume_g> robher: I got it "working". I mean the driver does not crash on boot. ;) https://pastebin.com/w9F6KKYt My problem was wrong clock + missing power-domain
<guillaume_g> robher: I hope what is printed makes sense
gtucker has quit [Ping timeout: 252 seconds]
tomeu has quit [Ping timeout: 252 seconds]
<alyssa> chewitt: :/
<alyssa> chewitt: Could you send the output with "MIDGARD_MESA_DEBUG=shaders" set?
<alyssa> And if I can't figure it out from that, maybe even with "PAN_MESA_DEBUG=trace" set (but it will be very large, so quit Kodi as soon as the black text appears)
<alyssa> It's just odd given that it worked fine before and our conformance numbers have been steadily _increasing_
<calcprogrammer1> I flashed a fresh install of Debian (unofficial arm64 this time) to my Rock Pi 4 and then built kernel, libdrm, and mesa as I was before. Can't confirm yet but it appears to be working now, no dmesg errors when running lightdm or kmscube. I'm remoted in right now so can't see screen. Are there any known issues with running a 32-bit distribution with Panfrost on arm64 hardware (rk3399)? Radxa's official Debian
<calcprogrammer1> build for the Rock Pi 4 is a 32-bit Debian armhf with an aarch64 kernel.
<alyssa> calcprogrammer1: 32-on-64 is rather broken right now, but we're working on it!
<alyssa> LibreELEC has a workaround but I don't recommend it at this time
stikonas has joined #panfrost
stikonas has quit [Read error: Connection reset by peer]
stikonas has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
BenG83 has joined #panfrost
davidlt has quit [Ping timeout: 244 seconds]
MistahDarcy has quit [Quit: Leaving]
jcureton has joined #panfrost
guillaume_g has quit [Quit: Konversation terminated!]
<jcureton> is there anything being done toward a platform quirks framework? i know people have added some vendor-specific compat flags but i haven't seen any patches toward building it
<jcureton> ^ within the kernel drm driver
jcureton has quit [Remote host closed the connection]
stikonas has quit [Remote host closed the connection]
jcureton has joined #panfrost
<robher> jcureton: didn't know we needed one.
<jcureton> robher: i'm trying to figure out if there's generically a need for one. i'm working on getting a T720 running as a backport on an SoC that definitely needs to handle some platform quirks. i've seen some conversation around amlogic devices also having some oddities
<robher> jcureton: We need to see what the changes needed are first, then we can decide if a 'framework' is needed. Sounds like overkill is my first thought.
<jcureton> robher: kind of what i figured. see example amlogic quirk here https://github.com/superna9999/linux/commit/df28223b151155ab0edd8419b0347a7135443fd3
<jcureton> the above is a simple one, my requirement is a bit broader requiring poking quite a few registers on my SoC outside of the GPU address space.
<jcureton> no issues maintaining mine in my tree, but if there's a wider need to dealing with platform-specifics i can try to make it upstreamable
<chewitt> alyssa: output from MIDGARD_MESA_DEBUG=shaders => http://ix.io/1Oca
<chewitt> let me know if you need the other output
<chewitt> partial output form PAN_MESA_DEBUG=trace => http://ix.io/1Oce
<chewitt> partial because it overflows the weeny journal buffer before I can login to stop kodi
<alyssa> chewitt: *eyes*
<chewitt> anything useful there ^ ?
<alyssa> chewitt: It'll take me a bit to chew through so maybe?!
<chewitt> happy hunting :)
<alyssa> :)
<alyssa> Here's sth
<alyssa> shift/extra_flags getting set for LINEAR
<alyssa> Probably not the issue but it's semantically nonsense and should be fixed
<alyssa> (Set to ~0)
<alyssa> for both attrs and varyings..
<alyssa> chewitt: I'm just mighty confused given our conformance status and Kodi is a very normal GLES app
<alyssa> chewitt: Ohhhhhhh I also did some work on blending, I wonder if that's messing with something
<alyssa> That said, the blend mode used in that log is the same on my local Kodi which works fine so
<alyssa> chewitt: Question: Is this a 32-bit build with the LE Panfrost patch or honest-to-goodness 64-bit?
<chewitt> 64-bit kernel and 32-bit userspace
<chewitt> standard LE config
<alyssa> Alright, that helps
<alyssa> (Well, it doesn't help the problem, but it might help me narrow the problem :P)
<alyssa> I'm guessing 32-bit support is buggy in some way, trying to think the easiest way to debug this
<alyssa> Let me check if CI keeps interesting logs
<chewitt> I can make an aarch64 image and test again .. but that will be overnight
<alyssa> If it's not too much trouble, that would certainly help narrow down the issue :)
<alyssa> But I know of a few outstanding 32-bit-related issues which I can bump the prio and look into now :)
<alyssa> Best case, it solves your bug
<alyssa> Worst case, it solves a bug you didn't know you had :p
<chewitt> sounds like a good idea
BenG83 has quit [Quit: Leaving]
<alyssa> chewitt: I'm collecting possible fixes in tomeu/fix32
<alyssa> Er, tomeu/mesa branch fix32
<alyssa> There's a small chance that branch will magically work better