alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
<TheCycoONE> using a mesa build from a couple days ago on gnome-shell I got a black screen crash a couple times. drm:vop_crtc_atomic_flush [rockchipdrm]] *ERROR* VOP vblank IRQ stuck for 10ms, and lots of page faults.
<TheCycoONE> known?
vstehle has quit [Ping timeout: 245 seconds]
Depau has joined #panfrost
<Depau> Hello! I heard somebody here is looking for people with a snow chromebook ;)
<Depau> I've read the logs trying to do what guillaume_g did to get it working, but I'm kinda stuck. I'll tell you more about what I did
<Depau> First of all I'm running ALARM. I noticed the linux-armv7 kernel it ships with has the panfrost module, however I figured it would be better to build it from scratch from drm-misc/drm-misc-next. Mesa from alarm also builds with kmsro and panfrost by default when building for armv7, though I rebuilt it anyway just be sure.
<Depau> I tried to run kmscube, sway. Sway runs, but it uses llvmpipe. Kmscube crashes but it says it's running with llvmpipe. I looked around and noticed that the device tree was missing the gpu, which was there in chromeos
MistahDarcy has joined #panfrost
<Depau> I hadn't found the relevant logs of this channel at that time, so I thought I'd try to learn how the dtb is built and add it myself. I decompiled the dtb from chromeos and try to patch it myself. Looking at the patch from this pastebin I found on this channel https://pastebin.com/2944Y1Sm mine was pretty close, except for the gpu-opp-table and power-domains parts which I didn't know I should have added.
<Depau> With my patch I got an error "Unhandled fault: imprecise external abort (0x406) at 0x0000000" followed by a trace. Then I saw the pastebin patches and rebuilt the dtb accordingly
<Depau> With that patch, it would boot normally but 1) panfrost was not loaded, 2) if I loaded it manually, nothing would happen. My best guess was that "status = "disabled";" was doing what I'd expect it to do, so I commented it. With this dtb the screen stays off. Not having a debug cable to find out what's going on in the kernel I'm stuck here. I tried to wait a bit and see if would connect to wifi but it did not, so I'm guessing it's a panic or a deadlock
<Depau> What I'm going to do now is try to blacklist panfrost in /etc/modprobe.d and see if it boots, then look around and try to modprobe it
<Depau> Let me know if you have any suggestions :)
<alyssa> TheCycoONE: Hrm :/ What device?
<alyssa> I've been using GNOME ok with my RK3399 setup..
MistahDarcy has quit [Quit: Leaving]
<alyssa> Depau: A few things:
<alyssa> - Snow is not supported in the kernelspace or the userspace. It may work (possibly with some changes), but it's very much YMMV; no gaurantees of functionality, stability, or performance.
<alyssa> - To ensure users don't end up with broken systems by installing mesa with an unuspported Mali, panfrost userspace works on a whitelist to ensure we only load against "known good" hardware revisions.
<alyssa> - For dev, you can patch out that check at the bottom of pan_screen.c
<alyssa> - I don't do a ton with kernelspace. For general DTB questions, you may have luck on #linux-rockchip. Otherwise, robher ^^ may have some pointers?
<alyssa> - There may not be a ton of point to gtting the kernel up when the userspace is still unsupported.
<Depau> > Snow is not supported...
<Depau> If it breaks I'll send it back to amazon lol
<alyssa> Depau: Ok, I've never heard of a Mali being broken because of a bad driver.
<alyssa> And believe me, if it were possible to break one of these things, I imagine I would have done it by now ;)
<Depau> I'll check pan_screen.c. I'm doing the modprobe.d thing now. I guess it's in the kernel, I forgot to setup ccache so if it's not in mesa you'll hear me about it tomorrow haha
<alyssa> pan_screen.c is mesa (user)
<Depau> Awesome
<Depau> Okay: after modprobe-ing it manually it prints this "panfrost 1180000.gpu: clock rate = 200000000", then it freezes completely (it doesn't answer pings any more)
<Depau> But I think this is related to the device tree, so I'll wait for robher
davidlt__ has joined #panfrost
<robher> Depau: probably some clock or power domain not enabled before accessing mali registers. Every platform seems to be a little different.
<Depau> robher: any pointers on that?
<Depau> This is the relevant part of the dts anyway: https://hastebin.com/iyiravaxac.rb
<Depau> I commented out the clock lines to match the pastebin I linked earlier. I have a very basic understanding of what it's doing and the fact that they're defined in chromeos's dtb but guillaume_g got it working without them confuses me
<robher> Depau: perhaps some of the commented out clocks need to be enabled.
<Depau> robher: Alright, I'll try. Now that I'm thinking about it, as you can see the second clock is defined as a magic constant. That's because I couldn't find it in the relevant included header. What do you think about it?
<Depau> And I also have another question: how the hell in chrome os, the EDT looks good, though looking at the dts's in google's kernel repo there's missing stuff? I literally went through all of the included files and I couldn't find the mali definition
<Depau> It still freezes with the clocks defined
vstehle has joined #panfrost
fysa has joined #panfrost
_whitelogger has joined #panfrost
guillaume_g has joined #panfrost
pH5 has joined #panfrost
yann has quit [Ping timeout: 272 seconds]
hexdump0815 has joined #panfrost
<hexdump0815> Depau: from the irc logs of the last days (just search for guillaume_g)
<hexdump0815> Depau: dtb - https://pastebin.com/2944Y1Sm
<hexdump0815> Depau: he also posted some mesa diff, but somehow the link does not seem to work anymore
<hexdump0815> Depau: the chromeos dtb will not help much as it is for linux 3.08 and legacy mali
<Depau> hexdump0815: yeah i read those messages, I used his dtb
davidlt_ has joined #panfrost
maciejjo has quit [Ping timeout: 245 seconds]
rhyskidd has quit [Ping timeout: 245 seconds]
davidlt__ has quit [Ping timeout: 245 seconds]
tlwoerner has quit [Excess Flood]
phh has quit [Quit: No Ping reply in 180 seconds.]
guillaume_g has quit [Ping timeout: 245 seconds]
tlwoerner_ has joined #panfrost
phh has joined #panfrost
<hexdump0815> Depau: ok - then you'll most probably have to wait until guillaume_g is around again
maciejjo has joined #panfrost
guillaume_g has joined #panfrost
<rtp> Depau: why is your dts using g2d clock and not the g3d clock ? Does the panfrost kernel module load and print something ?
yann has joined #panfrost
<narmstrong> Lyude: hi, what are these SCPI issues ? can you detail and give me your config so I can reproduce ?
davidlt_ is now known as davidlt
<rtp> alyssa: after quick tests yesterday, if I exclude the whitelist/mod linear things, all I needed was that http://paste.debian.net/hidden/25a3db68/ on top of mesa master to get kmscube on my peachpit.
<rtp> alyssa: I've got black screen with weston but there are chances that it's a bug on my rootfs.
<rtp> (not tested on my snow, -ENOTIME)
cwabbott has quit [Read error: Connection reset by peer]
cwabbott has joined #panfrost
hexdump0815 has quit [Remote host closed the connection]
raster has joined #panfrost
<tomeu> rtp: I'm not sure, but I think it's the rockchip DRM driver that needs to be fixed
<tomeu> panfrost in mesa doesn't support modifiers atm
<tomeu> sorry, I meant exynos DRM before
<tomeu> the rockchip drm driver doesn't support modifiers at all atm
<tomeu> the exynos DRM driver should be able to accept INVALID
<tomeu> rtp: if you grep for DRM_FORMAT_MOD_INVALID in the kernel, you will see that all/most drivers have it within their list of accepted modifiers
<rtp> tomeu: and how the driver must react to INVALID ? from what I understand, the exynos hw can only do linear or samsung-specific tiled mode so the code seems fine
<rtp> tomeu: is INVALID something like "drm driver default modifier" ?
<tomeu> rtp: INVALID is what you put in modifiers when the client doesn't support modifiers
<tomeu> something like this is needed, I think: http://paste.debian.net/1092735/
<rtp> tomeu: I'm not convinced that it's the right thing to do. from a quick look at some existing .format_mod_supported hooks, they dont seem to do that
<tomeu> rtp: you may be right, I haven't checked in detail
<tomeu> but I feel very weird that a KMS driver would force modifier support on userspace
<tomeu> maybe ask in #dri-devel?
<rtp> I guess I'll try to find the answer by myself first, even if in the end I'll do that.
<Depau> rtp839496: Good catch, it was a typo. It loads and it prints "panfrost 1180000.gpu: clock rate = 200000000", then it freezes
<TheCycoONE> alyssa: kevin, same as yours
<TheCycoONE> I don't remember it happening on the earlier mesa build, so - I'll try a newer one.
<rtp> Depau: even with the right clock ? sounds like either clocking or power issue
<Depau> rtp: nope, I haven't tried yet, I'll test it later
<rtp> Depau: ok. If it's still not working, I may test it on my snow but no timeline for that (it may be tonight or in a month or... )
<Depau> No hurries ;)
afaerber has quit [Quit: Leaving]
guillaume_g has left #panfrost ["Konversation terminated!"]
afaerber has joined #panfrost
rhyskidd has joined #panfrost
herbmilleriw has quit [Quit: Konversation terminated!]
<alyssa> rtp: Nice commit, thank you for the sign off; I'm happy to push that once I finish waking up / reading email / etc :)
<alyssa> TheCycoONE: Hrm
<alyssa> Kevins seem pretty... homogenous, if that makes sense
<alyssa> Wonder what could be different between me+CI and you..
herbmilleriw has joined #panfrost
<TheCycoONE> it's possible I just hit a bad build if I'm using master right?
<alyssa> TheCycoONE: Quite, yeah
<alyssa> master runs CI but it's not, ahem, used perfectly
<TheCycoONE> I just bumped to d35af71 and linux 5.2.2 - I'll make noise if it happens again, but so far ok
<alyssa> :+1:
<alyssa> rtp: I just did some cleanup around the patch (to make sure we don't regress anythnig else in the process)
<alyssa> Your patch is off to CI for review and should be pushed in about a half an hour :)
<rtp> alyssa: great ! thanks !
<alyssa> rtp: Thank *you* for the debug! :)
BenG83 has joined #panfrost
LinguinePenguiny has joined #panfrost
LinguinePenguiny has quit [Remote host closed the connection]
LinguinePenguiny has joined #panfrost
LinguinePenguiny has quit [Read error: Connection reset by peer]
tlwoerner_ is now known as tlwoerner
<alyssa> rtp: Congratulations, you're officially a Panfrost contributor :)
<Depau> rtp: I fixed it. No luck, it still freezes
<Depau> Does anybody know how to debug the kernel on a chromebook? I wasn't able to find any uart on the board
<rtp> alyssa: I'm not convinced that such a small patch would really make me a contributor :)
<rtp> Depau: freezing at the same place ?
<Depau> rtp: Yep
<rtp> hm. annoying. Try booting with clk_ignore_unused=1
LinguinePenguiny has joined #panfrost
hlmjr has joined #panfrost
jcureton has joined #panfrost
herbmilleriw has quit [Ping timeout: 244 seconds]
<alyssa> rtp: Hey, otherwise T6xx would still be totally broken
<alyssa> If you want to really become a contributor, start working through glmark2-es2-drm :)
<alyssa> -bbuild would be a good start, if kmscube works it should Just Work
<alyssa> same with -bshading
<alyssa> After that would be -btexture
<alyssa> You'll probably need an errata workaround, uh
<alyssa> t6xx has an errata where the texture payload pointers (texture_descriptor payload are... off-by-one, somehow?)
<alyssa> For a single texture / single mip level / 2D, what that means is the payload pointer is specified twice in a row.
<alyssa> For a mipmapped texture or a cubemap, I'm not sure how that works; you'll have to grab a trace and find out (or experiment -- for t860, I did it trial and error since that was easier lol)
pH5 has quit [Quit: bye]
* rcf might need to dig into some of the the glmark2 bugs he's encountered on the T760
* tomeu thinks that's an excellent idea :)
<rtp> alyssa: I've no clue about all this graphical stuff. too "high level", I usually stop at kernel boundaries. I can track regressions, test but I'm not sure I can do more.
afaerber has quit [Ping timeout: 276 seconds]
<alyssa> rcf: Go for it!
<alyssa> rtp: Everyone starts somewhere!
LinguinePenguiny has quit [Ping timeout: 244 seconds]
<alyssa> chrisf: What's the right way to handle OOM tests (other than freezing the entire system, which is what happens now)?
<anarsoul> rtp: well, mesa code is not that messy, if you stick to running piglit/deqp to test your work it's more or less the same as hacking kernel
<anarsoul> also panfrost (and lima) are written in C
<alyssa> anarsoul: BTW, how is lima in low-memory situations?
<anarsoul> alyssa: haven't tested
<alyssa> robher: I wonder if it would make sense to have the userspace BO cache evict stuff on its own (even when not low mem) just overtime
<anarsoul> but probably not very good, we're allocating huge buffers for tile heap (2MB per context)
<alyssa> anarsoul: Pff
<anarsoul> alyssa: I know
<alyssa> We're allocating like 64MB for the tile heap .. :P
<alyssa> Recently slashed that to 16MB (!)
<anarsoul> alyssa: mali4x0 can't render more than 64k vertices at a time :)
<alyssa> Yes, I suppose that solves that problem
<anarsoul> so vertex job needs to be split
<anarsoul> and it's not implemented atm (there's WIP MR though)
<robher> alyssa: Not really sure. What do other drivers do?
<anarsoul> so refract demo doesn't render correctly :)
<alyssa> robher: Haven't looked, I'll ask in #dri-devel
* robher wonders why every driver has to go reimplement this stuff...
<alyssa> Sigh.
<robher> alyssa: IIRC, freedreno doesn't do any aging. vc4 has an in kernel cache (for some private buffers in addition to the userspace caching) which does.
raster has quit [Remote host closed the connection]
<chrisf> alyssa: ideally you'd be more robust than that...
<alyssa> chrisf: Indeed. What are the better options then..?
<chrisf> do you have particular tests in mind?
<alyssa> chrisf: Anything in stress.memory.* I think
<alyssa> Or alternatively, GNOME and Chromium running at the same time with 'frost ;)
<chrisf> do you just run the machine out of pages, or another constraint?
stikonas has joined #panfrost
raster has joined #panfrost
raster has quit [Read error: Connection reset by peer]
Elpaulo has joined #panfrost
BenG has joined #panfrost
BenG83 has quit [Ping timeout: 246 seconds]
BenG has quit [Client Quit]
stikonas has quit [Read error: Connection reset by peer]
stikonas_ has joined #panfrost
Elpaulo has quit [Ping timeout: 246 seconds]
<jcureton> i've noticed that on two different T720 platforms that the first job submitted to the GPU always returns a DATA_INVALID_FAULT. does anyone know why that is?
<daniels> rtp: it sounds like kmscube is screwed up - it needs to check whether or not the kms driver supports modifiers and not try to use modifiers if the driver doesn't support it
<rtp> daniels: the exynos drm driver does support modifiers (only linear one). its panfrost forcing it to the invalid mod.
<rtp> alyssa: glmark2-es2-drm -bbuild is crashing badly. Current workaround for this is http://paste.debian.net/hidden/7634e528/. I've not tried yet to understand what's happening.
* rtp off to bed
<daniels> rtp: ok, in that case if it's receiving INVALID then it should never be passing it to KMS for any reason
stikonas_ has quit [Remote host closed the connection]
<alyssa> chrisf: Not ntirely sure, but that sounds about right sadly..
<alyssa> jcureton: Ooooooo, now *that* is an interesting bug!
<alyssa> So, from the desription, I'm very tempted to blame the polygon list
<alyssa> The reason is, this was one of the big disastrous differences between T720 and T760 that slowed down the port so much
<alyssa> On T760+, the polygon list is managed entirely by the GPU.
<anarsoul> so T720 has PLBU like Utgard?
<alyssa> The buffer itself, both with the blob and Panfrost (*), is set to be "invisible". That is, although the CPu allocates it, it is not actually mapped to be CPU readable/writeable.
<alyssa> We just supply the buffer and we're ok.
<alyssa> T720 is... Well, we were rather surprised to see this pattern broken.
<alyssa> With the T720 blob, the polygon list is seemingly mapped CPU-writeable (?!)
<alyssa> And, if we just leave it uninitialized, the GPU DATA_INVALID_FAULTs out when we try to render !
<anarsoul> I wonder if it has PLBU commands in it :)
<alyssa> Rather, Tomeu discovered we had to initiaiize the polygon list ourselves, CPU side. Which is somewhat bizarre, honestly.
<alyssa> anarsoul: I don't *think* so, but conceptually similar..?
<alyssa> jcureton: I'm wondering if we're only initializing half of what we're supposed to be, and the other the GPU is supposed to initialize it
<alyssa> Or we're initiailizing it half-wrong ;)
<alyssa> ....Speaking of, where is that initiialization code?
* alyssa doesn't see it at all -- maybe that code isn't even here in mainline and the fact it's missing explains the behaviour
<alyssa> Tomeu wrote code for that; maybe it never got merged for some reason.
<alyssa> Subject: [RFC] panfrost/midgard: Hack some bits to get things working on T720
<alyssa> was the patch in question
<alyssa> So the solution was (roughly -- this violates aliasing rules):
<alyssa> *((uint32_t *) tiler_polygon_list_body) = 0xA0000000;
<alyssa> (I think -- trying to forward port the logic; we changed a lot of the names since that patch was written early June)
<alyssa> I don't have hardware to test on, but that should be something to spring off
<alyssa> Meanwhile, I'm going to prototype a new RA/scheduler approach since I grok how to do this now !
* alyssa has been looking at some of the PPIR compiler improvements (@enunes has been doing cool stuff!) since Midgard is fancier PPIR
<anarsoul> alyssa: dropping some code is always nice :)
<anarsoul> if I understand correctly midgard is better organized utgard pp
<anarsoul> utgard pp isn't as bad as utgard gp but it could have been better
<anarsoul> (I'm still wondering who decided that 6 regs is enough...)
<alyssa> anarsoul: Yes, Midgard (the ISA, not the cmdstream) is pretty perfect for the architecture
<alyssa> It's just that the architecture turned out to be... unfashionable... hence Bifrost
<HdkR> <3