<TheCycoONE>
using a mesa build from a couple days ago on gnome-shell I got a black screen crash a couple times. drm:vop_crtc_atomic_flush [rockchipdrm]] *ERROR* VOP vblank IRQ stuck for 10ms, and lots of page faults.
<TheCycoONE>
known?
vstehle has quit [Ping timeout: 245 seconds]
Depau has joined #panfrost
<Depau>
Hello! I heard somebody here is looking for people with a snow chromebook ;)
<Depau>
I've read the logs trying to do what guillaume_g did to get it working, but I'm kinda stuck. I'll tell you more about what I did
<Depau>
First of all I'm running ALARM. I noticed the linux-armv7 kernel it ships with has the panfrost module, however I figured it would be better to build it from scratch from drm-misc/drm-misc-next. Mesa from alarm also builds with kmsro and panfrost by default when building for armv7, though I rebuilt it anyway just be sure.
<Depau>
I tried to run kmscube, sway. Sway runs, but it uses llvmpipe. Kmscube crashes but it says it's running with llvmpipe. I looked around and noticed that the device tree was missing the gpu, which was there in chromeos
MistahDarcy has joined #panfrost
<Depau>
I hadn't found the relevant logs of this channel at that time, so I thought I'd try to learn how the dtb is built and add it myself. I decompiled the dtb from chromeos and try to patch it myself. Looking at the patch from this pastebin I found on this channel https://pastebin.com/2944Y1Sm mine was pretty close, except for the gpu-opp-table and power-domains parts which I didn't know I should have added.
<Depau>
With my patch I got an error "Unhandled fault: imprecise external abort (0x406) at 0x0000000" followed by a trace. Then I saw the pastebin patches and rebuilt the dtb accordingly
<Depau>
With that patch, it would boot normally but 1) panfrost was not loaded, 2) if I loaded it manually, nothing would happen. My best guess was that "status = "disabled";" was doing what I'd expect it to do, so I commented it. With this dtb the screen stays off. Not having a debug cable to find out what's going on in the kernel I'm stuck here. I tried to wait a bit and see if would connect to wifi but it did not, so I'm guessing it's a panic or a deadlock
<Depau>
What I'm going to do now is try to blacklist panfrost in /etc/modprobe.d and see if it boots, then look around and try to modprobe it
<Depau>
Let me know if you have any suggestions :)
<alyssa>
TheCycoONE: Hrm :/ What device?
<alyssa>
I've been using GNOME ok with my RK3399 setup..
MistahDarcy has quit [Quit: Leaving]
<alyssa>
Depau: A few things:
<alyssa>
- Snow is not supported in the kernelspace or the userspace. It may work (possibly with some changes), but it's very much YMMV; no gaurantees of functionality, stability, or performance.
<alyssa>
- To ensure users don't end up with broken systems by installing mesa with an unuspported Mali, panfrost userspace works on a whitelist to ensure we only load against "known good" hardware revisions.
<alyssa>
- For dev, you can patch out that check at the bottom of pan_screen.c
<alyssa>
- I don't do a ton with kernelspace. For general DTB questions, you may have luck on #linux-rockchip. Otherwise, robher ^^ may have some pointers?
<alyssa>
- There may not be a ton of point to gtting the kernel up when the userspace is still unsupported.
<Depau>
> Snow is not supported...
<Depau>
If it breaks I'll send it back to amazon lol
<alyssa>
Depau: Ok, I've never heard of a Mali being broken because of a bad driver.
<alyssa>
And believe me, if it were possible to break one of these things, I imagine I would have done it by now ;)
<Depau>
I'll check pan_screen.c. I'm doing the modprobe.d thing now. I guess it's in the kernel, I forgot to setup ccache so if it's not in mesa you'll hear me about it tomorrow haha
<alyssa>
pan_screen.c is mesa (user)
<Depau>
Awesome
<Depau>
Okay: after modprobe-ing it manually it prints this "panfrost 1180000.gpu: clock rate = 200000000", then it freezes completely (it doesn't answer pings any more)
<Depau>
But I think this is related to the device tree, so I'll wait for robher
davidlt__ has joined #panfrost
<robher>
Depau: probably some clock or power domain not enabled before accessing mali registers. Every platform seems to be a little different.
<Depau>
I commented out the clock lines to match the pastebin I linked earlier. I have a very basic understanding of what it's doing and the fact that they're defined in chromeos's dtb but guillaume_g got it working without them confuses me
<robher>
Depau: perhaps some of the commented out clocks need to be enabled.
<Depau>
robher: Alright, I'll try. Now that I'm thinking about it, as you can see the second clock is defined as a magic constant. That's because I couldn't find it in the relevant included header. What do you think about it?
<Depau>
And I also have another question: how the hell in chrome os, the EDT looks good, though looking at the dts's in google's kernel repo there's missing stuff? I literally went through all of the included files and I couldn't find the mali definition
<Depau>
It still freezes with the clocks defined
vstehle has joined #panfrost
fysa has joined #panfrost
_whitelogger has joined #panfrost
guillaume_g has joined #panfrost
pH5 has joined #panfrost
yann has quit [Ping timeout: 272 seconds]
hexdump0815 has joined #panfrost
<hexdump0815>
Depau: from the irc logs of the last days (just search for guillaume_g)
<hexdump0815>
Depau: he also posted some mesa diff, but somehow the link does not seem to work anymore
<hexdump0815>
Depau: the chromeos dtb will not help much as it is for linux 3.08 and legacy mali
<Depau>
hexdump0815: yeah i read those messages, I used his dtb
davidlt_ has joined #panfrost
maciejjo has quit [Ping timeout: 245 seconds]
rhyskidd has quit [Ping timeout: 245 seconds]
davidlt__ has quit [Ping timeout: 245 seconds]
tlwoerner has quit [Excess Flood]
phh has quit [Quit: No Ping reply in 180 seconds.]
guillaume_g has quit [Ping timeout: 245 seconds]
tlwoerner_ has joined #panfrost
phh has joined #panfrost
<hexdump0815>
Depau: ok - then you'll most probably have to wait until guillaume_g is around again
maciejjo has joined #panfrost
guillaume_g has joined #panfrost
<rtp>
Depau: why is your dts using g2d clock and not the g3d clock ? Does the panfrost kernel module load and print something ?
yann has joined #panfrost
<narmstrong>
Lyude: hi, what are these SCPI issues ? can you detail and give me your config so I can reproduce ?
davidlt_ is now known as davidlt
<rtp>
alyssa: after quick tests yesterday, if I exclude the whitelist/mod linear things, all I needed was that http://paste.debian.net/hidden/25a3db68/ on top of mesa master to get kmscube on my peachpit.
<rtp>
alyssa: I've got black screen with weston but there are chances that it's a bug on my rootfs.
<rtp>
(not tested on my snow, -ENOTIME)
cwabbott has quit [Read error: Connection reset by peer]
cwabbott has joined #panfrost
hexdump0815 has quit [Remote host closed the connection]
raster has joined #panfrost
<tomeu>
rtp: I'm not sure, but I think it's the rockchip DRM driver that needs to be fixed
<tomeu>
panfrost in mesa doesn't support modifiers atm
<tomeu>
sorry, I meant exynos DRM before
<tomeu>
the rockchip drm driver doesn't support modifiers at all atm
<tomeu>
the exynos DRM driver should be able to accept INVALID
<tomeu>
rtp: if you grep for DRM_FORMAT_MOD_INVALID in the kernel, you will see that all/most drivers have it within their list of accepted modifiers
<rtp>
tomeu: and how the driver must react to INVALID ? from what I understand, the exynos hw can only do linear or samsung-specific tiled mode so the code seems fine
<rtp>
tomeu: is INVALID something like "drm driver default modifier" ?
<tomeu>
rtp: INVALID is what you put in modifiers when the client doesn't support modifiers
<rtp>
tomeu: I'm not convinced that it's the right thing to do. from a quick look at some existing .format_mod_supported hooks, they dont seem to do that
<tomeu>
rtp: you may be right, I haven't checked in detail
<tomeu>
but I feel very weird that a KMS driver would force modifier support on userspace
<tomeu>
maybe ask in #dri-devel?
<rtp>
I guess I'll try to find the answer by myself first, even if in the end I'll do that.
<Depau>
rtp839496: Good catch, it was a typo. It loads and it prints "panfrost 1180000.gpu: clock rate = 200000000", then it freezes
<TheCycoONE>
alyssa: kevin, same as yours
<TheCycoONE>
I don't remember it happening on the earlier mesa build, so - I'll try a newer one.
<rtp>
Depau: even with the right clock ? sounds like either clocking or power issue
<Depau>
rtp: nope, I haven't tried yet, I'll test it later
<rtp>
Depau: ok. If it's still not working, I may test it on my snow but no timeline for that (it may be tonight or in a month or... )
<Depau>
No hurries ;)
afaerber has quit [Quit: Leaving]
guillaume_g has left #panfrost ["Konversation terminated!"]
afaerber has joined #panfrost
rhyskidd has joined #panfrost
herbmilleriw has quit [Quit: Konversation terminated!]
<alyssa>
rtp: Nice commit, thank you for the sign off; I'm happy to push that once I finish waking up / reading email / etc :)
<alyssa>
TheCycoONE: Hrm
<alyssa>
Kevins seem pretty... homogenous, if that makes sense
<alyssa>
Wonder what could be different between me+CI and you..
herbmilleriw has joined #panfrost
<TheCycoONE>
it's possible I just hit a bad build if I'm using master right?
<alyssa>
TheCycoONE: Quite, yeah
<alyssa>
master runs CI but it's not, ahem, used perfectly
<TheCycoONE>
I just bumped to d35af71 and linux 5.2.2 - I'll make noise if it happens again, but so far ok
<alyssa>
:+1:
<alyssa>
rtp: I just did some cleanup around the patch (to make sure we don't regress anythnig else in the process)
<alyssa>
Your patch is off to CI for review and should be pushed in about a half an hour :)
<rtp>
alyssa: great ! thanks !
<alyssa>
rtp: Thank *you* for the debug! :)
BenG83 has joined #panfrost
LinguinePenguiny has joined #panfrost
LinguinePenguiny has quit [Remote host closed the connection]
LinguinePenguiny has joined #panfrost
LinguinePenguiny has quit [Read error: Connection reset by peer]
tlwoerner_ is now known as tlwoerner
<alyssa>
rtp: Congratulations, you're officially a Panfrost contributor :)
<Depau>
rtp: I fixed it. No luck, it still freezes
<Depau>
Does anybody know how to debug the kernel on a chromebook? I wasn't able to find any uart on the board
<rtp>
alyssa: I'm not convinced that such a small patch would really make me a contributor :)
<rtp>
Depau: freezing at the same place ?
<Depau>
rtp: Yep
<rtp>
hm. annoying. Try booting with clk_ignore_unused=1
LinguinePenguiny has joined #panfrost
hlmjr has joined #panfrost
jcureton has joined #panfrost
herbmilleriw has quit [Ping timeout: 244 seconds]
<alyssa>
rtp: Hey, otherwise T6xx would still be totally broken
<alyssa>
If you want to really become a contributor, start working through glmark2-es2-drm :)
<alyssa>
-bbuild would be a good start, if kmscube works it should Just Work
<alyssa>
same with -bshading
<alyssa>
After that would be -btexture
<alyssa>
You'll probably need an errata workaround, uh
<alyssa>
t6xx has an errata where the texture payload pointers (texture_descriptor payload are... off-by-one, somehow?)
<alyssa>
For a single texture / single mip level / 2D, what that means is the payload pointer is specified twice in a row.
<alyssa>
For a mipmapped texture or a cubemap, I'm not sure how that works; you'll have to grab a trace and find out (or experiment -- for t860, I did it trial and error since that was easier lol)
pH5 has quit [Quit: bye]
* rcf
might need to dig into some of the the glmark2 bugs he's encountered on the T760
* tomeu
thinks that's an excellent idea :)
<rtp>
alyssa: I've no clue about all this graphical stuff. too "high level", I usually stop at kernel boundaries. I can track regressions, test but I'm not sure I can do more.
afaerber has quit [Ping timeout: 276 seconds]
<alyssa>
rcf: Go for it!
<alyssa>
rtp: Everyone starts somewhere!
LinguinePenguiny has quit [Ping timeout: 244 seconds]
<alyssa>
chrisf: What's the right way to handle OOM tests (other than freezing the entire system, which is what happens now)?
<anarsoul>
rtp: well, mesa code is not that messy, if you stick to running piglit/deqp to test your work it's more or less the same as hacking kernel
<anarsoul>
also panfrost (and lima) are written in C
<alyssa>
anarsoul: BTW, how is lima in low-memory situations?
<anarsoul>
alyssa: haven't tested
<alyssa>
robher: I wonder if it would make sense to have the userspace BO cache evict stuff on its own (even when not low mem) just overtime
<anarsoul>
but probably not very good, we're allocating huge buffers for tile heap (2MB per context)
<alyssa>
anarsoul: Pff
<anarsoul>
alyssa: I know
<alyssa>
We're allocating like 64MB for the tile heap .. :P
<alyssa>
Recently slashed that to 16MB (!)
<anarsoul>
alyssa: mali4x0 can't render more than 64k vertices at a time :)
<alyssa>
Yes, I suppose that solves that problem
<anarsoul>
so vertex job needs to be split
<anarsoul>
and it's not implemented atm (there's WIP MR though)
<robher>
alyssa: Not really sure. What do other drivers do?
<anarsoul>
so refract demo doesn't render correctly :)
<alyssa>
robher: Haven't looked, I'll ask in #dri-devel
* robher
wonders why every driver has to go reimplement this stuff...
<alyssa>
Sigh.
<robher>
alyssa: IIRC, freedreno doesn't do any aging. vc4 has an in kernel cache (for some private buffers in addition to the userspace caching) which does.
raster has quit [Remote host closed the connection]
<chrisf>
alyssa: ideally you'd be more robust than that...
<alyssa>
chrisf: Indeed. What are the better options then..?
<chrisf>
do you have particular tests in mind?
<alyssa>
chrisf: Anything in stress.memory.* I think
<alyssa>
Or alternatively, GNOME and Chromium running at the same time with 'frost ;)
<chrisf>
do you just run the machine out of pages, or another constraint?
stikonas has joined #panfrost
raster has joined #panfrost
raster has quit [Read error: Connection reset by peer]
Elpaulo has joined #panfrost
BenG has joined #panfrost
BenG83 has quit [Ping timeout: 246 seconds]
BenG has quit [Client Quit]
stikonas has quit [Read error: Connection reset by peer]
stikonas_ has joined #panfrost
Elpaulo has quit [Ping timeout: 246 seconds]
<jcureton>
i've noticed that on two different T720 platforms that the first job submitted to the GPU always returns a DATA_INVALID_FAULT. does anyone know why that is?
<daniels>
rtp: it sounds like kmscube is screwed up - it needs to check whether or not the kms driver supports modifiers and not try to use modifiers if the driver doesn't support it
<rtp>
daniels: the exynos drm driver does support modifiers (only linear one). its panfrost forcing it to the invalid mod.
<rtp>
alyssa: glmark2-es2-drm -bbuild is crashing badly. Current workaround for this is http://paste.debian.net/hidden/7634e528/. I've not tried yet to understand what's happening.
* rtp
off to bed
<daniels>
rtp: ok, in that case if it's receiving INVALID then it should never be passing it to KMS for any reason
stikonas_ has quit [Remote host closed the connection]
<alyssa>
chrisf: Not ntirely sure, but that sounds about right sadly..
<alyssa>
jcureton: Ooooooo, now *that* is an interesting bug!
<alyssa>
So, from the desription, I'm very tempted to blame the polygon list
<alyssa>
The reason is, this was one of the big disastrous differences between T720 and T760 that slowed down the port so much
<alyssa>
On T760+, the polygon list is managed entirely by the GPU.
<anarsoul>
so T720 has PLBU like Utgard?
<alyssa>
The buffer itself, both with the blob and Panfrost (*), is set to be "invisible". That is, although the CPu allocates it, it is not actually mapped to be CPU readable/writeable.
<alyssa>
We just supply the buffer and we're ok.
<alyssa>
T720 is... Well, we were rather surprised to see this pattern broken.
<alyssa>
With the T720 blob, the polygon list is seemingly mapped CPU-writeable (?!)
<alyssa>
And, if we just leave it uninitialized, the GPU DATA_INVALID_FAULTs out when we try to render !
<anarsoul>
I wonder if it has PLBU commands in it :)
<alyssa>
Rather, Tomeu discovered we had to initiaiize the polygon list ourselves, CPU side. Which is somewhat bizarre, honestly.
<alyssa>
anarsoul: I don't *think* so, but conceptually similar..?
<alyssa>
jcureton: I'm wondering if we're only initializing half of what we're supposed to be, and the other the GPU is supposed to initialize it
<alyssa>
Or we're initiailizing it half-wrong ;)
<alyssa>
....Speaking of, where is that initiialization code?
* alyssa
doesn't see it at all -- maybe that code isn't even here in mainline and the fact it's missing explains the behaviour
<alyssa>
Tomeu wrote code for that; maybe it never got merged for some reason.
<alyssa>
Subject: [RFC] panfrost/midgard: Hack some bits to get things working on T720
<alyssa>
was the patch in question
<alyssa>
So the solution was (roughly -- this violates aliasing rules):