#panfrost on 2019-07-23 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

01:06 <TheCycoONE> using a mesa build from a couple days ago on gnome-shell I got a black screen crash a couple times. drm:vop_crtc_atomic_flush [rockchipdrm]] *ERROR* VOP vblank IRQ stuck for 10ms, and lots of page faults.

01:06 <TheCycoONE> known?

01:26 vstehle has quit [Ping timeout: 245 seconds]

02:14 Depau has joined #panfrost

02:15 <Depau> Hello! I heard somebody here is looking for people with a snow chromebook ;)

02:16 <Depau> I've read the logs trying to do what guillaume_g did to get it working, but I'm kinda stuck. I'll tell you more about what I did

02:19 <Depau> First of all I'm running ALARM. I noticed the linux-armv7 kernel it ships with has the panfrost module, however I figured it would be better to build it from scratch from drm-misc/drm-misc-next. Mesa from alarm also builds with kmsro and panfrost by default when building for armv7, though I rebuilt it anyway just be sure.

02:21 <Depau> I tried to run kmscube, sway. Sway runs, but it uses llvmpipe. Kmscube crashes but it says it's running with llvmpipe. I looked around and noticed that the device tree was missing the gpu, which was there in chromeos

02:24 MistahDarcy has joined #panfrost

02:25 <Depau> I hadn't found the relevant logs of this channel at that time, so I thought I'd try to learn how the dtb is built and add it myself. I decompiled the dtb from chromeos and try to patch it myself. Looking at the patch from this pastebin I found on this channel https://pastebin.com/2944Y1Sm mine was pretty close, except for the gpu-opp-table and power-domains parts which I didn't know I should have added.

02:26 <Depau> With my patch I got an error "Unhandled fault: imprecise external abort (0x406) at 0x0000000" followed by a trace. Then I saw the pastebin patches and rebuilt the dtb accordingly

02:30 <Depau> With that patch, it would boot normally but 1) panfrost was not loaded, 2) if I loaded it manually, nothing would happen. My best guess was that "status = "disabled";" was doing what I'd expect it to do, so I commented it. With this dtb the screen stays off. Not having a debug cable to find out what's going on in the kernel I'm stuck here. I tried to wait a bit and see if would connect to wifi but it did not, so I'm guessing it's a panic or a deadlock

02:33 <Depau> What I'm going to do now is try to blacklist panfrost in /etc/modprobe.d and see if it boots, then look around and try to modprobe it

02:34 <Depau> Let me know if you have any suggestions :)

02:41 <alyssa> TheCycoONE: Hrm :/ What device?

02:41 <alyssa> I've been using GNOME ok with my RK3399 setup..

02:44 MistahDarcy has quit [Quit: Leaving]

02:49 <alyssa> Depau: A few things:

02:50 <alyssa> - Snow is not supported in the kernelspace or the userspace. It may work (possibly with some changes), but it's very much YMMV; no gaurantees of functionality, stability, or performance.

02:51 <alyssa> - To ensure users don't end up with broken systems by installing mesa with an unuspported Mali, panfrost userspace works on a whitelist to ensure we only load against "known good" hardware revisions.

02:51 <alyssa> - For dev, you can patch out that check at the bottom of pan_screen.c

02:52 <alyssa> - I don't do a ton with kernelspace. For general DTB questions, you may have luck on #linux-rockchip. Otherwise, robher ^^ may have some pointers?

02:52 <alyssa> - There may not be a ton of point to gtting the kernel up when the userspace is still unsupported.

02:53 <Depau> > Snow is not supported...

02:53 <Depau> If it breaks I'll send it back to amazon lol

02:53 <alyssa> Depau: Ok, I've never heard of a Mali being broken because of a bad driver.

02:53 <alyssa> And believe me, if it were possible to break one of these things, I imagine I would have done it by now ;)

02:55 <Depau> I'll check pan_screen.c. I'm doing the modprobe.d thing now. I guess it's in the kernel, I forgot to setup ccache so if it's not in mesa you'll hear me about it tomorrow haha

02:55 <alyssa> pan_screen.c is mesa (user)

02:56 <Depau> Awesome

03:03 <Depau> Okay: after modprobe-ing it manually it prints this "panfrost 1180000.gpu: clock rate = 200000000", then it freezes completely (it doesn't answer pings any more)

03:04 <Depau> But I think this is related to the device tree, so I'll wait for robher

03:07 davidlt__ has joined #panfrost

03:08 <robher> Depau: probably some clock or power domain not enabled before accessing mali registers. Every platform seems to be a little different.

03:14 <Depau> robher: any pointers on that?

03:14 <Depau> This is the relevant part of the dts anyway: https://hastebin.com/iyiravaxac.rb

03:15 <Depau> I commented out the clock lines to match the pastebin I linked earlier. I have a very basic understanding of what it's doing and the fact that they're defined in chromeos's dtb but guillaume_g got it working without them confuses me

03:16 <robher> Depau: perhaps some of the commented out clocks need to be enabled.

03:21 <Depau> robher: Alright, I'll try. Now that I'm thinking about it, as you can see the second clock is defined as a magic constant. That's because I couldn't find it in the relevant included header. What do you think about it?

03:24 <Depau> And I also have another question: how the hell in chrome os, the EDT looks good, though looking at the dts's in google's kernel repo there's missing stuff? I literally went through all of the included files and I couldn't find the mali definition

04:03 <Depau> It still freezes with the clocks defined

05:00 vstehle has joined #panfrost

05:43 fysa has joined #panfrost

05:50 _whitelogger has joined #panfrost

06:27 guillaume_g has joined #panfrost

06:30 pH5 has joined #panfrost

06:49 yann has quit [Ping timeout: 272 seconds]

07:02 hexdump0815 has joined #panfrost

07:04 <hexdump0815> Depau: from the irc logs of the last days (just search for guillaume_g)

07:04 <hexdump0815> Depau: dtb - https://pastebin.com/2944Y1Sm

07:07 <hexdump0815> Depau: he also posted some mesa diff, but somehow the link does not seem to work anymore

07:08 <hexdump0815> Depau: the chromeos dtb will not help much as it is for linux 3.08 and legacy mali

07:19 <Depau> hexdump0815: yeah i read those messages, I used his dtb

07:20 davidlt_ has joined #panfrost

07:23 maciejjo has quit [Ping timeout: 245 seconds]

07:23 rhyskidd has quit [Ping timeout: 245 seconds]

07:23 davidlt__ has quit [Ping timeout: 245 seconds]

07:23 tlwoerner has quit [Excess Flood]

07:23 phh has quit [Quit: No Ping reply in 180 seconds.]

07:23 guillaume_g has quit [Ping timeout: 245 seconds]

07:23 tlwoerner_ has joined #panfrost

07:23 phh has joined #panfrost

07:28 <hexdump0815> Depau: ok - then you'll most probably have to wait until guillaume_g is around again

07:29 maciejjo has joined #panfrost

07:29 guillaume_g has joined #panfrost

07:30 <rtp> Depau: why is your dts using g2d clock and not the g3d clock ? Does the panfrost kernel module load and print something ?

07:45 yann has joined #panfrost

07:56 <narmstrong> Lyude: hi, what are these SCPI issues ? can you detail and give me your config so I can reproduce ?

07:57 davidlt_ is now known as davidlt

08:02 <rtp> alyssa: after quick tests yesterday, if I exclude the whitelist/mod linear things, all I needed was that http://paste.debian.net/hidden/25a3db68/ on top of mesa master to get kmscube on my peachpit.

08:03 <rtp> alyssa: I've got black screen with weston but there are chances that it's a bug on my rootfs.

08:03 <rtp> (not tested on my snow, -ENOTIME)

09:14 cwabbott has quit [Read error: Connection reset by peer]

09:15 cwabbott has joined #panfrost

09:47 hexdump0815 has quit [Remote host closed the connection]

10:49 raster has joined #panfrost

10:56 <tomeu> rtp: I'm not sure, but I think it's the rockchip DRM driver that needs to be fixed

10:56 <tomeu> panfrost in mesa doesn't support modifiers atm

10:57 <tomeu> sorry, I meant exynos DRM before

10:57 <tomeu> the rockchip drm driver doesn't support modifiers at all atm

10:57 <tomeu> the exynos DRM driver should be able to accept INVALID

10:59 <tomeu> rtp: if you grep for DRM_FORMAT_MOD_INVALID in the kernel, you will see that all/most drivers have it within their list of accepted modifiers

11:06 <rtp> tomeu: and how the driver must react to INVALID ? from what I understand, the exynos hw can only do linear or samsung-specific tiled mode so the code seems fine

11:07 <rtp> tomeu: is INVALID something like "drm driver default modifier" ?

11:09 <tomeu> rtp: INVALID is what you put in modifiers when the client doesn't support modifiers

11:09 <tomeu> something like this is needed, I think: http://paste.debian.net/1092735/

11:26 <rtp> tomeu: I'm not convinced that it's the right thing to do. from a quick look at some existing .format_mod_supported hooks, they dont seem to do that

11:31 <tomeu> rtp: you may be right, I haven't checked in detail

11:31 <tomeu> but I feel very weird that a KMS driver would force modifier support on userspace

11:32 <tomeu> maybe ask in #dri-devel?

11:38 <rtp> I guess I'll try to find the answer by myself first, even if in the end I'll do that.

11:55 <Depau> rtp839496: Good catch, it was a typo. It loads and it prints "panfrost 1180000.gpu: clock rate = 200000000", then it freezes

12:13 <TheCycoONE> alyssa: kevin, same as yours

12:14 <TheCycoONE> I don't remember it happening on the earlier mesa build, so - I'll try a newer one.

12:29 <rtp> Depau: even with the right clock ? sounds like either clocking or power issue

12:31 <Depau> rtp: nope, I haven't tried yet, I'll test it later

12:41 <rtp> Depau: ok. If it's still not working, I may test it on my snow but no timeline for that (it may be tonight or in a month or... )

12:42 <Depau> No hurries ;)

12:52 afaerber has quit [Quit: Leaving]

12:57 guillaume_g has left #panfrost ["Konversation terminated!"]

13:09 afaerber has joined #panfrost

13:17 rhyskidd has joined #panfrost

13:32 herbmilleriw has quit [Quit: Konversation terminated!]

13:32 <alyssa> rtp: Nice commit, thank you for the sign off; I'm happy to push that once I finish waking up / reading email / etc :)

13:33 <alyssa> TheCycoONE: Hrm

13:33 <alyssa> Kevins seem pretty... homogenous, if that makes sense

13:33 <alyssa> Wonder what could be different between me+CI and you..

13:37 herbmilleriw has joined #panfrost

13:41 <TheCycoONE> it's possible I just hit a bad build if I'm using master right?

13:41 <alyssa> TheCycoONE: Quite, yeah

13:41 <alyssa> master runs CI but it's not, ahem, used perfectly

13:44 <TheCycoONE> I just bumped to d35af71 and linux 5.2.2 - I'll make noise if it happens again, but so far ok

13:44 <alyssa> :+1:

13:58 <alyssa> rtp: I just did some cleanup around the patch (to make sure we don't regress anythnig else in the process)

13:58 <alyssa> Your patch is off to CI for review and should be pushed in about a half an hour :)

14:08 <rtp> alyssa: great ! thanks !

14:12 <alyssa> rtp: Thank *you* for the debug! :)

14:30 BenG83 has joined #panfrost

14:35 LinguinePenguiny has joined #panfrost

14:36 LinguinePenguiny has quit [Remote host closed the connection]

14:49 LinguinePenguiny has joined #panfrost

15:01 LinguinePenguiny has quit [Read error: Connection reset by peer]

15:01 tlwoerner_ is now known as tlwoerner

15:05 <alyssa> rtp: Congratulations, you're officially a Panfrost contributor :)

15:07 <Depau> rtp: I fixed it. No luck, it still freezes

15:07 <Depau> Does anybody know how to debug the kernel on a chromebook? I wasn't able to find any uart on the board

15:25 <rtp> alyssa: I'm not convinced that such a small patch would really make me a contributor :)

15:25 <rtp> Depau: freezing at the same place ?

15:27 <Depau> rtp: Yep

15:29 <rtp> hm. annoying. Try booting with clk_ignore_unused=1

15:30 LinguinePenguiny has joined #panfrost

15:38 hlmjr has joined #panfrost

15:40 jcureton has joined #panfrost

15:41 herbmilleriw has quit [Ping timeout: 244 seconds]

15:53 <alyssa> rtp: Hey, otherwise T6xx would still be totally broken

15:53 <alyssa> If you want to really become a contributor, start working through glmark2-es2-drm :)

15:53 <alyssa> -bbuild would be a good start, if kmscube works it should Just Work

15:53 <alyssa> same with -bshading

15:54 <alyssa> After that would be -btexture

15:54 <alyssa> You'll probably need an errata workaround, uh

15:54 <alyssa> t6xx has an errata where the texture payload pointers (texture_descriptor payload are... off-by-one, somehow?)

15:54 <alyssa> For a single texture / single mip level / 2D, what that means is the payload pointer is specified twice in a row.

15:55 <alyssa> For a mipmapped texture or a cubemap, I'm not sure how that works; you'll have to grab a trace and find out (or experiment -- for t860, I did it trial and error since that was easier lol)

16:22 pH5 has quit [Quit: bye]

16:22 * rcf might need to dig into some of the the glmark2 bugs he's encountered on the T760

16:26 * tomeu thinks that's an excellent idea :)

16:59 <rtp> alyssa: I've no clue about all this graphical stuff. too "high level", I usually stop at kernel boundaries. I can track regressions, test but I'm not sure I can do more.

17:04 afaerber has quit [Ping timeout: 276 seconds]

17:05 <alyssa> rcf: Go for it!

17:05 <alyssa> rtp: Everyone starts somewhere!

17:13 LinguinePenguiny has quit [Ping timeout: 244 seconds]

17:25 <alyssa> chrisf: What's the right way to handle OOM tests (other than freezing the entire system, which is what happens now)?

17:26 <anarsoul> rtp: well, mesa code is not that messy, if you stick to running piglit/deqp to test your work it's more or less the same as hacking kernel

17:27 <anarsoul> also panfrost (and lima) are written in C

17:29 <alyssa> anarsoul: BTW, how is lima in low-memory situations?

17:29 <anarsoul> alyssa: haven't tested

17:31 <alyssa> robher: I wonder if it would make sense to have the userspace BO cache evict stuff on its own (even when not low mem) just overtime

17:31 <anarsoul> but probably not very good, we're allocating huge buffers for tile heap (2MB per context)

17:32 <alyssa> anarsoul: Pff

17:32 <anarsoul> alyssa: I know

17:32 <alyssa> We're allocating like 64MB for the tile heap .. :P

17:32 <alyssa> Recently slashed that to 16MB (!)

17:32 <anarsoul> alyssa: mali4x0 can't render more than 64k vertices at a time :)

17:32 <alyssa> Yes, I suppose that solves that problem

17:33 <anarsoul> so vertex job needs to be split

17:33 <anarsoul> and it's not implemented atm (there's WIP MR though)

17:33 <robher> alyssa: Not really sure. What do other drivers do?

17:33 <anarsoul> so refract demo doesn't render correctly :)

17:33 <alyssa> robher: Haven't looked, I'll ask in #dri-devel

17:34 * robher wonders why every driver has to go reimplement this stuff...

17:34 <alyssa> Sigh.

17:36 <robher> alyssa: IIRC, freedreno doesn't do any aging. vc4 has an in kernel cache (for some private buffers in addition to the userspace caching) which does.

17:36 raster has quit [Remote host closed the connection]

17:50 <chrisf> alyssa: ideally you'd be more robust than that...

17:50 <alyssa> chrisf: Indeed. What are the better options then..?

17:55 <chrisf> do you have particular tests in mind?

17:55 <alyssa> chrisf: Anything in stress.memory.* I think

17:55 <alyssa> Or alternatively, GNOME and Chromium running at the same time with 'frost ;)

17:59 <chrisf> do you just run the machine out of pages, or another constraint?

18:13 stikonas has joined #panfrost

18:28 raster has joined #panfrost

18:56 raster has quit [Read error: Connection reset by peer]

19:45 Elpaulo has joined #panfrost

20:45 BenG has joined #panfrost

20:47 BenG83 has quit [Ping timeout: 246 seconds]

20:49 BenG has quit [Client Quit]

20:54 stikonas has quit [Read error: Connection reset by peer]

20:55 stikonas_ has joined #panfrost

21:29 Elpaulo has quit [Ping timeout: 246 seconds]

21:31 <jcureton> i've noticed that on two different T720 platforms that the first job submitted to the GPU always returns a DATA_INVALID_FAULT. does anyone know why that is?

21:46 <daniels> rtp: it sounds like kmscube is screwed up - it needs to check whether or not the kms driver supports modifiers and not try to use modifiers if the driver doesn't support it

22:09 <rtp> daniels: the exynos drm driver does support modifiers (only linear one). its panfrost forcing it to the invalid mod.

22:10 <rtp> alyssa: glmark2-es2-drm -bbuild is crashing badly. Current workaround for this is http://paste.debian.net/hidden/7634e528/. I've not tried yet to understand what's happening.

22:10 * rtp off to bed

22:14 <daniels> rtp: ok, in that case if it's receiving INVALID then it should never be passing it to KMS for any reason

22:50 stikonas_ has quit [Remote host closed the connection]

23:23 <alyssa> chrisf: Not ntirely sure, but that sounds about right sadly..

23:23 <alyssa> jcureton: Ooooooo, now *that* is an interesting bug!

23:23 <alyssa> So, from the desription, I'm very tempted to blame the polygon list

23:23 <alyssa> The reason is, this was one of the big disastrous differences between T720 and T760 that slowed down the port so much

23:25 <alyssa> On T760+, the polygon list is managed entirely by the GPU.

23:25 <anarsoul> so T720 has PLBU like Utgard?

23:25 <alyssa> The buffer itself, both with the blob and Panfrost (*), is set to be "invisible". That is, although the CPu allocates it, it is not actually mapped to be CPU readable/writeable.

23:25 <alyssa> We just supply the buffer and we're ok.

23:26 <alyssa> T720 is... Well, we were rather surprised to see this pattern broken.

23:26 <alyssa> With the T720 blob, the polygon list is seemingly mapped CPU-writeable (?!)

23:26 <alyssa> And, if we just leave it uninitialized, the GPU DATA_INVALID_FAULTs out when we try to render !

23:26 <anarsoul> I wonder if it has PLBU commands in it :)

23:26 <alyssa> Rather, Tomeu discovered we had to initiaiize the polygon list ourselves, CPU side. Which is somewhat bizarre, honestly.

23:27 <alyssa> anarsoul: I don't *think* so, but conceptually similar..?

23:27 <alyssa> jcureton: I'm wondering if we're only initializing half of what we're supposed to be, and the other the GPU is supposed to initialize it

23:27 <alyssa> Or we're initiailizing it half-wrong ;)

23:28 <alyssa> ....Speaking of, where is that initiialization code?

23:28 * alyssa doesn't see it at all -- maybe that code isn't even here in mainline and the fact it's missing explains the behaviour

23:29 <alyssa> Tomeu wrote code for that; maybe it never got merged for some reason.

23:30 <alyssa> Subject: [RFC] panfrost/midgard: Hack some bits to get things working on T720

23:30 <alyssa> was the patch in question

23:31 <alyssa> So the solution was (roughly -- this violates aliasing rules):

23:31 <alyssa> *((uint32_t *) tiler_polygon_list_body) = 0xA0000000;

23:32 <alyssa> (I think -- trying to forward port the logic; we changed a lot of the names since that patch was written early June)

23:34 <alyssa> I don't have hardware to test on, but that should be something to spring off

23:35 <alyssa> Meanwhile, I'm going to prototype a new RA/scheduler approach since I grok how to do this now !

23:40 * alyssa has been looking at some of the PPIR compiler improvements (@enunes has been doing cool stuff!) since Midgard is fancier PPIR

23:40 <anarsoul> alyssa: dropping some code is always nice :)

23:43 <anarsoul> if I understand correctly midgard is better organized utgard pp

23:43 <anarsoul> utgard pp isn't as bad as utgard gp but it could have been better

23:43 <anarsoul> (I'm still wondering who decided that 6 regs is enough...)

23:48 <alyssa> anarsoul: Yes, Midgard (the ISA, not the cmdstream) is pretty perfect for the architecture

23:48 <alyssa> It's just that the architecture turned out to be... unfashionable... hence Bifrost

23:49 <HdkR> <3