#panfrost on 2019-07-11 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

01:05 vstehle has quit [Ping timeout: 246 seconds]

03:32 _whitelogger has joined #panfrost

03:34 fysa has joined #panfrost

04:00 NeuroScr has joined #panfrost

04:35 NeuroScr has quit [Quit: NeuroScr]

05:00 vstehle has joined #panfrost

06:00 <tomeu> chewitt: don't know about amlogic :/

06:01 <tomeu> chewitt: wonder if we could find someone to be our amlogic champion :)

06:45 <chewitt> narmstrong is the person i'd normally point fingers at, but he's currently enjoying his summer vacation

06:48 <chewitt> tomeu: do you have any Amlogic S912 hardware?

06:49 <tomeu> chewitt: nope

06:57 <chewitt> can I fix that?

06:58 <tomeu> not sure, it doesn't seem that scalable

06:59 <tomeu> to collabora, it makes more sense to focus on rockchip, as we have made big investments in other parts of the hw

07:00 <tomeu> if we had a customer that wanted panfrost on amlogic, all would change, of course :)

07:39 stikonas has joined #panfrost

07:43 raster has joined #panfrost

07:46 stikonas has quit [Remote host closed the connection]

07:50 <chewitt> understood

07:54 <tomeu> it's not just getting things to work, btw, it's also keeping them working

07:54 <tomeu> so we need CI, and we need to keep that working so it doesn't disrupt development

07:56 raster has quit [Remote host closed the connection]

07:57 raster has joined #panfrost

08:02 <chewitt> Neil is the right guy.. Baylibre have all the right stuff in their lava lab afaik

08:09 <tomeu> we just need to wait then :)

09:07 <tomeu> alyssa: wonder if our CI should run astyle

09:07 <tomeu> so we don't have to choose later between having inconsistent style and having to re-run it again on the code base

09:12 <tomeu> alyssa: what's the reason for tiler_dummy, btw?

09:12 <tomeu> cannot we just use the tiler_polygon_list BO?

09:53 Elpaulo has joined #panfrost

10:43 <tomeu> alyssa: it's starting to look as if the hierarchy mask was totally different on T720

10:56 <tomeu> otherwise, the biggest difference I see with the blob is the depth stencil stuff in mali_shader_meta

11:21 davidlt has joined #panfrost

11:24 <tomeu> alyssa: have hacked the cmdstream to match libmali in terms of ds and blend, but keep seeing the same NULL dereference from the GPU

11:25 <tomeu> wonder if it could be related to how now the tiler jobs depend on the set_value job, but the vertex jobs execute before

11:25 <tomeu> that doesn't match what libmali is doing here

11:25 <tomeu> (set_value is submitted first in the chain without dependencies, then come the vertex jobs and then the tiler jobs)

11:43 <tomeu> alyssa: have pushed to https://gitlab.freedesktop.org/tomeu/mesa/commits/panfrost-T720

11:45 <tomeu> this is the working cmdstream: http://paste.debian.net/1091233/

11:45 <tomeu> this one corresponds to the pile of hacks I pushed above: http://paste.debian.net/1091234/

11:45 <tomeu> and this is what I get in dmesg: http://paste.debian.net/1091235/

11:47 <tomeu> the DATA_INVALID_FAULT is due to the polygon_list being empty, because the previous job chain didn't finish successfully

11:47 <tomeu> because of the NULL access

11:47 <tomeu> ah, that's a single frame from kmscube, but with only one face of the cube being rendered

12:49 guillaume_g has joined #panfrost

12:50 <alyssa> tomeu: I don't really want CI to run astyle because I mean

12:50 <alyssa> I'd rather push badly spaced but working code than no code at all

12:50 <alyssa> and CI slows things down

12:51 <alyssa> tomeu: I'm not 100% certain about tiler-dummy, but this is a very firm guess:

12:51 <guillaume_g> Hi. I am trying to use mali-t604 on my chromebook snow, but it fails on boot, I get:

12:51 <guillaume_g> [ 14.002299] panfrost 11800000.mali: clock rate = 200000000

12:51 <guillaume_g> [ 14.002411] Unhandled fault: asynchronous external abort (0x211) at 0x00000000

12:51 <guillaume_g> How could I move forward?

12:51 <alyssa> Mali T604 isn't supported (yet)..

12:52 <guillaume_g> it seems to fail a panfrost_gpu_soft_reset

12:52 <alyssa> Although that does seem to be a legitimate bug in the kernel

12:52 <alyssa> robher: ^^

12:52 <guillaume_g> alyssa: ok. What is missing?

12:52 <alyssa> tomeu: SET_VALUE jobs reset the polygon list; TILER jobs add to the polygon list; FRAGMENT jobs read the polygon list

12:53 <alyssa> tomeu: So if you have no draws, there is no TILER but also no SET_VALUE.

12:53 <guillaume_g> maybe my DTB fragment is wrong.

12:53 <alyssa> tomeu: .....Meaning your fragment-only frame will actually end up redrawing whatever you drew last frame!

12:54 <tomeu> alyssa: ah, but the CPU writes to it :)

12:54 <alyssa> tomeu: So the blobl's apparent solution is to have the actual per-FBO poltgon list, and also a dummy empty one they keep. They switch them out depending if there are draws

12:54 <alyssa> tomeu: Well, it's possible that on T720, the internal polygon list structures were invalid if zeroed out

12:55 <tomeu> yeah, that's how it looks

12:55 <alyssa> But with a field or two in the header set, they became valid but stil empty

12:55 <alyssa> Regardless, you really do need the tiler_dummy

12:55 <tomeu> ah, haven't looked in the header

13:23 <robher> guillaume_g, alyssa: need to look at kbase and see if there are any reset related errata for t604. I don't recall any, but there's lots for t604 in snow. Reset needs to work well on it because my understanding is that h/w has to be reset several times a second.

13:25 <alyssa> robher: That's terrifying.

13:26 <tomeu> well, there's a bunch of microseconds in a second :p

13:26 <robher> Can we just buy everyone that asks about snow a new chromebook...

13:27 <tomeu> works for me, I'm not looking forward to debugging panfrost on t6xx with 64-bit descriptors...

13:27 <alyssa> Juno :o

13:32 <HdkR> Someone still has a juno?

13:32 <HdkR> Madness

13:32 davidlt has quit [Remote host closed the connection]

13:32 <tomeu> hmm, not sure there's a 64-bit DDK for Juno

13:33 davidlt has joined #panfrost

13:35 <alyssa> tomeu: I don't see the difference with depth/stencil? The defaults between Gallium and libmali are probably different without going out of spec

13:36 <alyssa> tomeu: I thought Juno *is* 64-bit

13:36 <tomeu> alyssa: yeah, I think you can ignore all my suggestions before

13:36 <tomeu> I'm really out of ideas :)

13:37 <alyssa> tomeu: I'm lookin

13:37 <alyssa> First weirdness:

13:37 <alyssa> Why on *earth* is the blob allocating the polygon list as executable?

13:38 <alyssa> vt_sfbd.clear_flags differ

13:40 <alyssa> workgroups_z_shift differs but I doubt that affects anything

13:41 <alyssa> tomeu: vertex payload's gl_enables differ <------ this one probably matters

13:45 <guillaume_g> alyssa, robher: adding OPP, I do not have a crash anymore, but still external aborts: https://pastebin.com/kGnq1L3d

13:46 chewitt has quit [Quit: Zzz..]

13:57 <tomeu> I seem to have broken CI and have no idea how

14:24 <robher> guillaume_g: Looks to me like there's a problem accessing the GPU registers. Probably something else needs to be enabled and that could be exynos specific. There were some patches on the list for exynos support on panfrost. Do you have those?

14:25 chewitt has joined #panfrost

14:25 chewitt has quit [Client Quit]

14:27 hanetzer has quit [Changing host]

14:27 hanetzer has joined #panfrost

14:30 <guillaume_g> robher: no, I am using kernel 5.2.0 atm.

14:32 <guillaume_g> robher: dou you have a link, please?

14:32 <guillaume_g> *do

14:36 <tomeu> alyssa: well, the mir_foreach_instr_in_block_safe patch seems to have broken everything

14:40 chewitt has joined #panfrost

14:41 <robher> guillaume_g: "[PATCH v2 3/7] arm64: dts: exynos: Add GPU/Mali T760 node to Exynos5433", but I don't see anything extra needed on other chips (there's no 5250 support in the series though).

14:41 <robher> guillaume_g: maybe one of the exynos folks can help. It's going to be some clock, regulator, or power domain most likely.

14:42 <chewitt> tomeu: narmstrong: I've tested reverting all commits since the initial panfrost merge in 5.2, e.g. all here https://github.com/torvalds/linux/commits/master/drivers/gpu/drm/panfrost

14:42 <robher> guillaume_g: Or you just have the wrong address.

14:42 <chewitt> and its still shows the same "panfrost: probe of d00c0000.gpu failed with error -12"

14:46 <guillaume_g> robher: I will switch to the arndale board, at least I will have a serial to debug things.

14:49 <chewitt> tomeu: narmstrong: so I have to conclude it's something external to panfrost (something else in the kernel changed)

14:49 <chewitt> one last thing to try is an aarch64 image

14:50 <chewitt> as we normally build 'arm'

14:51 <tomeu> ah, guess that could be it

14:53 <tomeu> alyssa: guess I will have to revert it in the meantime

15:03 <guillaume_g> robher: adding pd_g3d to power-domains make the system to freeze :(

15:05 <robher> guillaume_g: perhaps the PD is on already, and then on probe failure it gets turned off.

15:06 raster has quit [Remote host closed the connection]

15:07 <guillaume_g> robher: and it could freeze the board?

15:08 <robher> guillaume_g: certainly if something else is relying on the PD to be default enabled.

15:09 <guillaume_g> robher: ok. The last line I have before the freeze is "panfrost 11800000.gpu: clock rate = 200000000"

15:11 <robher> guillaume_g: I think also there's some issues in the clean-up error paths in the panfrost driver interacting with runtime-pm. I'm not certain though.

15:17 <chewitt> tomeu: narmstrong: for kicks I fully reverted panfrost and then cherry-picked the commit that added panfrost to our 5.1 kernel sources .. and this works

15:18 <chewitt> so i've exported the two commits as patch files .. now doing 'diff -y' to compare them

15:23 <guillaume_g> robher: it seems that upstream dts fragments have only one clock, whereas my downstream reference code has multiple clocks

15:30 <guillaume_g> robher: it seems I am missing some configuration for the main clock :(

15:39 <chewitt> tomeu: narmstrong: robher: this is the (inverse) diff between the "add panfrost" 5.1 kernel patch and the initial 5.2 commit merged in mainline

15:39 <chewitt> http://paste.ubuntu.com/p/FmkJPGNpn3/

15:40 <chewitt> although when building with that patch added it doesn't compile http://paste.ubuntu.com/p/QQMFkZV4Bs/

15:42 <chewitt> could be my bad copy/pasting .. but something there is why T820 (S912) stopped working after the 5.2 bump

15:44 <robher> chewitt: there was some issue with 32-bit GPU VA being rejected by io-pgtable code. I'm not sure if Robin ever got a fix in.

15:47 <chewitt> we run a split 64/32 arrangement so kernel is 64-bot

15:47 <chewitt> s/bot/bit

15:50 <alyssa> tomeu: Hrm

15:53 <chewitt> alyssa: with the patch hacking .. I still see the black text problem I described before

16:13 <guillaume_g> robher: I got it "working". I mean the driver does not crash on boot. ;) https://pastebin.com/w9F6KKYt My problem was wrong clock + missing power-domain

16:14 <guillaume_g> robher: I hope what is printed makes sense

16:28 gtucker has quit [Ping timeout: 252 seconds]

16:28 tomeu has quit [Ping timeout: 252 seconds]

16:33 <alyssa> chewitt: :/

16:33 <alyssa> chewitt: Could you send the output with "MIDGARD_MESA_DEBUG=shaders" set?

16:34 <alyssa> And if I can't figure it out from that, maybe even with "PAN_MESA_DEBUG=trace" set (but it will be very large, so quit Kodi as soon as the black text appears)

16:41 <alyssa> It's just odd given that it worked fine before and our conformance numbers have been steadily _increasing_

17:23 <calcprogrammer1> I flashed a fresh install of Debian (unofficial arm64 this time) to my Rock Pi 4 and then built kernel, libdrm, and mesa as I was before. Can't confirm yet but it appears to be working now, no dmesg errors when running lightdm or kmscube. I'm remoted in right now so can't see screen. Are there any known issues with running a 32-bit distribution with Panfrost on arm64 hardware (rk3399)? Radxa's official Debian

17:23 <calcprogrammer1> build for the Rock Pi 4 is a 32-bit Debian armhf with an aarch64 kernel.

17:33 <alyssa> calcprogrammer1: 32-on-64 is rather broken right now, but we're working on it!

17:33 <alyssa> LibreELEC has a workaround but I don't recommend it at this time

17:58 stikonas has joined #panfrost

18:02 stikonas has quit [Read error: Connection reset by peer]

18:03 stikonas has joined #panfrost

18:40 davidlt has quit [Remote host closed the connection]

18:40 davidlt has joined #panfrost

19:05 BenG83 has joined #panfrost

19:44 davidlt has quit [Ping timeout: 244 seconds]

19:59 MistahDarcy has quit [Quit: Leaving]

20:11 jcureton has joined #panfrost

20:12 guillaume_g has quit [Quit: Konversation terminated!]

20:14 <jcureton> is there anything being done toward a platform quirks framework? i know people have added some vendor-specific compat flags but i haven't seen any patches toward building it

20:15 <jcureton> ^ within the kernel drm driver

20:54 jcureton has quit [Remote host closed the connection]

20:58 stikonas has quit [Remote host closed the connection]

21:05 jcureton has joined #panfrost

21:16 <robher> jcureton: didn't know we needed one.

21:18 <jcureton> robher: i'm trying to figure out if there's generically a need for one. i'm working on getting a T720 running as a backport on an SoC that definitely needs to handle some platform quirks. i've seen some conversation around amlogic devices also having some oddities

21:24 <robher> jcureton: We need to see what the changes needed are first, then we can decide if a 'framework' is needed. Sounds like overkill is my first thought.

21:26 <jcureton> robher: kind of what i figured. see example amlogic quirk here https://github.com/superna9999/linux/commit/df28223b151155ab0edd8419b0347a7135443fd3

21:28 <jcureton> the above is a simple one, my requirement is a bit broader requiring poking quite a few registers on my SoC outside of the GPU address space.

21:31 <jcureton> no issues maintaining mine in my tree, but if there's a wider need to dealing with platform-specifics i can try to make it upstreamable

21:43 <chewitt> alyssa: output from MIDGARD_MESA_DEBUG=shaders => http://ix.io/1Oca

21:43 <chewitt> let me know if you need the other output

21:51 <chewitt> partial output form PAN_MESA_DEBUG=trace => http://ix.io/1Oce

21:51 <chewitt> partial because it overflows the weeny journal buffer before I can login to stop kodi

21:59 <alyssa> chewitt: *eyes*

22:00 <chewitt> anything useful there ^ ?

22:01 <alyssa> chewitt: It'll take me a bit to chew through so maybe?!

22:01 <chewitt> happy hunting :)

22:01 <alyssa> :)

22:08 <alyssa> Here's sth

22:08 <alyssa> shift/extra_flags getting set for LINEAR

22:08 <alyssa> Probably not the issue but it's semantically nonsense and should be fixed

22:09 <alyssa> (Set to ~0)

22:09 <alyssa> for both attrs and varyings..

22:10 <alyssa> chewitt: I'm just mighty confused given our conformance status and Kodi is a very normal GLES app

22:14 <alyssa> chewitt: Ohhhhhhh I also did some work on blending, I wonder if that's messing with something

22:15 <alyssa> That said, the blend mode used in that log is the same on my local Kodi which works fine so

22:17 <alyssa> chewitt: Question: Is this a 32-bit build with the LE Panfrost patch or honest-to-goodness 64-bit?

22:18 <chewitt> 64-bit kernel and 32-bit userspace

22:18 <chewitt> standard LE config

22:18 <alyssa> Alright, that helps

22:18 <alyssa> (Well, it doesn't help the problem, but it might help me narrow the problem :P)

22:22 <alyssa> I'm guessing 32-bit support is buggy in some way, trying to think the easiest way to debug this

22:22 <alyssa> Let me check if CI keeps interesting logs

22:22 <chewitt> I can make an aarch64 image and test again .. but that will be overnight

22:23 <alyssa> If it's not too much trouble, that would certainly help narrow down the issue :)

22:23 <alyssa> But I know of a few outstanding 32-bit-related issues which I can bump the prio and look into now :)

22:23 <alyssa> Best case, it solves your bug

22:23 <alyssa> Worst case, it solves a bug you didn't know you had :p

22:24 <chewitt> sounds like a good idea

22:36 BenG83 has quit [Quit: Leaving]

22:50 <alyssa> chewitt: I'm collecting possible fixes in tomeu/fix32

22:50 <alyssa> Er, tomeu/mesa branch fix32

22:51 <alyssa> There's a small chance that branch will magically work better