#panfrost on 2020-12-09 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:30 icecream95 has joined #panfrost

00:32 nerdboy has quit [Excess Flood]

00:33 * macc24 got gnome running on duet

00:34 <macc24> aside from fact that it's super slow, it's fine

00:34 <HdkR> woo

00:34 alyssa has quit [Remote host closed the connection]

00:35 <macc24> it turns out that gdm dislikes /etc/default/locales being empty

00:35 icecream95 has quit [Ping timeout: 260 seconds]

00:38 <HdkR> Who needs a locale other than "C"? :P

00:39 <macc24> gdm needs it!

00:39 <macc24> for whatever reason

00:39 * macc24 thinks that gnome is cursed permamently

00:39 * HdkR looks at gnome-shell

00:39 nerdboy has joined #panfrost

00:39 <HdkR> yep

00:40 <macc24> on the other hand, phosh runs pretty well

00:40 <macc24> and is near edge of usability aside from fact that there are still some `issues

00:44 tomboy64 has quit [Remote host closed the connection]

00:45 tomboy64 has joined #panfrost

00:54 justin3 has quit [Ping timeout: 265 seconds]

01:20 icecream95 has joined #panfrost

01:25 stikonas has quit [Remote host closed the connection]

01:31 popolon has quit [Quit: WeeChat 3.0]

01:36 <kinkinkijkin> writing a hacky attempt at fixing this

01:37 <kinkinkijkin> just changing a magic constant to see if its fixed by changing that

01:37 <kinkinkijkin> and some other changes im testing

01:39 <kinkinkijkin> but the most important test is that there magic number

01:41 robmur01 has quit [Read error: Connection reset by peer]

01:42 robmur01 has joined #panfrost

01:44 <chewitt> kinkinkijkin share the patch once you've hacked it, I have an XU4 image for my distro that deadlocks trying to start kodi .. i'm always keen in t6xx experiments

01:45 karolherbst has quit [Remote host closed the connection]

01:46 karolherbst has joined #panfrost

01:55 kaspter has joined #panfrost

02:04 vstehle has quit [Ping timeout: 256 seconds]

02:06 bbrezillon has quit [Ping timeout: 272 seconds]

02:07 megi has joined #panfrost

02:20 kaspter has quit [Ping timeout: 256 seconds]

02:20 kaspter has joined #panfrost

02:24 archetech has quit [Quit: Konversation terminated!]

02:30 kaspter has quit [Ping timeout: 265 seconds]

02:30 kaspter has joined #panfrost

02:39 <kinkinkijkin> of course i am getting an issue with permissions

02:43 nerdboy has quit [Ping timeout: 256 seconds]

02:48 nerdboy has joined #panfrost

03:05 archetech has joined #panfrost

03:21 megi has quit [Ping timeout: 260 seconds]

03:22 megi has joined #panfrost

03:40 <kinkinkijkin> okay, fix confirmed bootable but i havent gotten panfrost to work at all on a working kernel yet so im installing 5.10

03:41 <kinkinkijkin> (its only in mesa side, didnt cause any issues loading on a crashing kernel ver)

03:42 kaspter has quit [Ping timeout: 265 seconds]

03:42 kaspter has joined #panfrost

04:27 <kinkinkijkin> okay, i have it booting armbian with the new kernel

04:27 <kinkinkijkin> okay, i need to reconfigure that kernel

04:29 <archetech> N2 is what needs attn. G52

04:29 <kinkinkijkin> okay, i can successfully hang the system very reproducibly

04:30 <kinkinkijkin> downside: reading logs is one of the things that hangs the system

04:30 <HdkR> archetech: Random morale boost?

04:31 <archetech> you need to back off

04:32 <kinkinkijkin> now what makes you say that, archetech?

04:33 <archetech> was at this HdkR guy who likes to harrras me

04:33 <kinkinkijkin> don't make demands in dev chats then

04:33 <archetech> \demand?

04:34 <archetech> how was that a demand

04:34 <kinkinkijkin> and he wasnt harassing, he was checking if it was a demand or not

04:34 <kinkinkijkin> you just entered and told us what apparently needs attention

04:34 <archetech> its a comment

04:35 <archetech> and an opinion

04:35 <kinkinkijkin> yes, and a very useless, brash comment

04:35 <HdkR> harassment claims are something that need to be taken seriously. You made an unrelated comment without any context.

04:36 <HdkR> Is there something specific about G52 that needs attention? Bifrost work is coming up already with G3x and G5x being the targets of choice already.

04:37 <HdkR> I recommend filing an issue at https://gitlab.freedesktop.org/mesa/mesa/-/issues for specific features or reproducible problems for developers to triage successfully.

04:38 <archetech> so this irc chan isnt for users of panfrost and errors they get?

04:38 <archetech> just devs ?

04:38 <kinkinkijkin> give the error

04:38 <kinkinkijkin> not "fix this device"

04:40 <HdkR> This channel can be used for discussion yes. It just needs context rather than a comment drop without context

04:40 <archetech> putting words in my mouth now eh

04:40 <archetech> context is obvious I said G52

04:41 <HdkR> G52 is in active development. There of course will be issues in the driver. What problem are you encountering this day?

04:41 <archetech> G52 also hangs freezes is also in context of what kinkinkijkin was seeing

04:42 <kinkinkijkin> im testing fixes for a specific device.

04:42 <archetech> thats my problem

04:42 <archetech> lots of n2 owners problem

04:43 <kinkinkijkin> so you've just come to tell us to divert all resources to a device that's already being worked on very heavily

04:43 <kinkinkijkin> driver work takes a long time, and not everyone working on this driver has incredibly much time, or an n2 for that matter

04:44 <archetech> wow you're defensive

04:45 <kinkinkijkin> and keep in mind a good handful of the developers involved with this driver are working for free or on individual sponsorship, though a lot of the work is done by folk employed to

04:46 <archetech> yes Im aware of what dev work is like

04:46 <kinkinkijkin> like I'm working on fixes right now with individual sponsorship, do not own an n2, and have 3 other projects im split between, two of which generated my individual sponsorship

04:46 <archetech> please continue on

04:47 <archetech> no need to educate me. for a simple g52 comment

04:47 <HdkR> archetech: dmesg or kernel logs should give information about what the problem is. Make sure to open a bug report with this information to be recorded for inactive developers to see when they come online

04:48 <archetech> right. will do

04:49 <kinkinkijkin> thanks, a bug report will do better yeah

04:49 <archetech> I do debugging for qt5 kde frameworks and plasme

04:49 <archetech> so far its not fun to run it on my N2+

04:49 <HdkR> Make sure to have reproduction steps in the issue of course, otherwise they won't know how to repro

04:50 <archetech> yup. I know ya need the disro pkg versions etc

04:50 <kinkinkijkin> i recognized your name, nice to know you guys over there are putting in some work on getting plasma running on panfrost

04:50 <archetech> what im doing when it freezes. etc

04:51 <archetech> im not a dev I there I build from source for the Linux from Scratch project

04:52 <kinkinkijkin> i ran plasma on libmali on an xu4 once, so far only recorded person ive found to even attempt it, wasnt aware other people would ever try running plasma on a device without advertised desktop gl

04:53 <kinkinkijkin> btw it wasn't usable, but it booted, in wayland

04:54 <archetech> exactly. that's what I'm troubleshooting for last 2-3 months.

04:55 <archetech> diff kernels mesa versions etc

04:56 <kinkinkijkin> when i did it with libmali, it required a version of libgbm with a set of patches i made that i lost :'D

04:56 <archetech> my tests are for helping armbian and odroid ubu 20.10

04:56 <archetech> not just myself or lfs

04:57 <archetech> libmali is fine for running gnome on wayland. it works well

04:57 <archetech> but panfrost is faster and is close to running plasma

04:58 <archetech> best libmali can do is run plasma on softpipe

04:58 <kinkinkijkin> suggestion: wait for more complete desktop gl support, use latest master of mesa and latest kernel rc (if it works on n2), and exercise patience/report crashes heavily verbosely

04:58 <archetech> thats what ive been doing all along

04:58 <kinkinkijkin> not much you can do but help

04:59 <archetech> I can give reports on prgress that I do on other irc's and forums

05:00 <kinkinkijkin> the most helpful thing right now would be providing an extra hand in the code

05:01 <archetech> I wish. no coding experience just configs and compiles

05:01 <kinkinkijkin> always time to learn

05:02 <archetech> I took assembly class on wasnt my cup of tea but was interesting

05:05 <archetech> I've read the Collabra blog too. good articles

05:05 <archetech> so im not the enemy. carry on :)

05:07 kaspter has quit [Ping timeout: 240 seconds]

05:08 kaspter has joined #panfrost

05:12 <kinkinkijkin> okay, so the hanging issue looked really similar to the one i had trying to compile a preemptible kernel with hmp awareness a few years back, and hmp awareness is still marked as experimental, so i turned that off, as well as a couple options that would screw with a non-hmp-aware scheduler on an hmp system

05:13 <kinkinkijkin> probably not in affectuous ways but always good to go too safe rather than waste an extra two hours reconfiguring, recompiling, and reinstalling the kernel over and over again

05:25 <archetech> <alyssa> daniels: Hoping to have the Bifrost scheduler for Christmas. Code name: Santa Clause. Scheduler is the piece G52 needs. Ill wait for the sleigh

05:27 youcai has quit [Ping timeout: 264 seconds]

05:27 ezequielg has quit [Read error: Connection reset by peer]

05:27 camus has joined #panfrost

05:28 kaspter has quit [Ping timeout: 258 seconds]

05:28 camus is now known as kaspter

05:28 ezequielg has joined #panfrost

05:28 youcai has joined #panfrost

05:38 <kinkinkijkin> hanging hasnt stopped

05:38 <kinkinkijkin> hmm

05:59 camus has joined #panfrost

06:00 kaspter has quit [Ping timeout: 256 seconds]

06:00 camus is now known as kaspter

06:00 vstehle has joined #panfrost

06:01 archetech has quit [Quit: Textual IRC Client: www.textualapp.com]

06:14 davidlt has joined #panfrost

06:29 <kinkinkijkin> local developer finds forbidden knowledge to halve kernel compilation time, click here to find out this one easy trick that is guaranteed to make recompiling your kernel more enjoyable

06:29 <kinkinkijkin> click the link and it just leads to a page containing instructions for disabling nouveau

06:33 <HdkR> Is it opening newegg.com and buying a Ryzen 5950X? :P

06:33 <kinkinkijkin> silly that's more than a halving

06:34 <kinkinkijkin> i like doing all my kernel compilation on-device

06:34 <kinkinkijkin> it's not because i dont want to learn how to crosscompile who told you that

06:34 <kinkinkijkin> it's because it forms a tighter bond between me and my hardare

06:35 <HdkR> At least the Linux kernel is one of the easier projects to cross-compile :D

06:36 <kinkinkijkin> i kinda wish nouveau wasn't absolutely massive btw

06:36 <kinkinkijkin> nouveau compilation was taking 53% of my kernel compilation time before

06:37 <kinkinkijkin> it's bigger than a barely-stripped-down kernel

06:39 <kinkinkijkin> probably required though since nvidia doesn't like cooperating and i feel like it probably contains unique bespoke firmwares for every device it supports

06:40 <kinkinkijkin> hdkr, how about those new AMD gpus? heard one of them was a little ridiculous

06:41 <kinkinkijkin> but also abandonment of hbm boo

06:41 <kinkinkijkin> (dw i know why, mostly joking)

06:42 <HdkR> ridiculous in what way?

06:42 <kinkinkijkin> power compared to their recent offerings overall

06:42 <HdkR> The performance is pretty good compared to Nvidia for gaming workloads

06:43 <HdkR> Unless you want to play RT games anyway

06:43 <kinkinkijkin> i understand hbm was causing pricing issues for high-perf gaming cards potentially

06:43 <HdkR> HBM is more expensive than GDDR yea

06:43 <kinkinkijkin> since it has to be printed at the same time with the same process as the core

06:44 <HdkR> ehhhh, it's on the same silicon substrate, doesn't necessarily need to match the process

06:44 <kinkinkijkin> leading to massive wafers, more material loss for each bunk wafer, more possible defect points

06:44 <kinkinkijkin> well

06:44 <HdkR> They are still separate silicon dies

06:44 <kinkinkijkin> oh really?

06:44 <kinkinkijkin> hmm

06:45 <HdkR> https://en.wikipedia.org/wiki/High_Bandwidth_Memory#/media/File:AMD_Fiji_GPU_package_with_GPU,_HBM_memory_and_interposer.jpg

06:45 <HdkR> See how there is a physical separation

06:45 <kinkinkijkin> i hope amd sticks to hbm for commercial and workspace offerings going forward though

06:45 <kinkinkijkin> where the pricing issue is a little less of an issue

06:46 kaspter has quit [Ping timeout: 258 seconds]

06:46 camus has joined #panfrost

06:47 <HdkR> Depends on the market, GDDR6X is still really high bandwidth if your bus is wide enough

06:47 <HdkR> and if they tied a larger memory bus to the Infinity Cache idea then it may be good enough for a lot of work loads

06:48 <kinkinkijkin> i know for ocl raytracing specifically hbm is a ridiculous advantage

06:48 <HdkR> Yea, Ray tracing is almost entirely bandwidth bounded

06:48 camus is now known as kaspter

06:48 <HdkR> both to vram AND to caches

06:48 <kinkinkijkin> ex. radeon vii being the fastest card available for luxcore

06:49 <kinkinkijkin> 16gb of ludicrously low-latency high-bandwidth memory is a huge boon

06:50 <HdkR> Which Nvidia's RTCore acceleration in Luxcore crushes perf there

06:50 <HdkR> So hard sell for professionals even if they want to go AMD

06:51 <kinkinkijkin> hmm, actually havent used luxcore recently enough to know they support that now

06:51 <kinkinkijkin> neat

06:52 <HdkR> Yea, they added OptiX support in....2.5?

06:54 <HdkR> https://www.legitreviews.com/wp-content/uploads/2020/11/luxmark-geforce3060ti.jpg

06:54 <kinkinkijkin> was about to dip for a second since my kernel finished building, but ive got a non-booter

06:54 <HdkR> oop, https://www.legitreviews.com/wp-content/uploads/2020/12/luxmark.jpg

06:54 <HdkR> Better one since it has the 6900 XT

06:55 <kinkinkijkin> now that's a crush

06:56 <HdkR> Sadly anything that manages to go over the infinity cache size will fall off a cliff

06:58 <HdkR> Which is why some benchmarks of the 6900XT showed only marginally performance loss going from 1440p to 4k. 1440p had already fallen off

06:59 <kinkinkijkin> kernel building again, think ive figured out the non-booting

07:00 <kinkinkijkin> silly me forgot the scheduler doesnt like preemption on the xu4

07:00 <HdkR> :D

07:00 <kinkinkijkin> next to see if i figured out the hanging, the hanging resembles an unstable overclock hang

07:01 <kinkinkijkin> but the device isnt over or under clocked

07:02 <kinkinkijkin> could just be that my xu4 is getting a little older than optimal and came with a small voltage regulator issue

07:04 <kinkinkijkin> within tolerances for shipment and not at all likely to become an issue (vr in question is for the hdmi phy) but it's worried me forever

07:04 <HdkR> https://imgur.com/a/O0tGklE Fun bench result of a game falling out of AMD's cache :D

07:19 <kinkinkijkin> 19 minute kernel build, impressive reduction by removing nouveau

07:21 <kinkinkijkin> aaaaaand i have to do it again

07:39 camus has joined #panfrost

07:39 kaspter has quit [Ping timeout: 260 seconds]

07:39 camus is now known as kaspter

07:45 <kinkinkijkin> 20 minutes that time

08:11 bbrezillon has joined #panfrost

08:17 <kinkinkijkin> okay, got the random hangs stopped

08:17 <kinkinkijkin> now the only hang left is starting sway

08:22 <kinkinkijkin> alright, my fix doesnt cause corruption and seems to stop the flickering, i just cant get the device to actually keep from hanging, it's a coinflip every time i open a graphical application

08:23 <kinkinkijkin> has been since 5.9

08:24 <kinkinkijkin> successfully opened sway and terminator, then terminator loading bash hung the system

08:24 <bbrezillon> kinkinkijkin: do you have kernel traces?.

08:24 <bbrezillon> page faults, job timeouts, ...

08:25 <kinkinkijkin> not that i know of, tell me how to collect these and ill get one immediately

08:25 <kinkinkijkin> fun note is that i never got as far as terminator without my fix, and i see no more flickering in sway when moving mouse with this fix

08:26 <kinkinkijkin> it was EXACTLY what i thought it would be, a buffer was being improperly sized due to host differences

08:27 <kinkinkijkin> i just Used A Magic Number to double the size of the buffer and the flickering went away and the system is minourly more stable

08:29 <HdkR> =o

08:29 icecream95 has quit [Ping timeout: 258 seconds]

08:29 <kinkinkijkin> oop the random hang came back

08:29 bbrezillon has quit [Read error: Connection reset by peer]

08:38 bbrezillon has joined #panfrost

08:43 <bbrezillon> kinkinkijkin: none of that should hang the system though

08:43 <bbrezillon> if you have GPU faults/timeouts they should appear in the kernel logs

08:50 <kinkinkijkin> every hang is punctuated with just uh

08:50 <kinkinkijkin> vdd_ldo12: disabling

08:50 <kinkinkijkin> and

08:50 <kinkinkijkin> vdd_g3d: disabling

08:51 <kinkinkijkin> absolutely no other info at crash time

08:54 <kinkinkijkin> both of these are xu4-specific names so ill have to go ask in #odroid

08:59 <bbrezillon> can you try to disable runtime PM?

09:00 <bbrezillon> echo -1 > /sys/devices/platform/ff9a0000.gpu/power/autosuspend_delay_ms

09:00 <bbrezillon> the path should be slightly different though

09:03 <kinkinkijkin> echo: write error: input/output error

09:03 <kinkinkijkin> from root

09:04 <bbrezillon> find /sys -name autosuspend_delay_ms|grep gpu

09:05 <kinkinkijkin> instant hang

09:06 <kinkinkijkin> going to check kern.log again now

09:08 <bbrezillon> what, the find made it hang?

09:09 <kinkinkijkin> no, setting it successfully

09:10 <kinkinkijkin> reproducible

09:11 <kinkinkijkin> is there a default setting for this in kernel config? ill set it and rebuild and see if that helps

09:15 nlhowell has joined #panfrost

09:19 camus has joined #panfrost

09:20 kaspter has quit [Ping timeout: 260 seconds]

09:20 camus is now known as kaspter

09:24 <bbrezillon> for setting what?

09:24 <kinkinkijkin> autosuspend_delay_ms

09:25 <bbrezillon> not that I know of, but you can hack the driver to disable runtime-PM

09:25 <bbrezillon> but you said it was not helping, right?

09:27 <kinkinkijkin> it was hanging immediately to change that value

09:27 <kinkinkijkin> so it's something to do with pm somewhere, whether that's in panfrost or elsewhere

09:27 nlhowell has quit [Remote host closed the connection]

09:28 <kinkinkijkin> which file should i go to in order to hack this out?

09:29 <kinkinkijkin> mightve found it, panfrost_devfreq.c, at void panfrost_devfreq_suspend?

09:34 nlhowell has joined #panfrost

09:36 <kinkinkijkin> compiling with that function replaced with a dummy, since it does seem to be what is called for autosuspend, though it's a bit brutish

09:39 davidlt has quit [Ping timeout: 240 seconds]

09:58 <bbrezillon> kinkinkijkin: that should do the trick => https://gitlab.freedesktop.org/-/snippets/1352

09:59 <bbrezillon> but we should also investigate on why suspend/resume cause those hangs :)

10:02 <bbrezillon> tomeu: ^

10:06 <kinkinkijkin> that's no longer crashing now

10:06 <kinkinkijkin> all that's left is the random hangs

10:09 <kinkinkijkin> wait, i solved the random hangs somehow, turns out my fix is now causing a hang it seems

10:10 stikonas has joined #panfrost

10:13 <kinkinkijkin> weird, fix slaughtered any notion of stability but fixed the bug, with a single magic number

10:14 <kinkinkijkin> also resulted in a massive perf improvement

10:14 raster has joined #panfrost

10:19 <kinkinkijkin> okay, the bug is where i thought it was but it wasn't what i thought it was

10:19 <kinkinkijkin> hardware bug of some sort

10:19 <kinkinkijkin> or a quirk

10:21 <kinkinkijkin> it's fixed witth something extremely similar to the fix for what i thought it was though

10:22 <kinkinkijkin> i dont quite know how to make diffs by hand (never had a reason to), this is going to be hard to get across

10:26 <urjaman> ... yeah that's why there's a program for that? o.O

10:26 <kinkinkijkin> no i mean

10:26 <kinkinkijkin> i havent run the program by hand

10:26 <kinkinkijkin> always had git do it for me

10:27 <urjaman> diff -u file1 file2

10:28 <urjaman> (but yeah git makes it easier, just have the stuff be in git and life gets a lot easier...)

10:29 <urjaman> I tend to stuff things into git (just git init, and git add ., commit that as initial state and off to hacking) even if it natively isnt, just to know what i changed

10:32 <daniels> also `git diff` can output a diff ...

10:40 <kinkinkijkin> with my hack, there's no flickering but buffering too much data results in a hang

10:48 camus has joined #panfrost

10:50 kaspter has quit [Ping timeout: 260 seconds]

10:50 camus is now known as kaspter

10:51 <kinkinkijkin> going off

11:12 davidlt has joined #panfrost

11:39 chewitt has quit [Quit: Zzz..]

11:53 alpernebbi has joined #panfrost

12:01 chewitt has joined #panfrost

12:19 <robmur01> re XU4: anything involving "half of" anything instantly makes me suspicious of caching issues - T628 with two core groups is the weirdo where half the GPU isn't cache-coherent with the other half

12:19 <robmur01> there are potentially some flushes that we might need to do there that we'd never need to do on anything else

12:21 nlhowell has quit [Ping timeout: 260 seconds]

12:23 <stepri01> robmur01: Yeah - there's a TODO about that in panfrost_job_write_affinity(): "Eventually we may need to support [...] h/w with multiple (2) coherent core groups"

12:26 <robmur01> stepri01: true, I guess scheduling data-dependent jobs behind the same L2 is probably more desirable than brute-force flushing both L2s all the time :)

12:27 <robmur01> (I failed to consider that we also have control of that...)

12:27 <stepri01> ideally user space and kernel work together on it. where jobs don't have data dependencies they can be run on all cores (with occasional flushes as necessary), but sometimes there are data dependencies in which case it's best to restrict to a coherent set of cores

12:28 <stepri01> kbase gives the options to user space to work out how to handle it

12:29 <stepri01> I haven't looked into what Panfrost user space does. I suspect the main issue is the vertex shading to tiler coherency (the tiler being in the first core group)

12:45 alyssa has joined #panfrost

13:06 patrik has joined #panfrost

14:03 chewitt has quit [Ping timeout: 240 seconds]

14:05 kaspter has quit [Ping timeout: 258 seconds]

14:06 kaspter has joined #panfrost

14:09 <macc24> kinkinkijkin: yeah defconfig for arm64 is quite big

14:10 <macc24> this makes pretty small kernel, it's hopefully bare minimum + mtk drivers, https://github.com/Maccraft123/Cadmium/blob/master/kernel/config-duet

14:10 kaspter has quit [Ping timeout: 265 seconds]

14:48 raster has quit [Quit: Gettin' stinky!]

14:50 raster has joined #panfrost

14:55 robmur01_ has joined #panfrost

14:57 <macc24> robmur01_: wouldn't cache coherency decrease performance due to effective cache size being lowered if all cores have local copy of cache of other cores?

14:58 robmur01 has quit [Ping timeout: 258 seconds]

14:58 chewitt has joined #panfrost

15:00 robmur01_ is now known as robmur01

15:01 <robmur01> macc24: I don't think you have the right idea of how coherency works :/

15:02 <robmur01> if two cores are working on the same data, there's no "effective size" difference between both holding a line in their own cache without the other's knowledge, and both holding a line in their own cache in a shared state with the ability to snoop updates from each other

15:03 raster has quit [Quit: Gettin' stinky!]

15:04 raster has joined #panfrost

15:05 <macc24> okay

15:05 <robmur01> if only one core is using the data, it can still hold that line in a unique state by itself - it only gets shared (and thus copied) if somebody else actually needs it at the same time

15:07 <robmur01> https://en.wikipedia.org/wiki/MOESI_protocol for example

15:11 <robmur01> (also note that anything I say about coherency is likely to be a mishmash of AMBA ACE terminology which almost certainly doesn't represent what any GPU is using internally...)

15:12 * robmur01 is still "interconnect guy" far more than "GPU guy"

15:26 <tomeu> bbrezillon: guess it should be easy to come up with a test case for igt that reproduces that

15:28 davidlt has quit [Ping timeout: 240 seconds]

15:33 bschiett has joined #panfrost

15:33 kaspter has joined #panfrost

15:34 <bschiett> hi all, trying kmscube with panfrost gives me this, any ideas?

15:34 <bschiett> /dev/dri/card0 does not look like a modeset device

15:34 <bschiett> drmModeGetResources failed: Operation not supported

15:34 <bschiett> failed to initialize legacy DRM

15:34 <bschiett> using stable kernel 5.9.12 with buildroot 2020.08.2

15:34 <macc24> try other debices using -d parameter

15:35 <bschiett> @macc24 I have card0 and renderD128 in /dev/dri, that's all

15:35 <macc24> what device do you have?

15:35 <bschiett> @macc24 rk3288

15:36 <bschiett> so mali T760 MP4

15:37 <bschiett> see https://pastebin.com/T98KyCSD and https://pastebin.com/kdfv14Gc for strace kmscube

15:37 <macc24> does /sys/devices/platform/ffa30000.gpu exist?

15:38 <bschiett> @macc24 yes

15:38 <macc24> how about /sys/module/panfrost?

15:39 <bschiett> exists also

15:39 <macc24> what's in /sys/devices/platform/ffa30000/of_node/status ?

15:39 <bschiett> okay

15:40 <macc24> do you have vgem enabled in kernel config?

15:41 <macc24> and what dts is your device using?

15:42 <bschiett> checking for vgem

15:42 <bschiett> DRM_VGEM is not enabled in kernel. should it be?

15:42 <macc24> i think yes

15:42 <macc24> what dts is your device using?

15:43 <bschiett> I have a custom dts for my board based on rk3288-firefly-reload-core.dtsi

15:43 <robmur01> AFAIK you should have two /dev/dri/card<n> entries, one for the display (which is the one kmscube is looking for) plus another for the GPU

15:43 <bschiett> if you check https://pastebin.com/T98KyCSD you can see I still have an issue with binding my lvds driver, I'm not sure if this could be the reason it's not working?

15:44 <robmur01> I'm slightly puzzled how you could have a display at all without the DRM driver having registered properly :/

15:44 <macc24> robmur01: simple-framebuffer?

15:44 <bschiett> I am going to add DRM_VGEM and then report back, give me a few minutes

15:45 <macc24> bschiett: can you set ROCKCHIP_LVDS to n too?

15:45 <bschiett> @macc24 will do

15:45 <macc24> it's near rockchip drm options

15:45 <robmur01> macc24: aha, yes, that would make sense :)

15:47 <robmur01> OK, so it sounds like it's purely an issue with getting rockchip-drm to probe at all, and nothing to do with panfrost (which *has* worked just fine)

15:48 <macc24> robmur01: my guess is that rockchipdrm needs lvds to exist to if has support for lvds compiled into it

15:50 <robmur01> bschiett: do you have the DT graph entries describing the connection between VOP and LVDS? That's most likely what the "can't find port" thing is about

15:53 <robmur01> they might need some massaging around the VOPB/VOPL shenanigans

15:53 <bschiett> ok here is the dmesg output - https://pastebin.com/JxWxymU6

15:54 <bschiett> DRM_VGEM enabled and LVDS disabled

15:55 <bschiett> @robmur01 I previously had my lvds stuff working but I had hacked the timings into one of the simple panel modules, and I now upgraded to 5.9.12 and still need to figure out how to properly add my LVDS timings for my display without hacking into any of the drivers and do it properly all in the DTS.

15:56 <bbrezillon> tomeu: sure, anyone volunteering? :)

16:02 <robmur01> OK, that's probably good, but I guess it's still possible that something's changed WRT the endpoint parsing. That -EINVAL still seems most likely to stem from DT stuff to me

16:02 <bschiett> @robmur01 are you talking about this line in the dmesg output? [ 0.964715] rockchip-drm display-subsystem: master bind failed: -22

16:08 <bschiett> @robmur01 here are the lvds related nodes - https://pastebin.com/FHX0sguw

16:24 nerdboy has quit [Ping timeout: 256 seconds]

16:25 <bschiett> @robmur01 the connection between VOP and LVDS seems to be in rk3288.dtsi, and VOPB/VOPL are enabled in rk3288-firefly-reload-core.dtsi

16:25 <bschiett> @robmur01 in my DTS I enable the LVDS node

16:27 <alyssa> dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_mag is evil

16:36 <kinkinkijkin> if it's a cache issue then it makes sense with my hack

16:36 nlhowell has joined #panfrost

16:39 <robmur01> bschiett: according to the bindings you need a further graph edge between the LVDS controller and the panel as well - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/display/rockchip/rockchip-lvds.txt

16:39 <robmur01> that appears to be what rockchip_lvds is looking for and complaining about

16:40 <bschiett> @robmur01 yeah, I can see that ports { ... } is missing so I just added that and I'm recompiling.

16:40 <robmur01> I guess the overall initialisation failure is just because the VOP finds no valid outputs to bind to

16:40 <kinkinkijkin> my hack just doubles the length of elements and buffers in memory; providing one core group with 0s and the other with data rather than splitting the data down the middle between core groups then giving unintentional memory would be what my hack is doing in this case

16:40 <kinkinkijkin> er, correction, doubles the size of allocations

16:42 <kinkinkijkin> except, not every allocation, which would explain why buffering too much data leads to a hang

16:44 <kinkinkijkin> the hang might just be a panic from the panfrost driver segfaulting

16:46 <bschiett> @robmur01 I now have this - [ 0.964437] rockchip-lvds ff96c000.lvds: [drm:rockchip_lvds_bind] *ERROR* failed to find panel and bridge node

16:47 <kinkinkijkin> okay, i need to figure out everything i need to double up with 0s and i may have a suggestion for an actual patch in a few hours

16:47 <macc24> kinkinkijkin: \o/

16:47 <kinkinkijkin> well, not an actual patch, a temp hack that can actually be patched in

16:47 <kinkinkijkin> but still

16:54 <robmur01> bschiett: at this point it's probably one for #dri-devel and/or #linux-rockchip - I don't have any actual experience with the LVDS driver or how it's supposed to work in general

16:56 <robmur01> my only guess would be to try making sure the panel driver has probed first

16:58 <robmur01> it appears it *should* probe-defer and wait for one, but I'm just reading code at face value here

16:59 <bschiett> @robmur01 ok thanks, i'll do some more hunting

17:16 davidlt has joined #panfrost

18:20 BorgCuba has joined #panfrost

18:32 patrik has quit [Quit: Leaving]

18:39 Green has quit [Quit: Ping timeout (120 seconds)]

18:39 Green has joined #panfrost

18:41 <tomeu> bbrezillon: I wouldn't mind working on it :)

18:43 <kinkinkijkin> kernel 5.4, panfrost userspace side refuses to load, kernel 5.10 kernel panic trying to use more than a tiny amount of data

18:45 <kinkinkijkin> doubling the size of various buffers has different affect depending the buffer, cant quite remember all the ones i tested since most just caused instant hanging

18:48 <kinkinkijkin> my best guess is that the separate core groups only need two caches for *a couple* things.

18:54 <kinkinkijkin> still hanging when terminator tries to load zsh regardless of hack in place or not, but doubling the size of (MALI_ATTRIBUTE_LENGTH * vs->attribute_count) in creation of struct panfrost_ptr T in function mali_ptr panfrost_emit_vertex_data removes most of the flickering and allows terminator to load before crashing

18:55 <kinkinkijkin> inside of pan_cmdstream.c

19:00 <alyssa> ---Hmm

19:01 <kinkinkijkin> OH

19:01 <kinkinkijkin> got it to load zsh successfully

19:01 <alyssa> :)

19:01 <alyssa> L1253 pan_cmdstream.c has a hack for bifrost

19:01 <alyssa> what happens if you do that on your board too? (it should be harmless, but might help? idk)

19:02 <kinkinkijkin> flickering persists but it's not half of renderables now with current setting

19:02 <kinkinkijkin> i will try that

19:03 <kinkinkijkin> L1253, which function is that in and what is the first word on that line? i am using gnu nano because installing vim is hard when your wifi drivers don't work

19:03 <alyssa> emit_vertex_data

19:03 <kinkinkijkin> alright

19:04 <alyssa> pan_pack(&bufs[k]...)

19:04 <alyssa> "We need an empty.."

19:07 <kinkinkijkin> trying that out

19:08 <kinkinkijkin> no change in behaviour, lemme try that removing my fix

19:09 <kinkinkijkin> great, my rtc has stopped working

19:13 <alyssa> You know, this would be easier if I didn't care about it working.. :p

19:14 <kinkinkijkin> rtc stopped working independant of this

19:20 <kinkinkijkin> no change in behaviour from using that line

19:21 <kinkinkijkin> with or without my fix

19:22 <alyssa> Ack

19:23 archetech has joined #panfrost

19:24 <kinkinkijkin> i got an error on-screen trying to run x out of curiosity

19:24 <kinkinkijkin> gpu sched timeout

19:24 <kinkinkijkin> gpu soft reset timeout

19:25 <kinkinkijkin> and a bunch of trace info in binary form

19:25 <kinkinkijkin> x actually server a useful purpose

19:26 <kinkinkijkin> the deadlock is from the scheduler failing to stop cpus during kernel panic

19:27 <kinkinkijkin> also getting a fair few messages in this tracelog about

19:27 <kinkinkijkin> cpu idle driver crashing

19:28 <kinkinkijkin> alyssa: bbrezillon tomeu idk if this is completely the right pings, got a tracelog on screen just now, see above

19:28 <kinkinkijkin> i can reproduce this and take video with my chromebook

19:32 kaspter has quit [Quit: kaspter]

19:40 stikonas has quit [Ping timeout: 272 seconds]

19:43 stikonas has joined #panfrost

19:46 alpernebbi has quit [Quit: alpernebbi]

19:46 davidlt has quit [Ping timeout: 240 seconds]

20:01 <kinkinkijkin> https://drive.google.com/file/d/1MR0oullOdDwIyfCHuq1wMLxbWaxVTK21/view?usp=sharing

20:02 <kinkinkijkin> might as well get video of sway too

21:04 cphealy_ has quit [Remote host closed the connection]

21:19 raster has quit [Quit: Gettin' stinky!]

21:27 stikonas has quit [Ping timeout: 272 seconds]

21:28 stikonas has joined #panfrost

21:40 archetech has quit [Quit: Konversation terminated!]

22:00 <kinkinkijkin> looking through historical errors with the same set of errors on kernel panic, looks related to devfreq

22:07 rando25892 has quit [Ping timeout: 260 seconds]

22:09 archetech has joined #panfrost

22:14 rando25892 has joined #panfrost

22:20 archetech has quit [Read error: Connection reset by peer]

22:21 archetech has joined #panfrost

22:37 <bschiett> @robmur01 @macc24 got it working by using panel-lvds in my dts file :-) now back to panfrost :-)

22:38 <bschiett> still getting this though ... not sure what i'm missing in my mesa config in buildroot?

22:38 <bschiett> [root@rockchip:/tmp]# kmscube

22:38 <bschiett> MESA-LOADER: failed to open rockchip (search paths /usr/lib/dri)

22:38 <bschiett> failed to load driver: rockchip

22:38 <bschiett> MESA-LOADER: failed to open kms_swrast (search paths /usr/lib/dri)

22:38 <bschiett> failed to load driver: kms_swrast

22:38 <bschiett> MESA-LOADER: failed to open swrast (search paths /usr/lib/dri)

22:38 <bschiett> failed to load swrast driver

22:38 <bschiett> Segmentation fault

22:38 <bschiett> [root@rockchip:/tmp]# ls /usr/lib/dri

22:38 <bschiett> panfrost_dri.so

22:38 <bschiett> [root@rockchip:/tmp]#

22:44 <urjaman> i think kmsro needs to be enabled too? (that makes the rockchip etc stubby drivers to glue the panfrost onto whatever display controllers... sorta kinda i think lol)

22:44 <bschiett> @urjaman checking

22:46 <bschiett> @urjaman BR2_PACKAGE_MESA3D_GALLIUM_DRIVER_PANFROST=y will cause BR2_PACKAGE_MESA3D_GALLIUM_KMSRO=y to be set

22:46 <bschiett> @urjaman BUT ... there is also BR2_PACKAGE_MESA3D_GALLIUM_DRIVER_KMSRO and that one is NOT set

22:47 <bschiett> @urjaman so i'm wondering if this is actually what needs to be set? if that is the case then the rule in buildroot is not correct.

22:48 <urjaman> idk about how buildroot does it, but kmsro is specified in the same driver list as panfrost ... so in that way the _DRIVER_ one seems the one you need (eg you configure for the gallium drivers panfrost,kmsro)

22:48 <urjaman> and the non-driver one is a typo/thinko or something else idk?

22:49 <bschiett> @urjaman i'll try enabling it

23:07 <bschiett> @urjaman that didn't fix it, still the same thing

23:16 icecream95 has joined #panfrost

23:18 <icecream95> bschiett: What happens if you run 'ln -s panfrost_dri.so /usr/lib/dri/rockchip_dri.so' ?

23:20 <kinkinkijkin> crashes ive been getting seem to be related to voltage missets

23:21 <kinkinkijkin> raised the min value of vdd_g3d by 300mV and the crashes got immediately much more predictable, and entirely tied to high gpu load

23:25 <bschiett> @icecream95 [root@rockchip:/usr/lib/dri]# kmscube

23:25 <bschiett> failed to bind extensions

23:25 <bschiett> failed to load driver: rockchip

23:25 <bschiett> MESA-LOADER: failed to open kms_swrast (search paths /usr/lib/dri)

23:25 <bschiett> failed to load driver: kms_swrast

23:25 <bschiett> MESA-LOADER: failed to open swrast (search paths /usr/lib/dri)

23:25 <bschiett> failed to load swrast driver

23:25 <bschiett> Segmentation fault

23:25 <bschiett> (after ln -s panfrost_dri.so rockchip_dri.so)

23:27 <icecream95> bschiett: That seems to indicate that kmsro wasn't built at all

23:33 <bschiett> @icecream95 I found this in package/mesa3d/Config.in:

23:33 <bschiett> 84 # Quote from mesa3d meson.build: "kmsro driver requires one or more

23:33 <bschiett> 85 # renderonly drivers (vc4, etnaviv, freedreno)".

23:34 <bschiett> maybe this is the reason kmsro wasn't built (?)

23:37 <alyssa> kinkinkijkin: that's almost definitely a kernel issue then, not mesa

23:37 <kinkinkijkin> yep

23:37 <alyssa> aka not my problem™️ :p

23:37 <kinkinkijkin> im going through dts and changing values, seeing results

23:37 <kinkinkijkin> i am Not Enthused

23:38 <kinkinkijkin> the flickering is still somewhat happening though

23:42 <macc24> bschiett: got it working?

23:43 <macc24> alyssa: ™ is better than ™️ :P

23:45 stikonas has quit [Remote host closed the connection]

23:45 stikonas has joined #panfrost

23:49 <kinkinkijkin> typos in voltage values suck

23:51 <bschiett> @macc24 I got lvds working but panfrost not yet (see above), @icecream95 thinks it has to do with kmsro not being built by buildroot

23:51 <macc24> kmsro <thinking> seems likely

23:52 <bschiett> @macc24 the weird thing is that I see kmsro header files in my build dir. but it seems it is not being built, unless I enable freedreno stuff etc. makes no sense

23:52 rando25892 has quit [Ping timeout: 264 seconds]

23:53 <macc24> do you have kmsro in -Dgallium-drivers option in meson in mesa compiling script thing?

23:53 <bschiett> @macc24 buildroot is building, will check logs in a minute

23:56 BorgCuba has quit [Quit: Leaving]

23:56 <bschiett> @macc24 i found an override option in mesa3d.mk which does not set -Dgallium-drivers=... correctly, going to change this now and try again