#panfrost on 2021-04-23 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:00 stikonas has quit [Remote host closed the connection]

00:05 <alyssa> Ouch, figured out what the draw_buffers fails are about.. affects all the way back to ES3.0

00:05 <alyssa> will fix tomorrow

00:14 vstehle has quit [Ping timeout: 252 seconds]

01:00 atler has quit [Killed (rothfuss.freenode.net (Nickname regained by services))]

01:00 atler has joined #panfrost

01:17 * icecream95 is hitting a stack overflow in Firefox

01:17 <icecream95> #250 0x0000ffffde35fec8 in panfrost_batch_submit

01:20 <alyssa> let me guess, blitter recursion?

01:20 <icecream95> Nope, just a *really* long dependency chain

01:23 <alyssa> blink

01:30 <icecream95> Splitting panfrost_batch_submit so that recursing through dependencies happens in a different function to actually submitting seems to help

01:34 kaspter has quit [Ping timeout: 240 seconds]

01:34 kaspter has joined #panfrost

01:34 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

01:36 jernej has joined #panfrost

01:36 jernej has quit [Client Quit]

01:39 jernej has joined #panfrost

01:44 <icecream95> The function used about 2 KB of stack space, times 250 levels of recursion to be 500KB of stack, but the stack is only 256 KB big

01:45 <icecream95> ("Shouldn't it have segfaulted long before 250 levels if the stack was that small?" "Uhh...")

01:53 <HdkR> icecream95: Automatic stack growing

01:59 <icecream95> Answer: I was using the stack space for a different build of Mesa, in the instance where it recursed 250 times it used only 1 KB of stack

02:07 kaspter has quit [Ping timeout: 252 seconds]

02:07 kaspter has joined #panfrost

02:21 kaspter has quit [Ping timeout: 240 seconds]

02:25 <icecream95> I don't remember Xorg being this broken before--even glxgears crashes with MALI_BIFROST_TILER_pack: Assertion `values->fb_width >= 1' failed

02:27 kaspter has joined #panfrost

02:27 kaspter has quit [Excess Flood]

02:28 kaspter has joined #panfrost

02:32 kaspter has quit [Ping timeout: 240 seconds]

02:36 kaspter has joined #panfrost

02:56 kaspter has quit [Quit: kaspter]

02:57 kaspter has joined #panfrost

02:57 davidlt has joined #panfrost

02:57 camus has joined #panfrost

03:00 camus1 has joined #panfrost

03:01 kaspter has quit [Ping timeout: 265 seconds]

03:01 camus1 is now known as kaspter

03:02 camus has quit [Ping timeout: 252 seconds]

03:04 kaspter has quit [Read error: Connection reset by peer]

03:05 kaspter has joined #panfrost

03:09 kaspter has quit [Ping timeout: 252 seconds]

03:19 kaspter has joined #panfrost

03:38 <icecream95> The glxgears crashes start are caused by one of the commits in cfe9bca9120..9d0ad7fd2e1

03:38 <alyssa> are those AFBC

03:39 <icecream95> alyssa: No, pan_image stuff

03:40 <alyssa> Oh.

03:40 * icecream95 sees !10415 fixes a commit in that range

03:40 <alyssa> that only affects es31 afaik

03:40 <alyssa> (the fix i mean)

03:42 <icecream95> All the commits in that range cause Xorg to give DATA_INVALID_FAULTs, making further bisecting difficult

03:44 <alyssa> bbrezillon: ^^^^

03:53 <icecream95> erm, efcb1e494b7..9d0ad7fd2e1

04:02 <icecream95> The bad commit is 9d0ad7fd2e1 ("panfrost: Patch the gallium driver to use pan_image_layout_init()") itself

04:02 <alyssa> lovely

04:09 <icecream95> If I use 9d0ad7fd2e1 for Xorg and 051d62cf041 for glxgears then it gives BadAlloc errors from X

04:13 * icecream95 wonders if it's a good idea to set breakpoints on a running Xorg instance

04:14 <alyssa> no.

04:17 * icecream95 boots speedy to SSH in and kill Xorg

05:00 vstehle has joined #panfrost

05:27 WoC has quit [Remote host closed the connection]

05:28 WoC has joined #panfrost

05:29 <icecream95> It seems the problem is just that pan_image_layout_init is returning false because line_stride & 63 != 0

05:30 <icecream95> If I remove that if statement then everything seems to work fine

05:56 davidlt has quit [Remote host closed the connection]

05:56 davidlt has joined #panfrost

06:05 cowsay has joined #panfrost

06:07 cowsay_ has quit [Ping timeout: 265 seconds]

07:09 <bbrezillon> icecream95: hm, I remember having issues when the stride was not 64B aligned on Bifrost

07:09 <bbrezillon> but maybe that was not on linear buffers

07:13 <icecream95> bbrezillon: If there are issues for non-aligned strides (I didn't see any on G72) then panfrost_create_scanout_res would have to align the width to make the stride aligned

07:14 <bbrezillon> just to be sure, this problem happens when you import a buffer, right?

07:15 <icecream95> yes

07:17 <bbrezillon> then the align() done here is also problematic => https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/panfrost/lib/pan_texture.c#L633

07:17 <bbrezillon> and if (explicit_layout->line_stride < stride) might fail too

07:21 <bbrezillon> something like that => https://gitlab.freedesktop.org/-/snippets/1912

07:34 <icecream95> bbrezillon: That works, except you forgot to remove the old ALIGN_POT

08:30 camus has joined #panfrost

08:31 kaspter has quit [Ping timeout: 268 seconds]

08:31 camus is now known as kaspter

08:44 raster has joined #panfrost

08:59 warpme_ has joined #panfrost

09:13 stikonas has joined #panfrost

09:27 <wicast> hey guys, I finally able to run vkcube. But I'm not sure why it stays stationary and waits a fence that already finished.

09:28 <HdkR> vkcube with...which driver?

09:29 stikonas has quit [Remote host closed the connection]

09:29 stikonas has joined #panfrost

09:30 <wicast> panvk, in weston/wayland

09:31 <HdkR> Time to dive in to the source and figure out why it is sleeping on a fence. It isn't upstreamed for a reason you know :)

09:34 <bbrezillon> wicast: weird, I don't see that here

09:34 <bbrezillon> could it be a KMS fence you're blocked on?

09:35 <wicast> strace stay at ioctl(DRM_IOCTL_SYNCOBJ_WAIT...)

09:42 <wicast> I've tested with glmark2-es-wayland. Hardware should be good.

09:43 <wicast> oh, I get some error from dmesg

09:43 <wicast> [ 4323.980226] panfrost ffe40000.gpu: js fault, js=1, status=OUT_OF_MEMORY, head=0x40c3000, tail=0x40c3000

09:43 <wicast> [ 4323.984011] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x60, head=0x40c3000, tail=0x40c3000, sched_job=000000001fc9d5eb

09:43 <wicast> [ 4324.503657] panfrost ffe40000.gpu: gpu sched timeout, js=0, config=0x7300, status=0x8, head=0x40c33c0, tail=0x40c33c0, sched_job=00000000e9ba870f

09:47 <wicast> Same after I reboot(

09:47 <wicast> [ 3.982041] panfrost ffe40000.gpu: dev_pm_opp_set_regulators: no regulator (mali) found: -19

09:48 <wicast> Is this a known error? I have seen this somewhere else.

09:53 <bbrezillon> wicast: which kernel are you using?

09:53 <wicast> manjaro-arm

09:54 <wicast> 5.11.7-1

09:58 <bbrezillon> it should work just fine :-/

09:59 <bbrezillon> I mean, I don't see these faults here

10:01 <wicast> Just gussing, my sdcard reader is really unstable, could this cause some glitches to the kernel

10:02 <wicast> it can fail to boot, sometime.

10:05 <bbrezillon> nope

10:06 <bbrezillon> the kernel is loaded in RAM at boot time

10:19 <bbrezillon> wicast: this is a G52, right?

10:20 <wicast> yes G52

10:20 <wicast> vim3 4G

10:20 <wicast> but I can't boot any more after I tried to reinstall the kernel(

10:21 <wicast> filesystem crouped

10:22 <wicast> I need to buy an ssd first...

10:38 pendingchaos_ is now known as pendingchaos

11:22 <shadeslayer> alyssa: bbrezillon Plasma seems to want to disable the blur effect https://invent.kde.org/plasma/kwin/-/merge_requests/883 on panfrost, do you think you have cycles to spare to look into optimizing the effect?

11:23 <shadeslayer> alyssa: bbrezillon https://github.com/KDE/kwin/blob/master/src/effects/blur/blurshader.cpp is the shader

11:26 apol has joined #panfrost

11:46 <bbrezillon> shadeslayer: not immediately, but maybe you could open an issue in mesa (ideally with a trace we can replay)

11:46 <shadeslayer> apol: ^^

11:47 <shadeslayer> bbrezillon: trace might be tricky though, let's see

11:49 <macc24> i remember issue about blur stuff on kwin

11:49 stikonas has quit [Remote host closed the connection]

11:49 <macc24> #4687

11:51 stikonas has joined #panfrost

12:45 <raster> shadeslayer: realistically the blur needs to be isiolated into a test of its own

12:46 <raster> eg just an app that uses the same algorithm to blue some image

12:46 <raster> before i bother with that i'm going ot see if plasma works with the mali ddk and if they seem to be comparable in perf (ddk and panfrost)

12:47 stikonas has quit [Remote host closed the connection]

12:48 <raster> and dang task-kde wants to install 1.6g of stuff... poor emmc...

12:51 <tomeu> raster: will be interesting to know, thanks!

12:52 <raster> tomeu: i was just re-setting upo my dual ddk vs panfrost thing after it bitrotted a bit

12:52 <raster> evas in efl also has a blur filter that uses the gpu too - i havent actually profiled it but its far easier to cook up a demo of that :)

12:53 <raster> but if i see a big difference between ddk and panfrost it's worth making a dedicated test case

12:53 <raster> it's certainly an interesting shader use case thats real-life

12:56 <raster> ok... now.. how do i get this blur working?

12:57 <tomeu> I think qml is different from weston in that it uses fbos

12:57 urjaman has quit [Read error: Connection reset by peer]

12:57 urjaman has joined #panfrost

12:58 <raster> arg

12:58 <raster> this is a bug fest

12:58 <raster> cant even use menus properly

12:58 <tomeu> I kind of remember others at collabora working on getting plasma work well with etnaviv or something like that

12:58 <tomeu> daniels: do you remember any details on that?

12:58 <tomeu> there was also something about qt not supporting wl_dmabuf or so

12:58 <daniels> QML, not KWin/Plasma

12:59 <raster> settings crashes the desktop...

12:59 <raster> hmm

12:59 <daniels> but yeah, QML is pretty FBO-happy, copies all over the place

13:00 <tomeu> but doesn't plasma use qml?

13:00 <raster> but wouldnt you use fbo's anyway to do your initial downscale

13:00 <daniels> tomeu: probably

13:00 <daniels> yep

13:01 <raster> well i would use fbo's for the intermediate buffers - it's kind of a necessary for a blur :)

13:03 <raster> :(

13:03 <tomeu> maybe panfrost needs to learn some new tricks so that those copies can happen with less bw?

13:04 <raster> i cant manage to enable blur... opening setting crashes the desktop... compositor itself seems to stay alive - not sure how they structured it but looks like desktop is a wl client

13:05 <raster> uh oh... well i guess this isnt going to work... plasma doesnt start with ddk - just a black screen

13:05 <raster> weston and enlightenment are all happy with both ddk and panfrtost

13:05 <raster> :|

13:06 <raster> well let me quick and dirty use my blur alt-tab effect

13:40 camus has joined #panfrost

13:41 kaspter has quit [Ping timeout: 252 seconds]

13:41 camus is now known as kaspter

13:59 <daniels> raster: by the QML thing, I don't mean 'FBOs are a problem', I mean 'QML doesn't see through render chains and will insert totally unnecessary intermediate FBOs when you could just draw from A->B instead'

14:03 <alyssa> that sounds like a them problem

14:04 <raster> daniels: oh... ugh.

14:05 cphealy has quit [Remote host closed the connection]

14:09 <tomeu> alyssa: well, if we can learn about something that could improve panfrost...

14:24 <raster> fan-friggin-tastic

14:24 <raster> installing kde invovled a libwayland upgrade and this now has segv's inside libwayland-server ... yay

14:27 cphealy has joined #panfrost

14:29 Elpaulo has joined #panfrost

14:32 <alyssa> ...

14:41 <raster> wat?

14:41 <raster> ==143342== Jump to the invalid address stated on the next line

14:41 <raster> ==143342== at 0x0: ???

14:41 <raster> ==143342== Address 0x0 is not stack'd, malloc'd or (recently) free'd

14:42 <raster> inside wl_signal_emit()

14:42 <raster> #0 0x0000000000000000 in ()

14:42 <raster> #1 0x0000000000265024 in wl_signal_emit (data=0x43e37610, signal=0x43e37618) at /usr/include/wayland-server-core.h:478

14:42 <raster> #2 _e_comp_wl_buffer_cb_destroy (listener=0x43e37628, data=<optimized out>) at ../src/bin/e_comp_wl.c:999

14:42 <raster> ughh...

14:43 <raster> this is one of those days where to look art thing A i have to fix a chain of bugs over at B ...

14:43 <raster> B will need to fix C then D .. then eventually i can go back to A... hooray

14:53 kaspter has quit [Quit: kaspter]

15:05 <raster> this seems to be a change in libwayland where durting client destory i cant call signals registered...

15:07 <raster> or well the state is nulled out.. hmm

15:18 <alyssa> this is so broken

15:18 <alyssa> so many layers of wrong

15:28 <raster> what is broken?

15:28 <raster> because something weird is going on inside libwayland now...

15:29 <raster> here's the funtimes...

15:31 rcf has quit [Quit: WeeChat 3.2-dev]

15:31 <raster> https://termbin.com/246k

15:32 <raster> the exact same ptr to the same memory struct (wl_listener *)

15:32 <raster> in the parent frame notify is a valid ptr

15:32 <raster> in the child... it's not.

15:32 WoC has quit [Remote host closed the connection]

15:32 <raster> the list has only a single node...

15:36 <raster> yargh

15:41 <alyssa> woof woof

15:58 <daniels> raster: are you trying to remove a signal handler from a signal handler? because that’s guaranteed corruption

15:59 <daniels> you cannot change the list during a walk

16:26 <raster> daniels: actually just calling the signal handler that was already registered

16:28 <raster> so the signal handler is just registered in the buffer destroy_signal (a wl_signal)

16:28 <raster> that wl_signal is in our own data structs (that is our buffer wrapper which tracks a bunch of things)

16:29 <raster> something nulled out the notify

16:29 <raster> (notify cb ptr)

16:36 rcf has joined #panfrost

17:17 stikonas has joined #panfrost

17:22 <raster> oh now

17:22 <raster> heisenbug

17:23 <raster> i now compile libwayland with -O0 and i no longer have crashes.. it's all fine... wtf...

17:23 <raster> oh wqait

17:23 <raster> no - i'm not using my compile libwayland anytmore - using the system pkgs again which first started the crashes...

17:23 <raster> argh

17:24 <raster> i hate heisenbugs!

17:28 <raster> alyssa: bad news - in performance ticket :|

17:34 <alyssa> ruh roh

17:35 WoC has joined #panfrost

19:21 davidlt has quit [Ping timeout: 268 seconds]

19:47 macc24 has quit [Ping timeout: 250 seconds]

19:59 <daniels> raster: it sounds like your nested structure is invalidating assumptions

20:10 macc24 has joined #panfrost

20:31 * alyssa forgets how differential equatios work

20:34 WoC has quit [Remote host closed the connection]

20:34 WoC has joined #panfrost

21:29 stikonas has quit [Remote host closed the connection]

21:29 kherbst has quit [Ping timeout: 260 seconds]

21:29 stikonas has joined #panfrost

21:31 karolherbst has joined #panfrost

21:32 <cphealy> Does Panfrost support YUV render targets with Mali GPUs that support this?

21:32 warpme_ has quit [Quit: Connection closed for inactivity]

21:33 <raster> daniels: the callback is stored in a list of listeners attached to our datastruct. this is really weird... but now the bug wetn away magically after i rebooted...

21:33 <raster> ¯\_(ツ)_/¯

21:53 neonking has quit [Remote host closed the connection]