#panfrost on 2019-09-11 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:29 _whitelogger has joined #panfrost

00:30 <SolidHal> Cool, so assuming everything works, installing all of those packages should enable the use of panfrost?

00:31 <SolidHal> or rather grabbing that source package, building it, and installing it since it doesn't look like prebuilt packages are available

00:31 <SolidHal> *for everyting

00:34 <alyssa> Yeah, with a 5.2+ kernel with DRM_PANFROST=m loaded in

00:34 <alyssa> Mesa 19.2 or master (either way) built and installed

00:34 <alyssa> Can't think of anything else

00:34 <SolidHal> Awesome, thanks a ton.

00:34 <anarsoul> alyssa: dts

00:35 rcf has quit [Ping timeout: 258 seconds]

00:36 <alyssa> anarsoul: Ah, yeah, right

00:36 <alyssa> Hal is working on RK3288 which should be dts'd as is

00:36 <alyssa> bbrezillon: "in the vt1,frag1 + vt2,frag2 example, can vt1 and vt2 run in //"

00:36 <alyssa> Not sure what you mean by //?

00:37 <anarsoul> parallel

00:37 <alyssa> Ahhh

00:37 <alyssa> Yes, conceptually vt1/vt2 could run in parallel

00:37 <alyssa> In practice, AFAIK they can't because the hardware only has one job slot you can use for that (at least as implemented in DRM kernel)

00:38 <alyssa> See some of the emails with stepri0

00:38 <anarsoul> oh, alyssa, I have a question for you

00:38 <anarsoul> how do you recover from GPU hangs in userspace driver?

00:38 <alyssa> We don't since they almost never happen

00:38 <anarsoul> infinite loop?

00:38 <alyssa> Shouldn't happen

00:39 <anarsoul> you can have a shader with infinite loop in it

00:39 <alyssa> Not in the backend!

00:39 <alyssa> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1827

00:40 <anarsoul> you can't detect all the infinite loops

00:40 * alyssa hears Turing rolling in his grave

00:40 <alyssa> I suppose not... In practice this has not been an issue

00:40 <alyssa> With WebGL, conceivably it could be.

00:40 <alyssa> WebGL is usually the problem.

00:41 <alyssa> Joking aside, the hardware has a watchdog. If the job takes too long, it times out which is equivalent to a fault.

00:41 <alyssa> As long as you don't overwhelm it it usually recovers automatically since the hw/kernel is reliable

00:41 <anarsoul> right

00:41 <anarsoul> but what if vertex job failed

00:41 <alyssa> What about it?

00:42 <anarsoul> do you run fragment job in this case?

00:42 <alyssa> Dunno

00:50 rcf has joined #panfrost

01:04 vstehle has quit [Ping timeout: 246 seconds]

01:13 <TheCycoTWO> has anyone investigated the weston 7 segfault?

03:40 lvrp16 has quit [Read error: Connection reset by peer]

03:41 austriancoder has quit [Ping timeout: 276 seconds]

03:45 austriancoder has joined #panfrost

03:46 lvrp16 has joined #panfrost

04:32 belgin has joined #panfrost

05:00 vstehle has joined #panfrost

05:20 fysa has joined #panfrost

05:20 chewitt has quit [Quit: Adios!]

05:42 NeuroScr has joined #panfrost

06:06 davidlt has joined #panfrost

06:18 davidlt_ has joined #panfrost

06:18 belgin has quit [Quit: Leaving]

06:20 davidlt has quit [Ping timeout: 245 seconds]

06:32 davidlt__ has joined #panfrost

06:32 davidlt_ has quit [Read error: Connection reset by peer]

06:34 davidlt_ has joined #panfrost

06:36 davidlt__ has quit [Ping timeout: 240 seconds]

06:37 <bbrezillon> alyssa: ok, I think the part I was missing is that tiles are written back to DRAM and the fragment job has an fbd that describes where those tiles are (tiler_heap I guess)

06:43 <daniels> TheCycoTWO: which 'weston 7 segfault'? a fair few people here are running weston 7 and it works fine

06:48 <bbrezillon> anarsoul: just had a look at the scheduler code, and the fence is not signaled when a job fails. AFAIU, if the vertex job fails repetedly (the scheduler retries a few times), the fragment job will never be executed (actually, all jobs coming from this context/FD will be ignored after that)

07:21 warpme_ has joined #panfrost

07:26 adjtm has quit [Ping timeout: 246 seconds]

07:31 warpme_ has quit [Quit: warpme_]

07:34 warpme_ has joined #panfrost

07:36 <narmstrong> tomeu: I fixed the rootfs build, added the tags and switched to my kernel branch + fixed the arm64 config https://gitlab.freedesktop.org/narmstrong/mesa/-/jobs/592083

07:36 <narmstrong> tomeu: the job submission won't work, but I'll test locally

07:55 adjtm has joined #panfrost

07:56 warpme_ has quit [Quit: warpme_]

08:34 <narmstrong> tomeu: using variables is better for handling which board to test, otherwise the pipeline is set to `stuck` if we use tags https://gitlab.freedesktop.org/narmstrong/mesa/pipelines/62723

08:35 raster has joined #panfrost

08:35 <narmstrong> I managed to have artifacts, testing them then I'll setup the jobs submission

08:39 davidlt_ is now known as davidlt

08:40 <tomeu> narmstrong: oh, forgot to CC you on the CI changes I sent yesterday :/

08:50 <narmstrong> tomeu: no problem, I'm locally testing the artifact, and it works :-)

08:59 <narmstrong> tomeu: using surfaceless means it won't need a valid display anymore right ?

09:03 NeuroScr has quit [Quit: NeuroScr]

09:06 NeuroScr has joined #panfrost

09:12 <narmstrong> :: All test completed.

09:12 <narmstrong> ok: 11570, warn: 54, bad: 5771, skip: 76, total: 11767

09:12 <narmstrong> nice !

09:33 yann has joined #panfrost

09:41 <tomeu> narmstrong: yep

09:41 <tomeu> narmstrong: btw, I have started work on moving our CI to mesa's main CI scripts

09:42 <tomeu> so it's probably not worth it to make many changes to the current panfrost setup

09:53 <narmstrong> tomeu: ok, so in this case, tags is preferable to test only t820 on my runner

10:08 raster has quit [Remote host closed the connection]

10:18 <tomeu> narmstrong: maybe yeah, is the runner ready now?

10:19 <narmstrong> tomeu: nop, I need to finalize the lava part

10:27 <tomeu> ok

10:27 <tomeu> I need to check some regressions that have come from core mesa

10:28 <tomeu> narmstrong: btw, one more comment: for now we should probably only test one arch per board

10:28 <tomeu> once we think we have free capacity, we can change that

10:28 chewitt has joined #panfrost

10:33 raster has joined #panfrost

10:53 <TheCycoTWO> daniels on start after showing GL info. "segmentation fault (core dumped) weston peerset by peer'1' of size 16ev) 48own', model 'unknown', serial 'unknown'ent0.

10:54 <daniels> TheCycoTWO: i've not seen that and it generally works really well for us - one thing you can do is to redirect stdout and stderr to a file to capture that, as well as look at the core dump to see where the crash happened

10:56 <TheCycoTWO> Stack trace shows #0 0x0 n/a (n/a) #1 0x0000ffffab6e0668 n/a (gl-renderer.so) and on more n/a level above that.

10:56 <TheCycoTWO> Was trying to build weston myself with symbols but ran into complaints about egl.

10:58 <TheCycoTWO> That's archlinuxarm on kevin with latest master mesa (and several other mesa versions back to its release) linux 5.2

11:09 * TheCycoTWO posted a file: weston-start.log (9KB) < https://matrix.org/_matrix/media/v1/download/matrix.org/yysLfNBgeULeZRcRcvidkUST >

11:12 <TheCycoTWO> will the coredump be useful without symbols?

11:12 warpme_ has joined #panfrost

11:12 chewitt has quit [Quit: Adios!]

11:15 <TheCycoTWO> daniels ^?

11:16 <daniels> TheCycoTWO: unfortunately not really useful without symbols, no - getting a dump with symbols would be really useful

11:17 <TheCycoTWO> ok. First thing to do is figure out why it says "Runtime dependency egl found: NO" when I run meson then

11:24 raster has quit [Remote host closed the connection]

11:55 <alyssa> bbrezillon: Re tiles written to memory and fragment reading it back, correct. Think what you will of the efficiency..... but that's what the tiler_* structures are for.

12:22 raster has joined #panfrost

12:23 raster- has joined #panfrost

12:37 <TheCycoTWO> ah, mesa had stopped providing the pc file for egl when compiled with glvnd - um - back in july. I have no need for glvnd anyway.

12:38 <bbrezillon> alyssa: I have a version with the dep graph stuff that starts working (at least CI is happy with it :)) => https://gitlab.freedesktop.org/bbrezillon/mesa/tree/panfrost-job-rework

12:38 <bbrezillon> but I have a few questions

12:39 <bbrezillon> looks like I have a few BOs that I was expecting to be used only by the vertex+tiler jobs, but if I don't attach them to the fragment job it starts failing

12:41 <bbrezillon> alyssa: the SSBO buf for instance

13:00 <alyssa> bbrezillon: Ah! :)

13:00 <alyssa> Fragment shaders run during FRAGMENT, not during TILER

13:00 <alyssa> Ummm

13:00 <alyssa> VERTEX jobs = upload the vertex shader, run the vertex shader

13:01 <alyssa> TILER jobs = upload the fragment shader, run the *tiler*

13:01 <alyssa> FRAGMENT jobs = run the fragment shader, copy to framebuffer

13:01 <alyssa> So if a fragment shader needs a BO, it needs to be attached to the FRAGMENT job

13:01 <alyssa> But if a vertex shader needs it, to the VERTEX

13:02 <alyssa> And if tiling itself needs it (varyings, tiler structures, fragment shader binaries, etc), to the TILER

13:03 raster- has quit [Read error: Connection reset by peer]

13:11 NeuroScr has quit [Quit: NeuroScr]

13:11 <bbrezillon> alyssa: ok

13:14 <TheCycoTWO> ok - weston 7 is working for me now. (mesa built without glvnd) so I guess it's not panfrost's fault at any rate (probably my fault)

13:14 NeuroScr has joined #panfrost

13:18 <bbrezillon> alyssa: so, the FRAGMENT jobs knows where the fragment shader (and associated BOs) are

13:18 <bbrezillon> something defined in the fbd I guess

13:25 <tomeu> alyssa: o/

13:26 <tomeu> alyssa: do you know why this commit broke two tests in panfrost, but not on llvmpipe? https://gitlab.freedesktop.org/tomeu/mesa/commit/b6384e57f5f6e454c06ec1ada1c1138dd0dc84f2

13:26 <tomeu> 2019-09-11T11:18:09 dEQP-GLES2.functional.uniform_api.random.3 Fail REGRESSED from (NotListed)

13:26 <tomeu> 2019-09-11T11:18:09 dEQP-GLES2.functional.uniform_api.random.21 Fail REGRESSED from (NotListed)

13:30 <alyssa> bbrezillon: Yeah, somehow

13:30 <alyssa> I suspect it's somewhere in the tiler_heap but I never investigated

13:31 <TheCycoTWO> popolon: you were the other person that mentioned weston 7, and you are on almost an identical setup to me. Did you sort it out already, or see my libglvnd notes?

13:31 <alyssa> tomeu: THat is a good question :|

13:31 <alyssa> I mean

13:31 <alyssa> The obvious answer is that Panfrost ingests NIR but llvmpipe ingests LLVM IR :P

13:31 <alyssa> (and that commit is in the glsl_to_nir pass)

13:32 <tomeu> guess that iris isn't broken by it either, but <i haven't seen the CI run for it

13:32 <alyssa> Mm, the uniform/varying/attribute counting code is... fragile... so I'm not surprised it broke on some edge cases

13:32 <alyssa> Could you grab a testlog.xml so I can see what the shaders are?

13:33 <tomeu> hmm, let me check

13:33 * tomeu is at plumbers now

13:34 raster has quit [Read error: Connection reset by peer]

13:35 <alyssa> How's the plumbing going?

13:36 <tomeu> there's no plumbing that I can see, only plumbing professionals going around and eating and drinking

13:39 <alyssa> No plumbing?

13:39 <alyssa> What if you need to use the washroom? :/

13:41 raster has joined #panfrost

13:56 raster has quit [Read error: Connection reset by peer]

14:26 raster has joined #panfrost

14:30 <tomeu> hmm, maybe there's secret plumber somewhere...

14:44 <bbrezillon> alyssa, daniels, tomeu: looks like -bideas prefers the serialization => http://code.bulix.org/lje2vf-865832

14:44 <narmstrong> robmur01: thanks ! testing them right away

14:45 <daniels> bbrezillon: *blink*

14:54 urjaman has joined #panfrost

14:55 urjaman has quit [Quit: WeeChat 2.5]

14:55 urjaman has joined #panfrost

15:02 davidlt has quit [Ping timeout: 240 seconds]

15:07 warpme_ has quit [Quit: warpme_]

15:09 <tomeu> bbrezillon: any ideas why it would?

15:09 <alyssa> bbrezillon: What's the differnece between v1 and v2 of serialization here?

15:09 <alyssa> Also, it is possible the parallel submission code has a subtle bug that could manifest as suboptimal performance (even if it's still correct)

15:09 <tomeu> narmstrong: it helps to have arm people around :p

15:18 <bbrezillon> alyssa: v1 is the version I sent last week

15:18 <bbrezillon> v2 is the one using a dep graph to delay even more the batch submission

15:19 <bbrezillon> and I'd expect v2 to be faster than v1 once Steven patch has landed

15:20 <bbrezillon> tomeu: nope

15:23 adjtm has quit [Ping timeout: 258 seconds]

15:23 <bbrezillon> well, delaying the submission also means the GPU starts executing jobs later, and I suspect it makes a difference for gpu bound workloads

15:24 <bbrezillon> or it could be a bug, as pointed by alyssa :)

15:32 <tomeu> yeah, I would bet on that

15:33 <bbrezillon> hm, I applied Steven patch and it doesn't change the results

15:33 <tomeu> we need to improve tools to see how CPU and GPU load affect throughput

15:50 davidlt has joined #panfrost

15:51 raster has quit [Remote host closed the connection]

15:55 adjtm has joined #panfrost

16:16 yann has quit [Ping timeout: 276 seconds]

16:18 jolan has quit [Quit: leaving]

16:20 jolan has joined #panfrost

16:42 <tomeu> alyssa: does this help? https://lava.collabora.co.uk/scheduler/job/1818032#bottom

17:16 NeuroScr has quit [Quit: NeuroScr]

17:18 NeuroScr has joined #panfrost

17:24 <alyssa> Agh, GLSL IR, my eyes! :p

17:25 <alyssa> I just meant the GLSL itself (and also the NIR)

17:26 <alyssa> bbrezillon: Hmm, I'm trying to think why the dep graph could possibly be slower

17:28 <alyssa> "delaying the submission also means the GPU starts executing jobs later, and I suspect it makes a difference for gpu bound workloads"

17:28 <alyssa> This is a very real possibly and something I thought about; I think I had concluded to myself it should be neglible but let's see..

17:29 <alyssa> The drop between v1/v2 on the simple scenes is probably just the extra overhead

17:29 <alyssa> Since there's no dependent FBOs or anything, so there's nothing to delay really

17:30 <alyssa> (We should try to keep down the overhead but again, neglible)

17:31 <alyssa> v2 is a big win on -bdesktop:effect=shadow

17:32 <alyssa> If I had to guess it maybe is eliminating wallpapers and things

17:32 <alyssa> I'm trying to figure out why -bideas is hurt with both v2 and v1..

17:34 * alyssa is suspecting some sort of bug introducing waits in places they shouldn't be but she's not sure

17:45 yann has joined #panfrost

18:46 <bbrezillon> alyssa: look like calling bo_wait() (even with a 0 timeout or when the BO is ready) has a non negligible impatch

18:46 <bbrezillon> impact

18:47 <bbrezillon> userspace -> kernelspace context switch hurt

18:47 <bbrezillon> *hurts

18:48 davidlt has quit [Ping timeout: 244 seconds]

18:54 <alyssa> bbrezillon: Eek, yeah, that sounds like it could defn contribute..

18:54 <alyssa> Wonder if that can be mitigated in user somehow...?

18:54 <anarsoul> why do you call bo_wait()?

18:59 <bbrezillon> alyssa: it does not explain the 100 fps diff

19:00 <bbrezillon> but accounts for at least 20 AFAICT

19:00 <bbrezillon> we can mark BOs ready, but they need to be tested once at least

19:06 <alyssa> bbrezillon: I'm questioning if we made a bad ABI choice that we're stuck with now :-(

19:06 * alyssa wishes she understood fences/syncs when this stuff was first being done

19:08 <alyssa> Wait

19:08 <alyssa> bbrezillon: Why can't we use syncs?

19:08 <alyssa> When a BO is in flight, track "BO<->sync"

19:09 <alyssa> -----------Oh!

19:09 <alyssa> I am in Class but you know what this seems like fun, sec

19:10 <alyssa> Yeah, can't we specify the dependency graph in terms of syncs and eliminate the waits entirely?

19:11 <alyssa> You would still need wait_bo for shared BOs, I guess.... but the majority of BOs we deal with are not shared so it doesn't -terribly- matter

19:13 <alyssa> If there's really a reason that can't work, I mean, we can patch the kernel to do waits on `bo_handles` automatically, but ideally we make it work without any ABI changes so everything continues to work against 5.2

19:14 <anarsoul> bbrezillon: just check first BO, if it's ready - grab it, if it's not - others aren't ready either, so allocate new BO. Make sure that you add BO to list tail when you free them

19:15 <anarsoul> that'd be 1-2 syscall per BO allocation and that's not bad

19:18 * alyssa reads up on how sync objects work

19:19 <bbrezillon> alyssa: that's what I do

19:20 <bbrezillon> to track job lifetime

19:20 <bbrezillon> but we still need to wait for something when testing BO readiness

19:21 <bbrezillon> and calling bo_wait() is the same as calling wait_syncobj

19:21 <bbrezillon> anarsoul: exactly what we do already

19:22 <bbrezillon> we have extra BO waits in transfer_map()

19:22 <bbrezillon> but that's all

19:24 <anarsoul> it doesn't look like it's possible to optimize it further

19:25 <bbrezillon> the big difference is that, by serializing jobs (and waiting on their out_sync fences) we were having a single sync point, instead of one per BO-cache alloc + one per transfer_map

19:25 <anarsoul> don't serialize jobs?

19:25 <bbrezillon> that's what I'm trying to do

19:26 <bbrezillon> and it regresses the -bideas benchmaek

19:26 <bbrezillon> *benchmark

19:27 <bbrezillon> it's definitely not the only problem, but still accounts for almost 20fps on 100

19:28 <bbrezillon> anarsoul: http://code.bulix.org/lje2vf-865832

19:29 <anarsoul> interesting

19:29 <bbrezillon> what I noticed is that, in that test, batching is not possible (only ever have one job to submit)

19:29 <bbrezillon> and when that happens, it seems we have a huge perf drop

19:29 <anarsoul> profile it?

19:30 <anarsoul> bbrezillon: also check if you're actually getting BO from cache or you're allocating it

19:31 <bbrezillon> I'm definitely not an expert in profiling, but I tried callgrind, and didn't spot anything obvious

19:31 <anarsoul> try perf and flamegraph

19:31 <bbrezillon> good point, I'll check if the cache does it job

19:31 <bbrezillon> *its

19:31 * anarsoul loves flamegraphs

19:32 <bbrezillon> thanks for the tip, I'll try that

19:32 <anarsoul> https://github.com/brendangregg/FlameGraph

19:40 <bbrezillon> anarsoul: I have a pretty decent number of cache hit

19:40 <anarsoul> then profile it

19:42 <bbrezillon> yep, will do that tomorrow

19:43 <bbrezillon> anarsoul, alyssa: thanks for your help

19:43 stikonas has joined #panfrost

19:44 NeuroScr has quit [Quit: NeuroScr]

19:47 NeuroScr has joined #panfrost

19:48 <alyssa> I swear by the gnome profiling thing

20:58 <HdkR> This a wrapper around perf?

21:08 <alyssa> Dunno

21:35 <bbrezillon> anarsoul: flamegraph is awesome!

21:36 <anarsoul> yeah, told you :)

21:36 <HdkR> I use flamegraphs all the time

21:36 <bbrezillon> alyssa: I think I found the problem

21:36 <HdkR> flamegraph with xray and perf works great

21:36 <anarsoul> who's the culprit?

21:37 <bbrezillon> it's a mix of "panfrost is currently a bit lax regarding resource sync on transfer_map" (which my patch is addressing)

21:38 <bbrezillon> and "the kernel driver is a bit to restrictive when it comes to reserving BOs that we only read from"

21:39 <anarsoul> what side of 'we' you're referring to?

21:39 <bbrezillon> I think we'll have to extend the submit ioctl

21:39 <anarsoul> GPU? CPU?

21:39 <bbrezillon> GPU

21:41 <anarsoul> you don't have per-bo flags in submit ioctl yet?

21:41 <anarsoul> :qa

21:41 <anarsoul> oops

21:41 <bbrezillon> no, we son't

21:41 <anarsoul> wrong window :)

21:41 <bbrezillon> don't

21:42 <bbrezillon> hm, let me check

21:43 <bbrezillon> no, we definitely don't have that

22:54 enunes has quit [Quit: ZNC 1.7.2 - https://znc.in]

22:55 enunes has joined #panfrost

22:55 cwabbott has quit [Remote host closed the connection]

22:55 griffinp has quit [Quit: ZNC - http://znc.in]

22:55 cwabbott has joined #panfrost

23:00 stikonas has quit [Remote host closed the connection]

23:17 <alyssa> bbrezillon: Hm, not sure I follow. Could you elaborate?