#panfrost on 2019-09-12 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

01:05 vstehle has quit [Ping timeout: 276 seconds]

03:42 NeuroScr has quit [Quit: NeuroScr]

03:43 NeuroScr has joined #panfrost

04:20 <bbrezillon> alyssa: right now we add an exclusive fence to all BOs passed to the submit ioctl, which means jobs reading the same BO can't run concurrently, but that also means userspace has to wait for all jobs reading this BO to finish (because of the wait_bo() call in transfer_map())

04:22 <bbrezillon> if we add the concept of access type (rw or readonly), we can avoid useless waits in such situations

04:29 _whitelogger has joined #panfrost

04:47 _whitelogger has joined #panfrost

05:00 vstehle has joined #panfrost

05:37 fysa has quit []

05:37 fysa has joined #panfrost

05:42 fysa has quit [Read error: Connection reset by peer]

06:40 mani_s has quit [Read error: Connection reset by peer]

06:46 mani_s has joined #panfrost

07:05 warpme_ has joined #panfrost

07:14 guillaume_g has joined #panfrost

07:17 warpme_ has quit [Quit: warpme_]

07:34 yann has quit [Ping timeout: 246 seconds]

08:09 NeuroScr has quit [Quit: NeuroScr]

08:31 davidlt has joined #panfrost

08:34 yann|work has joined #panfrost

08:35 davidlt has quit [Ping timeout: 245 seconds]

09:06 warpme_ has joined #panfrost

11:08 <alyssa> Right, okay

11:13 <eballetbo[m]> alyssa, bbrezillon: I saw some patches around panfrost on linux-next for next merge window, there is any possibility that one of them fixes my issue with the artefacts on my external display?

11:15 <bbrezillon> eballetbo[m]: I can't tell, I guess it's worth a try :)

11:19 * eballetbo[m] was trying to avoid build linux-next after few days of holidays :-P Yeah, worth give a try

11:40 guillaume_g has quit [Quit: Konversation terminated!]

11:57 guillaume_g has joined #panfrost

12:02 <narmstrong> tomeu: succesfully submitted job https://gitlab.freedesktop.org/narmstrong/mesa/-/jobs/598263

12:03 <narmstrong> tomeu: but I needed to tweak the job definition, we use u-boot boot method

12:05 <narmstrong> but now I have a weird `[ 16.014376] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038` in drm_sched_increase_karma :-/

12:21 megi has joined #panfrost

12:41 <tomeu> narmstrong: guess it could be a regression in gpu-sched

12:42 <tomeu> tomorrow I need to do some mesa and kernel debugging I guess

12:42 <tomeu> one goes away a few days and everything starts breaking

13:05 <alyssa> Relatable

13:05 <alyssa> TBF everything is breaking when I'm not away too

13:15 <mmind00> tomeu: easy solution, just don't go away :-P

13:15 adjtm has quit [Ping timeout: 276 seconds]

13:20 <HdkR> Or the sad alternative, be the only person working on a project

13:21 <alyssa> HdkR: Doesn't help when regressions come from common code

13:23 <HdkR> Just write all the code in a vacuum :P

13:31 <bbrezillon> alyssa: ok, I think I filled the gap in -bideas

13:32 <alyssa> Yeah?

13:34 <bbrezillon> there are these wait bo changes I was talking about, plus I now cache the BO status (keep track job accesses so I don't need to call the ioctl() when no job touching this BO is in flight)

13:34 <bbrezillon> and I made the fence gc less agressive

13:35 <alyssa> So kernel update then?

13:36 <bbrezillon> yes

13:36 <alyssa> Ah

13:36 <bbrezillon> but it still works without the kernel changes

13:36 <bbrezillon> (haven't checked the perf penalty in that case though)

13:37 <alyssa> Ok

13:37 <alyssa> I still am not sure why there's a perf penalty without the changes

13:37 <alyssa> The kernel changes should allow even better perf, but why does perf regress between serialized jobs and pipelining/old kernel?

13:39 <bbrezillon> alyssa: because we were lacking flushes

13:43 <bbrezillon> new version => https://gitlab.freedesktop.org/bbrezillon/mesa/blob/panfrost-job-rework/src/gallium/drivers/panfrost/pan_resource.c#L598

13:43 <bbrezillon> current version => https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/panfrost/pan_resource.c#L622

13:43 <bbrezillon> there's just one flush here => https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/panfrost/pan_resource.c#L605

13:43 <bbrezillon> and only if the resource being read is the FBO

13:44 <alyssa> Hm

13:44 <alyssa> So how did -bideas work at all before?

13:44 <alyssa> Conceptually, either:

13:44 <alyssa> - We were not flushing when we should have been:

13:45 <alyssa> - Either this is incorrect, so -bideas shouldn't've worked.

13:45 <bbrezillon> the one that was generating all those bo_wait was pipe_draw_info.index.resource

13:45 <alyssa> - Or this is correct, so the flush is unnecessary..?

13:45 <alyssa> Er uh

13:45 <alyssa> My point is, it sounds like GPU utilization is dropping on -bideas (on current kernel), and it's not obvious that's functionally necessary

13:45 <alyssa> The index buffer, hmm

13:46 <alyssa> But.... the index buffer isn't written from any batches, no?

13:46 <bbrezillon> and now we keep track/flush of all kind of BOs not just the FBO ones

13:47 <alyssa> Sure, but if the problem BO is "pipe_draw_info.index.resource" and the problem flush is "flushing batchess that write it"

13:47 <alyssa> What batch is writing to the index buffer?

13:47 <bbrezillon> ok, just ran the test again (without the kernel changes), and there's no perf regression

13:48 <alyssa> https://gitlab.freedesktop.org/bbrezillon/mesa/blob/panfrost-job-rework/src/gallium/drivers/panfrost/pan_context.c#L1454 <--- index buffer is correctly mapped from read only

13:49 <alyssa> *for

13:49 <alyssa> And for GLES2, afaik the index buffer is written from the CPU onl

13:49 <alyssa> So what's up?

13:49 <alyssa> Also tangent but https://gitlab.freedesktop.org/bbrezillon/mesa/blob/panfrost-job-rework/src/gallium/drivers/panfrost/pan_context.c#L814 missing an add_bo (for GLES3 UBOs)

13:50 <bbrezillon> caching the BO state helps in that case => https://gitlab.freedesktop.org/bbrezillon/mesa/blob/panfrost-job-rework/src/gallium/drivers/panfrost/pan_bo.c#L118

13:52 <bbrezillon> since the BO is never written by the GPU

13:52 <alyssa> Okay..

13:52 <bbrezillon> but removing the exclusive fences when we can is still a good thing

13:52 <alyssa> bbrezillon: Let me make sure i understand -- the perf gap in -bideas was solely due to the overhead of the extra ioctls() which turn out to be no ops, so the solution is to skip the ioctls()

13:53 <alyssa> *not* due to the kernel *actually* waiting on BOs that weren't ready?

13:53 <bbrezillon> alyssa: not only

13:53 <alyssa> (Hence no kernel change is needed to fix the regression?)

13:53 <bbrezillon> right, that's what I was testing

13:53 <bbrezillon> like a minute ago

13:53 <alyssa> Yeah, just trying to make sure I follow

13:54 adjtm has joined #panfrost

13:55 <alyssa> https://gitlab.freedesktop.org/bbrezillon/mesa/blob/panfrost-job-rework/src/gallium/drivers/panfrost/pan_context.c#L832 ummm s/32/sizeof(mask)/ ... gosh alyssa

13:55 <bbrezillon> the other part of the problem was coming from excessive calls to panfrost_gc_fence()

13:57 <bbrezillon> alyssa: you mean 'sizeof(mask) * 8' ?

13:59 <bbrezillon> alyssa: and IIRC, unsigned (int) is still 32 bits on a 64-bit arch

14:02 <bbrezillon> ok, time for cleanup now

14:37 <alyssa> bbrezillon: Uh, yeah, sizeof(mask)*8

14:38 <alyssa> It is, but I'm still fairly sure this is UB at least on exotic archs so should probably be avoided for that reason if nothing else.

14:39 <HdkR> Can't wait for those exotic architectures rocking Mali :)

14:39 <alyssa> HdkR: ...or exotic compilers? :)

14:40 <alyssa> These are the kinds of assumptions that can break e.g. obscure BSDs, etc

14:40 <HdkR> haha sure

14:43 * HdkR hides al the UB under the carpet

14:45 <alyssa> :d

14:46 <HdkR> Don't step on the carpet, it may fall through the floor

15:14 adjtm has quit [Remote host closed the connection]

15:15 adjtm has joined #panfrost

15:16 adjtm has quit [Remote host closed the connection]

15:22 adjtm has joined #panfrost

16:09 yann|work has quit [Ping timeout: 245 seconds]

16:25 forkbomb has quit [Ping timeout: 264 seconds]

16:25 forkbomb has joined #panfrost

16:36 raster has joined #panfrost

17:05 yann|work has joined #panfrost

17:08 belgin has joined #panfrost

18:05 belgin has quit [Read error: Connection reset by peer]

18:25 adjtm has quit [Ping timeout: 276 seconds]

18:53 adjtm has joined #panfrost

18:53 guillaume_g has quit [Quit: Konversation terminated!]

19:05 raster has quit [Remote host closed the connection]

19:19 stikonas has joined #panfrost

20:36 warpme_ has quit [Quit: warpme_]

20:58 Prf_Jakob has quit [Ping timeout: 245 seconds]

21:00 Prf_Jakob has joined #panfrost

21:24 NeuroScr has joined #panfrost

22:49 stikonas has quit [Ping timeout: 276 seconds]

23:13 empty_string has joined #panfrost