#panfrost on 2019-08-16 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:43 davidlt has quit [Ping timeout: 244 seconds]

01:05 vstehle has quit [Ping timeout: 245 seconds]

01:41 davidlt has joined #panfrost

01:57 davidlt has quit [Ping timeout: 268 seconds]

01:58 MistahDarcy has joined #panfrost

02:17 MistahDarcy has quit [Remote host closed the connection]

02:25 davidlt has joined #panfrost

02:29 megi has quit [Ping timeout: 272 seconds]

03:14 davidlt has quit [Remote host closed the connection]

03:17 davidlt has joined #panfrost

03:19 davidlt has quit [Remote host closed the connection]

03:19 davidlt has joined #panfrost

03:24 davidlt_ has joined #panfrost

03:25 davidlt has quit [Ping timeout: 258 seconds]

03:31 davidlt_ has quit [Ping timeout: 248 seconds]

03:33 davidlt has joined #panfrost

03:45 davidlt has quit [Ping timeout: 245 seconds]

03:52 davidlt has joined #panfrost

04:00 davidlt has quit [Ping timeout: 272 seconds]

04:11 davidlt has joined #panfrost

04:18 davidlt has quit [Ping timeout: 246 seconds]

04:23 davidlt has joined #panfrost

04:28 davidlt has quit [Ping timeout: 258 seconds]

04:43 davidlt has joined #panfrost

04:49 davidlt has quit [Ping timeout: 268 seconds]

04:51 sravn has quit [Quit: WeeChat 2.4]

04:55 davidlt has joined #panfrost

04:59 NeuroScr has quit [Quit: NeuroScr]

05:00 vstehle has joined #panfrost

05:08 robert_ancell has quit [Ping timeout: 272 seconds]

05:22 davidlt has quit [Ping timeout: 268 seconds]

05:52 milloni has quit [Quit: No Ping reply in 210 seconds.]

05:52 <tomeu> alyssa: oh, thanks for mentioning that, I guess that's what is going on

05:52 milloni has joined #panfrost

05:57 NeuroScr has joined #panfrost

06:41 <tomeu> alyssa, Prf_Jakob: actually, those failures seem to happen when using --deqp-gl-config-name=rgba8888d24s8ms0

06:41 <tomeu> otherwise, we get NotSupported

06:51 <tomeu> Prf_Jakob: what do you think about adding an arg to set the config?

07:11 pH5 has joined #panfrost

08:40 <tomeu> Prf_Jakob: with it, I think I could be close to stop nagging you :)

08:42 <tomeu> the time spent actually running the deqp tests have been reduced to 2 and a half minutes

08:42 <tomeu> so I think we can increase test coverage quitea bit without impacting much total time

09:10 megi has joined #panfrost

09:20 <tomeu> robher: the shrinker seems to be working very well now

09:20 <tomeu> I still got the deplock warning though

09:24 <tomeu> but I can smoothly run glmark2 almost to the end on gnome-shell when booting with 512MB

09:24 <tomeu> then the OOM killer steps in, but that was towards the end

09:24 <tomeu> will review next

09:26 davidlt has joined #panfrost

09:34 NeuroScr has quit [Quit: NeuroScr]

09:55 <daniels> tomeu: how long was it previously?

09:57 <tomeu> daniels: around 8 mins I think

09:58 <tomeu> most of the time is now spent for gitlab runners to be assigned and for the lava device to boot up

10:08 raster has joined #panfrost

10:12 <daniels> woosh, that's awesome

10:13 unoccupied has quit [Ping timeout: 246 seconds]

10:13 <tomeu> yeah, we can now increase coverage without seriously impacting total times

10:14 <daniels> when you say waiting for runners to be assigned - which runners?

10:14 <daniels> arm64, x86-64, idle?

10:16 <tomeu> x86-64 and idle

10:16 <daniels> idle should be assigned instantly

10:17 <daniels> you're the only one using idle, so I have no idea why it doesn't immediately take it

10:17 <tomeu> yeah, sometimes it takes several minutes

10:18 <tomeu> but in general, I think it just takes a while to setup the env in the assigned runner, etc

10:18 <daniels> not several minutes it doesn't

10:18 <tomeu> I would merge everything in a single stage if I could :)

10:18 <tomeu> it definitely has in the past

10:19 <daniels> whois agx

10:19 <daniels> *slaps forehead*

10:19 <daniels> tomeu: weird, i have no idea what's going on in that case, sorry

10:20 <tomeu> np

10:21 <tomeu> maybe we could come up with some scripts to submit jobs to lava without going through gitlab-ci, for the remote testing use case

10:21 <tomeu> then gitlab-ci not being as fast as it could isn't such a problem

10:24 <tomeu> alyssa: just tested Steve's patch for _NEXT with glmark2 and the perf governor, and I see a drop in performance

10:24 <tomeu> do you have any ideas on why that could be?

10:24 <tomeu> I'm guessing mesa isn't submitting jobs to the kernel as fast as it could

10:25 <daniels> the only thing I can think of is that you end up with suboptimal job distribution between the different job slots?

10:25 <daniels> like, you end up with JS0 idle whilst JS1 has one job queued for now and another in _NEXT, when they could have been running in parallel

10:25 <daniels> tomeu: gitlab-ci is fixable, and it should usually pick up jobs in _seconds_

10:26 <daniels> so i have no idea why idle is being so rubbish

10:26 stepri01 has joined #panfrost

10:26 <tomeu> I think in general it's being fast

10:26 unoccupied has joined #panfrost

10:26 <tomeu> what takes longer always is checking out the code, setting up the container, etc

10:27 <tomeu> ah, another source of delays is that we have one job per arch per stage

10:27 <tomeu> and all jobs need to have finished in one stage to get to the next one

10:28 <tomeu> so any delay in a particular job delays as well the others

10:29 <stepri01> daniels: We have a scheduler per JS so there shouldn't be a situation of JS1 blocking JS0 like that

10:29 <stepri01> But clearly if you submit work faster to the same job slot you could end up slowing down work on the other slot

10:30 <stepri01> So it may be that the order of the work is less optimal with the _NEXT registers

10:32 <daniels> tomeu: yeah, that's pretty unavoidable I'm afraid - what's the slowest one? aarch64 or armhf?

10:32 <daniels> if it's armhf we could look at whether standing up a native runner would be any quicker but tbh I'm not convinced it would be

10:33 <daniels> stepri01: you'd definitely know better than me, I'm just spitballing :)

10:33 <stepri01> kbase used to be quite good at scheduling vertex work multiple frames in advance and not quite getting round to doing the fragment work :)

10:34 <stepri01> it creates an interesting stuttering effect

10:36 <daniels> heh

11:26 davidlt_ has joined #panfrost

11:26 davidlt has quit [Ping timeout: 246 seconds]

11:43 adjtm has quit [Ping timeout: 268 seconds]

11:45 <tomeu> daniels: sometimes one is slowe, sometimes the other

11:45 <tomeu> I think it's mostly which one has slower network at that moment

11:45 <tomeu> or maybe i/o load

11:46 <daniels> if you can tie that down to a particular runner (or group of runners) being generally slower than the others, that would be interesting to find out

11:48 <tomeu> ack, will keep an eye on it

11:57 <robher> tomeu: How? I pushed 2 fixes to my kernel.org tree, but am in lockdep hell now. Basically, we can't take any lock the ever allocs memory in the shrinker call.

11:57 <tomeu> robher: how what?

11:57 <robher> tomeu: devfreq->lock is one.

11:58 <robher> Runtime pm in panfrost_mmu_unmap.

11:59 * robher about to loose internet...

12:00 <tomeu> robher: I mean, I don't know what you are asking about when you said "How?"

12:00 <tomeu> is it how I reproduce the lockdep warning?

12:01 <robher> How is it fine?

12:03 davidlt_ has quit [Ping timeout: 245 seconds]

12:03 <tomeu> robher: ah, the shrinker just seems to work as expected

12:04 <tomeu> as long as memory pressure kicks in, memory gets released and we get to do more stuff than otherwise

12:04 <tomeu> that's what is fine :)

12:04 <tomeu> robher: but I got this other spinlock warning: http://paste.debian.net/1095964/

12:04 <tomeu> guess we slept somewhere within panfrost_gem_open ?

12:07 afaerber has joined #panfrost

12:09 <tomeu> robher: are you going to push yourself your madvise patches to mesa?

12:09 <tomeu> robher: any hints in kbase's shrinker code?

12:10 davidlt has joined #panfrost

12:15 <tomeu> robher: maybe something similar to the patch in https://lkml.org/lkml/2014/5/29/673 could help?

12:19 <robher> tomeu: devfreq takes a lock on registration. Seems unnecessary, but if I remove it, just get lockdep warning on another lock. What we need is to lock the runtime pm state, and skip hw access if suspended. No point in waking up.

12:20 <tomeu> sounds good!

12:50 <robher> tomeu: yes, but I haven't found a way to prevent waking, only keeping awake.

12:51 <robher> Yet another lock I guess...

12:57 davidlt has quit [Remote host closed the connection]

12:57 davidlt has joined #panfrost

13:19 adjtm has joined #panfrost

13:20 davidlt_ has joined #panfrost

13:24 davidlt has quit [Ping timeout: 272 seconds]

13:27 davidlt_ has quit [Ping timeout: 258 seconds]

13:40 raster has quit [Read error: Connection reset by peer]

13:40 raster has joined #panfrost

14:32 mupuf has joined #panfrost

14:32 <mupuf> I just introduced in fd.o's bugzilla a "not set" value for the priority and severity. Would anyone mind if I made it the default for new bugs?

14:33 <alyssa> mupuf: #dri-devel?

14:33 <mupuf> alyssa: already asked there :)

14:33 <alyssa> Ah

14:33 <alyssa> Well, I don't even know if we use the fd.o bugzilla to presumably nobody here would mind :)

14:34 <mupuf> I asked in #dri-devel, #freedesktop, #freedreno, #intel-3d, #intel-gfx, #nouveau, #panfrost, #xorg-devel :D

14:35 <mupuf> alyssa: https://bugs.freedesktop.org/buglist.cgi?product=Mesa&component=Drivers%2FGallium%2FPanfrost&resolution=---&list_id=678429

14:37 <mupuf> Well, weekend time, have a good one guys!

14:37 <daniels> o/

14:40 <tomeu> Prf_Jakob: no need to add a way to override the config I think

14:40 <tomeu> I think it's better to run with that config, as everybody else

14:41 <tomeu> so I just updated the expected failures

14:41 <tomeu> Prf_Jakob: if we had an armhf build, I think we could enable it in master :)

14:43 <tomeu> Prf_Jakob: btw, I guess it would be better if we were able to build deqp itself as in virgl ci

14:43 <tomeu> guess that should just work?

14:43 <tomeu> of course, ideally we would install it from debian :)

15:22 megi has quit [Ping timeout: 246 seconds]

15:27 <Prf_Jakob> tomeu: With armhf probably, the problem with AArch64 was a incomplete port of a dependancy of Volt.

15:34 <tomeu> Prf_Jakob: wonder what happened here:

15:34 <tomeu> 2019-08-16T14:47:13 Unknown argument '--print-failing=false'

15:35 <tomeu> 2019-08-16T14:47:13 Print the failing tests

15:35 <tomeu> 2019-08-16T14:47:13 Argument: --print-failing

15:35 <tomeu> 2019-08-16T14:47:13 Config: printFailing=[true|false]

15:46 <Prf_Jakob> tomeu: The argument is just a flag with no argument, the config is the one that support true or false.

15:46 <Prf_Jakob> tomeu: Just remove the flag from the arguments and it will be the same as false.

16:25 pH5 has quit [Quit: bye]

16:43 megi has joined #panfrost

16:50 raster has quit [Read error: Connection reset by peer]

17:25 davidlt has joined #panfrost

17:28 fysa has joined #panfrost

17:37 fysa has quit [Ping timeout: 272 seconds]

17:50 <alyssa> I'm embarking on a new (Panfrost) project

17:51 <alyssa> Automated command stream static analysis.

17:51 <alyssa> Right now, our tracing tools are essentially pretty-printers for GPU memory

17:51 <alyssa> That's faithful to the original representation -- good for computers -- but is extremely verbose

17:52 <alyssa> 500 lines per draw, huge numbers of draws per frame, huge numbers of frames ===> traces are often many megabytes big and when debugging you're looking for a needle in a haystack.

17:52 <alyssa> I'd like to add some analysis passes into the tracer so it reports *intention* rather than the raw bits

17:53 <alyssa> In particular, it goes through the cmdstream, and determines semantically what you tried to do

17:53 <alyssa> If it matches our understanding of the hardware, we print that semantic.

17:53 <alyssa> If it does not, we print a huge XXXXXXXXX comment and dump the original bit-level representation

17:54 <alyssa> (So if we trace the blob and see that coment, we know our understanding is wrong so we fix it. If we trace Panfrost and get that, we know we have a driver bug.)

18:21 <shadeslayer> alyssa: nice

18:21 <shadeslayer> sounds pretty useful

18:21 <alyssa> shadeslayer: You interested? :)

18:30 <alyssa> hereby dubbed "anti-Gallium" since that name sounds awesome

18:30 stikonas has joined #panfrost

18:33 fysa has joined #panfrost

18:41 fysa has quit [Ping timeout: 268 seconds]

18:49 adjtm has quit [Ping timeout: 248 seconds]

19:19 * alyssa thinks this holds promise.

19:20 <endrift> just don't call it Mercury

19:20 <alyssa> What I can't decide if I want to use literal Gallium structs or roll our own thing.

19:21 <alyssa> context functions don't make sense but maybe the structs in p_state.h and the enums might be useful.

19:21 * alyssa thinks

19:21 <alyssa> Yeah, I think that might be good.

19:27 <alyssa> Okay, yes, this is interesting

19:32 davidlt has quit [Remote host closed the connection]

19:38 fysa has joined #panfrost

19:46 fysa has quit [Ping timeout: 245 seconds]

19:53 unoccupied has quit [Ping timeout: 244 seconds]

20:17 unoccupied has joined #panfrost

20:28 davidlt has joined #panfrost

20:28 adjtm has joined #panfrost

20:40 NeuroScr has joined #panfrost

20:41 adjtm_ has joined #panfrost

20:43 fysa has joined #panfrost

20:44 adjtm has quit [Ping timeout: 268 seconds]

20:52 fysa has quit [Ping timeout: 268 seconds]

21:00 sravn has joined #panfrost

21:10 stikonas has quit [Read error: Connection reset by peer]

21:10 stikonas_ has joined #panfrost

21:22 <alyssa> Essentially, I'm doing a shift of pandecode from focusing on packing to focusing on meaning.

21:22 <alyssa> Focusing on packing was useful in the early days (when we were able to literally compile a trace and execute it against a Mali)

21:23 <alyssa> But now, we want to go to higher levels of understanding, and focusing on the nitty gritty details of packing bits for the hw is... not useful.

21:23 <alyssa> Better to simply *validate* the packing is correct / canonical.

21:23 <alyssa> if it's not, yeah, dump everything around so we can fix things

21:24 <alyssa> But if it is, no need to print anything except the actually decoded meaning.

21:38 <alyssa> Cute..

21:39 <alyssa> TILER jobs within a job chain are totally independant.

21:39 <alyssa> So you can have different framebuffers (of different sizes / polygon lists) bound for different ones

21:39 <alyssa> And then you just have multiple FRAGMENT jobs

21:40 <alyssa> Blob does this to avoid flushing like crazy while mipmapping

21:40 <alyssa> And likewise, you can have a whole bunch of FRAGMENT jobs in a chain together, with job_barrier set.

21:41 <alyssa> This hw quirk was discovered via anti-Gallium.

21:41 <alyssa> Good news: we know more about the hardware now!

21:41 <alyssa> Bad news: I need to redesign a bunch of things in anti-Gallium to account for the new model of the hw.

21:41 <alyssa> Still a net win.

21:42 <alyssa> New model for anti-Gallium has to be stateless (well, not totally stateless, but not as state-like as GL)

21:43 <alyssa> Instead having hashmaps for things, so the check is more "this framebuffer size corresponds to this framebuffer / polygon list / etc"

21:43 <alyssa> This will take... somewhat more thought to get right.

21:48 fysa has joined #panfrost

21:54 <alyssa> As for workgroups_x_shift_*:

21:54 <alyssa> workgroups_x_shift_3 I've seen as 7 or 8, even

21:54 <alyssa> and zero?

21:54 <alyssa> 0x8 seems to all be CL workloads with images

21:56 fysa has quit [Ping timeout: 245 seconds]

22:02 <alyssa> Wut.

22:04 <alyssa> Oh

22:22 <alyssa> So, in OpenCL: workgroup_x_shift_size_3 =

22:22 <alyssa> { 8 if (128 < local < 256)

22:22 <alyssa> 7 if (64 < l <= 128)

22:22 <alyssa> 6 otherwise }

22:22 <alyssa> (er <= 256)

22:22 <alyssa> That's in 1D

22:22 <alyssa> Need to check 2D/3D now

22:29 robink has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

22:30 <alyssa> wut

22:32 <alyssa> What am I missing

22:38 <alyssa> HdkR: https://developer.arm.com/docs/100614/0310/opencl-optimizations-list/mali-bifrost-gpu-specific-optimizations

22:39 <alyssa> ^ Some of that is good stuff to keep in mind

22:40 <HdkR> :)

22:41 fysa has joined #panfrost

22:41 <anarsoul> mmm, scalar

22:44 <HdkR> Bifrost and Valhall are nice architectures :D

22:47 <anarsoul> iirc bifrost has the same command stream as midgard, what about valhall?

22:49 <HdkR> Probably the same but it isn't released yet

22:49 <HdkR> Seems like mostly an ISA improvement again

22:49 <alyssa> Things I've learned so far:

22:49 <alyssa> - I don't know what I'm doing??

22:51 fysa has quit [Ping timeout: 272 seconds]

22:52 <alyssa> For OpenCL, workgroups_x_shift_2 == workgroups_x

22:52 <alyssa> for GL, it has the MAX2 thing going on

22:52 <alyssa> Wut.

22:53 <alyssa> Oh, and even crazier, for GL compute it's just.. always 0x2?

23:08 <alyssa> Also, for GL, workgroups_z_shift = 32, at least for the blob I have

23:09 <alyssa> It doesn't really matter what it is so I'm guessing it's just a blob quirk

23:10 <alyssa> (when instancing is disabled, anyway)

23:30 stikonas_ has quit [Remote host closed the connection]

23:37 * alyssa shrugs

23:37 <alyssa> This is progress, and we shall revisit later.

23:41 <anarsoul> this was a triumph? :)

23:42 <HdkR> Only once Panfrost can play Portal? :P

23:44 raster has joined #panfrost

23:44 <anarsoul> oh, I wish they released source code so it could be ported to ARM

23:45 <anarsoul> anyway it has no commercial value nowadays

23:46 <HdkR> It was already ported to ARM, just Nvidia only

23:46 <alyssa> and we're out of beta we're releasing on tiiiiiiiiiiiime

23:46 fysa has joined #panfrost

23:47 <anarsoul> HdkR: and android-only :(

23:47 <HdkR> aye

23:47 <alyssa> Okay, new approach

23:47 <alyssa> Anti-Gallium is going to maintain a hashmap of framebuffer keys to framebuffers.

23:48 <HdkR> anarsoul: We just need a fast x86 emulator that can run it :P

23:48 <alyssa> And then when the framebuffer is read in the FRAGMENT job, we ensure it's there

23:48 <anarsoul> HdkR: *sigh*

23:48 <alyssa> (Unles it's a clear-only, in which case we ensure there wasn't anything there)

23:48 <alyssa> And then evict the framebuffer from the hashmap

23:48 <alyssa> ("Even I know this is ridiculous")

23:49 <alyssa> Hrmph

23:50 <alyssa> I think the polygon_list pointer is the right key to use

23:50 <alyssa> I'm just questioning the wisdom of this design

23:50 <alyssa> I suppose hashmaps of objects more generally is the way to go

23:50 <alyssa> I'm just questioning whether:

23:50 <alyssa> - This could ever work on any architecture beside Midgard/Bifrost

23:51 <alyssa> - If it matters.

23:51 <alyssa> This is inherently hw specific

23:55 fysa has quit [Ping timeout: 244 seconds]

23:59 raster has quit [Remote host closed the connection]