#panfrost on 2020-02-05 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:21 NeuroScr has quit [Quit: NeuroScr]

00:31 raster has quit [Quit: Gettin' stinky!]

00:41 stikonas has quit [Remote host closed the connection]

01:06 nerdboy has quit [Ping timeout: 240 seconds]

02:04 vstehle has quit [Ping timeout: 260 seconds]

02:06 bbrezillon has quit [Ping timeout: 265 seconds]

02:07 bbrezillon has joined #panfrost

02:42 chewitt has quit [Quit: Zzz..]

02:43 chewitt has joined #panfrost

03:56 buzzmarshall has quit [Remote host closed the connection]

04:11 megi has quit [Ping timeout: 265 seconds]

04:26 icecream95 has joined #panfrost

05:03 robert_ancell has quit [Ping timeout: 268 seconds]

05:20 <icecream95> bbrezillon: Your patch does seem to fix the errors when killing glmark2.

05:26 MoeIcenowy has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]

05:27 MoeIcenowy has joined #panfrost

05:29 <icecream95> LZDoom seems to be running slightly slower than it used to, though...

05:34 robink has joined #panfrost

05:44 robink has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

05:46 robink has joined #panfrost

06:00 vstehle has joined #panfrost

06:01 NeuroScr has joined #panfrost

06:04 robink has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

06:04 cowsay_ has joined #panfrost

06:06 cowsay has quit [Ping timeout: 272 seconds]

06:09 robink has joined #panfrost

07:10 guillaume_g has joined #panfrost

07:32 <icecream95> False alarm, having Firefox open has a larger effect on performance than I expected

07:40 <bbrezillon> icecream95: feel free to add your Tested-by

07:52 chewitt has quit [Ping timeout: 268 seconds]

08:10 yann|work has quit [Ping timeout: 240 seconds]

08:28 marex-cloud has joined #panfrost

09:02 warpme_ has joined #panfrost

09:11 yann|work has joined #panfrost

09:31 alpernebbi has joined #panfrost

09:43 <tomeu> robmur01: it was indeed a PERMISSION_FAULT with EXECUTE access

09:43 <tomeu> patch coming

09:44 <tomeu> robmur01: what I don't understand is that on a kevin I hit the mmap codepath, but on a rockpi4 I went straight to the terminal fault

09:51 icecream95 has quit [Quit: leaving]

09:58 toggleton has joined #panfrost

10:00 chewitt has joined #panfrost

10:10 Xalius has joined #panfrost

10:12 <Xalius> what patches are currently recommended/needed on top of the 5.5 release?

10:12 <Xalius> I saw some mentions reading the backlog from the last 5 days

10:13 raster has joined #panfrost

10:17 megi has joined #panfrost

10:19 <tomeu> bbrezillon, robmur01, robher: we seem to have some concurrency problem, I see faults when running deqp-gles3 only when running several instances in parallel

10:19 <tomeu> the faults can be translation faults, but also misc other such as INSTR_OPERAND_FAULT, DATA_INVALID_FAULT, etc

10:19 <tomeu> which suggests BOs being overwritten

10:20 <tomeu> guess it has to be a kernel problem

10:23 <bbrezillon> tomeu: not necessarily

10:24 <bbrezillon> if the BO accessed by the job is not added to the bos array passed at submit time, the kernel doesn't know about it

10:24 <tomeu> yes, but how could processes affect each other without it being kernel's fault?

10:26 <bbrezillon> tomeu: well, if the memory backing the BO is released, it can be allocated to another process

10:26 <bbrezillon> something like a use-after-free, except this time the user is the GPU

10:26 <tomeu> oh, I see

10:27 <bbrezillon> but maybe you're right and it's a kernel problem

10:27 <tomeu> shouldn't that cause a translation fault though?

10:27 <tomeu> as the previous mapping would have been deleted

10:30 <bbrezillon> tomeu: are all the gles3 tests impacted?

10:31 <tomeu> bbrezillon: you mean if faults happen with all gles3 tests?

10:31 <bbrezillon> I'd expect it to happen on gles2 too if that was a kernel bug

10:32 <tomeu> good point, maybe there's some kind of BO that is only used in gles3 and it's not being added to the list sent to the kernel

10:32 <bbrezillon> tomeu: yep, if it only happens with a subset of the gles3 testsuite, maybe we can figure out what those tests have in common

10:32 <tomeu> will give that a look

10:35 <bbrezillon> well, if INSTR_OPERAND_FAULT is what I think (fault caused by a wrong instruction in the midgard shader bytecode), that'd be weird. I'm pretty sure BOs backing shaders' bytecode are added to the list

10:46 Xalius has quit [Remote host closed the connection]

10:51 robink has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

10:53 robink has joined #panfrost

11:17 raster has quit [Quit: Gettin' stinky!]

11:40 raster has joined #panfrost

11:51 buzzmarshall has joined #panfrost

12:11 <alyssa> bbrezillon: INSTR_OPERAND_FAULT is caused by accessing the wrong *operand* in the midgard shader, that is, too many registers, too many uniforms, etc vs the counts reported in the shader metadata descriptor

12:18 chewitt has quit [Ping timeout: 260 seconds]

12:20 NeuroScr has quit [Quit: NeuroScr]

12:26 <bbrezillon> alyssa: ok, so it can also be caused by a cmdstream corruption

12:27 <tomeu> I have seen other faults in shaders, but I'm not sure they still happen

12:27 <alyssa> Yeah, although DATA_INVALID_FAULT would probably be the more likely result of just random cmdstream corruption

12:28 <tomeu> hmm, good point

12:29 <bbrezillon> tomeu: then maybe double-check the transient allocation path

12:29 <tomeu> though I was planning to see if copying to transient memory in one go before submitting would speed things up

12:30 <tomeu> to reduce cache flushes

12:30 <tomeu> so maybe I should do that first

12:30 <bbrezillon> yep, it should help write-combine

12:32 <tomeu> ok, will try that before doing any further debugging

12:32 <tomeu> once I finish all the paperwork...

12:33 chewitt has joined #panfrost

12:42 chewitt has quit [Ping timeout: 240 seconds]

12:46 raster has quit [Quit: Gettin' stinky!]

12:58 chewitt has joined #panfrost

13:25 vstehle has quit [Ping timeout: 240 seconds]

13:26 vstehle has joined #panfrost

13:35 raster has joined #panfrost

13:40 raster has quit [Ping timeout: 240 seconds]

13:47 <alyssa> Looking at shared memory siizing fields

13:47 <alyssa> I think I have it understood when the size is a power-of-two

13:48 <alyssa> things get a little (lot) more complicated when it isn't, thouh :-(

13:54 raster has joined #panfrost

13:58 chewitt has quit [Ping timeout: 265 seconds]

14:11 chewitt has joined #panfrost

14:26 raster has quit [Quit: Gettin' stinky!]

14:39 raster has joined #panfrost

14:48 raster has quit [Quit: Gettin' stinky!]

14:53 steev has quit []

14:55 steev has joined #panfrost

15:02 MoeIcenowy has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]

15:02 MoeIcenowy has joined #panfrost

15:10 raster has joined #panfrost

15:25 <tomeu> bbrezillon: do you spot the error? https://paste.internal.collabora.co.uk/17917

15:34 <tomeu> clears work, so readpixels isn't broken

15:34 <bbrezillon> tomeu: nope

15:35 <bbrezillon> tomeu: BTW, you probably want to keep the shadow buf per-batch, and not per BO

15:36 <bbrezillon> like, fill the shadow buf until if gets full or the batch is flushed, and copy to the actual BO at that time

15:36 <tomeu> oh, good idea

15:36 <tomeu> but that shouldn't affect the result, right?

15:37 <bbrezillon> no no

15:37 <bbrezillon> it's unrelated

15:37 <tomeu> afaics, all transient usage is for cmdstream stuff

15:38 <bbrezillon> it is

15:38 <bbrezillon> are you sure bo->cpu is not used directly in some path?

15:39 <bbrezillon> to readback the content of a cmdstream element

15:39 <tomeu> ah, good idea

15:39 <tomeu> though it should use the transfer

15:39 <bbrezillon> or even updated

15:40 <tomeu> I knew it couldn't be that easy :)

15:41 <bbrezillon> maybe add a panfrost_bo_cpu_addr() helper and patch all places where bo->cpu is accessed directly

15:43 <tomeu> yeah, most uses transfer.cpu, but those job.cpu references look suspicious

15:43 <tomeu> but the cmdstream was identically between a passing and failing case, so it would be something else :/

15:43 <tomeu> as dumped, at least

15:45 <tomeu> first_job is also a transfer

15:45 * tomeu keeps going through each straw

15:48 <tomeu> all are transfers :/

15:50 chewitt has quit [Ping timeout: 260 seconds]

15:52 <tomeu> ah, seems to work if we don't sync fences for dumping traces

15:58 <bbrezillon> tomeu: duh

15:58 <bbrezillon> it's hiding another bug :P

16:00 <tomeu> but even then, it regresses on 863 gles3 tests out of 8k

16:12 <robmur01> hmm, even more fun - hit the as_count < 0 error when *starting* kmscube, it runs fine, exits fine, but trying to start it again is the point where the deadlock kicked in

16:16 raster has quit [Quit: Gettin' stinky!]

17:16 yann|work has quit [Ping timeout: 265 seconds]

17:17 narmstrong has quit []

17:17 narmstrong has joined #panfrost

17:23 <alyssa> tomeu: oh joy :|

17:28 davidlt has joined #panfrost

18:01 pH5 has joined #panfrost

18:05 nerdboy has joined #panfrost

18:14 davidlt_ has joined #panfrost

18:17 davidlt has quit [Ping timeout: 268 seconds]

18:18 davidlt_ is now known as davidlt

18:40 davidlt has quit [Ping timeout: 272 seconds]

19:01 davidlt has joined #panfrost

19:11 stikonas has joined #panfrost

19:21 <alyssa> https://people.collabora.com/~alyssa/table.pdf

19:21 <alyssa> I too like highly nonlinear encodings.

19:21 <alyssa> From the behaviour around powers-of-two it's clear at least the upper nibble is logarithmic (base 2) but that doesn't explain the behaviour for non-powers-of-two

19:21 <alyssa> urjaman: ^ more $\LaTeX$ for you :p

19:24 <alyssa> It *appears* the upper nibble is a shift and the bottom nibble is a linear factor

19:25 <alyssa> But that doesn't always work, the fields with 0 in the bottom are the most egregiously wrong here

19:28 <alyssa> Oh, hey...

19:28 <alyssa> Notice in particular the bottom nibble only takes on one of four values in the table:

19:28 <alyssa> 0, 2, 4, 6

19:28 <alyssa> in binary:

19:28 <alyssa> 000, 010, 100, 110

19:29 <alyssa> So the bottom bit we don't see used here, but bits 1 and 2 form a 2-bit field.

19:38 steev has quit [Ping timeout: 248 seconds]

19:38 <alyssa> The question is of course.... for what?

19:39 narmstrong has quit [Ping timeout: 245 seconds]

19:39 <alyssa> Does shared memory have to be a POT? If so, then you just need the logarithm field, no 2-bit field

19:40 <alyssa> Maybe it has to be at least cache line aligned? That was see distinct encodings for <64bytes invalidates that.

19:40 <alyssa> (The fact that 12 bytes has an encoding distinct from 16 says... a lot)

19:42 <alyssa> I'm tempted to think there's a way to do *3 multipliers but... why?

19:44 <alyssa> Of course panfrost could just round up sizes to powers of two, minimum 128, since those are understood. But we shouldn't really have to?

19:45 <alyssa> oh! oh! but it's even more wild

19:46 <alyssa> since apparently the type matters. which invlidates some of the earlier data, gah

19:46 <alyssa> er no, just bad data

19:50 <bbrezillon> robmur01: that might fix your problem => http://code.bulix.org/mhzkfl-1122038

19:53 TheKit has joined #panfrost

19:56 tasinofan has joined #panfrost

19:57 buzzmarshall has quit [Quit: Leaving]

19:59 <alyssa> Hmm

20:00 guillaume_g has quit [Quit: Konversation terminated!]

20:04 <alyssa> in other news, three.js demos mostly work now

20:08 <alyssa> I guess I'll start implementing the compiler side of the shared memory while my brain tries to piece together this~

20:13 <alyssa> Good news is that the ISA side of shared memory is nearly identical to stack accesses, OpenCL globals, and OpenGL SSBOs

20:36 jolan has quit [Quit: leaving]

20:39 yann|work has joined #panfrost

20:39 <alyssa> Okay, I got the general tructure for shared memory right

20:39 <alyssa> Passing dEQP-GLES31.functional.compute.basic.shared_var_single_group now but I think I need to do some pretty serious refactoring for this to be practical..

20:41 buzzmarshall has joined #panfrost

20:42 * alyssa needs a good name for this structure

20:42 <alyssa> Configuration for the stack and shared memory, and the pointers to those buffers..

20:43 <alyssa> I guess I'll just go with mali_stack_shared, though that's kind of a lame name :p

20:47 <alyssa> mali_shared_memory is probably better, stack is shared

20:53 icecream95 has joined #panfrost

20:55 alpernebbi has quit [Quit: alpernebbi]

21:09 jolan has joined #panfrost

21:09 <alyssa> Admittedly with some hacks, but now passing dEQP-GLES31.functional.compute.basic.shared_var_multiple_groups

21:10 <alyssa> (Hack being worstcasing a size, still need to route the size through properly)

21:11 <alyssa> dEQP-GLES31.functional.compute.shared_var.* failing but in part that's because of barriers right now ...

21:35 davidlt has quit [Ping timeout: 265 seconds]

22:01 NeuroScr has joined #panfrost

22:32 NeuroScr has quit [Quit: NeuroScr]

22:58 steev has joined #panfrost

23:00 narmstrong has joined #panfrost

23:00 NeuroScr has joined #panfrost

23:06 pH5 has quit [Ping timeout: 260 seconds]

23:58 warpme_ has quit [Quit: Connection closed for inactivity]