#panfrost on 2020-05-25 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:00 cowsay has joined #panfrost

00:57 stikonas has quit [Remote host closed the connection]

00:58 stikonas has joined #panfrost

01:14 stikonas has quit [Remote host closed the connection]

01:48 rokquarry has quit [Quit: Leaving]

01:49 buzzmarshall has joined #panfrost

02:04 NeuroScr has quit [Quit: NeuroScr]

02:07 nerdboy has quit [Ping timeout: 240 seconds]

02:42 macc24 has joined #panfrost

03:46 davidlt has joined #panfrost

04:48 icecream95 has joined #panfrost

05:18 buzzmarshall has quit [Remote host closed the connection]

05:18 davidlt has quit [Ping timeout: 272 seconds]

05:20 davidlt has joined #panfrost

05:24 horsewater has joined #panfrost

05:36 macc24 has quit [Ping timeout: 246 seconds]

06:01 <icecream95> bbrezillon: Starting any GL application with Mesa 794c239a990 or later and using a lot of RAM (e.g. dd from /dev/zero to a tmpfs) will trigger it

06:38 <bbrezillon> icecream95: completely untested, but can you try moving the drm_vma_node_unmap() earlier in drm_gem_shmem_purge_locked() => https://gitlab.freedesktop.org/snippets/1026/raw

07:30 <icecream95> bbrezillon: I haven't got DRM compiled as a module, but I tried copying the unmap call to panfrost_gem_purge, and it didn't seem to help

07:42 <bbrezillon> icecream95: exact same bug? do you have a diff to share?

07:46 <icecream95> bbrezillon: The backtrace is exactly the same

07:48 <bbrezillon> icecream95: hm, that's weird, I don't see how page_mapped() can return true if we call vma_node_unmap()

07:49 <bbrezillon> can you check if unmap_mapping_range() is really called in drm_vma_node_unmap()?

07:50 <bbrezillon> or print the result of drm_mm_node_allocated() in the panfrost driver

07:57 <icecream95> bbrezillon: drm_mm_node_allocated is returning 1

07:58 nerdboy has joined #panfrost

08:00 <bbrezillon> then vma_node_unmap() should be called, and I don't see how we can have a page that's still mapped

08:01 yann has joined #panfrost

08:13 rcf has quit [Read error: Connection reset by peer]

08:16 nlhowell has quit [Ping timeout: 260 seconds]

08:50 Elpaulo has quit [Read error: Connection reset by peer]

08:52 Elpaulo has joined #panfrost

09:04 rcf has joined #panfrost

09:14 raster has joined #panfrost

09:44 <icecream95> bbrezillon: I get a similar error from panfrost_mmu_map very occasionally: https://gitlab.freedesktop.org/snippets/1027

09:46 <icecream95> (this is not related to the change in Mesa, I've had it about once or twice a month since at least 5.4)

09:55 macc24 has joined #panfrost

09:58 <bbrezillon> icecream95: that's not the same problem

10:03 <bbrezillon> looks like this one is happening when you try to re-import an already imported dmabuf, or some missing unmap calls in the gem_close/fd_close path

10:15 rcf has quit [Quit: WeeChat 2.7]

10:16 rcf has joined #panfrost

10:20 icecream95 has quit [Quit: leaving]

10:25 nlhowell has joined #panfrost

10:33 stikonas has joined #panfrost

10:41 <bbrezillon> robmur01, robher: are you sure we should call shmem_truncate_range(0, -1) when purging a BO

10:41 horsewater is now known as mixfix41

10:42 <bbrezillon> shouldn't we pass the range attached to GEM object being purged instead?

10:42 <bbrezillon> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_shmem_helper.c#L411

10:44 <bbrezillon> nevermind

10:45 <bbrezillon> there's a file per obj, so we really want to truncate the whole file

12:14 nlhowell has quit [Ping timeout: 260 seconds]

13:16 raster has quit [Quit: Gettin' stinky!]

13:59 <alyssa> icecream95: bbrezillon: I can revert the mesa change (it was a CPU overhead optimization), but I don't see how that could trigger kernel issues.

14:01 <bbrezillon> alyssa: no, it's really a kernel bug

14:01 <bbrezillon> I'm chasing it right now

14:01 <alyssa> OK

14:01 <bbrezillon> I think it has to do with BOs flagged growable

14:01 <alyssa> Does it make sense to predicate that commit on new kernel versions once it's fixed, though?

14:02 <alyssa> Growable shouldn't be in the cache afaik.

14:02 <alyssa> actually did we change that?

14:02 <bbrezillon> hm, I think they are

14:02 <alyssa> Better point is growable should never be CPU mapped

14:02 <alyssa> (IIRC the DDK only CPU maps them for debug)

14:04 <bbrezillon> well, I see some pages are marked unevictable, and there's this set_unevictable() call here https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L501

14:09 <alyssa> bbrezillon: I'm trying to think more long-term what to do, since AFAIK users tend to get new mesa earlier than new kernel (or at least I do).

14:09 <alyssa> So even if the bug is fixed in 5.8, I mean, how many users do we have that are still on 5.2?

14:10 <bbrezillon> well, fixes are supposed to be backported

14:11 <alyssa> True

14:11 <bbrezillon> and distros are expected to update their kernels

14:11 <alyssa> but do users?

14:11 <bbrezillon> so I wouldn't worry about that

14:11 <alyssa> Personally I'm stuck on an older version since the new kernel in debian has major devfreq regressions as discussd..

14:11 <bbrezillon> dunno, but then it's their problems, no?

14:12 <bbrezillon> have you mentioned that to mmind00 ?

14:12 <bbrezillon> it's on kevin, right?

14:14 <alyssa> Well, you just pinged em :)

14:27 buzzmarshall has joined #panfrost

15:08 <bbrezillon> alyssa: ok, so forbidding MADV on heaps fixes the BUG reported by icecream95, but now I hit a NULL pointer dereference :'-(

15:15 <bbrezillon> my bad, it's caused by my own traces :)

15:18 <alyssa> bbrezillon: We should probably do that anyway tbh.

15:20 <mmind00> "new kernel in debian has major devfreq regressions as discussd" ... what does that mean? Aka is that device-specific of general?

15:31 <bbrezillon> alyssa: nope, still have unevictable pages after that, so it's something else

15:31 <bbrezillon> and drm_gem_get_pages() calls mapping_set_unevictable() too

15:40 <bbrezillon> mmind00: I think it's kevin specific

15:40 <bbrezillon> (or the rk3399 variant used on this chromebook)

15:44 <alyssa> ^^ yeah

15:45 <alyssa> bbrezillon: does madv just rely on things being munmapped?

15:46 <urjaman> afaik it could affect any panfrost using thingy, but actually only happens when the default voltage is too low for max speed

15:46 <urjaman> but kevin and also my veyron speedy

15:47 <urjaman> (and i assume other veyrons too... tho also it's like overclocking (or undervolting) so device specific as to whether bad stuff will happen)

15:48 <mmind00> bbrezillon: is there a problem description somewhere ... because at least the op1 operating points didn't change since 2017 - so I'm not really sure what should've broken

15:49 <urjaman> iirc it just doesnt change the voltage

15:51 <urjaman> ask robmur01 ? i think he knows more...

15:52 <bbrezillon> alyssa: if it is, we should refuse the MADV(DONT_NEED) instead of adding it to the purgeable pool

15:54 <alyssa> bbrezillon: If it's madv(..) and the kernel wants to claim it back, the kernel should go ahead and munmap it I think?

15:56 nlhowell has joined #panfrost

15:59 <bbrezillon> alyssa: that's what I'm trying to figure out

15:59 <bbrezillon> it does unmap()

16:00 <bbrezillon> but apparently that's not enough

16:00 <alyssa> alright :)

16:00 <bbrezillon> *unmap it

16:03 <bbrezillon> I mean, drm_vma_node_unmap() is called, but page->_mapcount) is not decremented as a result

16:08 nlhowell has quit [Quit: WeeChat 2.8]

16:09 <alyssa> Uh oh.

16:12 nlhowell has joined #panfrost

16:21 macc24 has quit [Quit: WeeChat 2.8]

16:28 Stary- is now known as Stary

17:42 <alyssa> Gotta love me some Midgard.

17:42 <alyssa> glBlendFunc(GL_DST_COLOR, GL_SRC_COLOR)

17:42 <alyssa> (with FUNC_ADD)

17:42 <alyssa> This computes the final colour as:

17:42 <alyssa> (dst)(src) + (src)(dst)

17:43 <alyssa> Mulitplication is commutative so that's equal to just 2(src)(dst)

17:43 <alyssa> (src)(dst) we can easily compute with fixed-function, but the above split we can't since there's no dominant factor, and the latter we can't since there's a 2

17:43 <alyssa> so all because of that 2, we need to trigger a blend shader.

17:51 <alyssa> Anyway, for RGB565 the blend shader we emit on T860 is 14 cycles. We can do a lot better.

17:51 <alyssa> { This comes up in t-rex, for those following along. }

17:59 <alyssa> Using a native unpack gets us down to 12 cycles

18:21 NeuroScr has joined #panfrost

18:44 <alyssa> Fusing in more things into the convert gets us to 11

18:44 <alyssa> "vadd.f2u_rteh"

18:45 <alyssa> ^ I can use fancy ops too.

18:56 macc24 has joined #panfrost

19:00 macc24 has quit [Client Quit]

19:17 davidlt has quit [Ping timeout: 240 seconds]

19:22 cwabbott has joined #panfrost

20:01 macc24 has joined #panfrost

20:10 macc24 has quit [Quit: WeeChat 2.8]

20:33 macc24_ has joined #panfrost

20:34 macc24_ is now known as macc24

20:35 cwabbott has quit [Ping timeout: 256 seconds]

21:12 robert_ancell has joined #panfrost

21:19 macc24 has quit [Quit: WeeChat 2.8]

21:55 macc24 has joined #panfrost

22:59 mixfix41 has quit [Ping timeout: 272 seconds]

23:18 nlhowell has quit [Ping timeout: 246 seconds]

23:19 nlhowell has joined #panfrost

23:22 Werner has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]

23:23 Werner has joined #panfrost

23:23 macc24_ has joined #panfrost

23:25 macc24 has quit [Ping timeout: 265 seconds]

23:30 macc24_ is now known as macc24