alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
cowsay has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
rokquarry has quit [Quit: Leaving]
buzzmarshall has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
nerdboy has quit [Ping timeout: 240 seconds]
macc24 has joined #panfrost
davidlt has joined #panfrost
icecream95 has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
davidlt has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
horsewater has joined #panfrost
macc24 has quit [Ping timeout: 246 seconds]
<icecream95> bbrezillon: Starting any GL application with Mesa 794c239a990 or later and using a lot of RAM (e.g. dd from /dev/zero to a tmpfs) will trigger it
<bbrezillon> icecream95: completely untested, but can you try moving the drm_vma_node_unmap() earlier in drm_gem_shmem_purge_locked() => https://gitlab.freedesktop.org/snippets/1026/raw
<icecream95> bbrezillon: I haven't got DRM compiled as a module, but I tried copying the unmap call to panfrost_gem_purge, and it didn't seem to help
<bbrezillon> icecream95: exact same bug? do you have a diff to share?
<icecream95> bbrezillon: The backtrace is exactly the same
<bbrezillon> icecream95: hm, that's weird, I don't see how page_mapped() can return true if we call vma_node_unmap()
<bbrezillon> can you check if unmap_mapping_range() is really called in drm_vma_node_unmap()?
<bbrezillon> or print the result of drm_mm_node_allocated() in the panfrost driver
<icecream95> bbrezillon: drm_mm_node_allocated is returning 1
nerdboy has joined #panfrost
<bbrezillon> then vma_node_unmap() should be called, and I don't see how we can have a page that's still mapped
yann has joined #panfrost
rcf has quit [Read error: Connection reset by peer]
nlhowell has quit [Ping timeout: 260 seconds]
Elpaulo has quit [Read error: Connection reset by peer]
Elpaulo has joined #panfrost
rcf has joined #panfrost
raster has joined #panfrost
<icecream95> bbrezillon: I get a similar error from panfrost_mmu_map very occasionally: https://gitlab.freedesktop.org/snippets/1027
<icecream95> (this is not related to the change in Mesa, I've had it about once or twice a month since at least 5.4)
macc24 has joined #panfrost
<bbrezillon> icecream95: that's not the same problem
<bbrezillon> looks like this one is happening when you try to re-import an already imported dmabuf, or some missing unmap calls in the gem_close/fd_close path
rcf has quit [Quit: WeeChat 2.7]
rcf has joined #panfrost
icecream95 has quit [Quit: leaving]
nlhowell has joined #panfrost
stikonas has joined #panfrost
<bbrezillon> robmur01, robher: are you sure we should call shmem_truncate_range(0, -1) when purging a BO
horsewater is now known as mixfix41
<bbrezillon> shouldn't we pass the range attached to GEM object being purged instead?
<bbrezillon> nevermind
<bbrezillon> there's a file per obj, so we really want to truncate the whole file
nlhowell has quit [Ping timeout: 260 seconds]
raster has quit [Quit: Gettin' stinky!]
<alyssa> icecream95: bbrezillon: I can revert the mesa change (it was a CPU overhead optimization), but I don't see how that could trigger kernel issues.
<bbrezillon> alyssa: no, it's really a kernel bug
<bbrezillon> I'm chasing it right now
<alyssa> OK
<bbrezillon> I think it has to do with BOs flagged growable
<alyssa> Does it make sense to predicate that commit on new kernel versions once it's fixed, though?
<alyssa> Growable shouldn't be in the cache afaik.
<alyssa> actually did we change that?
<bbrezillon> hm, I think they are
<alyssa> Better point is growable should never be CPU mapped
<alyssa> (IIRC the DDK only CPU maps them for debug)
<bbrezillon> well, I see some pages are marked unevictable, and there's this set_unevictable() call here https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L501
<alyssa> bbrezillon: I'm trying to think more long-term what to do, since AFAIK users tend to get new mesa earlier than new kernel (or at least I do).
<alyssa> So even if the bug is fixed in 5.8, I mean, how many users do we have that are still on 5.2?
<bbrezillon> well, fixes are supposed to be backported
<alyssa> True
<bbrezillon> and distros are expected to update their kernels
<alyssa> but do users?
<bbrezillon> so I wouldn't worry about that
<alyssa> Personally I'm stuck on an older version since the new kernel in debian has major devfreq regressions as discussd..
<bbrezillon> dunno, but then it's their problems, no?
<bbrezillon> have you mentioned that to mmind00 ?
<bbrezillon> it's on kevin, right?
<alyssa> Well, you just pinged em :)
buzzmarshall has joined #panfrost
<bbrezillon> alyssa: ok, so forbidding MADV on heaps fixes the BUG reported by icecream95, but now I hit a NULL pointer dereference :'-(
<bbrezillon> my bad, it's caused by my own traces :)
<alyssa> bbrezillon: We should probably do that anyway tbh.
<mmind00> "new kernel in debian has major devfreq regressions as discussd" ... what does that mean? Aka is that device-specific of general?
<bbrezillon> alyssa: nope, still have unevictable pages after that, so it's something else
<bbrezillon> and drm_gem_get_pages() calls mapping_set_unevictable() too
<bbrezillon> mmind00: I think it's kevin specific
<bbrezillon> (or the rk3399 variant used on this chromebook)
<alyssa> ^^ yeah
<alyssa> bbrezillon: does madv just rely on things being munmapped?
<urjaman> afaik it could affect any panfrost using thingy, but actually only happens when the default voltage is too low for max speed
<urjaman> but kevin and also my veyron speedy
<urjaman> (and i assume other veyrons too... tho also it's like overclocking (or undervolting) so device specific as to whether bad stuff will happen)
<mmind00> bbrezillon: is there a problem description somewhere ... because at least the op1 operating points didn't change since 2017 - so I'm not really sure what should've broken
<urjaman> iirc it just doesnt change the voltage
<urjaman> ask robmur01 ? i think he knows more...
<bbrezillon> alyssa: if it is, we should refuse the MADV(DONT_NEED) instead of adding it to the purgeable pool
<alyssa> bbrezillon: If it's madv(..) and the kernel wants to claim it back, the kernel should go ahead and munmap it I think?
nlhowell has joined #panfrost
<bbrezillon> alyssa: that's what I'm trying to figure out
<bbrezillon> it does unmap()
<bbrezillon> but apparently that's not enough
<alyssa> alright :)
<bbrezillon> *unmap it
<bbrezillon> I mean, drm_vma_node_unmap() is called, but page->_mapcount) is not decremented as a result
nlhowell has quit [Quit: WeeChat 2.8]
<alyssa> Uh oh.
nlhowell has joined #panfrost
macc24 has quit [Quit: WeeChat 2.8]
Stary- is now known as Stary
<alyssa> Gotta love me some Midgard.
<alyssa> glBlendFunc(GL_DST_COLOR, GL_SRC_COLOR)
<alyssa> (with FUNC_ADD)
<alyssa> This computes the final colour as:
<alyssa> (dst)(src) + (src)(dst)
<alyssa> Mulitplication is commutative so that's equal to just 2(src)(dst)
<alyssa> (src)(dst) we can easily compute with fixed-function, but the above split we can't since there's no dominant factor, and the latter we can't since there's a 2
<alyssa> so all because of that 2, we need to trigger a blend shader.
<alyssa> Anyway, for RGB565 the blend shader we emit on T860 is 14 cycles. We can do a lot better.
<alyssa> { This comes up in t-rex, for those following along. }
<alyssa> Using a native unpack gets us down to 12 cycles
NeuroScr has joined #panfrost
<alyssa> Fusing in more things into the convert gets us to 11
<alyssa> "vadd.f2u_rteh"
<alyssa> ^ I can use fancy ops too.
macc24 has joined #panfrost
macc24 has quit [Client Quit]
davidlt has quit [Ping timeout: 240 seconds]
cwabbott has joined #panfrost
macc24 has joined #panfrost
macc24 has quit [Quit: WeeChat 2.8]
macc24_ has joined #panfrost
macc24_ is now known as macc24
cwabbott has quit [Ping timeout: 256 seconds]
robert_ancell has joined #panfrost
macc24 has quit [Quit: WeeChat 2.8]
macc24 has joined #panfrost
mixfix41 has quit [Ping timeout: 272 seconds]
nlhowell has quit [Ping timeout: 246 seconds]
nlhowell has joined #panfrost
Werner has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
Werner has joined #panfrost
macc24_ has joined #panfrost
macc24 has quit [Ping timeout: 265 seconds]
macc24_ is now known as macc24