#lima on 2020-04-08 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:11 machinehum has quit [Ping timeout: 256 seconds]

00:22 buzzmarshall has joined #lima

00:24 yuq825 has joined #lima

00:30 <anarsoul> yuq825: hi

00:30 <anarsoul> any opinion on https://gitlab.freedesktop.org/mesa/mesa/-/issues/2736 ?

00:30 <yuq825> hi

00:31 <anarsoul> basically the issue is that mpv passes video decoder buffer into gltexsubimage2d or glteximage2d directly

00:32 <anarsoul> and as result we end up with copying data from uncached linear buffer (from hw decoder) into uncached tiled buffer (gpu texture)

00:32 <anarsoul> prior to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4281 we had cached intermediate buffer and thus we didn't hit the issue

00:34 <anarsoul> but it resulted in significant slowdown in glamor on slower CPUs

00:36 <anarsoul> I'm inclined to leave this MR in since it addresses generic performance issue

00:36 <anarsoul> while mpv is corner case that can be fixed in the app

00:36 <anarsoul> I'm not sure why they are not importing hw decoder buffer as a texture but it would be right approach

00:58 machinehum has joined #lima

00:58 <yuq825> so mpv mmap the hw decode buffer and pass it directly to texsubimage?

01:00 <anarsoul> I think so

01:01 <anarsoul> well, it definitely uploads texture data with texsubimage2d

01:01 <anarsoul> and support for v4l2 m2m decoder is out of tree patches if I understand correctly

01:01 <anarsoul> narmstrong: ^^

01:06 <yuq825> why 4281 expose this? I suppose intermediate buffer is for dst buffer which is used to be written which is fine for wc buffer

01:07 <yuq825> does the panfrost_store_tiled_image need to read out the dst buffer?

01:08 <anarsoul> yuq825: because prior to 4281 we dealt with tiled buffers differently

01:09 <anarsoul> basically u_default_texture_subdata() called transfer_map() for dst buffer which creates staging buffer (which is cached) and returns a pointer, then it copies src into this staging buffer, then calls transfer_unmap() which copies data from linear staging buffer into tiled dst

01:10 <anarsoul> 4821 eliminated that by directly using src

01:12 <anarsoul> as result it gives nice improvement in some x11perf benchmarks

01:13 <yuq825> either way need to read from uncached src, I suppose this step is the most expensive one, would it make any difference for uc/wc -> cached with uc/wc -> wc?

01:14 <anarsoul> yuq825: likely because we don't read linearly?

01:15 <anarsoul> also memcpy is heavily optimized, our tiling routines can't compete with it

01:15 <yuq825> possible

01:16 <yuq825> read from wc continuously can be optimized with some CPU SIMD instructions in memcpy

01:17 <anarsoul> yeah

01:22 <yuq825> I agree that pass a uncached pointer to tex2dimage is not a performance friendly usage for driver, user app should be responsible for it

01:22 <yuq825> gpu with dma engine may avoid this, but not for mali4xx

01:30 <anarsoul> yeah, I don't have any ideas how we can improve it from driver side

01:31 <anarsoul> backing out optimization looks like bad option to me

01:32 kaspter has joined #lima

01:33 <yuq825> btw. copy from uc/wc mem is always bad idea even with SIMD optimization, so mpy should use hw decoder output bufffer as texture directly whenever possible

01:47 <anarsoul> yeah

01:47 <anarsoul> well, even for sw decoder it's not a good idea to use glTexSubImage to upload textures to the GPU

01:47 <anarsoul> they should import the buffer directly

01:57 machinehum has quit [Ping timeout: 260 seconds]

02:37 machinehum has joined #lima

04:53 _whitelogger has joined #lima

05:26 Barada has joined #lima

05:26 Barada has quit [Client Quit]

05:27 Barada has joined #lima

06:13 <narmstrong> anarsoul: it’s upstream now

06:13 <anarsoul> hm

06:14 <narmstrong> warpme_: sync should be ok, it was fixed in the early days of lima with fences

06:14 <narmstrong> warpme_: and a proper usage of drm submit

06:15 <anarsoul> narmstrong: is it part of ffmpeg or mpv?

06:16 <narmstrong> anarsoul: the driver is a m2m v4l2 driver, ffmpeg master and mpv should be able to use it

06:17 <anarsoul> I see

06:17 <anarsoul> well, it's unfortunate that mpv uses suboptimal method for rendering video

06:18 <anarsoul> texsubimage2d is suboptimal even on x86 machines

06:22 <anarsoul> btw, v4l2 request support hasn't landed in ffmpeg yet, has it?

06:33 machinehum has quit [Ping timeout: 260 seconds]

06:33 monstr has joined #lima

06:51 <anarsoul> narmstrong: I checked mpv code and it should be using MapBufferRange() instead of TexSubImage() unless it hits fallback

07:04 <anarsoul> but tbh it causes higher cpu load for me with sw rendering on pinebook 1080p

07:04 <anarsoul> 100% vs 150%

07:05 <anarsoul> since MapRange() would result in extra copy :)

08:22 <rellla> anarsoul: guess, i have to implement GL_EXT_timer_query :)

08:30 machinehum has joined #lima

08:34 machinehum has quit [Ping timeout: 246 seconds]

09:35 <narmstrong> anarsoul: v4l2 request hasn't, but the aml decoder is stateful (not easier...)

10:18 psydread has left #lima [#lima]

10:18 psydread has joined #lima

10:44 psydread has left #lima [#lima]

10:44 psydread has joined #lima

11:06 Barada has quit [Quit: Barada]

11:19 Barada has joined #lima

11:26 Ntemis has joined #lima

11:35 Ntemis has quit [Remote host closed the connection]

11:53 Barada has quit [Quit: Barada]

12:11 cwabbott has quit [Quit: cwabbott]

12:12 cwabbott has joined #lima

12:31 machinehum has joined #lima

12:35 cwabbott has quit [Quit: cwabbott]

12:35 cwabbott has joined #lima

12:36 machinehum has quit [Ping timeout: 246 seconds]

12:38 cwabbott has quit [Client Quit]

12:38 cwabbott has joined #lima

12:40 cwabbott has quit [Client Quit]

12:40 cwabbott has joined #lima

12:45 cwabbott has quit [Client Quit]

12:45 cwabbott has joined #lima

12:45 cwabbott has quit [Remote host closed the connection]

12:46 cwabbott has joined #lima

13:26 dddddd has joined #lima

14:02 yuq825 has quit [Quit: Leaving.]

14:32 machinehum has joined #lima

14:37 machinehum has quit [Ping timeout: 260 seconds]

15:41 monstr has quit [Remote host closed the connection]

16:10 <anarsoul> rellla: good luck :)

16:33 machinehum has joined #lima

16:37 machinehum has quit [Ping timeout: 246 seconds]

17:14 machinehum has joined #lima

17:15 cwabbott has quit [Ping timeout: 246 seconds]

17:18 cwabbott has joined #lima

18:56 megi has quit [Quit: WeeChat 2.7.1]

18:57 megi has joined #lima

19:25 machinehum has quit [Ping timeout: 260 seconds]

19:26 machinehum has joined #lima

19:40 deesix_ has joined #lima

19:40 dddddd_ has joined #lima

19:42 deesix has quit [Ping timeout: 240 seconds]

19:42 deesix_ is now known as deesix

19:42 dddddd has quit [Ping timeout: 240 seconds]

19:43 dddddd_ is now known as dddddd

20:17 tautologico has joined #lima

20:17 tautologico has left #lima [#lima]

21:15 psydread has left #lima [#lima]

21:22 psydread has joined #lima

22:59 machinehum has quit [Ping timeout: 265 seconds]

23:46 machinehum has joined #lima

23:51 machinehum has quit [Ping timeout: 260 seconds]