ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
machinehum has quit [Ping timeout: 256 seconds]
buzzmarshall has joined #lima
yuq825 has joined #lima
<anarsoul> yuq825: hi
<yuq825> hi
<anarsoul> basically the issue is that mpv passes video decoder buffer into gltexsubimage2d or glteximage2d directly
<anarsoul> and as result we end up with copying data from uncached linear buffer (from hw decoder) into uncached tiled buffer (gpu texture)
<anarsoul> prior to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4281 we had cached intermediate buffer and thus we didn't hit the issue
<anarsoul> but it resulted in significant slowdown in glamor on slower CPUs
<anarsoul> I'm inclined to leave this MR in since it addresses generic performance issue
<anarsoul> while mpv is corner case that can be fixed in the app
<anarsoul> I'm not sure why they are not importing hw decoder buffer as a texture but it would be right approach
machinehum has joined #lima
<yuq825> so mpv mmap the hw decode buffer and pass it directly to texsubimage?
<anarsoul> I think so
<anarsoul> well, it definitely uploads texture data with texsubimage2d
<anarsoul> and support for v4l2 m2m decoder is out of tree patches if I understand correctly
<anarsoul> narmstrong: ^^
<yuq825> why 4281 expose this? I suppose intermediate buffer is for dst buffer which is used to be written which is fine for wc buffer
<yuq825> does the panfrost_store_tiled_image need to read out the dst buffer?
<anarsoul> yuq825: because prior to 4281 we dealt with tiled buffers differently
<anarsoul> basically u_default_texture_subdata() called transfer_map() for dst buffer which creates staging buffer (which is cached) and returns a pointer, then it copies src into this staging buffer, then calls transfer_unmap() which copies data from linear staging buffer into tiled dst
<anarsoul> 4821 eliminated that by directly using src
<anarsoul> as result it gives nice improvement in some x11perf benchmarks
<yuq825> either way need to read from uncached src, I suppose this step is the most expensive one, would it make any difference for uc/wc -> cached with uc/wc -> wc?
<anarsoul> yuq825: likely because we don't read linearly?
<anarsoul> also memcpy is heavily optimized, our tiling routines can't compete with it
<yuq825> possible
<yuq825> read from wc continuously can be optimized with some CPU SIMD instructions in memcpy
<anarsoul> yeah
<yuq825> I agree that pass a uncached pointer to tex2dimage is not a performance friendly usage for driver, user app should be responsible for it
<yuq825> gpu with dma engine may avoid this, but not for mali4xx
<anarsoul> yeah, I don't have any ideas how we can improve it from driver side
<anarsoul> backing out optimization looks like bad option to me
kaspter has joined #lima
<yuq825> btw. copy from uc/wc mem is always bad idea even with SIMD optimization, so mpy should use hw decoder output bufffer as texture directly whenever possible
<anarsoul> yeah
<anarsoul> well, even for sw decoder it's not a good idea to use glTexSubImage to upload textures to the GPU
<anarsoul> they should import the buffer directly
machinehum has quit [Ping timeout: 260 seconds]
machinehum has joined #lima
_whitelogger has joined #lima
Barada has joined #lima
Barada has quit [Client Quit]
Barada has joined #lima
<narmstrong> anarsoul: it’s upstream now
<anarsoul> hm
<narmstrong> warpme_: sync should be ok, it was fixed in the early days of lima with fences
<narmstrong> warpme_: and a proper usage of drm submit
<anarsoul> narmstrong: is it part of ffmpeg or mpv?
<narmstrong> anarsoul: the driver is a m2m v4l2 driver, ffmpeg master and mpv should be able to use it
<anarsoul> I see
<anarsoul> well, it's unfortunate that mpv uses suboptimal method for rendering video
<anarsoul> texsubimage2d is suboptimal even on x86 machines
<anarsoul> btw, v4l2 request support hasn't landed in ffmpeg yet, has it?
machinehum has quit [Ping timeout: 260 seconds]
monstr has joined #lima
<anarsoul> narmstrong: I checked mpv code and it should be using MapBufferRange() instead of TexSubImage() unless it hits fallback
<anarsoul> but tbh it causes higher cpu load for me with sw rendering on pinebook 1080p
<anarsoul> 100% vs 150%
<anarsoul> since MapRange() would result in extra copy :)
<rellla> anarsoul: guess, i have to implement GL_EXT_timer_query :)
machinehum has joined #lima
machinehum has quit [Ping timeout: 246 seconds]
<narmstrong> anarsoul: v4l2 request hasn't, but the aml decoder is stateful (not easier...)
psydread has left #lima [#lima]
psydread has joined #lima
psydread has left #lima [#lima]
psydread has joined #lima
Barada has quit [Quit: Barada]
Barada has joined #lima
Ntemis has joined #lima
Ntemis has quit [Remote host closed the connection]
Barada has quit [Quit: Barada]
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #lima
machinehum has joined #lima
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #lima
machinehum has quit [Ping timeout: 246 seconds]
cwabbott has quit [Client Quit]
cwabbott has joined #lima
cwabbott has quit [Client Quit]
cwabbott has joined #lima
cwabbott has quit [Client Quit]
cwabbott has joined #lima
cwabbott has quit [Remote host closed the connection]
cwabbott has joined #lima
dddddd has joined #lima
yuq825 has quit [Quit: Leaving.]
machinehum has joined #lima
machinehum has quit [Ping timeout: 260 seconds]
monstr has quit [Remote host closed the connection]
<anarsoul> rellla: good luck :)
machinehum has joined #lima
machinehum has quit [Ping timeout: 246 seconds]
machinehum has joined #lima
cwabbott has quit [Ping timeout: 246 seconds]
cwabbott has joined #lima
megi has quit [Quit: WeeChat 2.7.1]
megi has joined #lima
machinehum has quit [Ping timeout: 260 seconds]
machinehum has joined #lima
deesix_ has joined #lima
dddddd_ has joined #lima
deesix has quit [Ping timeout: 240 seconds]
deesix_ is now known as deesix
dddddd has quit [Ping timeout: 240 seconds]
dddddd_ is now known as dddddd
tautologico has joined #lima
tautologico has left #lima [#lima]
psydread has left #lima [#lima]
psydread has joined #lima
machinehum has quit [Ping timeout: 265 seconds]
machinehum has joined #lima
machinehum has quit [Ping timeout: 260 seconds]