<anarsoul>
but it resulted in significant slowdown in glamor on slower CPUs
<anarsoul>
I'm inclined to leave this MR in since it addresses generic performance issue
<anarsoul>
while mpv is corner case that can be fixed in the app
<anarsoul>
I'm not sure why they are not importing hw decoder buffer as a texture but it would be right approach
machinehum has joined #lima
<yuq825>
so mpv mmap the hw decode buffer and pass it directly to texsubimage?
<anarsoul>
I think so
<anarsoul>
well, it definitely uploads texture data with texsubimage2d
<anarsoul>
and support for v4l2 m2m decoder is out of tree patches if I understand correctly
<anarsoul>
narmstrong: ^^
<yuq825>
why 4281 expose this? I suppose intermediate buffer is for dst buffer which is used to be written which is fine for wc buffer
<yuq825>
does the panfrost_store_tiled_image need to read out the dst buffer?
<anarsoul>
yuq825: because prior to 4281 we dealt with tiled buffers differently
<anarsoul>
basically u_default_texture_subdata() called transfer_map() for dst buffer which creates staging buffer (which is cached) and returns a pointer, then it copies src into this staging buffer, then calls transfer_unmap() which copies data from linear staging buffer into tiled dst
<anarsoul>
4821 eliminated that by directly using src
<anarsoul>
as result it gives nice improvement in some x11perf benchmarks
<yuq825>
either way need to read from uncached src, I suppose this step is the most expensive one, would it make any difference for uc/wc -> cached with uc/wc -> wc?
<anarsoul>
yuq825: likely because we don't read linearly?
<anarsoul>
also memcpy is heavily optimized, our tiling routines can't compete with it
<yuq825>
possible
<yuq825>
read from wc continuously can be optimized with some CPU SIMD instructions in memcpy
<anarsoul>
yeah
<yuq825>
I agree that pass a uncached pointer to tex2dimage is not a performance friendly usage for driver, user app should be responsible for it
<yuq825>
gpu with dma engine may avoid this, but not for mali4xx
<anarsoul>
yeah, I don't have any ideas how we can improve it from driver side
<anarsoul>
backing out optimization looks like bad option to me
kaspter has joined #lima
<yuq825>
btw. copy from uc/wc mem is always bad idea even with SIMD optimization, so mpy should use hw decoder output bufffer as texture directly whenever possible
<anarsoul>
yeah
<anarsoul>
well, even for sw decoder it's not a good idea to use glTexSubImage to upload textures to the GPU
<anarsoul>
they should import the buffer directly
machinehum has quit [Ping timeout: 260 seconds]
machinehum has joined #lima
_whitelogger has joined #lima
Barada has joined #lima
Barada has quit [Client Quit]
Barada has joined #lima
<narmstrong>
anarsoul: it’s upstream now
<anarsoul>
hm
<narmstrong>
warpme_: sync should be ok, it was fixed in the early days of lima with fences
<narmstrong>
warpme_: and a proper usage of drm submit
<anarsoul>
narmstrong: is it part of ffmpeg or mpv?
<narmstrong>
anarsoul: the driver is a m2m v4l2 driver, ffmpeg master and mpv should be able to use it
<anarsoul>
I see
<anarsoul>
well, it's unfortunate that mpv uses suboptimal method for rendering video
<anarsoul>
texsubimage2d is suboptimal even on x86 machines
<anarsoul>
btw, v4l2 request support hasn't landed in ffmpeg yet, has it?
machinehum has quit [Ping timeout: 260 seconds]
monstr has joined #lima
<anarsoul>
narmstrong: I checked mpv code and it should be using MapBufferRange() instead of TexSubImage() unless it hits fallback
<anarsoul>
but tbh it causes higher cpu load for me with sw rendering on pinebook 1080p
<anarsoul>
100% vs 150%
<anarsoul>
since MapRange() would result in extra copy :)
<rellla>
anarsoul: guess, i have to implement GL_EXT_timer_query :)
machinehum has joined #lima
machinehum has quit [Ping timeout: 246 seconds]
<narmstrong>
anarsoul: v4l2 request hasn't, but the aml decoder is stateful (not easier...)
psydread has left #lima [#lima]
psydread has joined #lima
psydread has left #lima [#lima]
psydread has joined #lima
Barada has quit [Quit: Barada]
Barada has joined #lima
Ntemis has joined #lima
Ntemis has quit [Remote host closed the connection]
Barada has quit [Quit: Barada]
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #lima
machinehum has joined #lima
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #lima
machinehum has quit [Ping timeout: 246 seconds]
cwabbott has quit [Client Quit]
cwabbott has joined #lima
cwabbott has quit [Client Quit]
cwabbott has joined #lima
cwabbott has quit [Client Quit]
cwabbott has joined #lima
cwabbott has quit [Remote host closed the connection]
cwabbott has joined #lima
dddddd has joined #lima
yuq825 has quit [Quit: Leaving.]
machinehum has joined #lima
machinehum has quit [Ping timeout: 260 seconds]
monstr has quit [Remote host closed the connection]