<urjaman>
there's some sort of a cell update glitch that happens with libreoffice calc (has for a long time), but i finally took video to show it: https://youtu.be/BPg9JKdbmmI
<urjaman>
that video is ridiculously low quality but i guess you can make out the point ...
<urjaman>
which is that some cells dont get their contents drawn initially, but as you cursor around the grid and approach them (redrawing the area where you're cursoring) their contents appear (sometimes only partially if you're to the side of them)
<urjaman>
i just have no clue what would be causing this or how to debug lol (something something glamor? libreoffice calc is not a GL program (afaik :P) lol)
_whitelogger has joined #panfrost
raster has joined #panfrost
yann has joined #panfrost
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #panfrost
cowsay has quit [Read error: Connection reset by peer]
cowsay has joined #panfrost
nlhowell has joined #panfrost
<icecream95>
alyssa: The AFBC flush issue sounds very similar to the DarkPlaces performance regression I reported three weeks ago...
<icecream95>
Testing all patches with Quake E1M1 in both Darkplaces and QuakeSpasm would probably reduce the number of regressions by >50%
<icecream95>
alyssa: "reading from uninitialized AFBC is invalid". Filling with copy-blocks can be done by setting the second header word for each block to 1
stikonas has joined #panfrost
<icecream95>
To store data (with no compression) instead of repeating a single 4x4 sub-block, use 010101010101, 02020202020 and 0404040404 octal in words 2-4
<macc24>
icecream95: when did performance hit happen with darkplaces?
<alyssa>
macc24: from AFBC iirc
rcf has quit [Ping timeout: 258 seconds]
<macc24>
alyssa: and when is AFBC a thing in panfrost?
<alyssa>
already is :)
<macc24>
from which mesa version?
<macc24>
20.1.6?
kinkinkijkin has quit [Ping timeout: 244 seconds]
Ntemis has joined #panfrost
yann has joined #panfrost
<alyssa>
20.2
<macc24>
what speedup does afbc provide?
rak-zero has quit [Ping timeout: 240 seconds]
<alyssa>
depends on workload, I guess
<warpme_>
macc24: numbers I was was likely 40..50% less memory BW
<macc24>
:o
<warpme_>
was was->saw
<HdkR>
Which only really matters if you're BW bounded
<warpme_>
HdkR: on "small" SOCs it is easy to meet constrains for HD content. for 4k - almost any SoC will have issues (i think)
<macc24>
in which scenario does memory bandwidth matter?
<HdkR>
Almost all in ARM SoCs
<HdkR>
:P
<Lyude>
memory == power consumption
<macc24>
vnc?
<Lyude>
less memory bandwidth needed is usually better :)
<alyssa>
power consumption always, and if BW bounded (which can often happen at high res)
<HdkR>
Porting games to the SHIELD with 25.6GB/s memory bandwidth was a nightmare because of memory BW limitations D:
<robmur01>
display scanout on its own can consume major amounts of bandwidth with modern-day resolutions
<macc24>
1280x800@60hz is not that bad
<robmur01>
on my RK3328, running a memory-heavy workload will glitch out a 1080p display no problem ;)
<macc24>
oh gof
<robmur01>
(since by default the interconnect QoS doesn't prioritize the VOP)
<urjaman>
yeah the C201 feels significantly slushier with a second 1080p display attached
<urjaman>
i'm guessing some of this could be optimized/prioritized (if you have a video playing, sometimes terminal response to a keypress feels like it happens in the next second...), but still...
<warpme_>
macc24: re: "in which scenario does memory bandwidth matter?" All contend which is displayed needs to be provided to display subsystem (DRM plane). Of course zero-copy on UMA or DMA assisted on NUMA systems can offload CPU - but in any case data needs to be provided to DRM plane. Having less data to provide means less power to deal with it. Of course compression of data also requires power - but it looks like
<warpme_>
require less than moving bigger data....
<urjaman>
i mean ... a second display, that happens to be 1080p (the built in is as we know a 1366x768 one)
<daniels>
robmur01: really 3328, not 3326/3288? I thought 3288 was a relatively high-end media SoC
<urjaman>
yeah i think he meant 3328 (since i remember talk of this previously)
<daniels>
wow
<robmur01>
yup, 3328 is basically meant to be a video decoder and not much else (hence lame-o Mali 450)
<daniels>
(fun galore on the OMAP1710/2420 where we could barely barely do 800x480 due to memory bandwidth constraints)
<alyssa>
urjaman: I was always a little miffed the c201 subjectively outperfomed kevin, since the screen resolution was increased more than the GPU horse power :p
<daniels>
(that was also with an external display controller so scanout didn't smash L3 to bits)
<robmur01>
down there in the $25-$50 TV box market with S905 and H6
<daniels>
well, at least they didn't butcher the cache on the Mali :P
<alyssa>
;P
<HdkR>
We just need a Mali bearing SoC with 138GB/s memory bandwidth to take on Tegra
<HdkR>
:>
<daniels>
presumably Neoverse isn't bandwidth-shy
<warpme_>
i have q regarding current implementation AFBC, mesa and video decoders and kernel DRM: may i assume that: if video decoder will be setup to output AFBC format, compressed frame can provided to mesa's by EGL_LINUX_DMA_BUF_EXT (so zero copy or dma_buf). Mesa can import this and pass to DRM fb plane (still compressed) and DRM CRTC will decompress AFBC and put decompressed frame to DRM encoder to finally display content
<warpme_>
to user?
<robmur01>
HdkR: maybe if an Arm-based laptop/desktop market emerges and someone like Samsung/MTK decides they want a piece of the pie... I can but dream :)
* HdkR
dreams of a better world
<robmur01>
daniels: TBH it's not the CPUs so much as the interconnect/memory controller setup that's constraining mobile SoCs
<robmur01>
consider Graviton1 with "just" Cortex-A72s
<daniels>
robmur01: right, I didn't mean the core, I meant whichever complete solution they were selling
<daniels>
which iirc is configured with some ridiculous cloud-friendly scale ootb
<alyssa>
warpme_: currently none of that is tested, but in theory yes, that's supposed to work, and PAN_MESA_DEBUG=afbc will flip on the mesa bits for AFBC buffer sharing
<robmur01>
Ah CMN-600, the millstone around my neck... :)
<daniels>
warpme_: in the case you're describing, as alyssa says it should work and will once it's been proven enough to remove the need for debug bits, however you don't need Mesa in that picture unless you're actually operating on frames with the GPU?
<daniels>
warpme_: if you want to display from V4L2, you can just import those dmabufs directly into KMS and it'll work
<daniels>
warpme_: if you want to process the V4L2 content with the GPU, indeed you do need to import it as an EGLImage, then after that what's rendered by the GPU will be in a different buffer, with whichever allocation you separately made for it (e.g. gbm_surface)
kinkinkijkin has joined #panfrost
<warpme_>
daniels: issue with going exclusively with KMS model of DRM_PRIME for video rendering is about post-processing of video (mainly DI). But I agree - having alternative mode where v4l2 draws directly to DRM plane is good option (with loosing post-processing capability. or more precisely - narrowing post-processing capabilities to only HW provided). This is what I'm postulating as target for mythtv.
<warpme_>
daniels: with AFBC however - KMS model is only option as post-processing on AFBC is not possible?
<daniels>
you mean doing in-place post-processing ... ?
<daniels>
else I don't really understand what you mean
<daniels>
traditionally, you would decode the video via V4L2, get one buffer for one frame, import that buffer in as an EGLImage, then use that as part of a GPU job which renders to another buffer as an output (one allocated by the GPU)
<daniels>
this can all be AFBC: AFBC out of V4L2, AFBC into EGL, AFBC out of EGL, AFBC out of KMS
<daniels>
I just wanted to draw the distinction that usually it's not the same actual buffer in memory, because the V4L2 output buffer != the GL output buffer
<warpme_>
oh of course i had in mind in-place post-processing. But... if we want to realise post-processing on AFBC frame (i.e. DI) - then content must be first decompressed. And some times ago I was told AFBC compression algo. isn't well known. So my statement about issue with post-processing on AFBC frames was with this assumption. Should I conclude mesa is capable decompress AFBC in-place and do post-processing i.e. GLSL
<warpme_>
shaders?
<urjaman>
the GPU will do it both ways for you
<urjaman>
read AFBC and write AFBC
<warpme_>
and i'm asking for compression/decompression....
<urjaman>
in hardware ... that's kinda the point
<warpme_>
urjaman: when you write "read" - do you mean read+decompress?
<urjaman>
yes (and same for write, as in write a compressed form)
<urjaman>
that's why it's called framebuffer compression...
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
<warpme_>
urjaman: ok. this then sounds very interesting. So interesting will be comparison of models with non-compressed in/out + GLSL based in-place post-processing vs. AFBC compressed in/out + GLSL based in-place post-processing... any thoughts?
<warpme_>
comparison in context of GPU processing power to deal with HD content GLSL simple operations (like DI) + AFBC compression/decompression....
<warpme_>
btw: i must say i'm really impressed by mesa 20.2.0-rc3. It is FULLY working on aw/aml/rk/brcm on mali450/t720/t820/t860/g31. No any issues noted so far... Impressive especially as we are using DRM_PRIME via EGL_LINUX_DMA_BUF_EXT - not DRM planes (and i'm considering EGL mode more demanding from GL drivers that DRM planes). FANATASTIC work!
<anarsoul|2>
why EGL is more demanding than DRM planes?
guillaume_g has quit [Quit: Konversation terminated!]
<warpme_>
anarsoul|2: well: here is how I understand things: let assume we have video player. It plays video + some OSD stuff (i.e. subtitles). With KMS mode what we need from GL stack is: preparing surface with subtitles and providing it to DRM plane frame buffer. And do this in non-real time regime. Now compare this with EGL mode: GL needs (in real time) import frames, mix video surface with OSD surface, export frame to DRM
<warpme_>
plane. So comparing both i would say: mixing surfaces + keeping real-time regime in KMS mode is by DRM subsystem while in ELG mode it is in GL subsystem. So from requirements point of view: for EGL model - GL stack is required to: operate at real-time at frame rate; do mixing surfaces with real-time at frame rate. This is not a case with KMS as this stuff is offloaded from GL to DRM subsystem (done by CRTC
<warpme_>
component).
<anarsoul|2>
importing a frame is essentially free
<anarsoul|2>
on UMA architectures
guillaume_g has joined #panfrost
camus1 has joined #panfrost
<warpme_>
indeed. More costly is scaling planes to target res. + mixing them to target frame.
<anarsoul|2>
so the difference is essentially one more sampler?
kaspter has quit [Ping timeout: 260 seconds]
camus1 is now known as kaspter
m][sko7 has joined #panfrost
<warpme_>
video engines in today's mid/high end SoC can do all this nicely (+ DI). IMHO issue is uniformization + common API I think. I'm not aware any abstractions in DRM subsystem for DI i.e. Comparing this with GL i see difference: GL and GLSL are fully uniform, standardised, portable, etc. Ehh - for me all this depends on angle: if i'll be embedded developer - then MKS looks really sexy. but different per almost each SoC
<warpme_>
family :-) . Showing this model for app (player) developer: he will say: omg. i'm expecting this should be uniformed by operating system (or runtime libs). And I agree. Now lets compare this with GL mode: player developer will say. nice. we have here well known GL + some extensions for performance (dmabuf EGL exports). For me - if asked what I'll choose - as default I'll go with GL....
<warpme_>
video engines -> i was mean display engines....
Ntemis has quit [Read error: Connection reset by peer]
<HdkR>
This is why it needs bisected :)
<warpme_>
m][sko7: mesa devs decided to be more precise about caps. As bifrost is deep wip - after https://gitlab.freedesktop.org/mesa/mesa/-/commit/96fa8d70bc13f8b21e4a8bfb91128bd85055990c some apps stopped working (were working in past as overreported caps incidentally meet real working ones; now not works). In such case you may need to hacking caps. reporting. In my case (mythtv) this was necessary to get back app working
<warpme_>
(and working again really well on g31). Doing this is pure HACK (so pls not bother mesa devs about g31 not-working as wip nature of bifrost support is overstretched but such HACKs)
<warpme_>
but such HACKs - by such HACKs
tomboy64 has quit [Remote host closed the connection]
tomboy64 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
guillaume_g has quit [Quit: Konversation terminated!]
m][sko7 has quit [Quit: Connection closed]
buzzmarshall has joined #panfrost
tomboy64 has quit [Remote host closed the connection]