#panfrost on 2020-09-02 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:04 tgall_fo_ has joined #panfrost

00:05 tgall_foo has quit [Ping timeout: 265 seconds]

00:27 Ntemis has quit [Read error: Connection reset by peer]

01:03 enunes has quit [Quit: ZNC - https://znc.in]

01:03 vstehle has quit [Ping timeout: 258 seconds]

01:04 enunes has joined #panfrost

03:08 buzzmarshall has quit [Remote host closed the connection]

03:39 davidlt has joined #panfrost

05:00 vstehle has joined #panfrost

05:43 icecream95 has joined #panfrost

05:49 mupuf has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

06:01 ezequielg has quit [Ping timeout: 260 seconds]

06:02 ezequielg has joined #panfrost

06:18 nlhowell has quit [Ping timeout: 240 seconds]

06:24 guillaume_g has joined #panfrost

06:41 <urjaman> there's some sort of a cell update glitch that happens with libreoffice calc (has for a long time), but i finally took video to show it: https://youtu.be/BPg9JKdbmmI

06:43 <urjaman> that video is ridiculously low quality but i guess you can make out the point ...

06:44 <urjaman> which is that some cells dont get their contents drawn initially, but as you cursor around the grid and approach them (redrawing the area where you're cursoring) their contents appear (sometimes only partially if you're to the side of them)

06:45 <urjaman> i just have no clue what would be causing this or how to debug lol (something something glamor? libreoffice calc is not a GL program (afaik :P) lol)

06:56 _whitelogger has joined #panfrost

07:20 raster has joined #panfrost

07:21 yann has joined #panfrost

07:27 kaspter has quit [Ping timeout: 240 seconds]

07:28 kaspter has joined #panfrost

07:40 cowsay has quit [Read error: Connection reset by peer]

07:41 cowsay has joined #panfrost

08:00 nlhowell has joined #panfrost

08:32 <icecream95> alyssa: The AFBC flush issue sounds very similar to the DarkPlaces performance regression I reported three weeks ago...

08:32 <icecream95> Testing all patches with Quake E1M1 in both Darkplaces and QuakeSpasm would probably reduce the number of regressions by >50%

08:40 <icecream95> alyssa: "reading from uninitialized AFBC is invalid". Filling with copy-blocks can be done by setting the second header word for each block to 1

08:46 stikonas has joined #panfrost

08:49 <icecream95> To store data (with no compression) instead of repeating a single 4x4 sub-block, use 010101010101, 02020202020 and 0404040404 octal in words 2-4

09:15 MastaG has quit [Quit: The Lounge - https://thelounge.chat]

09:22 karolherbst has quit [Quit: duh 🐧]

09:25 karolherbst has joined #panfrost

09:43 nlhowell has quit [Ping timeout: 246 seconds]

10:06 nlhowell has joined #panfrost

10:20 MastaG has joined #panfrost

10:29 icecream95 has quit [Ping timeout: 256 seconds]

10:44 nlhowell has quit [Ping timeout: 264 seconds]

10:46 nlhowell has joined #panfrost

11:04 <alyssa> icecream95: thanks :+1:

12:21 gcl has quit [Ping timeout: 240 seconds]

12:28 gcl has joined #panfrost

12:29 MastaG has quit [Quit: The Lounge - https://thelounge.chat]

12:33 kaspter has quit [Ping timeout: 246 seconds]

12:33 gcl has quit [Ping timeout: 240 seconds]

12:33 kaspter has joined #panfrost

12:33 gcl has joined #panfrost

12:35 MastaG has joined #panfrost

13:09 tgall_fo_ is now known as tgall_foo

13:42 MastaG has quit [Quit: The Lounge - https://thelounge.chat]

14:10 nlhowell has quit [Ping timeout: 260 seconds]

14:15 MastaG has joined #panfrost

14:34 nlhowell has joined #panfrost

14:51 yann has quit [Read error: No route to host]

15:04 <macc24> icecream95: when did performance hit happen with darkplaces?

15:15 <alyssa> macc24: from AFBC iirc

15:28 rcf has quit [Ping timeout: 258 seconds]

15:29 <macc24> alyssa: and when is AFBC a thing in panfrost?

15:30 <alyssa> already is :)

15:30 <macc24> from which mesa version?

15:31 <macc24> 20.1.6?

15:44 kinkinkijkin has quit [Ping timeout: 244 seconds]

15:48 Ntemis has joined #panfrost

15:48 yann has joined #panfrost

15:58 <alyssa> 20.2

15:59 <macc24> what speedup does afbc provide?

16:00 rak-zero has quit [Ping timeout: 240 seconds]

16:00 <alyssa> depends on workload, I guess

16:01 <warpme_> macc24: numbers I was was likely 40..50% less memory BW

16:01 <macc24> :o

16:01 <warpme_> was was->saw

16:01 <HdkR> Which only really matters if you're BW bounded

16:03 <warpme_> HdkR: on "small" SOCs it is easy to meet constrains for HD content. for 4k - almost any SoC will have issues (i think)

16:03 <macc24> in which scenario does memory bandwidth matter?

16:03 <HdkR> Almost all in ARM SoCs

16:03 <HdkR> :P

16:04 <Lyude> memory == power consumption

16:04 <macc24> vnc?

16:04 <Lyude> less memory bandwidth needed is usually better :)

16:06 <alyssa> power consumption always, and if BW bounded (which can often happen at high res)

16:06 <HdkR> Porting games to the SHIELD with 25.6GB/s memory bandwidth was a nightmare because of memory BW limitations D:

16:06 <robmur01> display scanout on its own can consume major amounts of bandwidth with modern-day resolutions

16:06 <macc24> 1280x800@60hz is not that bad

16:07 <robmur01> on my RK3328, running a memory-heavy workload will glitch out a 1080p display no problem ;)

16:07 <macc24> oh gof

16:08 <robmur01> (since by default the interconnect QoS doesn't prioritize the VOP)

16:08 <urjaman> yeah the C201 feels significantly slushier with a second 1080p display attached

16:09 <urjaman> i'm guessing some of this could be optimized/prioritized (if you have a video playing, sometimes terminal response to a keypress feels like it happens in the next second...), but still...

16:10 <warpme_> macc24: re: "in which scenario does memory bandwidth matter?" All contend which is displayed needs to be provided to display subsystem (DRM plane). Of course zero-copy on UMA or DMA assisted on NUMA systems can offload CPU - but in any case data needs to be provided to DRM plane. Having less data to provide means less power to deal with it. Of course compression of data also requires power - but it looks like

16:10 <warpme_> require less than moving bigger data....

16:10 <urjaman> i mean ... a second display, that happens to be 1080p (the built in is as we know a 1366x768 one)

16:12 <daniels> robmur01: really 3328, not 3326/3288? I thought 3288 was a relatively high-end media SoC

16:13 <urjaman> yeah i think he meant 3328 (since i remember talk of this previously)

16:13 <daniels> wow

16:13 <robmur01> yup, 3328 is basically meant to be a video decoder and not much else (hence lame-o Mali 450)

16:14 <daniels> (fun galore on the OMAP1710/2420 where we could barely barely do 800x480 due to memory bandwidth constraints)

16:14 <alyssa> urjaman: I was always a little miffed the c201 subjectively outperfomed kevin, since the screen resolution was increased more than the GPU horse power :p

16:14 <daniels> (that was also with an external display controller so scanout didn't smash L3 to bits)

16:14 <robmur01> down there in the $25-$50 TV box market with S905 and H6

16:15 <daniels> well, at least they didn't butcher the cache on the Mali :P

16:15 <alyssa> ;P

16:15 <HdkR> We just need a Mali bearing SoC with 138GB/s memory bandwidth to take on Tegra

16:16 <HdkR> :>

16:20 <daniels> presumably Neoverse isn't bandwidth-shy

16:21 <warpme_> i have q regarding current implementation AFBC, mesa and video decoders and kernel DRM: may i assume that: if video decoder will be setup to output AFBC format, compressed frame can provided to mesa's by EGL_LINUX_DMA_BUF_EXT (so zero copy or dma_buf). Mesa can import this and pass to DRM fb plane (still compressed) and DRM CRTC will decompress AFBC and put decompressed frame to DRM encoder to finally display content

16:21 <warpme_> to user?

16:21 <robmur01> HdkR: maybe if an Arm-based laptop/desktop market emerges and someone like Samsung/MTK decides they want a piece of the pie... I can but dream :)

16:21 * HdkR dreams of a better world

16:22 <robmur01> daniels: TBH it's not the CPUs so much as the interconnect/memory controller setup that's constraining mobile SoCs

16:22 <robmur01> consider Graviton1 with "just" Cortex-A72s

16:23 <daniels> robmur01: right, I didn't mean the core, I meant whichever complete solution they were selling

16:23 <daniels> which iirc is configured with some ridiculous cloud-friendly scale ootb

16:23 <alyssa> warpme_: currently none of that is tested, but in theory yes, that's supposed to work, and PAN_MESA_DEBUG=afbc will flip on the mesa bits for AFBC buffer sharing

16:24 <robmur01> Ah CMN-600, the millstone around my neck... :)

16:25 <daniels> warpme_: in the case you're describing, as alyssa says it should work and will once it's been proven enough to remove the need for debug bits, however you don't need Mesa in that picture unless you're actually operating on frames with the GPU?

16:25 <daniels> warpme_: if you want to display from V4L2, you can just import those dmabufs directly into KMS and it'll work

16:26 <daniels> warpme_: if you want to process the V4L2 content with the GPU, indeed you do need to import it as an EGLImage, then after that what's rendered by the GPU will be in a different buffer, with whichever allocation you separately made for it (e.g. gbm_surface)

16:33 kinkinkijkin has joined #panfrost

16:38 <warpme_> daniels: issue with going exclusively with KMS model of DRM_PRIME for video rendering is about post-processing of video (mainly DI). But I agree - having alternative mode where v4l2 draws directly to DRM plane is good option (with loosing post-processing capability. or more precisely - narrowing post-processing capabilities to only HW provided). This is what I'm postulating as target for mythtv.

16:40 <warpme_> daniels: with AFBC however - KMS model is only option as post-processing on AFBC is not possible?

16:43 <daniels> you mean doing in-place post-processing ... ?

16:43 <daniels> else I don't really understand what you mean

16:44 <daniels> traditionally, you would decode the video via V4L2, get one buffer for one frame, import that buffer in as an EGLImage, then use that as part of a GPU job which renders to another buffer as an output (one allocated by the GPU)

16:44 <daniels> this can all be AFBC: AFBC out of V4L2, AFBC into EGL, AFBC out of EGL, AFBC out of KMS

16:44 <daniels> I just wanted to draw the distinction that usually it's not the same actual buffer in memory, because the V4L2 output buffer != the GL output buffer

17:00 <warpme_> oh of course i had in mind in-place post-processing. But... if we want to realise post-processing on AFBC frame (i.e. DI) - then content must be first decompressed. And some times ago I was told AFBC compression algo. isn't well known. So my statement about issue with post-processing on AFBC frames was with this assumption. Should I conclude mesa is capable decompress AFBC in-place and do post-processing i.e. GLSL

17:00 <warpme_> shaders?

17:01 <urjaman> the GPU will do it both ways for you

17:01 <urjaman> read AFBC and write AFBC

17:02 <warpme_> and i'm asking for compression/decompression....

17:02 <urjaman> in hardware ... that's kinda the point

17:03 <warpme_> urjaman: when you write "read" - do you mean read+decompress?

17:04 <urjaman> yes (and same for write, as in write a compressed form)

17:05 <urjaman> that's why it's called framebuffer compression...

17:06 raster has quit [Quit: Gettin' stinky!]

17:07 raster has joined #panfrost

17:07 <warpme_> urjaman: ok. this then sounds very interesting. So interesting will be comparison of models with non-compressed in/out + GLSL based in-place post-processing vs. AFBC compressed in/out + GLSL based in-place post-processing... any thoughts?

17:09 <warpme_> comparison in context of GPU processing power to deal with HD content GLSL simple operations (like DI) + AFBC compression/decompression....

17:13 <warpme_> btw: i must say i'm really impressed by mesa 20.2.0-rc3. It is FULLY working on aw/aml/rk/brcm on mali450/t720/t820/t860/g31. No any issues noted so far... Impressive especially as we are using DRM_PRIME via EGL_LINUX_DMA_BUF_EXT - not DRM planes (and i'm considering EGL mode more demanding from GL drivers that DRM planes). FANATASTIC work!

17:47 <anarsoul|2> why EGL is more demanding than DRM planes?

18:27 guillaume_g has quit [Quit: Konversation terminated!]

18:29 <warpme_> anarsoul|2: well: here is how I understand things: let assume we have video player. It plays video + some OSD stuff (i.e. subtitles). With KMS mode what we need from GL stack is: preparing surface with subtitles and providing it to DRM plane frame buffer. And do this in non-real time regime. Now compare this with EGL mode: GL needs (in real time) import frames, mix video surface with OSD surface, export frame to DRM

18:29 <warpme_> plane. So comparing both i would say: mixing surfaces + keeping real-time regime in KMS mode is by DRM subsystem while in ELG mode it is in GL subsystem. So from requirements point of view: for EGL model - GL stack is required to: operate at real-time at frame rate; do mixing surfaces with real-time at frame rate. This is not a case with KMS as this stuff is offloaded from GL to DRM subsystem (done by CRTC

18:29 <warpme_> component).

18:30 <anarsoul|2> importing a frame is essentially free

18:30 <anarsoul|2> on UMA architectures

18:31 guillaume_g has joined #panfrost

18:32 camus1 has joined #panfrost

18:32 <warpme_> indeed. More costly is scaling planes to target res. + mixing them to target frame.

18:33 <anarsoul|2> so the difference is essentially one more sampler?

18:33 kaspter has quit [Ping timeout: 260 seconds]

18:33 camus1 is now known as kaspter

18:41 m][sko7 has joined #panfrost

18:46 <warpme_> video engines in today's mid/high end SoC can do all this nicely (+ DI). IMHO issue is uniformization + common API I think. I'm not aware any abstractions in DRM subsystem for DI i.e. Comparing this with GL i see difference: GL and GLSL are fully uniform, standardised, portable, etc. Ehh - for me all this depends on angle: if i'll be embedded developer - then MKS looks really sexy. but different per almost each SoC

18:46 <warpme_> family :-) . Showing this model for app (player) developer: he will say: omg. i'm expecting this should be uniformed by operating system (or runtime libs). And I agree. Now lets compare this with GL mode: player developer will say. nice. we have here well known GL + some extensions for performance (dmabuf EGL exports). For me - if asked what I'll choose - as default I'll go with GL....

18:50 <warpme_> video engines -> i was mean display engines....

19:04 davidlt has quit [Ping timeout: 240 seconds]

19:05 <m][sko7> I am testing panfrost(mesa master) on Odroid C4 armbian with 5.8 kernel with with patchset https://github.com/bbrezillon/linux/tree/panfrost/vim3 and gnome3(ubuntu 20.04) and I have this problem https://www.youtube.com/watch?v=TovfrGw4KnA

19:18 <anarsoul|2> m][sko7: looks like old mesa to me

19:19 <anarsoul|2> are you sure it's mesa from git master and not something that ubuntu ships by default?

19:21 <m][sko7> I use this https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers and

19:21 <m][sko7> GL_VERSION: OpenGL ES 2.0 Mesa 20.3.0-devel (git-4500e6e 2020-09-02 focal-oibaf-ppa)

19:21 <m][sko7> so it is today build

19:22 <m][sko7> https://pastebin.com/X5n8CFyR

19:52 <Ntemis> lol

19:53 <anarsoul|2> oh, it's bifrost

19:53 <Ntemis> @anarsoul|2 was right

19:54 <Ntemis> yeah G series = bifrost

19:54 <Ntemis> even with master support is not ready

19:55 <m][sko7> anarsoul|2 I remember when everything works just fine on odroid C4 (g31)

19:55 <anarsoul|2> bisect?

19:55 <HdkR> Bifrost isn't actively testing, so breakages occuring is natural in early driver work :)

19:55 <Ntemis> ^

19:56 <m][sko7> I hope that you are testing it from time to time :)

19:56 <Ntemis> not this time :D

19:56 <m][sko7> glmark-es2 is fine for example.

19:57 <m][sko7> Ok so I will wait :)

19:57 <HdkR> Needs some CI love, otherwise no one can know when things break aside from user testing

19:57 <anarsoul|2> it looks like reload (aka wallpapering in panfrost terms) is broken

19:57 <anarsoul|2> glmark-es2 doesn't need reload, it clears buffer on each frame

19:59 <m][sko7> this is my screenshot from time when driver "works just fine" https://gitlab.freedesktop.org/mesa/mesa/-/issues/3308 :)

20:01 Ntemis has quit [Read error: Connection reset by peer]

20:01 <HdkR> This is why it needs bisected :)

20:20 <warpme_> m][sko7: mesa devs decided to be more precise about caps. As bifrost is deep wip - after https://gitlab.freedesktop.org/mesa/mesa/-/commit/96fa8d70bc13f8b21e4a8bfb91128bd85055990c some apps stopped working (were working in past as overreported caps incidentally meet real working ones; now not works). In such case you may need to hacking caps. reporting. In my case (mythtv) this was necessary to get back app working

20:20 <warpme_> (and working again really well on g31). Doing this is pure HACK (so pls not bother mesa devs about g31 not-working as wip nature of bifrost support is overstretched but such HACKs)

20:21 <warpme_> but such HACKs - by such HACKs

20:38 tomboy64 has quit [Remote host closed the connection]

20:39 tomboy64 has joined #panfrost

20:42 raster has quit [Quit: Gettin' stinky!]

20:58 guillaume_g has quit [Quit: Konversation terminated!]

20:59 m][sko7 has quit [Quit: Connection closed]

21:00 buzzmarshall has joined #panfrost

21:16 tomboy64 has quit [Remote host closed the connection]

21:17 tomboy64 has joined #panfrost

21:59 klaxa has quit [*.net *.split]

21:59 doublej472 has quit [*.net *.split]

21:59 hl has quit [*.net *.split]

22:00 klaxa has joined #panfrost

22:00 doublej472 has joined #panfrost

22:01 hl has joined #panfrost

22:22 yann has quit [Remote host closed the connection]

22:41 Depau has quit [Quit: ZNC 1.8.1 - https://znc.in]

22:44 Depau has joined #panfrost

23:16 rcf has joined #panfrost

23:16 <HdkR> let's see if I can get this silly Valhall device capturing

23:25 <HdkR> ...and host USB controller died

23:26 * HdkR restarts

23:35 <HdkR> https://pastebin.com/MZ9fvgsc Give me your secrets valhal!

23:39 * HdkR keeps missing that second l