#panfrost on 2021-01-20 — irc logs at freenode.irclog.whitequark.org

2019-09-06 11:20 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:15 cphealy has joined #panfrost

00:39 enunes has joined #panfrost

00:43 yann has quit [Ping timeout: 260 seconds]

00:44 yann has joined #panfrost

00:58 raster has quit [Quit: Gettin' stinky!]

01:02 enunes has quit [Ping timeout: 256 seconds]

01:02 enunes has joined #panfrost

01:43 stikonas has quit [Remote host closed the connection]

02:00 jgmdev has quit [Ping timeout: 272 seconds]

02:01 atler has quit [Killed (beckett.freenode.net (Nickname regained by services))]

02:01 atler has joined #panfrost

02:03 vstehle has quit [Ping timeout: 246 seconds]

02:08 kaspter has joined #panfrost

02:08 kaspter has quit [Excess Flood]

02:08 kaspter has joined #panfrost

02:20 archetech has joined #panfrost

03:29 camus1 has joined #panfrost

03:29 kaspter has quit [Remote host closed the connection]

03:29 camus1 is now known as kaspter

04:09 davidlt has joined #panfrost

04:27 ente has quit [Ping timeout: 272 seconds]

04:53 chewitt has quit [Quit: Zzz..]

05:07 ente has joined #panfrost

05:41 warpme_ has quit [Quit: Connection closed for inactivity]

06:02 _whitelogger has joined #panfrost

06:32 camus1 has joined #panfrost

06:32 kaspter has quit [Ping timeout: 264 seconds]

06:32 camus1 is now known as kaspter

07:25 guillaume_g has joined #panfrost

07:33 mixfix41 is now known as h0tp0ck3t

07:35 h0tp0ck3t has left #panfrost [#panfrost]

07:41 xdarklight has quit [Ping timeout: 272 seconds]

07:44 xdarklight has joined #panfrost

08:10 camus1 has joined #panfrost

08:12 kaspter has quit [Ping timeout: 256 seconds]

08:12 camus1 is now known as kaspter

08:20 orkid has quit [Quit: leaving]

08:22 orkid has joined #panfrost

08:27 tgall_foo has quit [Read error: Connection reset by peer]

08:30 tgall_foo has joined #panfrost

08:31 archetech has quit [Quit: Konversation terminated!]

08:37 m][sko has joined #panfrost

08:47 ente has quit [Ping timeout: 246 seconds]

08:59 m][sko has quit [Quit: Connection closed]

09:04 ente has joined #panfrost

09:18 pmjdebruijn has quit [Ping timeout: 240 seconds]

09:56 stikonas has joined #panfrost

10:14 raster has joined #panfrost

11:11 kaspter has quit [Ping timeout: 264 seconds]

11:21 kaspter has joined #panfrost

11:23 leah is now known as _4of7

11:57 warpme_ has joined #panfrost

12:02 kaspter has quit [Ping timeout: 260 seconds]

12:04 kaspter has joined #panfrost

12:07 kaspter has quit [Remote host closed the connection]

12:07 kaspter has joined #panfrost

12:56 alpernebbi has joined #panfrost

13:15 alyssa has joined #panfrost

13:16 <alyssa> icecream95: crc-hud is super neat!

13:16 * alyssa had to go 'test' it with glmark, and uh supertuxkart

13:17 <alyssa> Unsurprisingly, CRC does very well on 2D UI, and very poorly on 3D content.

13:19 <alyssa> I don't know if drirc would make sense here.

13:23 <alyssa> (Best would be calibrating based on perf counters at runtime but unfortunately mali doesn't expose them at enough granularity for that =p)

13:31 <alyssa> --Although actually supertuxkart fps seems to be helped?

13:32 <alyssa> Yeah, it looks like CRC either helps or makes no difference

13:32 <alyssa> So keeping it on unconditional is probably fine

13:39 <macc24> alyssa: bifrost optimizations?

13:39 <macc24> :o

13:40 <alyssa> macc24: and midgard

13:40 <alyssa> Courtesy of icecream95 , I'm just 'testing'

13:41 <alyssa> For anyone too lazy to do the algebra themselves:

13:42 <alyssa> The expected cost of rendering with CRC is [m T_miss + (1 - m) T_hit] for the miss rate m

13:42 <alyssa> T_miss = T_r + T_w + F_w

13:42 <alyssa> For CRC read cost T_r, CRC write cost T_w, and framebuffer write cost F_w

13:42 <alyssa> T_hit = r

13:43 <alyssa> On the other hand, the expected cost of rendering without CRC is simply F_w

13:43 <alyssa> So CRC is a win iff [m T_miss + (1 - m) T_hit] < F_w

13:44 <alyssa> Noting that T_r = T_w = 8 bytes/tile since CRCs are 64-bit, and that F_w = (width)(height)(bpp) for the tile width/height and bpp bytes per pixel, CRC is a win iff:

13:44 <macc24> alyssa: finally gpu in my laptop will be faster than 2010 integrated graphics

13:44 <alyssa> m < ((width)(height)(bpp) - 8)/((width)(height)(bpp) + 8)

13:45 <alyssa> Said differently, letting h=1-m be the hit rate, CRC is a win iff:

13:46 <alyssa> h > 16 / ((width)(height)(bpp) + 8)

13:46 <alyssa> This is instructive: it says CRC is more effective as the tile size increases, and as the bytes per pixel increases.

13:47 alyssa has quit [Remote host closed the connection]

13:48 alyssa has joined #panfrost

13:48 <alyssa> For the average case, where tile size is 16x16 and there are 4 bytes per pixel, the hit rate needs to be 1.6%

13:49 <alyssa> This agrees with Arm's marketing materials which mention ~2% if I recall, I just wanted to do the calculation myself :p

13:51 <alyssa> For the two limiting cases:

13:52 <alyssa> * Tile size 16x16, with 256-bits per pixel -- hit rate needs to be only 0.2% for it to be a win

13:53 <alyssa> * Tile size 4x4, with 8-bits per pixel -- hit rate needs to be 66%!

13:53 <alyssa> Actually, the latter case is totally fantasy -- small tile sizes should only be used at high bpps

13:53 <alyssa> So for Midgard, where 128-bits is the cap, we have...

13:54 <alyssa> 0.4% for any power-of-two bpp greater than 128-bits

13:55 <alyssa> at worst 0.8% for npot bpp

13:56 <alyssa> So really, the limiting case is 6% for 8bpp, but already down to less than 2% for the expected bpp4 case.

13:56 <alyssa> So this justifies turning it on even for 3D: you can fill that 2% just with the onscreen HUD, or the sky.

13:57 <alyssa> TL;DR transaction elimination is totally OP

13:58 <macc24> alyssa: will it help with performance on kevin

13:58 <macc24> ?

13:58 <alyssa> Hope so

13:58 <alyssa> Synthetic benchmarks are seeing slight win.

13:59 <alyssa> Most significant being glmark -bdesktop, which IIRC is bandwidth limited, which I see up by 4%

14:00 <macc24> does it help when some parts of window stay the same?

14:01 alyssa has quit [Remote host closed the connection]

14:20 kaspter has quit [Quit: kaspter]

14:24 _4of7 is now known as leah

14:49 nlhowell has joined #panfrost

15:16 amonakov has joined #panfrost

15:23 raster has quit [Quit: Gettin' stinky!]

15:30 alpernebbi has quit [Remote host closed the connection]

15:39 raster has joined #panfrost

15:42 <amonakov> Hi folks. Today for work I was looking into how f16vec2 fma is compiled for Bifrost, and using malisc with panfrost disassembler I discovered Mali compiler has a pretty bad performance bug:

15:42 <amonakov> despite presence of fma.v2f16, they unpack f16vec2 to f32 registers, perform fma.f32 twice, and repack back

15:42 <amonakov> This would have been impossible to notice and track down without panfrost disassembler, kudos to you!

15:46 <daniels> amonakov: glad to hear it! :) what's stopping you from using Panfrost directly, ooi?

15:48 <amonakov> well, it's for work, targeting Android with Mali drivers?..

15:49 <daniels> there are a couple of people looking into AOSP enablement, but it should largely work with gbm_gralloc as I understand it

15:50 <amonakov> non-rooted Android, and I believe Mesa does not speak the /dev/mali0 language :)

16:00 alyssa has joined #panfrost

16:04 <alyssa> amonakov: FMA.v2f16 should be used by the DDK under 'good' circumstances..

16:05 <amonakov> what is the DDK in this context?

16:06 <HdkR> Driver blob

16:08 <amonakov> yeah, they can use fma.v2f16 for 'x*y+z', but not when passed spir-v has the explicit fma insn

16:26 <HdkR> https://www.anandtech.com/show/16436/mediatek-announces-dimensity-1100-1200-socs-a78-on-6nm Huh, Valhal 1st gen rather than 2nd gen. Weird

16:27 <HdkR> Only minor changes to the SoC. Guess they didn't want to disturb the design too much :D

16:36 <robmur01> those specs certainly sound like a "tock" to me

16:36 <macc24> HdkR: i hope chips like these find their way to chromebooks

16:38 <HdkR> Me too

16:39 * robmur01 hopes they find their way into useful computers :P

16:40 <HdkR> Chromebook is completely useful for RE and then copying a Linux install on to it instead ;)

16:40 <macc24> s/Linux/Cadmium/

16:45 <robmur01> meh, still not the same as a machine that's actually designed to run your own OS of choice, or at least can offer a normal EFI boot menu like this thing

16:45 <macc24> robmur01: still, depthcharge is better than traditional bios

16:51 <macc24> and developing linux distro for chromebooks is easier than developing linux distro for regular x86 hardware

16:57 <robmur01> equally, developing a towbar for blue Honda CB-F motorcycles is easier than developing a towbar for all cars

16:58 <robmur01> market size and diversity still doesn't make a blue motorcycle a *good* choice of towing machine

16:58 <macc24> robmur01: chromebooks have no legacy baggage and no need to put boot code in specific sector of a random drive

16:58 <HdkR> :D

16:58 <macc24> less choice = easier to develop for

16:59 <macc24> less choice for user, that is

17:00 <robmur01> macc24: what part of the standard EFI boot protocol on this SDM835 machine requires that, exactly?

17:00 <macc24> robmur01: i was talking about bios

17:01 <robmur01> pretty sure last time I tried I just booted a kernel image off a FAT32 partition on a USB stick

17:02 <HdkR> I boot my ProX off an EFI partition pointing to a Linux image on USB :P

17:02 * robmur01 doesn't understand what legacy PC BIOS has to do with anything Arm-related :/

17:02 <macc24> robmur01: my point was that making linux on chromebooks is easier than making linux on "regular" x86 hardware

17:03 <macc24> all chromebooks just load kernel from first partition without any additional config

17:03 <robmur01> and when exactly do you expect MTK SoCs to start turning up in regular x86 hardware?

17:03 * robmur01 is massively confused and going off to do something else

17:04 <macc24> TIL that i'm good at confusing people accidentally

17:05 <macc24> and i wish arm socs replaced intel/amd cpus

17:05 guillaume_g has quit [Quit: Konversation terminated!]

17:06 <HdkR> Gimme a 64 core post-hercules SoC and I'll replace a computer in my house :P

17:06 <macc24> HdkR: if mt8183 laptops had as many ports as my thinkpad x201 i would have typed this message from mt8183

17:07 <HdkR> :P

17:08 <macc24> the only reasons that i keep my xeon desktop are its gpu and x86 software

17:10 <HdkR> Luckily GPU is easy to fix in ATX form factor even with ARM

17:11 <HdkR> x86 software is a bit more rough without ARMv8.4

17:11 <macc24> well, it's more about fact that it can drive more than 2 displays at the same time

17:12 <amonakov> how do I build Mesa's in-tree panfrost disassembler? was using the one from ShaderProgramDisassembler repo, but it's probably very outdated by now

17:17 <robmur01> HdkR: y'know I've been here via x86 software on Armv8.0 for pushing a year now, right? :D

17:17 <HdkR> robmur01: FEX also supports that config. It's just a nightmare if you need to deal with say...unaligned atomics

17:19 <alyssa> amonakov: build mesa with -Dtools=panfrost

17:19 <alyssa> and it'll show up as `bifrost_compiler` in the build dir nested deeply

17:19 <alyssa> `bifrost_compiler disasm foo.bin` should work if mesa is built from git master

17:36 alyssa has quit [Remote host closed the connection]

17:38 alyssa has joined #panfrost

17:50 <macc24> HdkR: armv8.4?

17:51 <HdkR> macc24: ARMv8.4 mandates support for unaligned atomics

17:51 <HdkR> Where with ARMv8.1 it is optional to support and nobody supports it

17:52 <amonakov> alyssa: thanks (also disabled a bunch of stuff to cut down dependencies)

17:53 <amonakov> alyssa: do I understand correctly that I need to manually remove the mali blob shader header, unlike for ShaderProgramDisassembler?

17:53 <alyssa> if you built latest, not needed

17:57 <amonakov> ah, nice, thanks

18:12 <macc24> alyssa: https://twitter.com/cmwdotme/status/1351838924621099008?s=21

18:16 <macc24> it would be embarrasing if linux on m1 had more features than cadmium in shorter time

18:16 <alyssa> this is offtopic for #panfrost

18:16 <macc24> i see no alyssa on ##panfrost-offtopic

18:16 <daniels> there's also the #asahi family of channels for M1 things :)

18:17 <daniels> I won't go into it here because way offtopic, but there are legal concerns raised around the Corellium port, which you can find out more about by looking into marcan's Twitter posts

18:20 <anarsoul> yay!

18:29 <alyssa> macc24: also, I would add "completely usable" is ... a stretch

18:30 <alyssa> by that metric panfrost has been "completely usable" since Jan 2019

18:31 <macc24> alyssa: completely usable as in all essential stuff like gpu, usb, suspending, display, whatever else working fine

18:31 <macc24> sound too

18:33 <macc24> anyway i'm gonna go run away since my desktop froze

18:33 <alyssa> macc24: Definitely no GPU support on that image.

18:34 <alyssa> Display is whatever was provided at boot only.

18:34 <alyssa> I don't believe suspend works.

18:34 <alyssa> I don't believe sound works.

18:34 <alyssa> I don't mean to knock the work -- the speed of the port is incredible -- but "completely usable" is misleading.

18:48 <robmur01> also good luck hotplugging monitors or changing resolution with simple-framebuffer

18:54 davidlt has quit [Ping timeout: 272 seconds]

18:55 <anarsoul> who cares about suspend on mac mini?

19:03 <robmur01> People who suspend their machine when they're not using it? I almost never cold-boot my desktop.

19:04 <robmur01> (note that hibernate needs baseline suspend support too)

19:18 nlhowell has quit [Ping timeout: 265 seconds]

19:35 <macc24> alyssa: i mean, sound and suspend doesn't work in cadmium too

19:36 <macc24> and i have seen people use llvmpipe on c201pa

19:39 <anarsoul> robmur01: I keep my laptop always on if its on charger

19:39 <anarsoul> anyway, they'll likely get to working suspend some day

19:39 <anarsoul> one step at a time

19:39 <macc24> shit i gotta speed up with getting suspend to work on duet

20:10 raster has quit [Quit: Gettin' stinky!]

20:39 raster has joined #panfrost

20:53 * alyssa is embarassed by the # of open panfrost MRs

20:57 <HdkR> That just means more people need poked for review. Which is a better problem than an a large downstream fork without any MRs :)

20:59 Net147 has quit [Read error: Connection reset by peer]

20:59 Net147 has joined #panfrost

21:01 <warpme_> Guys: i'm scratching my head why amlogic is only platform giving me app. segfault on Qt EGLFS (EGLFS: Qt draws to fullscreen EGL surface) while exactly the same sw. stack works well on rk/aw/rpi. Some datapoints: 1\GL provider (mesa) is the same on all HW; 2\ Qt X11/GL(glamour) works OK on AML. 3\AML issue is on all AML HW i have (lima on mali450, panfrost on t760 and bifrost on g31). My hypothesis is: EGLFS uses mesa

21:01 <warpme_> GLES call(s) which are exposing issue at mesa-drm in aml drm. What will be your opinion here?

21:02 <alyssa> backtrace? but yeah, sounds like amlogic display stuff is to blame

21:02 <alyssa> if both lima and panfrost are affected, but rk is not, it isn't a GL issue

21:03 <alyssa> scheduler constants are breaking dEQP-GLES3.functional.texture.format.sized.3d.rgba8_pot why...

21:07 <warpme_> alyssa: segfault is deep in Qt EGLplatformintegration driver. To get meaningful data from Qt internals - i need to debug build of Qt. Even cross-compiled on i7 it takes hours. I think it is not worth - as at end we will end with the same conclusion: issue is in aml drm driver??

21:08 <warpme_> asking here as suspect aml guys will point finger to mesa :-p

21:12 <anarsoul> it crash happens in the same place on lima and panfrost it's very unlikely to be mesa

21:13 <anarsoul> anyway, try building mesa with debug info to see if it even appears in backtrace

21:14 <warpme_> anarsoul: can't compare exact call/stack regs - but Qt call seen in gdb seems to be the same...

21:14 <anarsoul> could also be some Qt bug

21:15 <anarsoul> aml folks will ask you for a backtrace anyway

21:15 <anarsoul> :)

21:15 <warpme_> hmm - might be but why then all is ok on: aw/rk/rpi/intel/amd/nvidia?

21:16 <macc24> i bet it will be ok with LIBGL_ALWAYS_SOFTWARE=1

21:19 <alyssa> macc24: doesn't mean much, s/w drivers are special-cased for the winsys

21:20 <macc24> 'ok' as in 'the same result'

21:21 <warpme_> macc24: nope. with LIBGL_ALWAYS_SOFTWARE=1 segfaults the same....

21:23 <alyssa> bbrezillon: why the heck do *{S,U}{8,16}_TO_{S,U}32 exist

21:23 <alyssa> that is strictly equivalent to *MKVEC.v2i16 [whatever], #0

21:24 <alyssa> The + versions make sense though

21:24 <amonakov> hm? the signed variants shouldn't be equivalent?

21:25 <HdkR> Saves a constant needing to be encoded?

21:26 <alyssa> amonakov: --right, they're not, my bad.

21:26 <alyssa> My point still stands for *U16_TO_U32 at least

21:26 <alyssa> HdkR: #0 is free in the FMA pipe

21:27 <HdkR> So just the signed bit that matters :D

21:33 <alyssa> rrright

21:34 <alyssa> amonakov: oh hey, a Pidgin user! =)

21:34 <amonakov> yep, that I am :)

21:41 <alyssa> raster: in case you were wondering about yesterday's bug -- data race reading consecutive staging registers

21:41 <alyssa> my fault, but also so, so, bifrost

21:44 <raster> alyssa: ugh... race conditions in hw... :|

21:44 <alyssa> raster: Reading from staging registers is architecturally defined to be racy.

21:44 <alyssa> (I don't remember if it's undefined behaviour or flat out will never work.)

21:45 <raster> thats what i mean...

21:45 <raster> :)

21:45 <raster> ugh. :|

21:45 <alyssa> it's not a bug if it's documented right???

21:45 <raster> :P

21:45 <raster> x86 vs any other rch...

21:45 <raster> x86 is documented

21:45 <alyssa> [Anyway, I had accounted for this when a single reg is read, but had a subtle issue with vector regs]

21:45 <raster> but its ugly

21:45 <raster> well ok 6502 was not pretty either

21:45 <HdkR> Can confirm ugly x86

21:46 <alyssa> I like 6502.

21:46 <raster> but 69k, ppc, arm ... all much cleaner and nicer

21:46 <urjaman> lmao 69k

21:46 <alyssa> 68k?

21:46 <urjaman> Nice

21:46 <alyssa> but 1k more?

21:46 <raster> haahok 68k :)

21:46 <alyssa> 1k more... another k, another destiny...

21:47 <amonakov> alyssa: hold up, there's architecture guide with register definitions for Bifrost? where?

21:47 <alyssa> amonakov: effectively, mesa's comments ;)

21:47 <amonakov> ah, and git commit messages :)

21:48 <raster> amonakov: lies! there can be no such thing! :)

21:49 <icecream95> z80 > ARM > AArch64 > everything else

21:49 <amonakov> [as an application dev looking at Arm Mali tools, the more I look, the less sense it makes]

21:49 <alyssa> amonakov: that's correct.

21:50 <alyssa> if this stuff was publicly documented, people would realize none of this makes sense, by design ;p

21:50 <alyssa> icecream95: z81 > z80

21:51 <icecream95> alyssa: Only if z is positive

21:51 <alyssa> drat, foiled again

21:54 <icecream95> alyssa: You missed https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8583

21:54 <alyssa> whoops

21:57 <alyssa> Pass: 21949, Fail: 13, Warn: 28, Skip: 93, Flake: 3, Duration: 5:12, Remaining: 0

21:57 <alyssa> that's like, most of them, right?

22:08 <icecream95> macc24: Enabling pstore might help with debugging suspend

22:13 * alyssa broke reg spilling again

22:28 <alyssa> trying against CI

22:28 <alyssa> 24 files changed, 2667 insertions(+), 652 deletions(-)

22:28 <alyssa> ughhh

22:29 <HdkR> ooo, that's a good one

22:29 <alyssa> HdkR: still the scheduler

22:30 <HdkR> `15 files changed, 1621 insertions(+), 4 deletions(-)` I've got this fun one today

22:31 <HdkR> Doomed to massive commits

22:31 <alyssa> a single commit?

22:31 <alyssa> mine was across the series :<

22:31 <HdkR> Yea, half of it is unit tests though

22:31 <alyssa> tests can go in a 2nd commit :p

22:32 <HdkR> That would be too easy then :D

22:33 <macc24> icecream95: display doesn't power on on resume

22:33 <macc24> i can ssh into machine just fine

22:56 <alyssa> glmark2 -bterrain is improved on bifrost by schedule patches :)

22:58 <alyssa> (9fps to 11fps, at 1080p)

22:58 <alyssa> not that 11fps is anything to write home about...

22:58 <alyssa> :shader5: - MESA_SHADER_FRAGMENT shader: 1314 inst, 1314 nops, 1314 clauses, 1 threads, 0 loops, 0:0 spills:fills

22:59 <alyssa> :shader5: - MESA_SHADER_FRAGMENT shader: 1314 inst, 872 nops, 141 clauses, 1 threads, 0 loops, 0:0 spills:fills

22:59 <italove> why so many nops?

22:59 <alyssa> italove: 16% reduction in nops, think positive! :p

23:00 <alyssa> and nops are from failing to fill the pipeline

23:00 <italove> haha, right

23:00 <alyssa> (that's before/after the schedule MR)

23:00 <alyssa> italove: Technically Midgard has lots of nops too, you just don't notice them

23:01 <alyssa> if you have a bundle of 2 instructions, say, { vmul.fmul ... / vadd.fadd ... }, effectively sadd/smul/vlut are all nops

23:02 <italove> oh I see, so the disasm doesn't show it because it's not part of the code, it's just something that happens when the scheduling can't fill the bundle with instructions?

23:02 <italove> makes sense

23:02 <italove> scheduler*

23:02 <icecream95> alyssa: G72 already gets 11 fps, so will scheduling make it 13 or 13.44444 fps?

23:07 <anarsoul> or 111?

23:11 <icecream95> alyssa: How should panpackcolor handle 12-byte formats?

23:12 <icecream95> pan_pack_color*

23:21 <alyssa> icecream95: 13.35195876432 +/- 1fps, with 30% confidence :p

23:21 <alyssa> wdym how?

23:24 <icecream95> alyssa: Should the size == 12 case be just a memcpy like size == 16 or does it need something like the size == 6 case?

23:24 raster has quit [Quit: Gettin' stinky!]

23:42 karolherbst has quit [Ping timeout: 272 seconds]

23:42 <alyssa> icecream95: First question is what the heck the size == 6 special case was about...

23:43 <alyssa> git blames me

23:43 <alyssa> 1b86e0927d4c829209a6134223b0ca5aff771c8d commit message sounds wholly unconvining

23:44 <alyssa> icecream95: For a bit of context, the hardware just does a dumb copy of clear_color (128-bits) straight into the tilebuffer, completely disregarding the pixel format.

23:45 <alyssa> Which means pan_pack_color is really a CPU side tilebuffer pixel pack, analogous to lower_framebuffer for blend shaders

23:45 <alyssa> I don't think that function has been touched since GenXML.

23:46 <alyssa> But if I were writing the function now, the 'right' approach would be:

23:46 <alyssa> 1. Get the tilebuffer format, like we do in pan_mfbd.c -- check panfrost_blend_format and if none exists, use an appropriately sized RAW format

23:47 <alyssa> 2. For a blendable format, special case packs by tilebuffer format ("Color Buffer Internal Format")

23:48 <alyssa> The underlying principle is that the tib can be reinterpreted as RGBX8 and it should still make sense.

23:49 <alyssa> So RGBA8 is just a copy, RGB10A2 shifts off the lower 2 bits and then packs them in the top byte, RGBA4 has everything shifted 4, etc.

23:50 <alyssa> 3. For a raw format, *we* determine the interpretation (these are the formats requiring blend shaders). So we do the obvious thing and always use the same format as Gallium, which means we can just use Gallium packs...

23:50 <alyssa> ...except stuff needs to be replicated to keep the hardware happy for reasons I don't remember.

23:55 <alyssa> icecream95: ---I guess to answer your question, probably just a memcpy, and you can probably garbage collect the == 6 case to be just a memcpy too.