#panfrost on 2019-08-01 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

01:15 vstehle has quit [Ping timeout: 244 seconds]

01:28 megi has quit [Ping timeout: 272 seconds]

02:17 rcf has quit [Ping timeout: 246 seconds]

02:19 rcf has joined #panfrost

02:33 chewitt has joined #panfrost

02:46 rcf has quit [Ping timeout: 245 seconds]

02:47 rcf has joined #panfrost

03:59 davidlt has joined #panfrost

04:38 _whitelogger has joined #panfrost

04:55 chewitt has quit [Quit: Zzz..]

05:00 vstehle has joined #panfrost

06:20 chewitt has joined #panfrost

06:24 chewitt has quit [Client Quit]

06:40 guillaume_g has joined #panfrost

07:54 pH5 has joined #panfrost

08:41 yann has quit [Ping timeout: 248 seconds]

08:41 Kwiboo has quit [Quit: .]

08:42 Kwiboo has joined #panfrost

08:46 pH5 has quit [Ping timeout: 264 seconds]

09:42 megi has joined #panfrost

09:53 jcureton has quit [Remote host closed the connection]

10:00 <tomeu> alyssa: btw, CI runs take much shorter now

10:00 <tomeu> guess it's due to the BO cache?

10:22 raster has joined #panfrost

10:28 pH5 has joined #panfrost

10:55 raster has quit [Remote host closed the connection]

10:55 raster has joined #panfrost

11:01 yann has joined #panfrost

11:17 <daniels> alyssa: eglChooseConfig() generally takes channel sizes as _at least_, so will use a config with wider channels than requested if one is available

11:50 adjtm has joined #panfrost

13:27 <tomeu> alyssa: when marking all BOs except ctx->shaders as non-executable, I get an invalid access when trying to execute at a weird address: http://paste.debian.net/1093867/

13:28 <tomeu> the fault only happens once job_2fb0480_17 gets submitted

13:28 <tomeu> which I guess it's just a clear?

13:46 herbmillerjr has quit [Ping timeout: 248 seconds]

13:51 herbmillerjr has joined #panfrost

14:24 <alyssa> tomeu: I noticed that, wasn't sure what to thank/blame

14:24 <alyssa> But armhf runs are.... getting more problematic by the day

14:24 <alyssa> tomeu: Umm we sometimes shove blend shaders in transient memory

14:24 <alyssa> which might be dumb but oops

14:25 <tomeu> because of how much we allocate (for now) in each context, and because of how often deqp allocates contexts, I think it's the BO cache what speeded it up

14:25 <tomeu> alyssa: aha :)

14:25 <tomeu> definitely looks like that

14:25 <alyssa> tomeu: Yeah, it's a Bug (TM)

14:26 <alyssa> But I was lazy and it didn't break anything at the time even though I felt bad about it and still kinda do

14:41 <tomeu> alyssa: it's just a one-liner :p

14:49 belgin has joined #panfrost

14:49 <belgin> hello

14:49 <belgin> so you don't like x11?

14:49 <belgin> i have found out something fun about it today

14:50 <belgin> if you strip your window of wm decorations and set it to the same size as the "root" window, it just ignores all resize attempts afterwards

14:51 <belgin> however, if you set one of the dimensions off by 1, it works normally

14:51 <belgin> take that x11

14:53 jcureton has joined #panfrost

15:02 guillaume_g has quit [Quit: Konversation terminated!]

15:06 <jcureton> the good news is my two T720 platforms are consistent on the dEQP blend tests, the bad news is they both pass 0% :)

15:11 <tomeu> jcureton: do you have an idea already of what's wrong?

15:12 <alyssa> tomeu: https://gitlab.freedesktop.org/tomeu/mesa/commit/ed98964abaf6d45c9cb8831b23124c2e202a91e8

15:12 <alyssa> tomeu: Unfortunately, I don't think that's the right approach

15:12 <alyssa> It *will* work, but it'll also leak memory terribly

15:12 <jcureton> tomeu: not quite yet. pass rate for all of dEQP-GLES2.* on Allwinner H6 T720 is quite low - 7.1%

15:13 <alyssa> tomeu: Anything uploaded to the shaders BO will stay alive for forever.

15:13 <alyssa> jcureton: Ouch D:

15:13 <tomeu> alyssa: ah, the transient pool has magic to know what can be released?

15:13 <alyssa> tomeu: Transient memory, by definition, is freed automatically every frame

15:14 <tomeu> ah, all of it?

15:14 <alyssa> All of it!

15:14 <alyssa> Think of `transient` like the stack, and main BOs like the heap.

15:14 <tomeu> gotcha

15:14 <alyssa> Small allocations within the function itself you put on the stack

15:14 <alyssa> But something big, or something you need to persist after we return, go to the heap since the stack will be destroyed

15:15 <alyssa> But allocation/freeing on the heap can be expensive (mitigated somewhat with the BO cache), whereas stack allocations/frees are essentially free

15:16 <tomeu> hmm, wonder how bad it could be to mark the transient pools executable

15:16 <tomeu> quite bad, I think

15:16 <alyssa> tomeu: Pretty bad.

15:16 <tomeu> and have a executable transient pool?

15:16 <alyssa> That's an option

15:17 belgin has quit [Quit: Leaving]

15:17 <alyssa> Another option is just doing proper memory manegemnt on ctx->shaders

15:17 <alyssa> But so far that's been quite low-prio since shaders are small.

15:18 <cwabbott> can't you just never free blend shaders except on context destruction? afaik that's what the blob does

15:18 <cwabbott> after all, you'll never know when you'll need it again

15:18 <alyssa> cwabbott: Mostly we do that

15:19 <alyssa> cwabbott: The exception is glBlendColor().. afaik, there's no provision for passing it (via a uniform or whatever)

15:19 Prf_Jakob has joined #panfrost

15:19 <alyssa> So we just patch it into the binary directly

15:19 <alyssa> But BlendColor is not tied to the CSO in Gallium

15:19 <alyssa> So the easiest approach is, for shaders with a BlendColor, to reupload once a frame. It's an edge case that doesn't come up much outside dEQP anyway, so I'm not too worried.

15:20 <cwabbott> oh, that sounds like fun indeed

15:20 <alyssa> cwabbott: Blending is by far the most complex part of the driver :(

15:21 <cwabbott> it sounds like you need another hash table with (blend CSO, blend color) -> patched shader

15:22 <alyssa> foreach frame { glBlendColor(i++, 0.0, 0.0, 1.0); ... Draw() }

15:22 <cwabbott> alyssa: you should have a look at the shaders the blob uses for indirect draws / geometry shaders / tess shaders :)

15:22 <alyssa> cwabbott: I saw the indirect draw mess yesterday

15:22 <alyssa> Needless to say I aim for ES3.0 only ;)

15:22 <cwabbott> I think it's just doing all the magic instance division stuff in the shader

15:23 <cwabbott> instead of in the driver

15:23 <alyssa> Well, the instancing magic is `only` 350 lines in Panfrost so I guess 200 lines of Midgard asm is reasonable t_t

15:34 <tomeu> jcureton: I'm asking because I used to test on a H6 and things didn't seem that bad

15:36 <jcureton> tomeu: i guess anecdotally most stuff seems to run without many issues. most of the ones i've run into have been blending related

15:37 <tomeu> ah, I see

15:37 <tomeu> jcureton: do you get anything in dmesg?

15:38 <jcureton> not in running dEQP, although with most applications the first job submission always results in a DATA_INVALID_FAULT before continuing

15:39 <jcureton> occasionally I'll see a job timeout, particularly running glmark. there are also some graphical glitches in glmark, although i don't remember which tests off my head

15:39 <tomeu> ok, so rendering with blending is just wrong

15:39 <jcureton> ^ yes

15:39 <tomeu> ah, and some other issue I guess

15:39 <alyssa> Blending on SFBD is pretty different from MFBD

15:40 <alyssa> So I'm willing to believe we (...I) butchered it

15:41 <tomeu> I could undust my H6, but not having the board in CI means it will break again

15:42 <alyssa> jcureton: The firs tjob submit thing is probably why dEQP results are so terrible

15:43 <jcureton> hmm, maybe. dEQP doesn't trigger any faults in the kernel

15:43 <alyssa> m

15:43 <alyssa> Hm

15:52 pH5 has quit [Quit: bye]

15:57 TheCycoTWO has joined #panfrost

15:57 TheCycoONE has left #panfrost ["WeeChat 2.4"]

16:38 shadeslayer has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

16:43 yann has quit [Ping timeout: 246 seconds]

16:51 pH5 has joined #panfrost

17:40 raster has quit [Remote host closed the connection]

17:45 yann has joined #panfrost

17:52 stikonas has joined #panfrost

18:30 jernej has quit [Remote host closed the connection]

19:05 shadeslayer has joined #panfrost

19:31 * alyssa thinks of landing the compute shader preliminary code now

19:31 <alyssa> It's useless on its own (still needs SSBOs implemented and a few other things to actually do anything)

19:31 <alyssa> But it's a bunch of patches and it doesn't regress anything so ya know

19:36 afaerber has joined #panfrost

19:42 <alyssa> Anyway, for SSBO stuff, I'm really going to need to figure out how load/store arguments are passed

19:43 * alyssa fudges around for her notes

19:43 <alyssa> We talked about this like last week

19:43 <alyssa> Here 'tis

19:53 <alyssa> D'oh!

19:53 <alyssa> In an indirect load, `base address` and `offset` are not separate arguments

19:53 <alyssa> They're just two arguments that are added together, end of story

19:53 <alyssa> So they're totally commutative

20:00 <alyssa> Arguments themselves are 8-bit

20:00 <alyssa> For a register argument (like an indirect offset), we need to specify:

20:01 <alyssa> - Component (2-bit)

20:01 <alyssa> - Register (1-bit)

20:01 <alyssa> - Size (2-bit)

20:03 <alyssa> So we're up to 5-bits of 8

20:07 <alyssa> https://people.collabora.com/~alyssa/ldst_arg.c roughly

20:10 <alyssa> Something about the sizing is totally wrong tho :/

20:13 <alyssa> Yeah, size is wrong ummmm

20:14 <alyssa> But.. hrm

20:15 <alyssa> Let's do some decoding for the bottom 3-bits at least, since those are consistently sane for register inputs

20:19 <HdkR> alyssa: No 128bit load for SSBO?

20:20 <alyssa> HdkR: You can definitely do a 128-bit load?

20:21 <HdkR> size = element size then rather than full memory load size?

20:22 <HdkR> If the hardware could do a 128bit load then I'd expect 3bits for size to be able to do 8bit all the way to 128bit

20:22 <alyssa> HdkR: size being size of the register

20:22 <alyssa> And this is the register for offsets, not for th result itself

20:22 <HdkR> ah

20:22 <alyssa> You don't need 128-bit offsets.. yet ;)

20:22 <HdkR> 128bit offset would be a bit mad by today's standards yea :P

20:32 davidlt has quit [Ping timeout: 245 seconds]

20:54 <alyssa> Hum, we're raising more questions than we're answeing

20:56 <HdkR> :D

21:10 <alyssa> Sweet!

21:11 <alyssa> We're nowhere "there" yet, but already we can see that "r27-only ops" don't exist

21:11 <alyssa> If we don't set the "r27?" flag unconditionally, then we RA them onto r26, hah

21:33 <alyssa> https://gitlab.freedesktop.org/tomeu/mesa/commit/1af22ac413d60dcb2935e819ebe7c3e4847c1683 for a satisfying cleanup :)

21:42 <alyssa> So the next question is how to interpret the dummy args

21:42 <alyssa> E.g.: 0x7E

21:43 <alyssa> Which in any SSBO access seems to mean "ignore this arg / zero"

21:45 <alyssa> The top bit of arg_1 seems to mean "indirect?" in some cases but I recall there being exeptions

21:46 <alyssa> Ah, yes, indirect varyings are an exception

21:47 <alyssa> For varyings, 0x9E and 0x1E are ignored args

21:47 <alyssa> And indirect is specified by all bits being zero except bottom 3

21:48 * alyssa wonders if the 0x8 bit causes a field to be ignored/clamped to zero

21:49 <alyssa> It certainly matches all the data points

21:49 <alyssa> ----Wait. We can test these assumptions

21:54 <alyssa> It would appear `(arg & 0xF) == 0xE` is the real 'ignore argument' here

21:57 <alyssa> But that can't be totally right, since it still matters in some cases what's put in the upper nibble

22:00 raster has joined #panfrost

22:01 stikonas has quit [Remote host closed the connection]

22:39 rhyskidd has quit [Remote host closed the connection]

22:45 <alyssa> Aaaaa, the register spilling, the register spilling!

22:45 <alyssa> Anything but the register spilling!

22:45 * alyssa faints

22:46 <HdkR> Maximum register spilling engaged

22:51 <anarsoul> alyssa: you still need spilling with 16-32 vec4?

22:52 <HdkR> You'll always need spilling

22:53 <HdkR> Nvidia has 256 and still needs spilling :P

22:53 <alyssa> anarsoul: 16 vec4 and definitely

22:53 <alyssa> Though preferably shaders only use 8 vec4

22:53 <alyssa> since stuff goes south after 8

22:53 <anarsoul> spilling is hard :(

22:53 <bnieuwenhuizen> I've had shaders spilling kilobytes per invocation, the sky is the limit :)

22:54 <alyssa> bnieuwenhuizen: shadertoy?

22:54 <anarsoul> bnieuwenhuizen: ouch

22:55 <HdkR> bnieuwenhuizen: Well, amount of memory is the limit :P

22:55 <bnieuwenhuizen> alyssa: not sure what made us originally detect it, but I had a piglit test somewhere. gfxbench5 the vulkan test also has it

22:55 <bnieuwenhuizen> HdkR: ever heard of swap? :P

22:55 <alyssa> bnieuwenhuizen: remind me why i'm writing an opencl impl again T_T

22:55 <alyssa> :p

22:56 <bnieuwenhuizen> opencl?

22:56 <bnieuwenhuizen> anyway, first time I saw was definitely OpenGL, so you can clearly have this badness in any API

22:57 <alyssa> bnieuwenhuizen: It's kind of a long rabbit hole I'm down under

22:57 <alyssa> Fixing load/store handling because I'm figuring out more about load/store ops

22:57 <alyssa> because I want to implement SSBOs

22:57 <alyssa> because I want GLES3.1 compute shaders

22:57 <bnieuwenhuizen> alyssa: every driver goes through it at some point. you'll get through it

22:57 <alyssa> because people are bugging about compute

23:03 <HdkR> alyssa: Bugging about compute or OpenCL specifically? :P

23:05 <alyssa> HdkR: Yes.

23:05 <HdkR> haha

23:05 * alyssa is hoping any compute will do and then Other People (possibly including Future Alyssa) deal with CL specifically

23:06 <HdkR> alyssa: You know at least one person is going to ask about it at XDC :)

23:07 <alyssa> HdkR: Who says I'm going to XDC?!

23:07 <alyssa> :p

23:07 <bnieuwenhuizen> alyssa: how are you giving your witchcraft talk then?

23:07 <HdkR> :D

23:07 <HdkR> That was a great talk description

23:07 <alyssa> bnieuwenhuizen: Oh, is that public now?

23:08 * bnieuwenhuizen has half a clue

23:08 <bnieuwenhuizen> I found it through my talk having a link to the main track and the link works for non-logged in visits

23:08 * alyssa can't tell if talks were released or y'all just have Special Privileges

23:09 * bnieuwenhuizen has no clue how to find it through the main page though

23:10 <alyssa> bnieuwenhuizen: I'm not seeing said link

23:10 raster has quit [Remote host closed the connection]

23:10 <bnieuwenhuizen> alyssa: your own talk description, there is a "main track" button

23:11 * alyssa doesn't see it

23:11 <alyssa> Oh, here tis

23:13 <alyssa> There are a bunch of cool talks on Wednesday and Thursday I'm going to miss, awww :(

23:14 <bnieuwenhuizen> alyssa: not coming the full conference?

23:14 <alyssa> bnieuwenhuizen: Can't get away with skipping more than a day of class ;)

23:14 megi has quit [Quit: WeeChat 2.5]

23:16 <bnieuwenhuizen> alyssa: time to ask for recordings?

23:32 jernej has joined #panfrost

23:35 <alyssa> Ah-ha! It just occurred to me how to handle uniform/work split

23:36 <alyssa> Rather than promoting UBO reads to uniform registers, we should be optimistic.

23:36 <alyssa> And use a full set of 16 uniform registers.

23:37 <alyssa> Then, when RA fails, rather than spilling registers, we demote uniform registers to UBO reads ("spilling uniforms")

23:37 <alyssa> After doing that 8 times if RA is still failing, well, yeah, start spilling real stuff I guess

23:38 <alyssa> This should be a promising opt, I think

23:44 * alyssa will finish tmrw

23:58 pH5 has quit [Quit: bye]