alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
vstehle has quit [Ping timeout: 244 seconds]
megi has quit [Ping timeout: 272 seconds]
rcf has quit [Ping timeout: 246 seconds]
rcf has joined #panfrost
chewitt has joined #panfrost
rcf has quit [Ping timeout: 245 seconds]
rcf has joined #panfrost
davidlt has joined #panfrost
_whitelogger has joined #panfrost
chewitt has quit [Quit: Zzz..]
vstehle has joined #panfrost
chewitt has joined #panfrost
chewitt has quit [Client Quit]
guillaume_g has joined #panfrost
pH5 has joined #panfrost
yann has quit [Ping timeout: 248 seconds]
Kwiboo has quit [Quit: .]
Kwiboo has joined #panfrost
pH5 has quit [Ping timeout: 264 seconds]
megi has joined #panfrost
jcureton has quit [Remote host closed the connection]
<tomeu> alyssa: btw, CI runs take much shorter now
<tomeu> guess it's due to the BO cache?
raster has joined #panfrost
pH5 has joined #panfrost
raster has quit [Remote host closed the connection]
raster has joined #panfrost
yann has joined #panfrost
<daniels> alyssa: eglChooseConfig() generally takes channel sizes as _at least_, so will use a config with wider channels than requested if one is available
adjtm has joined #panfrost
<tomeu> alyssa: when marking all BOs except ctx->shaders as non-executable, I get an invalid access when trying to execute at a weird address: http://paste.debian.net/1093867/
<tomeu> the fault only happens once job_2fb0480_17 gets submitted
<tomeu> which I guess it's just a clear?
herbmillerjr has quit [Ping timeout: 248 seconds]
herbmillerjr has joined #panfrost
<alyssa> tomeu: I noticed that, wasn't sure what to thank/blame
<alyssa> But armhf runs are.... getting more problematic by the day
<alyssa> tomeu: Umm we sometimes shove blend shaders in transient memory
<alyssa> which might be dumb but oops
<tomeu> because of how much we allocate (for now) in each context, and because of how often deqp allocates contexts, I think it's the BO cache what speeded it up
<tomeu> alyssa: aha :)
<tomeu> definitely looks like that
<alyssa> tomeu: Yeah, it's a Bug (TM)
<alyssa> But I was lazy and it didn't break anything at the time even though I felt bad about it and still kinda do
<tomeu> alyssa: it's just a one-liner :p
belgin has joined #panfrost
<belgin> hello
<belgin> so you don't like x11?
<belgin> i have found out something fun about it today
<belgin> if you strip your window of wm decorations and set it to the same size as the "root" window, it just ignores all resize attempts afterwards
<belgin> however, if you set one of the dimensions off by 1, it works normally
<belgin> take that x11
jcureton has joined #panfrost
guillaume_g has quit [Quit: Konversation terminated!]
<jcureton> the good news is my two T720 platforms are consistent on the dEQP blend tests, the bad news is they both pass 0% :)
<tomeu> jcureton: do you have an idea already of what's wrong?
<alyssa> tomeu: Unfortunately, I don't think that's the right approach
<alyssa> It *will* work, but it'll also leak memory terribly
<jcureton> tomeu: not quite yet. pass rate for all of dEQP-GLES2.* on Allwinner H6 T720 is quite low - 7.1%
<alyssa> tomeu: Anything uploaded to the shaders BO will stay alive for forever.
<alyssa> jcureton: Ouch D:
<tomeu> alyssa: ah, the transient pool has magic to know what can be released?
<alyssa> tomeu: Transient memory, by definition, is freed automatically every frame
<tomeu> ah, all of it?
<alyssa> All of it!
<alyssa> Think of `transient` like the stack, and main BOs like the heap.
<tomeu> gotcha
<alyssa> Small allocations within the function itself you put on the stack
<alyssa> But something big, or something you need to persist after we return, go to the heap since the stack will be destroyed
<alyssa> But allocation/freeing on the heap can be expensive (mitigated somewhat with the BO cache), whereas stack allocations/frees are essentially free
<tomeu> hmm, wonder how bad it could be to mark the transient pools executable
<tomeu> quite bad, I think
<alyssa> tomeu: Pretty bad.
<tomeu> and have a executable transient pool?
<alyssa> That's an option
belgin has quit [Quit: Leaving]
<alyssa> Another option is just doing proper memory manegemnt on ctx->shaders
<alyssa> But so far that's been quite low-prio since shaders are small.
<cwabbott> can't you just never free blend shaders except on context destruction? afaik that's what the blob does
<cwabbott> after all, you'll never know when you'll need it again
<alyssa> cwabbott: Mostly we do that
<alyssa> cwabbott: The exception is glBlendColor().. afaik, there's no provision for passing it (via a uniform or whatever)
Prf_Jakob has joined #panfrost
<alyssa> So we just patch it into the binary directly
<alyssa> But BlendColor is not tied to the CSO in Gallium
<alyssa> So the easiest approach is, for shaders with a BlendColor, to reupload once a frame. It's an edge case that doesn't come up much outside dEQP anyway, so I'm not too worried.
<cwabbott> oh, that sounds like fun indeed
<alyssa> cwabbott: Blending is by far the most complex part of the driver :(
<cwabbott> it sounds like you need another hash table with (blend CSO, blend color) -> patched shader
<alyssa> foreach frame { glBlendColor(i++, 0.0, 0.0, 1.0); ... Draw() }
<cwabbott> alyssa: you should have a look at the shaders the blob uses for indirect draws / geometry shaders / tess shaders :)
<alyssa> cwabbott: I saw the indirect draw mess yesterday
<alyssa> Needless to say I aim for ES3.0 only ;)
<cwabbott> I think it's just doing all the magic instance division stuff in the shader
<cwabbott> instead of in the driver
<alyssa> Well, the instancing magic is `only` 350 lines in Panfrost so I guess 200 lines of Midgard asm is reasonable t_t
<tomeu> jcureton: I'm asking because I used to test on a H6 and things didn't seem that bad
<jcureton> tomeu: i guess anecdotally most stuff seems to run without many issues. most of the ones i've run into have been blending related
<tomeu> ah, I see
<tomeu> jcureton: do you get anything in dmesg?
<jcureton> not in running dEQP, although with most applications the first job submission always results in a DATA_INVALID_FAULT before continuing
<jcureton> occasionally I'll see a job timeout, particularly running glmark. there are also some graphical glitches in glmark, although i don't remember which tests off my head
<tomeu> ok, so rendering with blending is just wrong
<jcureton> ^ yes
<tomeu> ah, and some other issue I guess
<alyssa> Blending on SFBD is pretty different from MFBD
<alyssa> So I'm willing to believe we (...I) butchered it
<tomeu> I could undust my H6, but not having the board in CI means it will break again
<alyssa> jcureton: The firs tjob submit thing is probably why dEQP results are so terrible
<jcureton> hmm, maybe. dEQP doesn't trigger any faults in the kernel
<alyssa> m
<alyssa> Hm
pH5 has quit [Quit: bye]
TheCycoTWO has joined #panfrost
TheCycoONE has left #panfrost ["WeeChat 2.4"]
shadeslayer has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
yann has quit [Ping timeout: 246 seconds]
pH5 has joined #panfrost
raster has quit [Remote host closed the connection]
yann has joined #panfrost
stikonas has joined #panfrost
jernej has quit [Remote host closed the connection]
shadeslayer has joined #panfrost
* alyssa thinks of landing the compute shader preliminary code now
<alyssa> It's useless on its own (still needs SSBOs implemented and a few other things to actually do anything)
<alyssa> But it's a bunch of patches and it doesn't regress anything so ya know
afaerber has joined #panfrost
<alyssa> Anyway, for SSBO stuff, I'm really going to need to figure out how load/store arguments are passed
* alyssa fudges around for her notes
<alyssa> We talked about this like last week
<alyssa> Here 'tis
<alyssa> D'oh!
<alyssa> In an indirect load, `base address` and `offset` are not separate arguments
<alyssa> They're just two arguments that are added together, end of story
<alyssa> So they're totally commutative
<alyssa> Arguments themselves are 8-bit
<alyssa> For a register argument (like an indirect offset), we need to specify:
<alyssa> - Component (2-bit)
<alyssa> - Register (1-bit)
<alyssa> - Size (2-bit)
<alyssa> So we're up to 5-bits of 8
<alyssa> Something about the sizing is totally wrong tho :/
<alyssa> Yeah, size is wrong ummmm
<alyssa> But.. hrm
<alyssa> Let's do some decoding for the bottom 3-bits at least, since those are consistently sane for register inputs
<HdkR> alyssa: No 128bit load for SSBO?
<alyssa> HdkR: You can definitely do a 128-bit load?
<HdkR> size = element size then rather than full memory load size?
<HdkR> If the hardware could do a 128bit load then I'd expect 3bits for size to be able to do 8bit all the way to 128bit
<alyssa> HdkR: size being size of the register
<alyssa> And this is the register for offsets, not for th result itself
<HdkR> ah
<alyssa> You don't need 128-bit offsets.. yet ;)
<HdkR> 128bit offset would be a bit mad by today's standards yea :P
davidlt has quit [Ping timeout: 245 seconds]
<alyssa> Hum, we're raising more questions than we're answeing
<HdkR> :D
<alyssa> Sweet!
<alyssa> We're nowhere "there" yet, but already we can see that "r27-only ops" don't exist
<alyssa> If we don't set the "r27?" flag unconditionally, then we RA them onto r26, hah
<alyssa> So the next question is how to interpret the dummy args
<alyssa> E.g.: 0x7E
<alyssa> Which in any SSBO access seems to mean "ignore this arg / zero"
<alyssa> The top bit of arg_1 seems to mean "indirect?" in some cases but I recall there being exeptions
<alyssa> Ah, yes, indirect varyings are an exception
<alyssa> For varyings, 0x9E and 0x1E are ignored args
<alyssa> And indirect is specified by all bits being zero except bottom 3
* alyssa wonders if the 0x8 bit causes a field to be ignored/clamped to zero
<alyssa> It certainly matches all the data points
<alyssa> ----Wait. We can test these assumptions
<alyssa> It would appear `(arg & 0xF) == 0xE` is the real 'ignore argument' here
<alyssa> But that can't be totally right, since it still matters in some cases what's put in the upper nibble
raster has joined #panfrost
stikonas has quit [Remote host closed the connection]
rhyskidd has quit [Remote host closed the connection]
<alyssa> Aaaaa, the register spilling, the register spilling!
<alyssa> Anything but the register spilling!
* alyssa faints
<HdkR> Maximum register spilling engaged
<anarsoul> alyssa: you still need spilling with 16-32 vec4?
<HdkR> You'll always need spilling
<HdkR> Nvidia has 256 and still needs spilling :P
<alyssa> anarsoul: 16 vec4 and definitely
<alyssa> Though preferably shaders only use 8 vec4
<alyssa> since stuff goes south after 8
<anarsoul> spilling is hard :(
<bnieuwenhuizen> I've had shaders spilling kilobytes per invocation, the sky is the limit :)
<alyssa> bnieuwenhuizen: shadertoy?
<anarsoul> bnieuwenhuizen: ouch
<HdkR> bnieuwenhuizen: Well, amount of memory is the limit :P
<bnieuwenhuizen> alyssa: not sure what made us originally detect it, but I had a piglit test somewhere. gfxbench5 the vulkan test also has it
<bnieuwenhuizen> HdkR: ever heard of swap? :P
<alyssa> bnieuwenhuizen: remind me why i'm writing an opencl impl again T_T
<alyssa> :p
<bnieuwenhuizen> opencl?
<bnieuwenhuizen> anyway, first time I saw was definitely OpenGL, so you can clearly have this badness in any API
<alyssa> bnieuwenhuizen: It's kind of a long rabbit hole I'm down under
<alyssa> Fixing load/store handling because I'm figuring out more about load/store ops
<alyssa> because I want to implement SSBOs
<alyssa> because I want GLES3.1 compute shaders
<bnieuwenhuizen> alyssa: every driver goes through it at some point. you'll get through it
<alyssa> because people are bugging about compute
<HdkR> alyssa: Bugging about compute or OpenCL specifically? :P
<alyssa> HdkR: Yes.
<HdkR> haha
* alyssa is hoping any compute will do and then Other People (possibly including Future Alyssa) deal with CL specifically
<HdkR> alyssa: You know at least one person is going to ask about it at XDC :)
<alyssa> HdkR: Who says I'm going to XDC?!
<alyssa> :p
<bnieuwenhuizen> alyssa: how are you giving your witchcraft talk then?
<HdkR> :D
<HdkR> That was a great talk description
<alyssa> bnieuwenhuizen: Oh, is that public now?
* bnieuwenhuizen has half a clue
<bnieuwenhuizen> I found it through my talk having a link to the main track and the link works for non-logged in visits
* alyssa can't tell if talks were released or y'all just have Special Privileges
* bnieuwenhuizen has no clue how to find it through the main page though
<alyssa> bnieuwenhuizen: I'm not seeing said link
raster has quit [Remote host closed the connection]
<bnieuwenhuizen> alyssa: your own talk description, there is a "main track" button
* alyssa doesn't see it
<alyssa> Oh, here tis
<alyssa> There are a bunch of cool talks on Wednesday and Thursday I'm going to miss, awww :(
<bnieuwenhuizen> alyssa: not coming the full conference?
<alyssa> bnieuwenhuizen: Can't get away with skipping more than a day of class ;)
megi has quit [Quit: WeeChat 2.5]
<bnieuwenhuizen> alyssa: time to ask for recordings?
jernej has joined #panfrost
<alyssa> Ah-ha! It just occurred to me how to handle uniform/work split
<alyssa> Rather than promoting UBO reads to uniform registers, we should be optimistic.
<alyssa> And use a full set of 16 uniform registers.
<alyssa> Then, when RA fails, rather than spilling registers, we demote uniform registers to UBO reads ("spilling uniforms")
<alyssa> After doing that 8 times if RA is still failing, well, yeah, start spilling real stuff I guess
<alyssa> This should be a promising opt, I think
* alyssa will finish tmrw
pH5 has quit [Quit: bye]