alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
megi has quit [Ping timeout: 258 seconds]
vstehle has quit [Ping timeout: 245 seconds]
Elpaulo has quit [Read error: Connection reset by peer]
Elpaulo has joined #panfrost
chewitt has joined #panfrost
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
chewitt has quit [Client Quit]
vstehle has joined #panfrost
yann has quit [Ping timeout: 248 seconds]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
cwabbott has joined #panfrost
pH5 has joined #panfrost
sphalerit has quit [Read error: Connection reset by peer]
calcprogrammer1 has quit [Remote host closed the connection]
EmilKarlson has quit [Read error: Connection reset by peer]
thefloweringash has quit [Remote host closed the connection]
megi has joined #panfrost
raster has joined #panfrost
thefloweringash has joined #panfrost
sphalerit has joined #panfrost
EmilKarlson has joined #panfrost
Depau has quit [Quit: ZNC 1.7.4 - https://znc.in]
Depau has joined #panfrost
buzzmarshall has joined #panfrost
jcureton has quit [Remote host closed the connection]
davidlt has joined #panfrost
cwabbott has quit [Quit: cwabbott]
afaerber has quit [Quit: Leaving]
cwabbott has joined #panfrost
buzzmarshall has quit [Quit: Leaving]
buzzmarshall has joined #panfrost
buzzmarshall has quit [Quit: Leaving]
buzzmarshall has joined #panfrost
tlwoerner has quit [Quit: Leaving]
yann has joined #panfrost
jcureton has joined #panfrost
<tomeu> robher: what's the plan regarding the BO cache in userspace and madvise?
<tomeu> ah, guess a BO could be marked as MADV_DONTNEED when placed in the cache, and MADV_NORMAL when taken out of it
<alyssa> wens: Is it..? I thought the issue was us exposing OES_standard_derivatives (an extension) incorrectly
<alyssa> Or, wait, no, I'm confused with index_bias, mea culpa
<alyssa> chewitt: Could you try to figure out how to start weston on your board? Having troubles over SSH
pH5 has quit [Quit: bye]
yann has quit [Ping timeout: 268 seconds]
<alyssa> Never mind -- figured it out :)
<jcureton> alyssa: i'm pulling Mesa up past 5a7688fdecd7 ("panfrost: Use 64-bit descriptors globally"), and i'm seeing the same thing on two T720 platforms that rtp saw on T628 last week. kmscube gives a grey background but no cube. this is true on an Allwinner H6 (aarch64+T720) as well as an armv7l+T720 SoC I can't disclose. the armv7l+T720 works well prior
<jcureton> to the 64-bit descriptor change.
<jcureton> any pointers on where to look? there's a lot of internal dependencies in that patch and all of the manual bit poking appears to only happen if ctx->is_t6xx
<EmilKarlson> are you people using linux-5.3-rc*, did you notice slowdown in non-panfrost graphics
<EmilKarlson> whatever designware controller
<alyssa> EmilKarlson: We've received reports that Panfrost gfx is slower in 5.3 than 5.2, but we were unable to reproduce this. If this applies to non-Panfrost as well, that'd be good to know..
<rtp> jcureton: you have the issue on 32 and 64 bit ? maybe I totally wrong on this but it might be interesting to look at this change https://gitlab.freedesktop.org/mesa/mesa/commit/83a1d5544a78b6f741523aa1689ab0c0941d549b
<jcureton> i do have the issue on 32 and 64 bit. i'm actually building right now with something comparable to that and will report back!
<EmilKarlson> alyssa: any tips on how one could profile this?
<alyssa> EmilKarlson: Oh, I'm not great with profilers..
<alyssa> jcureton: So, T720 support is still new to the tree, to preface this
<EmilKarlson> other option would be bisect I guess, but everything is a bit harder, when you don't have a benchmark with numbers
<alyssa> EmilKarlson: Try glmark2-es2-drm as a benchmark, and if that doesn't repo, glmark2-es2-wayland under weston
<EmilKarlson> does it specifically measure the kms part?
<EmilKarlson> glmark sounds a bit 3d'y
<alyssa> jcureton: It's possible some magic bits marked as T6xx are also needed on T720, hence why rtp was able to get it working on T6xx
<alyssa> EmilKarlson: Oh, no, glmark is specifically 3D..
<alyssa> jcureton: rtp sent in 397f9ba69fcaef17de5c8f639957743890fa7805 to fix T6xx after that commit, not sure if you cherrypicked that into your tree / if it's active (is_t6xx may not be set on T720 depending if you have /83a1d5544a78b6f741523aa1689ab0c0941d549b in your tree as rtp linked)
<jcureton> alyssa: yeah, reverting 83a1d5544a78 where you change is_t6xx fixes 32-bit
<jcureton> testing on 64 now
calcprogrammer1 has joined #panfrost
<alyssa> Alright, that's good to know
<jcureton> also fixes 64
<alyssa> jcureton:
<alyssa> *curses tab complete, didn't mean to ping8
<alyssa> Hm, so we have a few options then
<alyssa> 1) Change is_t6xx to is_t72x_or_earlier (or something) and be satisfied with the pile of hacks
<alyssa> 2) Figure out specifically which is_t6xx-only changes are needed on T720 (it may be all or just a subset)
<alyssa> 3) Something else?
<jcureton> i can experiment on (2) and see which ones appear to be explicitly needed on T720
<alyssa> Go for it! :)
raster has quit [Read error: Connection reset by peer]
<jcureton> alyssa: just realized that the only magic bit that needs set is the one Arnaud fixed in 397f9ba69fca.. should have been obvious at the outset since that's what I built with :)
<rtp> I'm not sure that the is_t6xx is used somewhere else nowadays
<alyssa> jcureton: \o/
<alyssa> rtp: I suppose it isn't. Huh, right, a bunch of old is_t6xx properties turned out to be 32-bit hacks
<alyssa> So the 64-bit descriptors removed all that
<alyssa> I'd love to know what rtp's magic bit does
<rtp> I'd like to know that too :)
<alyssa> Maybe a chicken bit for some errata...?
<alyssa> All in due time, I suppose! :)
<alyssa> tomeu: daniels: Re your questions about SSBOs, they're handled identically to __global buffers in OpenCL
<alyssa> So all the CL related re applies directly here
<alyssa> So conceptaully, allocate the SSBO as just any old BO
<alyssa> Upload the GPU side address as a uniform (sysval) and pass it to the shader
<alyssa> In the shader, use generic load/store ops with a direct address and move in that address from the uniform (with uint64 ops)
<alyssa> Stores to SSBOs are slightly more complex than the CL case
<alyssa> Before, storing we do:
<alyssa> ldst_op_9E [scalar temp], 96.xxxx, 0x1E00
<alyssa> We branch on that result (comparing to zero). If that's true, we skip over the actual store.
<alyssa> Guessing this is something related to memory barriers..?
<alyssa> This seems familiar, um
* alyssa greps notes
<alyssa> Ah-hah, I knew it looked familiar!
<alyssa> That same magic ldst (op 9E, address 96.xxxx) is also used:
<alyssa> - when reading from the stencil buffer in a Fragment shader
<alyssa> To read from a stencil buffer, the shader does a ld_color_buffer_8 op, with
<alyssa> r27.w = findMSB(that scalar temp)
<alyssa> - when reading from the depth buffer in certain Z formats (at which point we're identical to the stencil case)
<alyssa> - when reading from the color buffer... again similar but with a loop craziness
<alyssa> - a similar pattern is used in Pixel Local Storage writes (98.x instead of 96.x, discarding the write if the LSB isn't set)
<alyssa> It is NOT used for reads from PLS
<alyssa> Evidently it's some kind of barrier or atomic-hack or something..
* alyssa should read the SSBO spec
<daniels> hmm, the store-and-loop sounds super familiar ... wasn't that also used for frag writes?
<alyssa> daniels: Yeah, see above :)
<alyssa> Well, here it's a store-maybe
<alyssa> This occurs regardless of volatile/coherent/restrict qualifiers
<daniels> ah, before it was a load-maybe?
<daniels> it's such a shame we don't have the same joy on branches
<daniels> (can you see where i'm going with this ...)
<alyssa> daniels: Which joy? :P
<daniels> well, if you had a possibly-branch opcode, you could name it 'call-me-maybe'
<daniels> seems like a huge missed opportunity
<alyssa> Hay, I just met you
<alyssa> I'm feel hazy!
<alyssa> So here's an opcode
<alyssa> Now call me maybe
<daniels> (chorus repeats x5)
<alyssa> AHhhhhh!h!hhH!h!H!HH!
<alyssa> daniels: Helper invocatiosn!
<alyssa> "Stores to image and buffer variables performed by helper invocations have no effect on the underlying image or buffer memory."
<alyssa> From page 122 of the GLSL ES 3.20 spec
<alyssa> gl_HelperInvocation is exposed, so this is easy to verify
<alyssa> Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
<alyssa> ("was that a lightbulb moment or were you fuzzing irssi for buffer overflows?" "Yes.")
<daniels> he!
<alyssa> gl_HelperInvocation = (ldst_op_9E(96.x, 0x1E00) == 0);
<daniels> nice find!
<alyssa> Thanks :)
<alyssa> daniels: TBF, nothing in this shader uses derivatives
<alyssa> Helper invos shouldn't even be enabled, so that check should be useless
<alyssa> But maybe the blob doesn't care
<alyssa> Wait, wat
<alyssa> ----Oh. Hm
<alyssa> Yeah, dunno, helper invos are disabled for that shader. Whatever. Onwards :p
<alyssa> Anyways, so it's zero for helper invos, nonzero for real invos. Wonder what it means for real invos.
<alyssa> (Maybe one of the other builtins. mutlisampling perhaps?)
<alyssa> Yeah, sure
<alyssa> gl_SampleMaskIn[0] is read in with ldst_op_9E(96.x, 0x1E00)
<alyssa> gl_MaxSamples = 16 so that's easy
<alyssa> So, conceptually, Midgard defines:
<alyssa> gl_HelperInvocation = gl_SampleMaskIn[0] != 0;
<alyssa> er wait
<alyssa> gl_HelperInvocation = gl_SampleMaskIn[0] == 0;
<alyssa> That is, "this is a helper invocation if it doesn't actually correspond to any samples"
<alyssa> gl_SampleID = ldst_op_9E(97.x, 0x1E00)
<alyssa> gl_SamplePosition is done by uploading sample positions as a UBO and then indexing by gl_SampleID
<alyssa> Anyways, to recap SSBOs:
<alyssa> - Regular buffers GPU-mapped
<alyssa> - Address passed as uniforms
<alyssa> - Read just like OpenCL
<alyssa> - Written just like OpenCL, but only if it's not a gl_HelperInvocation
<alyssa> - gl_HelperInvocation = (gl_SampleMaskIn[0] == 0) = (ldst_op_9E(96.x, 0x1E00) == 0)
<alyssa> Atomics are up next, I guess
stikonas has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
<alyssa> Er, images, I guess
yann has joined #panfrost
herbmillerjr has quit [Quit: Konversation terminated!]
* alyssa wonders why derivatives in both directions are computed for a single dFdx or single dFdy call
herbmillerjr has joined #panfrost
<HdkR> Woo helpers
<alyssa> HdkR: Helpers are enabled regardless
<alyssa> It's just doing 2x as much work as it should be for dfdx/dfdy alone
<alyssa> cwabbott: I'm looking into helper invo stuff
<alyssa> In a shader like:
<alyssa> texture(tex0, texture(tex1, coord).xy)
<alyssa> The first tex op actually has _both_ "cont" and "last" flags set
<alyssa> So maybe all the flags are inverted?
<alyssa> (The second tex op just has "last"
<alyssa> )
<alyssa> cont = keep the helper invo alive
<alyssa> er wait inverted again um
<alyssa> Likewise, for dFdx(texture...)
<alyssa> The texture op has _both_ cont and last set
<alyssa> And then the first four tex ops for derivs have just cont, and finally the last tex op (for deriv) has just last
<alyssa> Now, for texture ops with an explicit LOD (so no helper invos needed), we actually set _both_ cont and last
<alyssa> Is it possible maybe... hmm..
<alyssa> There are four cases, I think:
<alyssa> - Result not used by a future helper invo, no more future helper invos: LAST
<alyssa> - Result not used by a future helper invo, yes more future helper invos: CONT
<alyssa> - Result yes used by a future helper invo, no more future helper invos: [-]
<alyssa> - Result yes used by a future helper invo, yes more future helper invos: CONT/LAST
<alyssa> Seemingly, CONT means "keep helper invo alive"
<alyssa> And LAST means... "helper invo still alive == result used"? I guess?
<alyssa> '!cont || used" I guess
<alyssa> textureLod has both flags set unconditionally, it looks like (flags might be ignored for derivs?)
<alyssa> uh, instructions that don't compute derivatives
* alyssa has some draft code implementing derivatives in the Mdg compiler
<alyssa> Codegen isn't terrible neither
paulk-leonov has quit [Ping timeout: 250 seconds]
<alyssa> Only float/vec2
<alyssa> Still need a lowering (splitting) pass for vec3/vec4
paulk-leonov has joined #panfrost
stikonas has quit [Remote host closed the connection]
davidlt has quit [Ping timeout: 245 seconds]
<alyssa> Wrote a lowering pass but the GPU is still not quite happy
<alyssa> Oh, huh, there's a swizzle we need to set on the sampler itself
<HdkR> Welcome to GL texture swizzle extension? :D
<HdkR> Which is core in ES 3.0
<alyssa> HdkR: No, that's handled with one of the swizzles in the cmdstream
<HdkR> huh
<alyssa> There are like 5 swizzles associated with a given texture
<alyssa> Ok, now it works but throws a quality warning
<alyssa> For vec4 only
<alyssa> vec2/3 are fine
<alyssa> And only in highp, it's fine for mediump
<HdkR> Oh, swizzle is only defined in GL on the texture, not the sampler. I feel like I've seen it in sampler somewhere...
<HdkR> oop
<alyssa> My derivatives are failing along edges
<alyssa> Oy vey
<HdkR> :)
<HdkR> It's a typical problem
<alyssa> Last time I saw something like this it turned out I had helper invos disabled
<alyssa> They're definitely enabled now though :P
<HdkR> I'm assuming it is derivative usage while there are dead threads in the quad
* alyssa was under the impression that's what helper invos are for
<HdkR> Yep
<alyssa> OH!
<alyssa> idhsgfyuirfgresghkrughurilhgfduyksgriseuyfgresugselrg
<alyssa> That is all.
<HdkR> I understand
<alyssa> HdkR: The test in question is doing derivatives in a loop.
<HdkR> Fun :D
<alyssa> The skip/kill flags Connor identified (cont/last in Panfrost)
<alyssa> If we set the kill flag (so cyberpunk), helper thread goes poof
<alyssa> So derivatives and wrong for iterations > 1
<HdkR> Makes sense
<calcprogrammer1> I tried to run OpenJK (open source Jedi Academy, Quake 3 based engine) and I can move around in spectate mode in an empty map but as soon as a player/bot joins it crashes and spams PIPE_FORMAT_R16G16B16A16_UNORM. Just wanted to let you know if you haven't tested this game, was just experimenting with the latest build of Mesa.
<HdkR> Time to take control flow in to account while determining if kill should be used :P
<alyssa> calcprogrammer1: Hm, I haven't tested that game
<alyssa> RGBA16_UNORM should theoretically be supported.. wonder what it's doing with the format (vertex? texture? render target?)
<alyssa> calcprogrammer1: Could you grab a backtrace, please? :)
<calcprogrammer1> mind walking me through how to do that?
<alyssa> $ gdb [program]
<alyssa> r
<calcprogrammer1> going to try openarena as well since it's a similar engine
<alyssa> (do whatever to make it crash)
<alyssa> bt
<alyssa> (Paste output)
<calcprogrammer1> ok
<HdkR> Looks like it is using texrect + rgba16
<alyssa> HdkR: Added a dumb heuristic of "always keep helpers alive if loops are used", problem solved .. :P
<HdkR> alyssa: pfft
<HdkR> How does that behave in a world with SSBOs and image stores? :P
<alyssa> HdkR: As I discovered htis morning, SSBO stores are wrapped in "if(!gl_HelperInvocation) { ... }"
<alyssa> I would assume image stores work the same way
<HdkR> Ah, you wrap stores in a helper invocation check yourself?
<alyssa> HdkR: Well, I haven't implemented SSBOs or images yet, but yes, that's required on mdg
<HdkR> :D
<alyssa> Oh, and then there's this weirdo unknown4 flag
<alyssa> "unknown4 = 0x1" when the results of texturing are fed back into derivatives, I guess
<alyssa> (Set on the derivative, I mean)
<HdkR> flag on texture op?
<alyssa> Yeah
<alyssa> (Derivatives are special texture ops)
<HdkR> Does behaviour change when it is set to zero?
<alyssa> TbD
<alyssa> TBD
<HdkR> hehe