alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
vstehle has quit [Ping timeout: 245 seconds]
megi has quit [Ping timeout: 252 seconds]
_whitelogger has joined #panfrost
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 245 seconds]
davidlt has joined #panfrost
tlwoerner has joined #panfrost
rhyskidd has joined #panfrost
vstehle has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
yann has quit [Ping timeout: 245 seconds]
pH5 has joined #panfrost
yann has joined #panfrost
megi has joined #panfrost
raster has joined #panfrost
jcureton has quit [Remote host closed the connection]
warpme_ has joined #panfrost
warpme_ has quit [Quit: warpme_]
Depau has quit [Ping timeout: 272 seconds]
Depau has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
warpme_ has joined #panfrost
warpme_ has quit [Quit: warpme_]
megi has quit [Quit: WeeChat 2.5]
warpme_ has joined #panfrost
cwabbott has quit [Quit: cwabbott]
<tomeu> robher: ^
cwabbott has joined #panfrost
jcureton has joined #panfrost
<narmstrong> tomeu: robher: seems to crash with CONFIG_CPU_BIG_ENDIAN=y and CONFIG_RANDOMIZE_BASE=y very weird
<robher> narmstrong: with something recent going into -next?
<tomeu> gtucker: has there been a bisection to go with that change?
<tomeu> s/change/report
<narmstrong> tomeu: let me analyse which board crashes, it may be a power issue we already solved on other boards
TheCycoONE has quit [Quit: ZNC 1.7.4 - https://znc.in]
TheCycoONE has joined #panfrost
herbmilleriw has quit [Quit: Konversation terminated!]
guillaume_g has joined #panfrost
* alyssa will have to do the dataflow analysis pass for texturing and is rather dreading it
<alyssa> But the good news is derivatives *are* working ok
guillaume_g has quit [Quit: Konversation terminated!]
herbmilleriw has joined #panfrost
<wens> yay!
JuJu has joined #panfrost
<alyssa> wens: What was the test for derivatives in chromium?
<wens> just open any page other than the default blank page
<wens> even chrome://gpu would do
<wens> if it crashes, i.e. aborts, then it doesn't work
<alyssa> chrome://gpu has rendered for a bit I had thought? \dhrug/
<alyssa> wens: It's extremely WIP (maybe wait until tomorrow for it to be cleaned up and hit master)
<alyssa> but if you're antsy to try something, deriv-v3 branch at gitl.fd.o/tomeu/mesa
warpme_ has quit [Quit: warpme_]
megi has joined #panfrost
pH5 has quit [Quit: bye]
warpme_ has joined #panfrost
ezequielg has quit [Ping timeout: 264 seconds]
ezequielg has joined #panfrost
<alyssa> Oh, I have a bunch of opts from last week which never got pushed
<alyssa> let's clean that up
megi has quit [Quit: WeeChat 2.5]
warpme_ has quit [Quit: warpme_]
yann has quit [Ping timeout: 245 seconds]
raster has quit [Remote host closed the connection]
<alyssa> So I'm working on represrnting conditionals explicitly in MIR
<alyssa> (As additiaonal args, basically)
<alyssa> Moving the r31 magic from codegen to scheduling
<alyssa> The tricky part is that now we have to insert instructions in scheduling (part 2)
<alyssa> Which in turn modifiers the schedule, etv
* alyssa needs to think about this
<alyssa> Maybe it's easiest to do some backtracking
<alyssa> Oh, no, even easier
<alyssa> When scheduling an op taking conditions, check back to see if all were emitted this bundle
<alyssa> If no, break the bundle and insert moves then and there
<alyssa> (Dummy moves, cond=cond)
<alyssa> Then do a pass after scheduling to create the pipeline registers (the r31 writes)
<alyssa> [Just to take some logic out of the scheduler]
<alyssa> Um, in advance of everything, let's fix branching/const special casing
<alyssa> Since they're just not that spcial
MistahDarcy has joined #panfrost
stikonas has joined #panfrost
megi has joined #panfrost
<alyssa> Oh, more cascading kittens...
* alyssa meows
<alyssa> (We also need to insert moves if a condition is read as a non-condition)
<alyssa> Mooooooooo
<alyssa> This is a contradiction?!
<alyssa> Ac
<alyssa> k
<alyssa> Midgard what even
<alyssa> These branch combiners are what-in-the-world
yann has joined #panfrost
<alyssa> Hrm.
<alyssa> Okay, so a branch can take up to 4 conditions
<alyssa> r31.w for scalar units, r31.x for vector units
<alyssa> NOT modifiers are evidently not per source
<alyssa> Conceptually, we have 16-bits available
<alyssa> The fact that it's able to do these logic ops is insane by itself..
<alyssa> So, taking a complement of the condition corresponds to complementing the entire 16-bits, ummm ok
<alyssa> This is just so.. chaotic
<alyssa> OH!
<alyssa> ...Hm
<alyssa> I thought I was so clever for a second
<alyssa> No, I'm still on the right track, I think?
<alyssa> Got it.
<alyssa> The key realization was wondering, not how I would encode the function in software, but how a hardware designer would do it
<alyssa> Semi-formally: We have four booleans in A, B, C, D. We output a single boolean
<alyssa> f(A, B, C, D) -> Q
<alyssa> So the branching field is a Godel numbering of the function f.
<alyssa> But here's the key realization: for 4 independent 1-bit inputs, there are 2^4 = 16 elements on f's domain
<alyssa> Each gets mapped to a 1-bit element, so we can write out the entire behaviour of f as a 16-bit number
<alyssa> And how big is our branching field? Why, exactly 16-bit! :)
<alyssa> It was around this point (combined with the inverting inverts the number) that I realized...
<alyssa> This is just a lookup table.
<alyssa> So I whipped up a Python script to try to figure out how exactly the scheme worked
<alyssa> I assumed, based on A&B&C&D mapped to 0x8000, that it was just the four elements in order, etc
<alyssa> That didn't work, but analyzing the binary forms of my results with the known references, the number of 0s and 1s were right
<alyssa> Just the swizzle was off
<alyssa> So at this point I modified the script to bruteforce the ordering, and one order -- ACDB -- quickly revealed itslef as the winner
<alyssa> And that's it :)
<alyssa> Any questions? :P
<alyssa> (https://people.collabora.com/~alyssa/lut.py if class wants to follow along in their books ;P)
<alyssa> Case closed? Case _not_ closed. What about the codes for 2-arg and 3-arg?
<alyssa> Well, let's do 3-arg next
<alyssa> 2^3 = 8-bit codes, which we see as using only the bottom 8-bit and duplicatating the upper 8-bit
<alyssa> Having troubles narrowing the ordering for 3-arg
<alyssa> Either XZY or ZXY
<alyssa> Regardless, the other question is _how on earth do we print this_
<alyssa> Like, in the disassembly
<alyssa> I guess theoretically you could reconstruct expressions from the LUT (a truth table essentially)
<alyssa> But that would require an algebra system to do well, quite outside the scope of our disassembler :p
davidlt has quit [Ping timeout: 245 seconds]
<alyssa> Ack, wow, okay
<alyssa> When we're only dealing with a single condition, take a look at this enum:
<alyssa> cond_[special-mostly-unused] = 0
<alyssa> cond_false = 1
<alyssa> cond_true = 2
<alyssa> cond_always = 3
<alyssa> Just a bunch of cases, right? What happens when you look at it *binary*
<alyssa> 00: never
<alyssa> 01: false
<alyssa> 10: true
<alyssa> 11: always
<alyssa> Using the usual numbering scheme, that's a 2-bit LUT
<alyssa> When c=0, it's 1 for "false" or "always" modes. When c=1, it's 1 for "true" or "always" modes.
herbmilleriw has quit [Remote host closed the connection]
herbmilleriw has joined #panfrost
stikonas has quit [Remote host closed the connection]
herbmilleriw has quit [Ping timeout: 245 seconds]
herbmilleriw has joined #panfrost
herbmilleriw has quit [Ping timeout: 245 seconds]
<alyssa> I still don't understand when r31.x vs r31.w
<alyssa> It's something about scheduling but.. :/
<alyssa> r31.x for vector, r31.w for scalar but... that still isn't quite right
<alyssa> if (all(equal(...))) discard;
<alyssa> This triggers some _very_ interesting assembly
<alyssa> First of all, it's triggering a new outmod (.unk2)
<alyssa> (on the ball_eq, which is on a vmul)
<alyssa> That's going to generic r31, unmasked (?)
<alyssa> And then we have on the branch a lut condition F0F0
<alyssa> If you write out the trth table for F0, you see that condition is just for grabbing the one element
<alyssa> Oh, .unk2 is set for ball always
<alyssa> Bet I just messed up the disasm for type converts
<alyssa> Oh ball/bany are even more messed up than I thought O-o
<alyssa> Ok, fixed like 3 bugs with ball/bany um ok, back to what we were doing
<alyssa> Okay, down to two weirdness:
<alyssa> 1) Why isn't it masking a single component/!
<alyssa> 2) Why can't we index directly?