vstehle has quit [Ping timeout: 245 seconds]
megi has quit [Ping timeout: 252 seconds]
_whitelogger has joined #panfrost
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 245 seconds]
davidlt has joined #panfrost
tlwoerner has joined #panfrost
rhyskidd has joined #panfrost
vstehle has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
yann has quit [Ping timeout: 245 seconds]
pH5 has joined #panfrost
yann has joined #panfrost
megi has joined #panfrost
raster has joined #panfrost
jcureton has quit [Remote host closed the connection]
warpme_ has joined #panfrost
warpme_ has quit [Quit: warpme_]
Depau has quit [Ping timeout: 272 seconds]
Depau has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
warpme_ has joined #panfrost
warpme_ has quit [Quit: warpme_]
megi has quit [Quit: WeeChat 2.5]
warpme_ has joined #panfrost
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #panfrost
jcureton has joined #panfrost
<
narmstrong>
tomeu: robher: seems to crash with CONFIG_CPU_BIG_ENDIAN=y and CONFIG_RANDOMIZE_BASE=y very weird
<
robher>
narmstrong: with something recent going into -next?
<
tomeu>
gtucker: has there been a bisection to go with that change?
<
tomeu>
s/change/report
<
narmstrong>
tomeu: let me analyse which board crashes, it may be a power issue we already solved on other boards
TheCycoONE has joined #panfrost
herbmilleriw has quit [Quit: Konversation terminated!]
guillaume_g has joined #panfrost
* alyssa
will have to do the dataflow analysis pass for texturing and is rather dreading it
<
alyssa>
But the good news is derivatives
*are* working ok
guillaume_g has quit [Quit: Konversation terminated!]
herbmilleriw has joined #panfrost
JuJu has joined #panfrost
<
alyssa>
wens: What was the test for derivatives in chromium?
<
wens>
just open any page other than the default blank page
<
wens>
even chrome://gpu would do
<
wens>
if it crashes, i.e. aborts, then it doesn't work
<
alyssa>
chrome://gpu has rendered for a bit I had thought? \dhrug/
<
alyssa>
wens: It's extremely WIP (maybe wait until tomorrow for it to be cleaned up and hit master)
<
alyssa>
but if you're antsy to try something, deriv-v3 branch at gitl.fd.o/tomeu/mesa
warpme_ has quit [Quit: warpme_]
megi has joined #panfrost
pH5 has quit [Quit: bye]
warpme_ has joined #panfrost
ezequielg has quit [Ping timeout: 264 seconds]
ezequielg has joined #panfrost
<
alyssa>
Oh, I have a bunch of opts from last week which never got pushed
<
alyssa>
let's clean that up
megi has quit [Quit: WeeChat 2.5]
warpme_ has quit [Quit: warpme_]
yann has quit [Ping timeout: 245 seconds]
raster has quit [Remote host closed the connection]
<
alyssa>
So I'm working on represrnting conditionals explicitly in MIR
<
alyssa>
(As additiaonal args, basically)
<
alyssa>
Moving the r31 magic from codegen to scheduling
<
alyssa>
The tricky part is that now we have to insert instructions in scheduling (part 2)
<
alyssa>
Which in turn modifiers the schedule, etv
* alyssa
needs to think about this
<
alyssa>
Maybe it's easiest to do some backtracking
<
alyssa>
Oh, no, even easier
<
alyssa>
When scheduling an op taking conditions, check back to see if all were emitted this bundle
<
alyssa>
If no, break the bundle and insert moves then and there
<
alyssa>
(Dummy moves, cond=cond)
<
alyssa>
Then do a pass after scheduling to create the pipeline registers (the r31 writes)
<
alyssa>
[Just to take some logic out of the scheduler]
<
alyssa>
Um, in advance of everything, let's fix branching/const special casing
<
alyssa>
Since they're just not that spcial
MistahDarcy has joined #panfrost
stikonas has joined #panfrost
megi has joined #panfrost
<
alyssa>
Oh, more cascading kittens...
<
alyssa>
(We also need to insert moves if a condition is read as a non-condition)
<
alyssa>
Mooooooooo
<
alyssa>
This is a contradiction?!
<
alyssa>
Midgard what even
<
alyssa>
These branch combiners are what-in-the-world
yann has joined #panfrost
<
alyssa>
Okay, so a branch can take up to 4 conditions
<
alyssa>
r31.w for scalar units, r31.x for vector units
<
alyssa>
NOT modifiers are evidently not per source
<
alyssa>
Conceptually, we have 16-bits available
<
alyssa>
The fact that it's able to do these logic ops is insane by itself..
<
alyssa>
So, taking a complement of the condition corresponds to complementing the entire 16-bits, ummm ok
<
alyssa>
This is just so.. chaotic
<
alyssa>
I thought I was so clever for a second
<
alyssa>
No, I'm still on the right track, I think?
<
alyssa>
The key realization was wondering, not how I would encode the function in software, but how a hardware designer would do it
<
alyssa>
Semi-formally: We have four booleans in A, B, C, D. We output a single boolean
<
alyssa>
f(A, B, C, D) -> Q
<
alyssa>
So the branching field is a Godel numbering of the function f.
<
alyssa>
But here's the key realization: for 4 independent 1-bit inputs, there are 2^4 = 16 elements on f's domain
<
alyssa>
Each gets mapped to a 1-bit element, so we can write out the entire behaviour of f as a 16-bit number
<
alyssa>
And how big is our branching field? Why, exactly 16-bit! :)
<
alyssa>
It was around this point (combined with the inverting inverts the number) that I realized...
<
alyssa>
This is just a lookup table.
<
alyssa>
So I whipped up a Python script to try to figure out how exactly the scheme worked
<
alyssa>
I assumed, based on A&B&C&D mapped to 0x8000, that it was just the four elements in order, etc
<
alyssa>
That didn't work, but analyzing the binary forms of my results with the known references, the number of 0s and 1s were right
<
alyssa>
Just the swizzle was off
<
alyssa>
So at this point I modified the script to bruteforce the ordering, and one order -- ACDB -- quickly revealed itslef as the winner
<
alyssa>
And that's it :)
<
alyssa>
Any questions? :P
<
alyssa>
Case closed? Case
_not_ closed. What about the codes for 2-arg and 3-arg?
<
alyssa>
Well, let's do 3-arg next
<
alyssa>
2^3 = 8-bit codes, which we see as using only the bottom 8-bit and duplicatating the upper 8-bit
<
alyssa>
Having troubles narrowing the ordering for 3-arg
<
alyssa>
Either XZY or ZXY
<
alyssa>
Regardless, the other question is
_how on earth do we print this_
<
alyssa>
Like, in the disassembly
<
alyssa>
I guess theoretically you could reconstruct expressions from the LUT (a truth table essentially)
<
alyssa>
But that would require an algebra system to do well, quite outside the scope of our disassembler :p
davidlt has quit [Ping timeout: 245 seconds]
<
alyssa>
Ack, wow, okay
<
alyssa>
When we're only dealing with a single condition, take a look at this enum:
<
alyssa>
cond_[special-mostly-unused] = 0
<
alyssa>
cond_false = 1
<
alyssa>
cond_true = 2
<
alyssa>
cond_always = 3
<
alyssa>
Just a bunch of cases, right? What happens when you look at it
*binary*
<
alyssa>
11: always
<
alyssa>
Using the usual numbering scheme, that's a 2-bit LUT
<
alyssa>
When c=0, it's 1 for "false" or "always" modes. When c=1, it's 1 for "true" or "always" modes.
herbmilleriw has quit [Remote host closed the connection]
herbmilleriw has joined #panfrost
stikonas has quit [Remote host closed the connection]
herbmilleriw has quit [Ping timeout: 245 seconds]
herbmilleriw has joined #panfrost
herbmilleriw has quit [Ping timeout: 245 seconds]
<
alyssa>
I still don't understand when r31.x vs r31.w
<
alyssa>
It's something about scheduling but.. :/
<
alyssa>
r31.x for vector, r31.w for scalar but... that still isn't quite right
<
alyssa>
if (all(equal(...))) discard;
<
alyssa>
This triggers some
_very_ interesting assembly
<
alyssa>
First of all, it's triggering a new outmod (.unk2)
<
alyssa>
(on the ball_eq, which is on a vmul)
<
alyssa>
That's going to generic r31, unmasked (?)
<
alyssa>
And then we have on the branch a lut condition F0F0
<
alyssa>
If you write out the trth table for F0, you see that condition is just for grabbing the one element
<
alyssa>
Oh, .unk2 is set for ball always
<
alyssa>
Bet I just messed up the disasm for type converts
<
alyssa>
Oh ball/bany are even more messed up than I thought O-o
<
alyssa>
Ok, fixed like 3 bugs with ball/bany um ok, back to what we were doing
<
alyssa>
Okay, down to two weirdness:
<
alyssa>
1) Why isn't it masking a single component/!
<
alyssa>
2) Why can't we index directly?