<robmur01>
Either way I suspect "designed to support OpenGL 4.2" is probably more like "designed to support DX11 and somebody got carried away cross-referencing standards while writing tables of texture formats" :P
<alyssa>
robmur01: :P
* alyssa
has been figuring out how to schedule Bifrost.
bnieuwenhuizen_ has joined #panfrost
bnieuwenhuizen_ is now known as bnieuwenhuizen
adjtm has quit [Remote host closed the connection]
adjtm has joined #panfrost
<alyssa>
..Okay, Bifrost has ceased to make any sense to my brain
<HdkR>
Oh no
<alyssa>
there is no dependency between these two clauses
<alyssa>
but there is an implicit RAW dep
<alyssa>
what gives
<alyssa>
also, looks like bundles with STORE in it can't read anything from the reg file. so that's fun.
<alyssa>
Perhaps there is distinction between hi- and low-latency deps
<alyssa>
or loads vs stores
<alyssa>
maybe there is implicitly no concurrency across stores
<HdkR>
high latency could potentially cause a full stall and is known done once the bundle completes?
<HdkR>
So most memory loads?
<alyssa>
HdkR: In that case there's no reason to track dependencies at all.
<alyssa>
The arch explicitly allows you to do
<alyssa>
{ load } { load } { something with both loads }
<alyssa>
with deps (), (), (0, 1)
<alyssa>
and then the hw can execute the loads in either order (so "effectively" concurrent)
<HdkR>
I feel like that is sane
<alyssa>
I do too
<alyssa>
But I'm trying to understand how a load dep is conceptually different from a store dep
<alyssa>
HdkR: Ohh, derp, okay, yeah.
<alyssa>
I've been thinking about clauses all wrong.
<HdkR>
:D
<alyssa>
AFAICT, clauses can't execute out-of-order or in parallel or anything
<alyssa>
Rather, you can think of taking all the clauses in source order and just running through them linear
<alyssa>
So conceptually the hardware just runs that with completely 0 latency, just instruction after instruction in linear order.
<HdkR>
That's my understanding as long as they don't do any voodoo hardware scheduling behind your back. Which would be invisible to the isa encoding anyway
<alyssa>
Dependencies just act as barriers to ensure previous instructions terminated (so it's safe to read their values)
<alyssa>
Right, the implication is that
<alyssa>
{ R0 = foo, R1 = bar, STORE R0 } { STORE R1 }
<alyssa>
^ those clauses don't need any dependencies, since R1 is gauranteed to be written even without a dep flag
<alyssa>
the dep flag would force the first store to complete, which is not necessary. So you can still get the stores in parallel.
<alyssa>
So this simplifies a lot, tb.
<alyssa>
*tbh
<alyssa>
I was nervous RA would have to deal with concurrency issues but it sounds like that's not the case at all, concurrency can *only* happen with the hi-latency ops, there's no clause-level concurrency of ALUs
<alyssa>
to think I read "Dependency tracking" in Connor's notes like four times and still had that misconception, sigh
<HdkR>
Right, so as long as that assignment of R1 wasn't a long latency op `{R1 = Load}, {Store R1}` sort of thing, then you theoretically don't need a dep between the clauses is what you're positing
<HdkR>
Sounds correct
<alyssa>
Right, yeah
nerdboy has quit [Ping timeout: 260 seconds]
stikonas has quit [Remote host closed the connection]
<alyssa>
Regardless, scheduler is definitely the one laying out clauses.
<alyssa>
And in one stage, not a two-stage inter-clause/intra-clause schedule
<alyssa>
Which can be.. accomodated, with some refactor
<alyssa>
And solves my branching woes nicely :)
<alyssa>
So packing ends up being dumb =D
<HdkR>
woo
<alyssa>
Hopefully I can get branching going tomorrow then
<HdkR>
(Also find it strange that they claim up to four displays can be run off the thing)
<Lyude>
HdkR: it shouldn't, I'll fix any you find
<Lyude>
(that's one of the ones I support for my job)
<HdkR>
Awesome. I just glued it to my working desk. My Laptop will be arriving in a couple days to play with it
<HdkR>
Well, double side sticky taped, but about the same when it's VHB
Cruft has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
buzzmarshall has quit [Remote host closed the connection]
davidlt has joined #panfrost
vstehle has joined #panfrost
NeuroScr has joined #panfrost
Cruft has quit [Quit: Leaving]
robmur01_ has joined #panfrost
vstehle has quit [*.net *.split]
Lyude has quit [*.net *.split]
robmur01 has quit [*.net *.split]
xdarklight has quit [*.net *.split]
cowsay has quit [*.net *.split]
tlwoerner has quit [*.net *.split]
nlhowell has quit [*.net *.split]
robmur01_ is now known as robmur01
nlhowell has joined #panfrost
vstehle has joined #panfrost
tlwoerner has joined #panfrost
xdarklight has joined #panfrost
cowsay has joined #panfrost
Lyude has joined #panfrost
icecrea105 has joined #panfrost
icecream95 has quit [Ping timeout: 246 seconds]
icecrea105 has quit [Quit: leaving]
icecream95 has joined #panfrost
icecrea105 has joined #panfrost
icecream95 has quit [Ping timeout: 260 seconds]
icecrea105 has quit [Quit: leaving]
stikonas has joined #panfrost
raster has joined #panfrost
Prf_Jakob has quit [Ping timeout: 246 seconds]
embed-3d has quit [Ping timeout: 246 seconds]
embed-3d has joined #panfrost
Prf_Jakob has joined #panfrost
sphalerite has quit [Quit: rebooooooooot]
sphalerite has joined #panfrost
sphalerite has quit [Client Quit]
sphalerite has joined #panfrost
indy has quit [Ping timeout: 260 seconds]
indy has joined #panfrost
indy has quit [Ping timeout: 264 seconds]
indy has joined #panfrost
buzzmarshall has joined #panfrost
cwabbott has quit [Ping timeout: 272 seconds]
cwabbott_ has joined #panfrost
cwabbott_ is now known as cwabbott
nerdboy has joined #panfrost
adjtm has quit [Read error: Connection reset by peer]
adjtm has joined #panfrost
nerdboy has quit [Ping timeout: 246 seconds]
nerdboy has joined #panfrost
anarsoul has quit [Remote host closed the connection]
<alyssa>
# unknown pos 0xe
<alyssa>
cwabbott: ^ :D
raster has quit [Quit: Gettin' stinky!]
<alyssa>
6 constants, 6 instructions in a clause
<Lyude>
o:
anarsoul has joined #panfrost
<Lyude>
that's a first
<alyssa>
so (6, 4) in the table, I guess?
nerdboy has quit [Ping timeout: 260 seconds]
<alyssa>
or (6, 5) maybe
<alyssa>
uh wait
<alyssa>
const0....const9
davidlt has quit [Ping timeout: 256 seconds]
mixfix41 has quit [Ping timeout: 256 seconds]
gcl_ has quit [Ping timeout: 260 seconds]
<alyssa>
(6, 4) I guess
gcl has joined #panfrost
* alyssa
has confused herself
<alyssa>
It's (6, 5)
<alyssa>
So 6 ins clause has 7 64-bit constants, 6 in use here
<alyssa>
Not sure why it refuses to use the 6th constant
<alyssa>
with 7 instructions, I mean
<alyssa>
wait what
<alyssa>
Now it's imagining instructions that don't exist
<alyssa>
6 constants, 7 instructions is okay
<alyssa>
via 0xd I guess
<alyssa>
But it refuses to do 7 constants, 7 instructions
<alyssa>
(or 7 constants 8 instructions)
<alyssa>
So the missing combinations are:
<alyssa>
6 instructions, 6 constants
<alyssa>
7 instructions, 7 constants
<alyssa>
8 instructions, 6 constants
<alyssa>
8 instructions, 7 constants
<alyssa>
8 instructions, 8 constants
<alyssa>
Presumably `pos = 0xf` is one of those
<alyssa>
Oh, but the first one is the 0xe I have above
<alyssa>
So one of the latter 4
<alyssa>
But it's refusing to do 8 instructions, 6 constants too. So not sure why 0xf is.
<alyssa>
It's happy to do 8, 5
<alyssa>
Assuming 0xf doesn't exist, this actually satisfies a really nice invariant
<alyssa>
constant_count + instruction_count <= 13
<alyssa>
No clue why 13 :p
<Lyude>
alyssa: you're not on the scheduler yet are you? (curious what's got you playing around with multiple instructions per clause)
<alyssa>
Lyude: Not exactly, but I'm paying serious thought to figuring out how I'll want to write it.
<alyssa>
So I don't paint myself into corners now
<alyssa>
Why now? I'd like to implement branching, which depends on calculating the sizes of clauses before packing. So that's related to scheduler data structures.
<Lyude>
ah, cool
icecream95 has joined #panfrost
<alyssa>
cwabbott: Re the above invariant - the limit then works out to be "no more than 8 quadwords per clause"
<alyssa>
Which is very reasonable.
<alyssa>
Per the packing algo: (8, 5), (7, 6), (6, 7) -- the borderline cases with sum of 13 -- work out to exactly 8 quadwords
<alyssa>
Anymore and you can't pack, any less and there's no more quadwords anyway. So 8.
Marex has quit [Ping timeout: 260 seconds]
Marex has joined #panfrost
<alyssa>
hm what happens with 4 instructions
<alyssa>
oh, format #3.3 has an end clause, ok
* alyssa
can't get the blob to fuse any conditions into branches
<alyssa>
with float, I mean
<alyssa>
And when it fuses with int the disasm doesn't agree.
<alyssa>
Will stick to unfused for now.
<alyssa>
Oh wait, it does agree, it's just an inverted branch
gcl_ has joined #panfrost
gcl has quit [Ping timeout: 272 seconds]
rando25902 has joined #panfrost
<robmur01>
:q
* robmur01
struggles to keep track of which of the 3 keyboards here is which...