fdalleau` has quit [Quit: It is time for you to leave]
<Sarayan>
oh that sucks
<Sarayan>
I currently have the route nodes as 8.8.8.8 type/x/y/z
<Sarayan>
there are up to 4 io pads associated to one node
<Sarayan>
there are *97* route nodes of the same type (IO_RE) associated to each io pad
<Lofty>
Sarayan: so it needs to be extended further?
<Sarayan>
Lofty: either that (I can reduce the number of bits on type/x/y) or pre-classify the IO_RE, or even go 64 bits
<Lofty>
Well, x/y is, what, 64 max?
<Sarayan>
hmmm, I'm close to getting somewhere w.r.t the timings for yosys amusingly
<Lofty>
Maybe it's higher for bigger dies, but
<Sarayan>
sx120f is 90x82
<Lofty>
So, uh, you can save two whole bits in x/y coord
<Sarayan>
yup
<Sarayan>
which is, in fact, a lot
<Lofty>
That's 1024 subtypes
<Sarayan>
that or 1024 z vavlues
<Sarayan>
both are possible
<Lofty>
inb4 we get access to the commercial dies and they're huge
<Sarayan>
commercial dies?
<Lofty>
Commercially-licensed dies
<Lofty>
AKA the ones you have to pay Intel to get Quartus for
<Sarayan>
there are massively many more LABs & co?
<Lofty>
So, Cyclone V tops out at like 120k ALMs, right?
<Lofty>
Arria V GX tops out at 190,240
<Sarayan>
11356 (M)LABs, x10 to get ALMs (we have 4191 in the de10-nano)
<Sarayan>
a (m)lab being what is in the grid, hence why I count that
<Sarayan>
that puts the d9 over 128 I'm sure
<Lofty>
Stratix V GX tops out at 359,200
<Lofty>
That *definitely* does
<Sarayan>
Is there the x10 in there?
<Lofty>
Yes
<Lofty>
I'm counting ALMs rather than LABs because I'm reading from the spec sheet
<Sarayan>
stratix 10 gx tops are 10M, dunno if it's a x10, x20 or more
<Sarayan>
still, fucking big
<Lofty>
Indeed.
<Sarayan>
I suspect you can more easily afford a 64bits cpu that even one of these beasts :-)
<Lofty>
Quite probably
<Lofty>
I know daveshah can target UltraScale+ thanks to RapidWright, and those chips are huge, but for the poorer of us, it's probably a good idea not to target the larger chips until we can afford them :P
<Sarayan>
the de10-pro, with 2.8M (want!) costs $10K (maybe not)
<Lofty>
Even ignoring the hardware, quartus alone is like $2K a year there
<Sarayan>
plus it's pcie and doesn't even have a hdmi connector
<Sarayan>
so meh
<Sarayan>
I've yet to see a successor to the nano that can do the same things
<Lofty>
The Nano's a really nice little board
<Sarayan>
the han pilot platform could be except for its price ($3K)
<Sarayan>
but fpga-wise it's 10 times bigger
<Sarayan>
and the memory is 5 times faster or so I think
<Lofty>
"Stratix V device configuration is enhanced for ease-of-use, speed, and cost."
<Lofty>
"Enhanced-cost FPGA"
<Sarayan>
yep, I can decode the dmf tables, need to write the python code for that
<Lofty>
Oh god, yet another internal naming scheme
<Sarayan>
that one is easy :-)
<Sarayan>
once second, lemme find the table
<Sarayan>
a-h = e0/f0/a/b/c0/c1/e1/f1
<Sarayan>
in that order
<Sarayan>
hmmm, looks like set: can have negative values
<Sarayan>
fixed
<Sarayan>
(see XTALK)
<Lofty>
Okay, hmm
<Lofty>
(This is useful to know)
<Lofty>
But it means I need to figure out how the fuck to translate these into Yosys timings
<Sarayan>
yep
<Sarayan>
that's a fucklot of information
<Lofty>
Time for epic diagram annotation time!
<Sarayan>
and that's just our fpga, there is an infinite number of files like this one
<Lofty>
I suspect they'll all have a pattern
<Sarayan>
hmm, copy/paste fuckup, file name should be ddb_*, whatever
<Sarayan>
db_cyclonev_sx120f-7_revprod_1100mv_n40c.dmf.txt for the -40C version if you want to compare
<Lofty>
I think the approach daveshah took was to just use the slow corner timings everywhere
<daveshah>
The nextpnr API supports four quadrant timings (slow/fast and rising/falling)
<Sarayan>
synth uses the 100c only, fit uses both and the fdi file (net delays)
<daveshah>
I think ECP5 is just two quadrant though, fast/slow as the rising/falling are the same in Diamond
<Sarayan>
the fdi file seems to use distances between connection points and rc values
<Sarayan>
plus some kind of driver power I suspect
<Sarayan>
only handling cyclonev should simplify things though
<Lofty>
"should"
<daveshah>
I'm vaguely interested at having a RC based timing model that could be used in a couple of nextpnr arches, as that is similar to the model xilinx uses
<daveshah>
So if someone here is interested in that, it would be great to have
<daveshah>
it might be that that model would only be enabled for signoff timing, and a faster approximation used during routing (and necessarily placement where you don't know the exact interconnect path anyway)
<Sarayan>
don't know as in not decided yet?
<daveshah>
Yes, the exact interconnect path is the job of the router
<daveshah>
in practice, in some of the most advanced flows, there's more of a blurring between place and route but nextpnr is someway off that kind of stuff yet
<Sarayan>
well, I find the idea of optimizing for the cv interesting, but that requires managing to do stuff for the cv in the first place :-)
<daveshah>
yeah, I think there could be quite a bit of shared work between cv (and bigger) and xilinx
<daveshah>
first I need to get ripple done first, which seems to be expanding into a never-ending set of subproblems
<Sarayan>
what's ripple?
<daveshah>
it's a routeability driven placement algorithm
<daveshah>
I'm working on some tweaks on top of it too, to make it generic to stuff other than Xilinx and add a few features like SLR (chiplet) partitioning
<Sarayan>
oh, there's going to be an interesting issue in cv
<Lofty>
daveshah: to place and route your FPGA you must first invent the universer
<Lofty>
-r
<daveshah>
yeah, things like discovering that I needed to do hypergraph partitioning myself has been something of an interesting distraction
<Sarayan>
a lab has 40 ffs. the ff clocks can be connected to one of 3 clock lines. The clock lines are created globally in the lab, from (n) clock inputs and an optional inverter for each line
<daveshah>
and now I'm looking at how to deal with cases of fracturable hard blocks (like the Xilinx RAM36/2x RAM18 and eventually Lattice DSPs) in the bipartite matching based legalisation
<daveshah>
That does seem like it should be representable in the per-tile legality checks
<daveshah>
I guess a LAB would be a tile?
<Sarayan>
yeah
<Lofty>
Cyclone V DSPs are three-way partitionable, daveshah
<Lofty>
Have fun with that
<daveshah>
Interesting
<Sarayan>
a (m)lab tile has 20 lut-6-equivalent and 40 ffs
<Lofty>
3 9x9s, 2 18x18s or 1 27x27
<Sarayan>
mlab can also switch to memory mode instead of comb
<daveshah>
Interesting, Lattice DSPs can be 1x 18x18 or 2x 9x9
<daveshah>
or four of them combined into a 36x36
<daveshah>
but I haven't even finished fuzzing that for ECP5 yet, they are very painful in terms of the number of random bits toggling everywhere
<Lofty>
Sarayan: is PROPAGATEIN the same as DATAA?
<Lofty>
[note: pronounced dat<screaming>]
<daveshah>
lol
<daveshah>
I've always thought the ecp5 JTAGG primitive to be really cute for some reason
<Sarayan>
well, propagatein is 3 while dataa is 12, so your guess is as good as mine
<Sarayan>
(it also has data and ndataa, the problem mostly is that the enum is generic for everything quartus handles)
<Sarayan>
(ndata, not ndataa)
<Sarayan>
oh, and datain too
<daveshah>
could it be two ways of routing to the same pin, with another bit somewhere to select what is used?
<Sarayan>
no, there are 8 input pins on the data side
<Sarayan>
they really seemed to have used PROPAGATEIN intead of datas just because
<Lofty>
It's what it seems like from the dump, anyway
<Lofty>
Which is...substantially slower than the numbers I had before
<Lofty>
(there are no [FH] => COMBOUT numbers)
<Lofty>
the FRACT/6LUT_TO_COMBOUT numbers line up better, but they seem incomplete...
<Lofty>
...I have a suspicion the numbers are correct
<Lofty>
The separate (A, B, C, D, E, G) => REGOUT numbers line up nicely
<Lofty>
mwk, daveshah: do you think I should use the COMBOUT numbers or the REGOUT numbers?
<Lofty>
(well in this case the numbers are (B, C, D, E, G, H) but anyway)
<Lofty>
It'd be funny to have a pass that examines the netlist and sets a parameter which lets you change the timings according to what something is connected to
<Lofty>
Okay, I've done some more research
<Lofty>
I'm pretty sure PROPAGATEIN is *not* DATAA
<Lofty>
But instead possibly - possibly - carry in
<Lofty>
Or the share input
<Sarayan>
where is e0 then?
<Sarayan>
incidentally, sharein exists (273) and so does cin (2)
<Sarayan>
og.kervella.org/enums.txt, we're talking about DB_INPUT_PORT_TYPE_STRING here
<Lofty>
I'm just grepping through your data dump (which is very handy)
<Sarayan>
the file is organized as a tree of nodes with a DBS_DTM_NODE_ENUM_STRING types that finally (the dash) go to a table indexed with a vector of (DTM_ENUM or DB_BURIED_PORT_TYPE or DB_INPUT_PORT_TYPE or DB_OUTPUT_PORT_TYPE or CDB_RE_TYPE or DEV_IO_STANDARD_ENUM) to which is associated an integer or float value
<Sarayan>
so find a table, then find a value in the table
<Sarayan>
enums.txt gives you, well, the vocabulary I guess
<Sarayan>
does tell which mapping they decided on for cv
<Sarayan>
oh also, s2t_dump_delay_info=on
<Sarayan>
s2t_delay_model_dump_delays=on
<Sarayan>
in quartus.ini makes it dump the real times for evey pair of node and quartus_map time, *if* it doesn't segfault (and even then you get intermediate files)
<Sarayan>
way more interesting that the slack files which give you how much margin you have
<Lofty>
Indeed :P
<Lofty>
I have this nasty suspicion that both "COMBOUT" and "PROPAGATE{IN,OUT}" are overloaded here
<Lofty>
I think COMBOUT is being used for both COMBOUT and SUMOUT
<Sarayan>
could be
<Lofty>
And PROPAGATE is both carry and share{in,out}
<Sarayan>
I gave you the files as soon as I've been able to decode them
<Lofty>
And I appreciate it
<Lofty>
I hope I'm helping too >.>
<Sarayan>
haven't yet tried to trace of they're actually used
<Lofty>
282 roughly matches the CLK -> DATAOUT arrival time
<Lofty>
(which I have as 262 but)
<Sarayan>
Seems to match, kinda. I'm not sure I understand latch_enable vs. enable
<Lofty>
There's also WE
<Lofty>
And I bet that's an enable too :P
<Sarayan>
It feels like P2P means "point-to-point timing inside an element), with (in|buried)_(buried|out)_node telling you where the points are placed w.r.t the logical periphery of the element
<Lofty>
That sounds about right
<Lofty>
What's the numbers in set:N ?
<Lofty>
Do you know?
<Sarayan>
mux settings I'm pretty sure
<Sarayan>
configurable stuff in any case
<Lofty>
I think P2P stuff is effectively arrival time
<Sarayan>
not sure what you mean by arrival
<Lofty>
Right
<Lofty>
When the clock edge triggers, there's a propagation delay between that and the flop output changing
phire has quit [Remote host closed the connection]
<Lofty>
That delay is the arrival time, because it's when the effect of the clock edge arrives at the flop output
<Sarayan>
ok, we agree then
<Sarayan>
Not sure I entirely understand something like that though: