fdalleau` has quit [Quit: It is time for you to leave]
oh that sucks
I currently have the route nodes as type/x/y/z
there are up to 4 io pads associated to one node
there are *97* route nodes of the same type (IO_RE) associated to each io pad
Sarayan: so it needs to be extended further?
Lofty: either that (I can reduce the number of bits on type/x/y) or pre-classify the IO_RE, or even go 64 bits
Well, x/y is, what, 64 max?
hmmm, I'm close to getting somewhere w.r.t the timings for yosys amusingly
Maybe it's higher for bigger dies, but
sx120f is 90x82
So, uh, you can save two whole bits in x/y coord
which is, in fact, a lot
That's 1024 subtypes
that or 1024 z vavlues
both are possible
inb4 we get access to the commercial dies and they're huge
commercial dies?
Commercially-licensed dies
AKA the ones you have to pay Intel to get Quartus for
there are massively many more LABs & co?
So, Cyclone V tops out at like 120k ALMs, right?
Arria V GX tops out at 190,240
11356 (M)LABs, x10 to get ALMs (we have 4191 in the de10-nano)
a (m)lab being what is in the grid, hence why I count that
that puts the d9 over 128 I'm sure
Stratix V GX tops out at 359,200
That *definitely* does
Is there the x10 in there?
I'm counting ALMs rather than LABs because I'm reading from the spec sheet
stratix 10 gx tops are 10M, dunno if it's a x10, x20 or more
still, fucking big
I suspect you can more easily afford a 64bits cpu that even one of these beasts :-)
Quite probably
I know daveshah can target UltraScale+ thanks to RapidWright, and those chips are huge, but for the poorer of us, it's probably a good idea not to target the larger chips until we can afford them :P
the de10-pro, with 2.8M (want!) costs $10K (maybe not)
Even ignoring the hardware, quartus alone is like $2K a year there
plus it's pcie and doesn't even have a hdmi connector
so meh
I've yet to see a successor to the nano that can do the same things
The Nano's a really nice little board
the han pilot platform could be except for its price ($3K)
but fpga-wise it's 10 times bigger
and the memory is 5 times faster or so I think
"Stratix V device configuration is enhanced for ease-of-use, speed, and cost."
"Enhanced-cost FPGA"
yep, I can decode the dmf tables, need to write the python code for that
Oh god, yet another internal naming scheme
that one is easy :-)
once second, lemme find the table
a-h = e0/f0/a/b/c0/c1/e1/f1
in that order
hmmm, looks like set: can have negative values
(see XTALK)
Okay, hmm
(This is useful to know)
But it means I need to figure out how the fuck to translate these into Yosys timings
that's a fucklot of information
Time for epic diagram annotation time!
and that's just our fpga, there is an infinite number of files like this one
I suspect they'll all have a pattern
hmm, copy/paste fuckup, file name should be ddb_*, whatever
db_cyclonev_sx120f-7_revprod_1100mv_n40c.dmf.txt for the -40C version if you want to compare
I think the approach daveshah took was to just use the slow corner timings everywhere
The nextpnr API supports four quadrant timings (slow/fast and rising/falling)
synth uses the 100c only, fit uses both and the fdi file (net delays)
I think ECP5 is just two quadrant though, fast/slow as the rising/falling are the same in Diamond
the fdi file seems to use distances between connection points and rc values
plus some kind of driver power I suspect
only handling cyclonev should simplify things though
I'm vaguely interested at having a RC based timing model that could be used in a couple of nextpnr arches, as that is similar to the model xilinx uses
So if someone here is interested in that, it would be great to have
it might be that that model would only be enabled for signoff timing, and a faster approximation used during routing (and necessarily placement where you don't know the exact interconnect path anyway)
don't know as in not decided yet?
Yes, the exact interconnect path is the job of the router
in practice, in some of the most advanced flows, there's more of a blurring between place and route but nextpnr is someway off that kind of stuff yet
well, I find the idea of optimizing for the cv interesting, but that requires managing to do stuff for the cv in the first place :-)
yeah, I think there could be quite a bit of shared work between cv (and bigger) and xilinx
first I need to get ripple done first, which seems to be expanding into a never-ending set of subproblems
what's ripple?
it's a routeability driven placement algorithm
I'm working on some tweaks on top of it too, to make it generic to stuff other than Xilinx and add a few features like SLR (chiplet) partitioning
oh, there's going to be an interesting issue in cv
daveshah: to place and route your FPGA you must first invent the universer
yeah, things like discovering that I needed to do hypergraph partitioning myself has been something of an interesting distraction
a lab has 40 ffs. the ff clocks can be connected to one of 3 clock lines. The clock lines are created globally in the lab, from (n) clock inputs and an optional inverter for each line
and now I'm looking at how to deal with cases of fracturable hard blocks (like the Xilinx RAM36/2x RAM18 and eventually Lattice DSPs) in the bipartite matching based legalisation
That does seem like it should be representable in the per-tile legality checks
I guess a LAB would be a tile?
Cyclone V DSPs are three-way partitionable, daveshah
Have fun with that
a (m)lab tile has 20 lut-6-equivalent and 40 ffs
3 9x9s, 2 18x18s or 1 27x27
mlab can also switch to memory mode instead of comb
Interesting, Lattice DSPs can be 1x 18x18 or 2x 9x9
or four of them combined into a 36x36
but I haven't even finished fuzzing that for ECP5 yet, they are very painful in terms of the number of random bits toggling everywhere
Sarayan: is PROPAGATEIN the same as DATAA?
[note: pronounced dat<screaming>]
I've always thought the ecp5 JTAGG primitive to be really cute for some reason
well, propagatein is 3 while dataa is 12, so your guess is as good as mine
(it also has data and ndataa, the problem mostly is that the enum is generic for everything quartus handles)
(ndata, not ndataa)
oh, and datain too
could it be two ways of routing to the same pin, with another bit somewhere to select what is used?
no, there are 8 input pins on the data side
they really seemed to have used PROPAGATEIN intead of datas just because
It's what it seems like from the dump, anyway
Which is...substantially slower than the numbers I had before
(there are no [FH] => COMBOUT numbers)
the FRACT/6LUT_TO_COMBOUT numbers line up better, but they seem incomplete...
...I have a suspicion the numbers are correct
The separate (A, B, C, D, E, G) => REGOUT numbers line up nicely
mwk, daveshah: do you think I should use the COMBOUT numbers or the REGOUT numbers?
(well in this case the numbers are (B, C, D, E, G, H) but anyway)
It'd be funny to have a pass that examines the netlist and sets a parameter which lets you change the timings according to what something is connected to
Okay, I've done some more research
I'm pretty sure PROPAGATEIN is *not* DATAA
But instead possibly - possibly - carry in
Or the share input
where is e0 then?
incidentally, sharein exists (273) and so does cin (2)
og.kervella.org/enums.txt, we're talking about DB_INPUT_PORT_TYPE_STRING here
I'm just grepping through your data dump (which is very handy)
the file is organized as a tree of nodes with a DBS_DTM_NODE_ENUM_STRING types that finally (the dash) go to a table indexed with a vector of (DTM_ENUM or DB_BURIED_PORT_TYPE or DB_INPUT_PORT_TYPE or DB_OUTPUT_PORT_TYPE or CDB_RE_TYPE or DEV_IO_STANDARD_ENUM) to which is associated an integer or float value
so find a table, then find a value in the table
enums.txt gives you, well, the vocabulary I guess
does tell which mapping they decided on for cv
oh also, s2t_dump_delay_info=on
in quartus.ini makes it dump the real times for evey pair of node and quartus_map time, *if* it doesn't segfault (and even then you get intermediate files)
way more interesting that the slack files which give you how much margin you have
Indeed :P
I have this nasty suspicion that both "COMBOUT" and "PROPAGATE{IN,OUT}" are overloaded here
I think COMBOUT is being used for both COMBOUT and SUMOUT
could be
And PROPAGATE is both carry and share{in,out}
I gave you the files as soon as I've been able to decode them
And I appreciate it
I hope I'm helping too >.>
haven't yet tried to trace of they're actually used
282 roughly matches the CLK -> DATAOUT arrival time
(which I have as 262 but)
Seems to match, kinda. I'm not sure I understand latch_enable vs. enable
There's also WE
And I bet that's an enable too :P
It feels like P2P means "point-to-point timing inside an element), with (in|buried)_(buried|out)_node telling you where the points are placed w.r.t the logical periphery of the element
That sounds about right
What's the numbers in set:N ?
Do you know?
mux settings I'm pretty sure
configurable stuff in any case
I think P2P stuff is effectively arrival time
not sure what you mean by arrival
When the clock edge triggers, there's a propagation delay between that and the flop output changing
phire has quit [Remote host closed the connection]
That delay is the arrival time, because it's when the effect of the clock edge arrives at the flop output
ok, we agree then
Not sure I entirely understand something like that though: