<rqou>
anyways, time to actually get something done today :P
DocScrutinizer05 has quit [Disconnected by services]
DocScrutinizer05 has joined ##openfpga
<awygle>
hey folks, new to this space, mostly just lurking to increase my understanding. had a quick question about tools. it seems like synthesis is fairly cohesive (mostly yosys) but p&r is more fragmented. even within openfpga gp4par and xbpar don't seem to share code. why is that? iiuc the underlying algorithm is basically the same, and the problems seem superficially similar
<rqou>
so xbpar is a subcomponent of gp4par
<rqou>
as for the algorithms, there seem to be currently two(ish) major approaches to p&r
<rqou>
naive simulated annealing, and VPR
<rqou>
afaik all the existing p&r tools are the first kind
<rqou>
azonenberg can probably provide more info
<rqou>
one big problem seems to be that a lot of the open-source p&r work either doesn't scale or has lots of assumptions about the particular target
<azonenberg>
awygle: so, gp4par is the application that does par for greenpak4 specifically
<azonenberg>
xbpar is an abstracted library for par of a netlist in a device with a crossbar-based (vs 2D routing fabric) interconnect
<azonenberg>
including but not limited to greenpak and coolrunner
<azonenberg>
rqou's coolrunner par and gp4par both use xbpar under the hood
<azonenberg>
there's probably not as much generic code in xbpar as there could be, that's a TODO refactoring
<azonenberg>
But they do share code
<azonenberg>
I don't know much about how the VPR project's routing works
<azonenberg>
The two main approaches are simluated annealing, which is what xbpar does
<azonenberg>
and some kind of fancy linear algebra that i think vivado does
<azonenberg>
i have ideas on how to optimize using physics simulation techniques (basically treat the timing paths as a mass-spring system) but no idea if it'll work
<balrog>
VPR seems to use a smarter simulated annealing approach? I might be looking at an old and not as relevant paper though
<rqou>
pointfree mentioned that a while ago
<azonenberg>
Could be, i've only targeted crossbar based devices where none of that worked
<rqou>
clifford came over and pointed out that the mass-spring-analogy approaches are basically how ASIC p&r works
<azonenberg>
interesting
<azonenberg>
that may be what vivado is doing?
<balrog>
simple harmonic oscillator, shows up everywhere :P
<azonenberg>
i used to work on molecular dynamics stuff
<azonenberg>
that scales very well
<azonenberg>
like, to hundreds of thousands of cores
<azonenberg>
i've dreamed my entire FPGA career of something that you can run on a rack of xeons and PAR a giant netlist in 30 seconds
<azonenberg>
So it may have potential for that
<awygle>
rqou: azonenberg: thanks :) my knowledge has been increased. so i'd imagine when targeting the greenpak looking at arachne-pnr which targets a 2D fabric architecture was of limited use? i remember being surprised that yosys was used in both places but the p&r was different
<azonenberg>
buy up 1024 ec2 instances, run a par job in a couple minutes
<azonenberg>
Yes, arachne is of no use for greenpak
<azonenberg>
i wrote my own b/c i couldnt find any good open par tools for crossbar architectures
<rqou>
i seem to remember reading that arachne doesn't scale either?
<azonenberg>
I'm going to look at both arachne and VPR as possible baseline tools for xilinx FPGA support down the road
<azonenberg>
I know nothing about the internals of either
<azonenberg>
they're both on my reading list
<awygle>
azonenberg: thanks again, think i get it now
<azonenberg>
rqou: what i want to try doing, when we design an FPGA par
<azonenberg>
is focus on scalability
<azonenberg>
and parallelism, from the beginning
<rqou>
dumb verilog question: how do i model a transparent latch with async set?
<azonenberg>
or at least early on
<azonenberg>
just an always @(*) block
<azonenberg>
specify the desired behavior, when nothing writes it'll latch
<rqou>
yosys somehow generates a giant mess when I do that
<azonenberg>
Yosys has poor support for latches last time i checked
<azonenberg>
i have support for instantiating them in gp4par but i havent tried inferring
<azonenberg>
generally, it doesnt handle them well
<azonenberg>
I have a long list of grievances with yosys that i want to fix
<rqou>
ah so this isn't a problem on my end specifically
<azonenberg>
my #1 priority is preserving instance names better through ABC and various optimization passes
<azonenberg>
so you can figure out what primitive in the generated netlist maps to what hdl object
<balrog>
why does everyone say "don't use latches in FPGA/CPLD design"?
<rqou>
timing analysis doesn't work, for one thing
<balrog>
ah, feedback
<azonenberg>
async stuff in general is a huge pain in the butt
<azonenberg>
But since i use latches so rarely it isnt a big priority
<azonenberg>
i can instantiate primitives on the rare occasions they're needed
<azonenberg>
but generally a latch is a bug in what's supposed to be stateless combinatorial logic
<rqou>
but the cpld latches have bonus fun with the async set/reset :P
<whitequark>
when ever are latches appropriate?
<azonenberg>
whitequark: ultra low power designs in greenpak when you don't have a clock and don't want constant dynamic power
<rqou>
when you are my father working on a design decades ago and were manually stealing time from other pipeline stages? :P
<cr1901_modern>
constant dynamic power?
<azonenberg>
the async state machine block is in fact specifically designed for this
<azonenberg>
cr1901_modern: as in, the clock toggling uses power even if nothign is happening
<azonenberg>
if you dont have a clock, things only use dynamic power when an input changes
<rqou>
what about pipeline stage timing stealing/shifting hacks? :P
<cr1901_modern>
Oh right. I guess the clock transition would use _some_ power even if input didn't change
<azonenberg>
Yeah exactly
<azonenberg>
and with a device that has standby power in the hundreds of nA
<rqou>
i thought the clock tree uses most of the power? not the useless toggling?
<azonenberg>
it makes a difference
<azonenberg>
rqou: even having the oscillator enabled uses power
<azonenberg>
greenpak standby @ 3.3V, typical process/temp: 370 nA
<rqou>
oh wtf
<rqou>
wow
<azonenberg>
LF oscillator (1.7 kHz): 890 nA
<azonenberg>
RC oscillator, 25 kHz : 6020 nA
<azonenberg>
that's before you add any loads to the osc output
<azonenberg>
They do not publish dynamic power for FF/LUT toggling etc
<azonenberg>
that's something i want to try measuring
<rqou>
hmm i just noticed my techmapping has a problem
<rqou>
abc -sop is a giant hammer that applies to _everything_
<rqou>
not just "stuff feeding into macrocells"
<azonenberg>
lol
<azonenberg>
yes, abc is a hammer
<rqou>
it also eats all your cell names :P
<rqou>
i think fixing this entails the same work that can probably make the xor gate work
<azonenberg>
Yes
<azonenberg>
Re eating cell names
<azonenberg>
I dont plan to patch ABC
<azonenberg>
but i think if i know the nets going in and out
<azonenberg>
i can assign names that make some degree of sense
<azonenberg>
at least so you can tell what line or two of rtl it's related to
<cr1901_modern>
"stealing time from other pipeline stages" can you elaborate (bed time for me I think)?
<rqou>
so the magic search term is "time borrowing"
<rqou>
but basically for a fully synchronous design, you can only run as fast as the _slowest_ pipeline stage
<rqou>
but if you have a really slow pipeline stage next to a really fast pipeline stage...
<rqou>
if you replace the FF between them with a latch, this allows the "slow" stage to use some of the time from the "fast" stage
<rqou>
because as long as the clock is still high, the output from the "slow" stage will still propagate through the latch (rather than missing the edge and getting blocked with a FF)
<azonenberg>
Better option: re-time the registers :p
<azonenberg>
push them a few gates later in the path
<rqou>
that works too :P
<rqou>
azonenberg: something is wrong with the xilinx LDCP primitive documentation
<rqou>
look at the second and third rows of the logic table
<openfpga-github>
[openfpga] azonenberg pushed 2 new commits to master: https://git.io/vQ36x
<openfpga-github>
openfpga/master b3f3b55 Andrew Zonenberg: Added configurable-edge flipflop for use in the macrocell. Supports async set/reset and rising/falling/DDR edges. No latch support yet.
<openfpga-github>
openfpga/master bae7c52 Andrew Zonenberg: Imported synchronizer cores from Antikernel IP cores repo
<azonenberg>
rqou: and what's wrong with them?
<rqou>
the second row has an X in the PRE input
<rqou>
so it overlaps the third row
<azonenberg>
Hmm
<azonenberg>
Yeah that does seem wrong i think that was probably meant to be a 1
<azonenberg>
or...
<azonenberg>
hmm
<azonenberg>
PRE is lower precedence than G
<rqou>
yeah, that too
<rqou>
this also means that it's not equivalent to $_DLATCHSR_PPP_
<rqou>
$_DLATCHSR_PPP_ works the "9500" way
<rqou>
where the precedence is reset, set, latch
<rqou>
but the text says that xc2 is reset, latch, set
<rqou>
fun
<rqou>
i don't actually know how to teach yosys about this
<rqou>
screw it for now i guess?
<azonenberg>
Lol
<azonenberg>
yeah leave it out for now
<azonenberg>
but try to issue a warning if they're used
<rqou>
oh yosys can't even infer $_DLATCHSR_* so that can't happen unless someone is doing something really weird
<rqou>
heh i just realized that my par engine was completely missing edges into BUFGs
<azonenberg>
how do you do that? lol
<rqou>
i just never implemented this particular function because registers weren't implemented
<azonenberg>
Any time gp4par tried to route a path that didnt exist it'd just fail b/c it couldn't find an edge to map it to
<rqou>
yeah so it fails right now too
<rqou>
or at least it should
<rqou>
i don't think i'm going to actually add it yet because the code needs a huge refactor
DocScrutinizer05 has quit [Disconnected by services]
DocScrutinizer05 has joined ##openfpga
<rqou>
hmm i wonder if inverting the clock on a DDR FF has any observable effect?
<azonenberg>
Don't think so
<rqou>
it shouldn't, but do we know for sure? :P
<azonenberg>
Nope
<azonenberg>
:p
<rqou>
hmm why does my yosys netlist have extraneous BUF cells?
<azonenberg>
no ida
<azonenberg>
no idea*
<azonenberg>
also, i think i just found a bug in my ZIA docs
<azonenberg>
oh nvm its not the zia
<azonenberg>
i see what i did
* azonenberg
pokes a bit
<azonenberg>
Soooo ibuf_to_zia[20] should be a 1 Hz squarewave...
* azonenberg
waits for build to confirm
<rqou>
FTDCP is my favorite cpld primitive :P
<azonenberg>
Why?
<azonenberg>
Ok, so...
<rqou>
dual-edge TFF
<azonenberg>
lolwut
<azonenberg>
so it tracks the incoming clock
<azonenberg>
anyway, ibuf_to_zia[20] is a 1 Hz squarewave
<azonenberg>
right_zia_out[6] is 1'b0
<azonenberg>
and the bitstream in my JED for row 6 is L000048 01111011*
<azonenberg>
This smells wrong
<azonenberg>
zia_row_inputs[6][2] is zbus[21], which should be ibuf_in[20]
eduardo_ has quit [Ping timeout: 255 seconds]
eduardo_ has joined ##openfpga
<azonenberg>
Welllp
<azonenberg>
the ZIA bitstream is wrong
<azonenberg>
now to figure out how the fsck THAT happened...
<rqou>
azonenberg: we should think about how to move forward
<azonenberg>
Awesome
<azonenberg>
let me send him a PR for my recent fix too
<rqou>
the code with the giant Rust->C++->Rust stack is kinda a mess
<rqou>
also hugely un-ergonomic
<azonenberg>
Yes
<azonenberg>
honestly, if i had anything to say i'd rewrite it in C++
<rqou>
biggest missing xbpar features right now are *) nodes that have to move as a group *) nodes that can become shared
<rqou>
e.g. right now each and term uses a unique ZIA row to get its inputs
<rqou>
the biggest thing i hated about C++ was how much effort "parsing stuff" took
<azonenberg>
So the way i'd implement this is
<azonenberg>
i'd first place all of the PLA terms
<azonenberg>
actually no
<azonenberg>
that wouldnt work, nvm
<rqou>
sharing andterms doesn't work either :P
<azonenberg>
But yeah i think all c++ is likely to be more maintainable
<azonenberg>
for the short term
<rqou>
all rust? :P
<azonenberg>
Lol
<azonenberg>
also, THIS is interesting...
<rqou>
the FFI shit is actually more lines of code than the entire PAREngine
* azonenberg
looks
<azonenberg>
so i have a counter working now
<azonenberg>
but something is seriously borked with the ordering
<azonenberg>
i'm not even sure how to explain it
<azonenberg>
Yeah i def still have bugs
<rqou>
heh xc2par is less than 2k lines of code
<azonenberg>
ok yeah i have bugs
<azonenberg>
i have a count that is supposed to go up every 50 clocks
<azonenberg>
i.e. every 50 clocks i increment the 4-bit LED counter
<azonenberg>
instead, led_count[0] and [2] are flashing at ~1 Hz
<azonenberg>
maybe 2 Hz?
<azonenberg>
and 1 and 3 are off
<azonenberg>
so, buggy... just not sure why yet
<azonenberg>
lol
<azonenberg>
and now i have another test that keeps led[3:1] at 0 and drives led_0 every ~500ms
<azonenberg>
led[3] is somehow blinking
<azonenberg>
Gonna investigate that tomorrow
<openfpga-github>
[openfpga] azonenberg pushed 2 new commits to master: https://git.io/vQ3QV
<openfpga-github>
openfpga/master bc6d7af Andrew Zonenberg: Continued macrocell support. Still lots of known bugs
<openfpga-github>
openfpga/master 40f2adf Andrew Zonenberg: Fixed incorrect ZIA bitstream ordering
<rqou>
offtopic: I _just now_ realized (by looking at the map) that Tuen Mun's "giant shopping complex clusterf*ck" is actually a mixed-use area with residential space
<rqou>
but i have absolutely no clue how these are connected together
<rqou>
huh OSM has better data here, but it's out of date
DocScrutinizer05 has quit [Disconnected by services]