<tnt>
daveshah: Am I reading this right : icestorm says "A logic tile sends the output of its eight logic cells to its neighbour tiles. ". So whe connecting an LC (or RAM) output to a cell that's right next to it (including diagonal), you don't need to go through either v/h spans at all ? And does that prevent that output from also being connected to a span to go a bit further or can it be connected to both or even several neighbors ?
<daveshah>
The output fans out to all 8 neighbours, all can be used if needed
rohitksingh_work has quit [Read error: Connection reset by peer]
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
the_new_lfsr has joined ##openfpga
<tnt>
mwk: oh, my bad, I didn't know.
feuerrot has quit [Ping timeout: 264 seconds]
ym has joined ##openfpga
<ZipCPU>
keesj: Yes, I have used AutoFPGA with DDR3 designs. AutoFPGA doesn't provide the DDR3 logic, but it can glue the controller to the rest of the design
<ZipCPU>
whitequark: I think it's a good start! Very similar to my own first FIFO proof
<ZipCPU>
You seem to have also gotten hooked--having now found bugs in "working"/production code
m4ssi has joined ##openfpga
<tnt>
I should try formal at some point ... but it really seems like so much work.
rohitksingh_work has joined ##openfpga
<q3k>
it's not
feuerrot has joined ##openfpga
rohitksingh_work has quit [Read error: Connection reset by peer]
<ZipCPU>
tnt: q3k is right. It's actually less work
rohitksingh has joined ##openfpga
<G33KatWork>
mwk: btw, my offer from 35C3 still stands - if you need test devices, I'll send some boards over. I'm drowning in Spartan 6 boards
<mwk>
G33KatWork: I have one now
<G33KatWork>
ah, cool
Miyu has joined ##openfpga
Zorix has quit [Ping timeout: 264 seconds]
Zorix has joined ##openfpga
pie_ has joined ##openfpga
pie___ has quit [Ping timeout: 245 seconds]
xobs has quit [Ping timeout: 252 seconds]
nrossi has quit [Ping timeout: 252 seconds]
thehurley3[m] has quit [Ping timeout: 250 seconds]
jfng has quit [Ping timeout: 250 seconds]
galv[m] has quit [Ping timeout: 260 seconds]
indefini[m] has quit [Ping timeout: 252 seconds]
AlexDaniel-old[m has quit [Ping timeout: 264 seconds]
azonenberg_work has quit [Ping timeout: 268 seconds]
nrossi has joined ##openfpga
xobs has joined ##openfpga
AlexDaniel-old[m has joined ##openfpga
indefini[m] has joined ##openfpga
jfng has joined ##openfpga
galv[m] has joined ##openfpga
thehurley3[m] has joined ##openfpga
pie_ has quit [Ping timeout: 246 seconds]
the_new_lfsr has quit [Quit: Page closed]
azonenberg_work has joined ##openfpga
kiboneu is now known as cnomad
<kc8apf>
ugh, why did they rewrite the xilinx bitstream parsing _again_
azonenberg_work has quit [Ping timeout: 252 seconds]
Ultrasauce has quit [Quit: Ultrasauce]
Ultrasauce has joined ##openfpga
jevinskie has quit [Ping timeout: 268 seconds]
jevinskie has joined ##openfpga
azonenberg_work has joined ##openfpga
pie_ has joined ##openfpga
m4ssi has quit [Remote host closed the connection]
<davidc__>
FWIW, I have a bunch of S3, S3E, S6, V2, V4, V5 and other board sitting in my board scrap bin
emeb has joined ##openfpga
<davidc__>
if anyone needs a test device hooked up to JTAG; I can provide a shell
ayjay_t has quit [Read error: Connection reset by peer]
<davidc__>
(re the discussion above)
ayjay_t has joined ##openfpga
zem has quit [Ping timeout: 244 seconds]
<kc8apf>
davidc__: I'm building a library of dev boards. I'm happy to accept V2, V4, or V5 boards. Already have S3, S3E, and S6
<whitequark_>
ZipCPU: I'm having some trouble with the two entry trick to verify FIFO contract.
<whitequark_>
I wrote the state machine in the obvious way and it of course passes BMC.
<whitequark_>
but I think it's only useful for k-induction.
<davidc__>
kc8apf: the V2, V4, V5 stuff I have isn't dev boards; just scrap boards with working FPGAs
zem has joined ##openfpga
<kc8apf>
ah. I prefer boards I can get schematics to
<whitequark_>
as far as I can tell, verifying that with induction requires $live (assert eventually)
<whitequark_>
which does not seem to work with all backends
<whitequark_>
or engines
whitequark_ is now known as whitequark
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
Ultrasauce has quit [Quit: Ultrasauce]
Ultrasauce has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
rohitksingh has quit [Remote host closed the connection]
<cr1901_modern>
Hmmm, from reading symbiyosys docs, proving liveness is a different mode from induction. I wonder if this means "whatever $live cells are transformed to" are ignored in aiger solvers during induction.
ZipCPU has quit [Ping timeout: 252 seconds]
ZipCPU has joined ##openfpga
kem_ has quit [Quit: Connection closed for inactivity]
the_new_lfsr has joined ##openfpga
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 258 seconds]
X-Scale` is now known as X-Scale
<azonenberg_work>
cr1901_modern: $live isnt something i remember seeing
<azonenberg_work>
yosys has added a lot of new formal capabilities lately
<azonenberg_work>
i need to start playing with them...
<cr1901_modern>
azonenberg_work: They're implemented in the AIGER backend
jcarpenter2 has quit [Ping timeout: 250 seconds]
jcarpenter2 has joined ##openfpga
<azonenberg_work>
cr1901_modern: well i'm also embarking on a massive campaign to systemverilog-ify all of my infrastructure
<azonenberg_work>
so everything sucks less
<cr1901_modern>
My sincere sympathies
<azonenberg_work>
The next step is to fox yosys so it can build my stuff
<azonenberg_work>
fix*
<azonenberg_work>
And then work on getting formal working on that
jcarpenter2 has quit [Ping timeout: 245 seconds]
<kbeckmann>
nextpnr-ecp5: what are reasonable timings for a design that basically just reads and stores 2048 * 16 bit words in BRAM and then writes them back? I'm currently writing a loopback for a ft600 + ecp5 combo and am getting ~125 MHz. Just want to check if that is good/decent/bad. I need 100MHz so it's fast enough for my usecase right now so there isn't a problem, I'm just wondering.
the_new_lfsr has quit [Quit: Page closed]
<whitequark>
what is your critical path?
<daveshah>
ECP5 BRAM have comparatively high clock to out times (5.6ns in the slowest grade)
<kbeckmann>
1.5 ns logic, 6.4 ns routing
<kbeckmann>
i'm not really sure how to read the output from the critical path report.. i guess i'm looking the the most expensive step(s) in the path? I'm still learning, coming from software.
<adamgreig>
it's hard when the critical path is "a net i recognise", twenty $abc$????, "another net, what are you doing there"
<whitequark>
ah, abc.
<adamgreig>
my ip stack is only hitting 77MHz now ;(
<adamgreig>
but nmigen is sending arp replies so that's something
<whitequark>
adamgreig: you should try synthesizing with flowmap.
<whitequark>
it's depth optimal but not area optimal, so your design may grow about 1.5-2x
<adamgreig>
atm about 500 luts for the mac and another 500 for the ip stack, compiling now
<adamgreig>
1013 to 1672 luts
<adamgreig>
77 to... 61MHz :p
<whitequark>
yeah, but now you get to learn the critical path
<whitequark>
well, in this case, there's another problem
<whitequark>
flowmap is not doing resynthesis, which is probably what's biting you so mcuh
<whitequark>
e.g. on picorv32 there is very little difference in Fmax
<whitequark>
fwiw, i am quite interested in such an IP stack
<daveshah>
That's because the crit path in picorv32 is mostly carry...
<daveshah>
The problem is, flowmap is a depth optimal mapping for a given circuit
<whitequark>
glasgow revE might use it
<adamgreig>
this is probably a bit too minimal to be that interesting, in that it's quite "answer icmp echo and arp, and send/receive udp, and let's do without a local arp cache" sort of thing
<gruetzkopf>
mmh i really need some ECP5 board
<daveshah>
It doesn't guarantee that circuit is optimal in the first place
<adamgreig>
though some of the ideas for nmigen might still work with something a bit more general purpose
<whitequark>
yes, that's what i mean
<whitequark>
there is such a DSL for Yumewatari too
<sorear>
nothing can *guarantee* that a large circuit is optimal
<whitequark>
does yours handle arbitrary gearing btw?
<adamgreig>
the ip stack talks to the mac through a bram for packet data and an async fifo for bram start address and length
<adamgreig>
sorry that's not really an answer to your question
<adamgreig>
i have avoided worrying about gearing by putting all the data through brams as it crosses clock domains, bytes at a time
<sorear>
how many bytes wide is your datapath in each clock domain?
<adamgreig>
1
<whitequark>
can you enlarge that?
<adamgreig>
probably yea
<adamgreig>
just avoided needing to special case not-full-words in the mac
<whitequark>
mmkay
* whitequark
goes back to formally verifying a fifo
<adamgreig>
didn't expect to need to and thought this would be cheaper area-wise
<adamgreig>
anyway it's a wip, we'll see how it ends up once it's actually got udp working
<adamgreig>
pleased at how relatively quickly a first cut comes together though
<whitequark>
i want tcp :]
<adamgreig>
honestly i was and still am tempted to just stuck a small cpu in
<whitequark>
boneless? :D
<adamgreig>
i'm spending 500 luts and a bram or two on the ip stack so far and it's very bare-bones
<adamgreig>
well quite
<adamgreig>
i was working my way to the pun but yes
<adamgreig>
it'd cost as much resource and i could just program a small ip stack instead
<whitequark>
lmk, i'll finish boneless then
<whitequark>
right now i'm somewhat trapped in nmigen hell
<adamgreig>
whereas my critical path now is the mux chain writing the rx memory address through three protocol stacks or whatever
<whitequark>
sounds like it needs more registers
<adamgreig>
the eth fsm is delegating to the ip4 fsm which delegates to the icmp fsm etc etc
<adamgreig>
yea ideally i should pipeline it but that makes working out the timing much more annoying
<whitequark>
hm
<adamgreig>
especially because currently each byte of incoming packet to process is one clock and one state in the fsm so it just eats through it
<whitequark>
why can't you make it one FSM?
<adamgreig>
probably can do
<adamgreig>
this seemed nicer conceptually but is definitely worse from a circuit architecture pov
<sorear>
explaining "arbitrary gearing": yumewatari is parameterized over the datapath width, so you can run it with a 32-bit datapath to more easily make timing at 2.5 or 5 GT/s, or maybe 16-bit to save LUTs if you have faster fabric and better placement
<whitequark>
you can have them separate in code
<adamgreig>
mmm
<adamgreig>
i expect to rejig it a few times as it goes
<adamgreig>
right now it's quite oriented to replying to incoming packets so i'm not sure how i'll do transmitting a new packet ab initio
<whitequark>
ah, the smoltcp problem :D
<adamgreig>
i might also rejig it to just stream the incoming packet byte by byte without any control form the protocol layers
<adamgreig>
right now them having control lets them skip over unneeded bytes but probably quicker overall to not
<adamgreig>
then you'd lose the long mux chain and can stream through the pipeline to the tx bram without worrying about pipeline delay
<adamgreig>
hah, yes
<adamgreig>
funny how some things just keep happening
<adamgreig>
sorear: is that both the mac layer and the protocol engine?
<adamgreig>
so like my "ip stack" processes incoming packets a byte at a time, but in ip4 many fields are 16bit anyway so you could imagine rewriting it to do that, but i'm less sure how you'd parameterise it
<adamgreig>
i guess cleverer metaprogramming
<sorear>
yumewatari is a PCIe stack, it currently has most of the Physical Layer and none of the Data Link Layer, "MAC" is not a term used in the PCIe spec
<adamgreig>
tbh most of it atm is already "skip 2 bytes, check next 4 bytes match this constant, copy next 6 bytes to start of tx memory" and you could just divide all those numbers by 2
<adamgreig>
does it have to deal with things akin to ip headers or once the link is established is it just streaming plain data?
<adamgreig>
i could readily imagine having the mac layer in this do 16 bits or whatever into a wider memory (16bit would match ice40 nicely), but ip protocol stuff seems like it might be more annoying
<sorear>
whitequark has a thing which uses a byte-level grammar to generate multi-byte datapaths; I suspect it's overkill for processing PCIe training sequences but quite appropriate for TCP
<sorear>
the data link layer is packet oriented
<sorear>
but my knowledge of the plans falls off rapidly and I don't want to speak for things I don't know
<whitequark>
sorear: not entirely overkill if i want to do full validation of training sets
<whitequark>
however, it is definitely more oriented at DLLPs
<sorear>
also, PCIe has a 32-bit bias; TLPs are explictly defined as sequences of 32-bit words, the data link and physical layers are nominally byte oriented but go to some trouble to preserve 32-bit alignment
<adamgreig>
i imagine that helps
<adamgreig>
ip4 is pretty alright for 16bit at least
<adamgreig>
how bad is using an ice40 bram as 8bit wide? maybe i could save some address logic by using twice as many 16bit ones
<adamgreig>
(I assume the hardware is only 16bit and using them as 8bit instantiates some address logic to select output bytes)
<whitequark>
no
<adamgreig>
huh
<whitequark>
the hardware can do 16,8,4,2
<adamgreig>
why have i heard the non-16bit primitives described as pseudo-primitives?
<adamgreig>
is it just that the actual primtive has a parameter..
Bike has joined ##openfpga
<sorear>
right
<adamgreig>
SB_RAM40_4K vs SB_RAM512x8 for instance
<sorear>
the primitive has All Of The Parameters
<adamgreig>
got it
<sorear>
most of the psuedo-primitives are just primitives with some parameters fixed
<adamgreig>
so using it as 8x512 is just fine?
<whitequark>
yes
<adamgreig>
that's nice
the_new_lfsr has joined ##openfpga
<adamgreig>
should have time this weekend to push the ip stack a bit further anyway
<adamgreig>
right now it massively cheats on the asyncfifo but once a formally verified one lands in lib.io it should slot right in ;)