<meawoppl>
can anyone in the channel help me with how DDR signals should be treated in ice40 packages?
<ZipCPU>
Sure
<ZipCPU>
What's up?
<ZipCPU>
Typically, I handle DDR signals by directliy instantiating an SB_IO primitive
<meawoppl>
that is already a helpful lead
<whitequark>
there's no other way to do this on iCE40 other than instantiating SB_IO
<meawoppl>
looking at that (macro?) is looks like you then get two output signals?
<daveshah>
No, those will be the outputs from the DDR input
<daveshah>
The output of the DDR output block is "PACKAGE_PIN"
<daveshah>
that must be driving a top level output (or inout)
<meawoppl>
oh, I think I am getting the language backward here
<daveshah>
You mean a DDR input primitive (i.e. the external pin is an input) ?
<daveshah>
or perhaps better said input DDR primitive
<meawoppl>
package pin (or two b/c of differential input here) -> SB_IO -> 2 inputs
<meawoppl>
I am implementing a MIPI receiver
<daveshah>
Yeah
<daveshah>
so you want to use D_IN_0 and D_IN_1 on the SB_IO to drive your logic
<daveshah>
the former is registered on the positive edge and the latter on the negative edge, iirc
<meawoppl>
ah, so those are set on the edges, then I can just look at 1 edge of the signal in my logic reading those?
<daveshah>
Yes, all your logic following would be on the posedge
<meawoppl>
posedge of the clock used to sync SB_IO
<daveshah>
yes
<meawoppl>
and I expect a delay of 2 cycles then?
<daveshah>
often you would have a posedge register as the next thing after the SB_IO at least on D_IN_1
<daveshah>
yeah
<meawoppl>
awesome, that makes so much more sense now
<meawoppl>
I tried some really hacky stuffs making an xor'ed signal based on posedge+negedge logic
<meawoppl>
and, the more I think about it, the more I am suprised it worked at all
<daveshah>
Yeah that's nasty
<ZipCPU>
meawoppl: Yeah, most FPGAs don't support that kind of logic. It's in the language, I think, because certain ASIC logic needs to do that kind of stuff
<ZipCPU>
(Not certain, though, since ... I've never done ASIC work)
<meawoppl>
it just seems really fraught to me now, thinking about the flip-flow state progression and I think the data would be underdetermined if the input clock was anything less than perfect
<ZipCPU>
meawoppl: Incidentally, some of the ugliest "yosys" bugs have been linked to not using the SB_IOs.
<ZipCPU>
The result is typically that yosys + (then) arachne-pnr would place the logic *anywhere* within the chip, leading to horrible I/O timings
<meawoppl>
interesting
<ZipCPU>
A Yosys update might adjust where the placement was made, since it was never controlled, and the design might go from working to not working. The student or other user then blames the "yosys" change for why the design no longer works
<meawoppl>
so SB_IO is basically 1:1 with some chip-edge special hardware?
<ZipCPU>
Absolutely!~
<meawoppl>
(new to this all)
<ZipCPU>
If you want timing to be controlled across multiple pins, you'll also want to make certain that the SB_IO uses the clock and registers all outputs as well.
<ZipCPU>
For a single pin it usually doesn't make a difference, but across several pins in some I/O interface or another--perhaps one is a clock, another data, then ... yeah, you want to use the SB_IO primitives
<meawoppl>
ZipCPU thanks
<meawoppl>
so here I am doing a differential clock and differential signal
<ZipCPU>
Are you creating the clock signal?
<meawoppl>
so I use 1 SB_IO for the clock, then a second using that input clock to clock the DDR data input
<meawoppl>
(two sub-lvds pairs)
<ZipCPU>
I mean ... Are you generating the clock signal and outputting it from your design, or is it coming into your design as an input?
<meawoppl>
thats coming in
<ZipCPU>
Are you using any global buffers? SB_GB() ?
<meawoppl>
I am for the clock (totally cargo-culted), I am honestly not sure what it buys me
<whitequark>
fun fact: some Altera FPGAs actually use regular flip-flops to implement DDR I/O. they do constrain placement to be right next to the I/O tile though
<whitequark>
regular fabric flip-flops, I mean
<ZipCPU>
whitequark: Wow ... is that how those design elements worked?
<meawoppl>
it looks like `SG_GB` does signal fanout to minimize latenccy?
<ZipCPU>
meawoppl: It buys you low clock skew across the chip, making it more likely that everything within your design uses the same clock with the same skew
<ZipCPU>
Yes, that's it
<whitequark>
ZipCPU: I'll tell you something worse. unless I misremember or misunderstood how it works, Altera actually implements *clock muxes* with LUTs on some FPGAs like Cyclone V
<ZipCPU>
That said, I think there is a certain latency by going through the global buffer network, but it would be more controlled than just routing the pin without using the clock network
<whitequark>
that seemed very very strange to me, so I dug into it, and again, unless I really misunderstood something in their toolchain, it seems that's what they do.
<ZipCPU>
whitequark: I'm not sure if I should be impressed and stand in awe, or if I should rather cringe at the sound of that
<whitequark>
I suspect the latter. I have seen reports on the web that their trick of using FFs for DDR IO has rather unfavorable results.
<whitequark>
which is exactly what you would expect.
<whitequark>
remember that you need a LUT to mux the output from the posedge and negedge FF... so the timings of that complete construct are not great
<whitequark>
bizarrely, the *input* DDR path on Cyclone V is actually hard logic in the IO tile. I'm wondering if they are working around a silicon bug or something.
<ZipCPU>
Yeah, I suppose that would make sense
<ZipCPU>
Makes you wonder if it gets fixed in a future silicon revision --- or even if so .. how would you know and tell?
<whitequark>
I think they've been dragging that design along for a rather long time, across many FPGA families
<meawoppl>
interesting, so when I plumb the clock signal (post `SB_IO`) should I route it directly into the data-SB_IO or should I use the post SB_GB signal?
<ZipCPU>
meawoppl: I'd use the SB_GB signal if possible
<meawoppl>
gotcha, but it will introduce some latency into the read, right?
<ZipCPU>
whitequark: "HTTP request sent, awaiting response... 403 Forbidden" ... well, maybe I'll look into it some other day
<meawoppl>
but consistent latency.... hermmm
<ZipCPU>
meawoppl: Yes, but that's not saying much. *Everything* will introduce some latency. The question is whether or not that latency is significant in your application. That I cannot answer.
<ZipCPU>
If it is a problem, you might be able to adjust the phase of the clock ... but I wouldn't be able to cite information on that off the top of my head
<meawoppl>
awesome, thanks for helping me understand all these tradeoffs
<whitequark>
note that SB_IO+SB_GB is not the same as SB_GB_IO!
<whitequark>
if you can, you really should use SB_GB_IO
<whitequark>
as this can change the phase of your clock quite significantly. I hit that bug some time ago.
<ZipCPU>
?? whitequark: Can you explain the difference?
<whitequark>
I think SB_IO+SB_GB actually routes your clock through fabric first
<tpb>
Title: Use SB_GB_IO instead of SB_IO+SB_GB · Issue #89 · GlasgowEmbedded/glasgow · GitHub (at github.com)
<whitequark>
I didn't check the actual netlist, but all signs point to SB_IO+SB_GB routing the clock through fabric, and not even always the same way
* ZipCPU
searches iCE40 doc's for SB_GB_IO information ...
* ZipCPU
finds references in the family handbook
<whitequark>
unfortunately iCE40 does not have particularly great documentation
<meawoppl>
interesting, I can use that if I don't use the inputs, and just relay on the global buffer it produces...
<whitequark>
for example, have you seen the circuit diagram that describes the SB_IO behavior? it is profoundly wrong
<whitequark>
(quiz: where is it wrong?)
<whitequark>
meawoppl: yep, you should preferably do that
<rombik_su>
whitequark: That's interesting! I'm looking rn at my DDR3 Cyclone V. At least in the floorplan Quartus shows that both DDIO_IN and DDIO_OUT contained within IOB (as dedicated h/w) and enabled. Judging from post-route netlist, looks like it's dedicated.
<rombik_su>
*at my DDR3 Cyclone V project.
<whitequark>
hmmm
<whitequark>
then I might have misunderstood something
<rombik_su>
I will scramble simple project to inspect
<whitequark>
can you check for Cyclone III too/
svenn7 has joined #yosys
<whitequark>
I originally got a Cyclone III board by mistake and I might have first checked on it
<whitequark>
and then misremembered
<rombik_su>
whitequark: No problem, I'll check
<meawoppl>
one other question for the group here
<meawoppl>
what is the typical testing narrative/process for a `yosys` workflow
<meawoppl>
right now I am using an oscilloscope for everything, but there are a bunch of modules I have written that seem very testable
kraiskil has joined #yosys
<whitequark>
typically people write Verilog testbenches and use Icarus Verilog
<meawoppl>
`testbench` was the keyword I needed there
<rombik_su>
whitequark: From Cyclone III handbook: The DDR input registers are implemented with three internal logic element (LE) registers for every DQ pin. These LE registers are located in the logic array block (LAB) adjacent to the DDR input pin.
<rombik_su>
A dedicated write DDIO block is implemented in the DDR output and output enable paths. Figure 8–5 shows how Cyclone III device family dedicated write DDIO block is implemented in the I/O element (IOE) registers
<whitequark>
aha, right, that's what I was missing.
<whitequark>
ZipCPU: ^
<whitequark>
looks like Cyclone (original) implemented DDR input and output in fabric, Cyclone III moved output into IOB, and Cyclone V has input and output in IOB
dys has quit [Ping timeout: 248 seconds]
<rombik_su>
I'm pretty sure C4 have dedicated DDR h/w in IOBs
<ZirconiumX>
rombik_su: Ah, a fellow Cyclone V user
* rombik_su
checking
<rombik_su>
ZirconiumX: \o/
<ZirconiumX>
I've actually been working a bit on the Cyclone V stuff today
<ZirconiumX>
We now have carry chain support :P
<ZirconiumX>
Unfortunately integrating into Quartus is hell on earth
<rombik_su>
whitequark: I'm wrong, Cyclone IV has the same story as Cyclone III wrt to DDR in IOB
<ZipCPU>
meawoppl: There's a "better" Yosys workflow that goes through a formal verification step before going into the simulator. Spares you some simulation cycles
<meawoppl>
is simulation really slow?
<rombik_su>
meawoppl: It depends on the simulator (iverilog vs commercial) and design size/complexity
<ZipCPU>
Definitely depends upon design size and complexity
<sorear>
formal verification (satisifiability) can also be very slow
pie_ has joined #yosys
<ZipCPU>
sorear: It can be, but in 90% of my example cases, it takes less than 2 minutes
<ZipCPU>
See for reference: https:zipcpu.com/formal/2019/08/03/proof-duration.html
<dh73>
It might be difficult to integrate into Quartus, because there are parameters still needed, and special inputs that you should be using. I can't remember if "shared_arith" is needed for carry chain, and also, sumout and cin inputs should be used for this instead of normal dataa..datag afaik
<ZirconiumX>
Yes, that's what I'm using dh73
<ZirconiumX>
data{a,b,c,d,f}
<ZirconiumX>
But that's not my point here
<ZirconiumX>
Quartus breaks on valid Verilog if you try to pass it as VQM
<ZirconiumX>
(Syntax error last time I checked)
<ZirconiumX>
If you pass it as Verilog, Quartus will instead ICE
<ZirconiumX>
And if you pass it as EDIF, Quartus complains that the ground net cannot be used more than once
<dh73>
Wait a second
<dh73>
What I said is, carry chain needs cin and datad as inputs, cout and sumout as outpus, not the normal dataa..datag, but anyway in any case Quartus clearbox will merge the logic in that fashion if you don't, is kind of a carry computation, just mentioning. I didn't know Quartus supports edif now, but I will not expect that thing working fine at all. Can I use one of your examples to see what errors the tools is giving?, just for curiosity
<ZirconiumX>
I have timing tables for C10GX, but I don't want to overload the patch reviewers just yet :P
<ZirconiumX>
A10GX would need new tables I think because it uses a different process
quigonjinn has joined #yosys
<rombik_su>
Arria 10 is 20 nm, Cyclone V is 28 nm
<ZirconiumX>
C10 is also 20nm
<ZirconiumX>
AIUI
<rombik_su>
Cyclone 10 GX is 20 nm, Cyclone 10 LP is *60* nm
<ZirconiumX>
...I'd noticed that the 10LP seemed to be slower than the IV
<quigonjinn>
I know arachne-pnr is not maintained, just a question in case someone here can answer. Running the tests of the latest commit fails with yosys-0.9 and 0.8, but is successful with 0.7. The relevant part is in the folowing paste: https://paste.debian.net/1122888/ It keeps allocating memory until the system run out of memory. Is this some bug with the latest versions of yosys, or just arachne-pnr is not compatible with
<ZirconiumX>
...If arachne-pnr is not maintained, isn't it a bit counterproductive to ask a question that involves arachne-pnr maintenance?
<whitequark>
you really just shouldn't use arachne-pnr
<whitequark>
it has no value beyond being a proof of concept. the quality of routing is very poor
<whitequark>
even if I knew the answer, I'd just tell you to not use it.
vidbina has joined #yosys
<quigonjinn>
just wondering if this may be a yosys bug, becauses it occurs with yosys being run
vidbina has quit [Ping timeout: 245 seconds]
vidbina has joined #yosys
X-Scale has joined #yosys
Jybz has quit [Quit: Konversation terminated!]
vidbina has quit [Ping timeout: 248 seconds]
vidbina has joined #yosys
emeb_mac has joined #yosys
fsasm has joined #yosys
<meawoppl>
Another question re: Global buffers and the ice40 package
<meawoppl>
I am getting this error:
<meawoppl>
```
<meawoppl>
`ERROR: BEL 'X9/Y0/io0' has no global buffer connection available`
<meawoppl>
and I am not sure what to make of it
<meawoppl>
I think this is confusing because I am mixing global-buffer with LVDS here
<meawoppl>
If I use SB_GB_IO with lvds, does the + pin have to be one with the global buffer tap?
vidbina has quit [Ping timeout: 260 seconds]
vidbina has joined #yosys
<daveshah>
Yes, it does
<meawoppl>
daveshah will it not let me use the COMP pin, or will it just be inverted?
<daveshah>
You will get an error if you use the COMP pin
<meawoppl>
makes sense, I am going to see if I can getaway without using it, because this boxes me into an annoying position, I have to use bank3 (subLVDS), and the GB wiring is on the comp line :/
emeb_mac has quit [Quit: Leaving.]
<daveshah>
Which chip is this?
<meawoppl>
the sg48 package
<meawoppl>
weird, I may have misread this
<meawoppl>
what is `the sysIO buffer`?
<daveshah>
This is a up5k
<daveshah>
?
<meawoppl>
yessir
<daveshah>
the sysIO buffer is just the IO buffer
<daveshah>
which pins are you trying to use
<daveshah>
I think at least one Lattice doc is wrong in terms of the up5k
<daveshah>
You don't have to use bank 3 for LVDS, definitely, and afaik the positive side is A not B
<meawoppl>
specifically that `IOB_3b_G6` seems to be the negative differential pair
<daveshah>
Yes, that looks to be a problem
<meawoppl>
confusingly numbered bank2 here
<daveshah>
That is the correct document
<daveshah>
The bank thing only applies to earlier iCE40 devices
<daveshah>
All of the UP5K pairs can be used as differential inputs regardless of bank
<meawoppl>
bah, so I was just looking at an old doc somewhere?
<meawoppl>
heh, I resister swapped this bank down to subLVDS voltages, now I suspect I have to do the swap....again to get to a differential with global input
<meawoppl>
bah
<meawoppl>
I am going to see if I can get away with this configuration for a bit, I am only using the clock as DDR input to the SB_IO, and a demuxing layer, so might be ok
fsasm has quit [Ping timeout: 258 seconds]
rombik_su has quit [Read error: Connection reset by peer]
<meawoppl>
so, somewhat confusing question
<meawoppl>
is partial bit-range assignment allowed in yosys