<sb0>
byte == deserialized word (10-bit, 20-bit, ...)?
<whitequark>
sb0: byte == word
<whitequark>
inconsistent use of terms
<sb0>
cr1901_modern: yes, the xilinx stuff does only barrel-shifting. there are features to shift the clock instead, but they are complex and broken.
<sb0>
the hack I'm using is to reset this crap until the comma is aligned
<sb0>
you get a random clock phase at each reset
<whitequark>
with ecp5 you just get a bitslip input you can strobe
<whitequark>
doesn't really get simpler than that
<whitequark>
i have only used the barrel shifting mode with pcie though, since it's the only mode supported with native pcie mode for some reason
<sb0>
someone wrote a paper on how determining the clock/comma phase, then using a MMCM to compensate, but it's quite a mess
<sb0>
"for some reason" - sounds like xilinx :)
<whitequark>
i think it's related to the custom pcie features like receiver detectin
<whitequark>
not sure though
<whitequark>
anyway, i never had to do comma alignment manually in the first place
<whitequark>
with ecp5 you put the commas (normal and inverted) into the serdes as parameters
<whitequark>
and it automatically aligns to them and locks
<whitequark>
that actually works well
<whitequark>
sb0: oh, i know why it does barrel shifting in pcie mode, actually
<whitequark>
the UG says that it can align to commas in some low time interval, like 4 symbol times
<whitequark>
you cannot do that with bitslip
<whitequark>
so i think what it does is it simultaneously matches commas in several different alignments
<whitequark>
probably not all 10
<whitequark>
but maybe 5 or 4
<whitequark>
and then reconfigures the barrel shifter after each symbol
<whitequark>
instead of doing bitslip 10 times
<sb0>
yeah well, that's not very hard to do and you can do it in fabric
<whitequark>
i know why they did it in the serdes
<sb0>
I implemented it for HDMI (with barrel shift)
<whitequark>
when you use 5 Gbps mode of SERDES, it only works with 1:2 gearing, because the fabric is not fast enough for 500 Msps rate
<whitequark>
actually it isn't really fast enough for 250 Msps rate either, you really want to use a PSCLKDIV primitive to do a 1:4 total gearing
<whitequark>
and then process it at 125 Msps
<whitequark>
with a very small gearbox in 250 MHz domain
<whitequark>
so, if you were doing bitslip in fabric, you'd probably have to do it post gearboxes, and if you do that, i think you blow timing constraints of some protocols it supports
<sb0>
what's the latency of those things?
<sb0>
with xilinx it's very high, 100-300ns, for some reason
<whitequark>
there is not enough info in UG for me to give you a precise number
<whitequark>
I can measure it for you though
<whitequark>
what methodology should i use?
<whitequark>
send a stream of 0 symbols, put a pulse in it, compare transmitted pulse time to received pulse time?
<sb0>
yes, with 8b10b disabled
<sb0>
or well, you can keep it enabled
<sb0>
just connect transceiver TX with RX, send something, and measure latency inside the fpga
<sb0>
they don't document the latency?
<whitequark>
maybe I can't read
<whitequark>
but they are definitely better at making bug-free IP than at documenting it
<whitequark>
remember when i said about several departments that dont talk to each other?
<whitequark>
sb0: oh, found the table
<whitequark>
they do document latenc
<whitequark>
sb0: what specific configuration are you interested in
<attie>
but they use the same signal in xilinx bram so you have to make the output look like that eventually.
<whitequark>
strictly speaking, xilinx synthesizer has to fold unnecessary registers in
<attie>
although I guess it doesn't matter if it's a different signal, as long as they're not both present simultaneously?
<attie>
actually idk what vivado does if you put two, isn't that the optional output register
<attie>
hmm actually latch address mode is only useful if you are writing on another port and waiting to see those changes. and unless you are in read first mode, that's undefined behavior.
<attie>
no wait, unless *the other port* is in read first mode
<attie>
maybe just wrapping it into something that disallows all the more complicated options is fine...
<attie>
would it be possible to have a lower-level "expert mode" module that has all the options, and a higher level one that offers the most sensible options sans footgun?
<whitequark>
attie: i think a set of warnings (on by default) would be useful
<whitequark>
can you look at uh
<whitequark>
give me a moment to push
<attie>
btw PR #90 was exactly that footgun, and the bug was in migen for years before anyone noticed
<whitequark>
yeah...
<whitequark>
and my ghetto solution (making re a constant) would have caught the bug
<attie>
for extra fun there was also a bug in the sim implementation that hid this bug
<sb0>
hartytp: so all SPI would be RTIO/realtime then?
<sb0>
then it's exactly the same driver as mirny
<sb0>
btw when will you test the PLL chip for phase determinism? that would be a big problem if it doesn't work
<sb0>
rewriting into kernels will add some more development time before the board does anything, though, and joe will likely be unhappy with that. and it seems joe still hasn't learned from what happens when he pushes unrealistic deadlines
<sb0>
actually, if we do that then we can strip the non-RT SPI support from the DRTIO firmware on the RTM side, and gain some memory
<sb0>
all satman would do then is basically program the si5324 and answer aux ping, moninj, and a few minor things
<sb0>
we can strip routing management since it's always a leaf node
<whitequark>
attie: sb0: do asynchronous memory write ports ever make sense?
<whitequark>
how would that even work?
<whitequark>
technically, yosys implements them, though i don't think it actually does anything meaningful with the cells
<whitequark>
write_verilog definitely emits invalid verilog for asynchronous write ports
<whitequark>
oh, nevermind, I've realized
<whitequark>
asynchronous write ports use enable as a strobe
<whitequark>
it definitely never makes sense to instantiate it in an nmigen design
<whitequark>
but the semantics of a $memwr cell like that is at least defined
<whitequark>
wow, there are *so many* errors migen isn't checking when using memories
<_whitenotifier-6>
[m-labs/nmigen] whitequark pushed 3 commits to master [+1/-0/±6] https://git.io/fhvHB
<_whitenotifier-6>
[m-labs/nmigen] whitequark 8d58cbf - back.rtlil: more consistent prefixing for subfragment port wires.
<_whitenotifier-6>
[m-labs/nmigen] whitequark a061bfa - hdl.mem: tie rdport.en high for asynchronous or transparent ports.
<_whitenotifier-6>
[m-labs/nmigen] whitequark c49211c - hdl.mem: add tests for all error conditions.
<whitequark>
attie: okay, can you go over the code i just pushed and see if you can find any combinations that are definitely undefined?
<attie>
whitequark: I think only combinations of two ports are undefined, no? if you have a transparent write port, a read from the same address is undefined.
<attie>
you have only write or read ports now, no ports that can do both?
<whitequark>
attie: hmm
<whitequark>
but you can request several ports from the same memory
<whitequark>
I'm confused as to how "transparent" is a feature of combinations of two ports
<whitequark>
"transparent" means that if a write (somehow) touches a memory location, it is immediately reflected at the output of this read port
<attie>
yeah, but your write port doesn't even have a read data signal any more
<attie>
so where would that even be reflected?
<whitequark>
on a different read port of the same memory
<whitequark>
that (dynamically) reads the same address
<attie>
yeah, except no because that's not how xilinx bram works
<attie>
it literally cannot offer that feature, and gives you undefined instead
<attie>
(IIRC in practice comes out to value changes next cycle)
<attie>
xilinx bram is transparent *only* on the same port you are writing from
<whitequark>
hm
<whitequark>
so how does their synthesizer know when to assemble read and write ports into a single bram port?
<whitequark>
does it look at the address syntactically?
<attie>
yeah one address signal = one port i think
<cr1901_modern>
pretty sure any pipelined softcore relies on a write being immediately available to read on a second port... and well, I've seen plenty of soft cores work on Xilinx
<whitequark>
(drive them to the same address)
<whitequark>
is you request a read port and a write port with the same address
<whitequark>
attie: right so the way you get a read/write port in nmigen
<whitequark>
since this *can* represent xilinx (with caveats) but the opposite cannot represent e.g. ice40
<attie>
cr1901_modern: well, once you start pipelining you have to detect hazards anyway. so you can wait one clock cycle longer.
<cr1901_modern>
attie: Fair, wasn't meant to derail, just I could've sworn I've relied on the exact behavior to work that Xilinx claims doesn't work. Hmmm...
<attie>
cr1901_modern: did you check that the component did end up in bram and not in dram?
<attie>
the *same* verilog description will infer to either, and merrily change behavior, merely based on size and the mood of the synthesizer that day.
<cr1901_modern>
attie: Will check tomorrow/when I'm a bit more focused. Tbh, I don't remember where the reg file goes on Xilinx devices
<cr1901_modern>
(same thing of course works on ice40, where only BRAM is present)
<attie>
unless you have an extremely large reg file, likely to be dram.
<whitequark>
06:34 < attie> the *same* verilog description will infer to either, and merrily change behavior, merely based on size and the mood of the
<whitequark>
wtf
<whitequark>
is this ise or vivado?
<attie>
vivado
<attie>
that's what PR #105 is about and why I run my own fork of migen atm
<whitequark>
ugh
<attie>
because I have a SyncFIFO, and depending on its depth and width, sometimes it works and sometimes it doesn't
<whitequark>
incredible
<whitequark>
i hate this
<whitequark>
i hate this passionately
<whitequark>
why do we even *use* xilinx
<attie>
personally, I do because someone gave me free FPGAs
<cr1901_modern>
attie: Is PR #105 your PR?
<attie>
yes, I'm nakengelhardt on github
<_whitenotifier-6>
[nmigen] whitequark opened issue #12: Implement a sanitizer for memory port combinations - https://git.io/fhvQl
<whitequark>
attie: ok so
<whitequark>
i've opened #12
<whitequark>
do you think you can write in the most painfully verbose way the exact requirements xilinx has for its bram and dram
<whitequark>
because i can implement the sanitizer but i find the requiremnts extremely confusing
<whitequark>
there will probably also have to be a "mode" parameter for Memory, to select between bram and dram and possibly dff
<cr1901_modern>
s/targets RTLIL/devel is underway/
<attie>
honestly, I've come to the conclusion that 'portable' verilog is a lie, and you need to manually instantiate xilinx BRAM to have any behavior guarantees.
<whitequark>
or at least stuff it full of attributes
<whitequark>
which is my plan here
<attie>
yeah that would work
<attie>
could also extend in the future to ultraram, which is only inferred with the attribute
<attie>
(haven't used it at all yet, no idea what new and exciting pitfalls await there :)
<cr1901_modern>
That's what attributes are for, no? :)
<whitequark>
no
<whitequark>
attributes are not to keep the synthesizer from illegally converting valid behavioral verilog to completely different gateware
<cr1901_modern>
I meant using attributes in the context of representing inherently non-portable HDL in a unified manner
<cr1901_modern>
If you can't reliably get xilinx BRAM to behave in the context of nmigen, using attributes is, well IMO, a good solution
<cr1901_modern>
which vivado output file would have the "this signal X mapped to Y primitive" information?
<attie>
At one point last year I tried to experiment with what verilog descriptions would synthesize into what BRAM settings, but I could not because *there was not enough other crap* in my design and so it never mapped to BRAM at all
<whitequark>
daveshah: question, TN1250 says that RE is only available for 256x16 BRAM configuration
<whitequark>
however, all the other primitives still have RE
<whitequark>
what is the reality like
<cr1901_modern>
attie: http://ix.io/1wul Gonna go ahead and guess Xilinx has been using dist ram all this time
<_whitenotifier-6>
[nmigen] whitequark commented on issue #12: Implement a sanitizer for memory port combinations - https://git.io/fhv73
<attie>
yep, that one has the behavior you'd expect (writes are immediately visible on the read port).
<attie>
hope you don't have a generic somewhere to change the size of the register file, or it might suddenly give you stale data without notice :)
<cr1901_modern>
nope, for lm32 it's set
<attie>
I think it's fairly rare to run into these problems because the usual use cases map to the appropriate sizes
<cr1901_modern>
I did some experiments w/ ice40 lm32 early this year; transparent reads are implemented using $memrd bypass circuitry that compare the write address to the current read address and enable a mux if they match.
<attie>
yeah, I guess they added that to the hard logic in xilinx bram but didn't add the cross-check to the other port.
<cr1901_modern>
IIRC, yosys does this at the RTLIL stage though (so the other backends might have the same impl w/ $memrd bypass as well)
<cr1901_modern>
I've never used the xilinx backend of yosys to check tho
<attie>
since they could be both write addresses and then you are personally responsible for ensuring no conflict anyway.
<cr1901_modern>
one write port, two read ports- write conflict isn't possible in this case (unless 2am me is more tired than usual)
<sb0>
whitequark: afaict intel isn't better and there are no alternatives to xilinx and intel for large and less-slow FPGAs
<whitequark>
ok
<attie>
yeah, if each of the ports only has one direction then there are fewer problematic cases and interactions.
<sb0>
might have used ECP5 on kasli though, if the speed is acceptable
<cr1901_modern>
sb0: pretty sure you've said intel is even worse. Like their high speed transceivers are completely unusable
<cr1901_modern>
does this ring a bell/like something you've said?
<whitequark>
it's not that they are unusable
<whitequark>
it's that they kill themselves if you stop the clock
<whitequark>
which is a hilarious silicon bug if you aren't trying to use that silicon
<attie>
ahaha that was also my discovery :D
<attie>
november last year was a fun month!
<whitequark>
i mean transceivers are fun in every family
<sb0>
yeah, other than that they just seem roughly as crappy as the xilinx transceiver
<whitequark>
ah.
<whitequark>
fun fact: ECP5 was likely going to be all 5G, but they fucked up
<whitequark>
with transceivers
<whitequark>
so they have to heavily bin them and *then* they overvolt them by 100 mV to actually get them to work at 5G
<attie>
They do give you a big critical warning about it though so at least I didn't fry my hosts' brand-new cards before they even got to use them.
<sb0>
I guess we can put a ECP5 on the sayma rtm too
<whitequark>
and the entire ECP4 series got cancelled because they couldn't get transceivers to behave at all
<whitequark>
sb0: this doesn't apply to operating at 3G btw
<whitequark>
you can use the cheaper non-5G parts at normal Vcore
<whitequark>
Vtr* even
<whitequark>
ECP4 was also going to get much larger fabric
<whitequark>
as it is, ECP3 has FPGAs with more LUTs than the largest ECP5 in existence
<whitequark>
which is pretty weird
<sb0>
what about speed?
<sb0>
this is usally the FPGA metric that sucks the most
<whitequark>
ECP5 has a smaller node so it's generally faster than ECP3
<whitequark>
do you want any specific numbers I can find?
<sb0>
how fast does misoc run on it?
<whitequark>
I haven't run misoc on it
<whitequark>
but I can try
<sb0>
also, artiq rtio
<whitequark>
oh something interesting wrt RTIO on ECP5
<whitequark>
many ECP5 pins (I think half of them? this part is a bit confusing) have an integrated gearbox
<whitequark>
so you can select 1:2, 1:4 or 1:7
<whitequark>
is this useful for RTIO maybe?
<sb0>
is it like a small SERDES?
<sb0>
Xilinx has this too, no?
<whitequark>
essentially
<sb0>
isn't that used for the SDRAM?
<sb0>
and 1:2 is a DDR register, right?
<whitequark>
yes and yes
<whitequark>
well, they group them all together
<sb0>
yeah we use this for the TTL PHY on Xilinx to get 1ns I/O resolution
<whitequark>
ah ok
<whitequark>
ECP5 clocking for this feature is very complicated and obnoxious
<whitequark>
I am not sure if it really is so complicated, or it is just because it is made for MIPI and SDRAM
<whitequark>
which are by themselves complicated and obnoxious
<sb0>
SDRAM is relatively reasonable given the constraints
<whitequark>
you can also align the data to clock edge or between clock transitions
<sb0>
it's basically designed like that to be cheap
<whitequark>
without having to mess with PLLs
<whitequark>
ah I see
<sb0>
low pin count, high bandwidth, high SDRAM chip yields
<sb0>
the problem is most SDRAM PHYs are fucked, but this is the case for most commercial IPs that have CDCs or asynchronous parts, especially when it's from xilinx
<sb0>
and SDRAM has a lot of async stuff
<sb0>
in misoc we're doing a bit of a hack for reading (not using DQS), and there's another hack I want to try to clean that up
<sb0>
the main issue is DQS is not free-toggling, so some data gets stuck unless you have some async logic that isn't implementable on xilinx fabric
<whitequark>
hm this is definitely complicated and obnoxious
<sb0>
what I want to do is have the controller issue 1 dummy-read after each legitimate read (just repeat the last read command) to make the SDRAM chips toggle DQS and pump data out of the FPGA I/O cell and into the elastic buffer
<whitequark>
as a matter of fact
<whitequark>
i should study how SDRAM works
<sb0>
then everything is implementable cleanly and without any xilinx hard-IP bullshit
<sb0>
well, yes, but it's done like that because DQS is bidirectional
<sb0>
doing it otherwise requires more pins, and SDRAM is optimized for rock-bottom cost
<sb0>
those pins have to be routed on every DIMM, connector, motherboard etc.
<sb0>
also if it's a free-running clock, you can't mux it. with the current design you can connect DQS pins of several DRAM ranks on a bus, and use chip select
<whitequark>
so why is DQS a thing? don't you already have a clock?
<whitequark>
or is that clock used only as reference and DQS for actually transferring data?
<sb0>
yes, but the skew isn't known and can vary from chip to chip and with PVT
<whitequark>
ok
<sb0>
with DQS all data transfers are source-synchronous
<whitequark>
i see, your workaround makes sense
<sb0>
we need some sort of FIFO that can write on both clock edges
<sb0>
this isn't doable on xilinx fabric
<sb0>
so we need IDDR + regular FIFO, which needs some cycles to flush the pipeline
<whitequark>
what if you use two FIFOs?
<sb0>
maybe, but matching the delays (since this will be coming straight from the I/O pin at >1Gb/s) will be very tricky
<whitequark>
hmm okay
<sb0>
also, more routing delays inside the FPGA = more absolute VT drift
<sb0>
the "repeat last read command" hack seems much more reliable and easier, and also should have a negligible impact on performance
<whitequark>
i mean you only need that before going to a different state, right?
<whitequark>
back to back reads should be fine
<sb0>
the main issue with it is the additional cycle it would take for read-to-write turnaround
<sb0>
absolutely, it's a pipeline
<whitequark>
yeah, it definitely sounds much more reliable
<daveshah>
whitequark: I'm pretty sure the RE/WE pins on the ice40 are subtly broken in various ways and both Yosys and icecube use RCLKE/WCLKE
<whitequark>
wtf
<whitequark>
ok good old fpga bugs
<daveshah>
I think tying WCLKE high and using WE instead breaks initialised BRAM
<whitequark>
Is setting up PDH locks really something you want/need to do with ARTIQ, or is this better done as a simple "standalone" board of some sort (e.g. an Arduino shield) where you write some quick and dirty GUI and then set-and-forget?
<whitequark>
arduino shield...
<whitequark>
really?
<sb0>
probably even less than ECP5, maybe even MCU
<whitequark>
ah, then put iCE40 there
<sb0>
though a nice FPGA potentially allows fancier locks, but the NIST folks know a lot more than I do...
<whitequark>
I should see how fast picorv32 can get on UP5K
<whitequark>
remember those numbers you said are shit? well I looked closer and we've actually improved the toolchain quite a bit since then
<whitequark>
there was a lot of low-hanging fruit in yosys and of course arachne was not timing-driven
<sb0>
I have very little experience with lasers, chiefly because doing it with trashed equipment from eBay (which is the only thing I can afford) is a pain and time sink
<whitequark>
well I know iCE40 practically inside out at this point
<sb0>
it's much worse than vacuum where parts are more repairable (give them a good cleanup), cheaper, and more generally available
<whitequark>
it's easy to design for, easy to write HDL for, and generally not a pain
<whitequark>
there are a few stupid gotchas with pin assignment, mainly
<whitequark>
the way they do clocks and PLLs is just not very good and requires forethought at PCB design phase
<whitequark>
each PLL is assigned to an IO buffer and if you feed a PLL to a GB then it eats the I part of IO (?!)
<whitequark>
this is of course only mentioned in the footnotes and there is no table anywhere that says which PLL goes where
<_whitenotifier-6>
[nmigen] nakengelhardt commented on issue #12: Implement a sanitizer for memory port combinations - https://git.io/fhvFk
<whitequark>
you have to look at bondouts and such
<whitequark>
again the department htat made silicon did not talk to the one making documentation
<attie>
ok, I've added what I think I know about the xilinx BRAM behavior.
<whitequark>
thanks!
<whitequark>
>sometimes it will find a read register somewhere even if you intended to write an asynchronous read port
<whitequark>
...
<whitequark>
sb0: what do you think about running *only* $mem cells through Yosys Xilinx techmapping in nMigen?
<whitequark>
because this kind of shit is just absurd
<whitequark>
are they even trying?
<attie>
that one isn't really a "bad xilinx" though
<attie>
it's just that if your address didn't have to go through many combinatorial stages, the resulting verilog looks the same
<whitequark>
syntactically or semantically?
<attie>
both
<whitequark>
hm ok
<attie>
I mean, if you got the address from some module that has an output register
<attie>
and then feed it to your asynchronous read port
<attie>
and migen puts the whole thing in a single file
<whitequark>
ah I think I see
<whitequark>
so what does Yosys do here...
<attie>
how would it know that the register was meant to be associated with *this* bit of code rather than *that* bit of code
<_whitenotifier-6>
[nmigen] daveshah1 commented on issue #12: Implement a sanitizer for memory port combinations - https://git.io/fhvFg
<whitequark>
hm, Yosys always infers 7-Series BRAMs in READ_FIRST moed.
<daveshah>
whitequark: a small picorv32 design can do about 26MHz with nextpnr
<whitequark>
daveshah: iirc when I last tried it was about 13 MHz with arachne
<whitequark>
but it was mor1kx
<whitequark>
I should try misoc again
<daveshah>
That was on up5k
<whitequark>
oh it was on hx8k
<daveshah>
Probably 60MHz on hx8k is doable
<whitequark>
sb0: ^ that's far more reasonable than arachne numbers
<whitequark>
though picorv32 is still weirdly slow
<whitequark>
I should definitely try mor1kx again
<daveshah>
I think nextpnr is at least 30% better than arachne on picorv32
<daveshah>
Plus there have been the recent improvements to Yosys by you and tnt
<whitequark>
attie: do I understand it right that READ_FIRST/WRITE_FIRST/NO_CHANGE is primarily to determine what happens with data out of the writing port?
<whitequark>
effectively that configures some sort of mux and register of the read port, right?
<whitequark>
in READ_FIRST it just does a read, in WRITE_FIRST it muxes data in to data out, in NO_CHANGE, WE gates the clock to that register
<whitequark>
this probably makes more sense if you're looking at the bitstream...
<daveshah>
At least on ECP5 it's very uninformative - just one random bit for either READBEFOREWRITE or WRITETHROUGH modes
<daveshah>
Which I think behave quite similarly
<whitequark>
hm, I think I get it
<whitequark>
>one port is writing AND it is in READ_FIRST mode, in which case the other port will read the old data. (Note that it is the setting of the write port that defines the behavior of the read port. Setting of the read port is irrelevant.)
<whitequark>
this part is just what it would do anyway, isn't it
<whitequark>
like, you are configuring the read behavior through the write port because the read mux/register is fed by the WE signal
<whitequark>
of the write port
<whitequark>
daveshah: does this make sense to you?
<whitequark>
it feels like a really sloppy abstraction to me, barely covering the underlying macro
<whitequark>
which is why it's so weird
<daveshah>
Interestingly the ECP5 also has a NORMAL mode where the read port is undefined while writing
<whitequark>
what bit does that set?
<whitequark>
or
<daveshah>
No bits
<daveshah>
It's the default
<whitequark>
does that just remove a false path between DIN and DOUT?
<whitequark>
ah
<whitequark>
so you have two bits for READBEFOREWRITE and WRITETHROUGH
<whitequark>
interesting
<daveshah>
Yes
<whitequark>
I wonder what the former bit does
<daveshah>
I'm not fully sure how they behave when dealing with both ports
<whitequark>
yeah I'm coming to conclusion that we need some sort of testbench
<whitequark>
to validate whether we actually instantiated memory in a sane way
<daveshah>
I wonder if they have to have explicit bypass logic even for the READBEFOREWRITE case
<whitequark>
this is very obnoxious
<daveshah>
I dare say no one has mentioned Intel yet either
<daveshah>
God knows what cursed shit they do
<attie>
I didn't really think about the underlying hardware yet.
<attie>
But to sum up my feeling about xilinx bram handling, "I can see how you got here but maybe you should reevaluate your life choices."
<whitequark>
yeah i definitely concur
<whitequark>
i sort of understand it now but it's still a nightmare
<attie>
oh since I just had to scroll through another hundred of this delightful message, Xilinx's official advice on this: "It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation."
<whitequark>
wonderful.
<whitequark>
yeah there's definitely a huge sim/synth mismatch in migen wrt RE
<whitequark>
actually, this doesn't even make any sense
<whitequark>
WRITE_FIRST merely latches address
<whitequark>
so in case of address conflict MemoryToArray is just wrong
<whitequark>
but it's wrong in a yet another way, different from Xilinx
<whitequark>
this is a horrible mess
<whitequark>
i think MemoryToArray is just mostly completely wrong
<whitequark>
this needs to be simulated in a completely different way
<attie>
I thought I made it match with what BRAM is doing at least. What else is wrong?
<whitequark>
it might match the Xilinx behavior, I don't understand it that well
<whitequark>
but it's wrong in general
<whitequark>
well
<whitequark>
ohhhh I see
<whitequark>
WRITE_FIRST effectively implements an async read port
<whitequark>
except the address is latched
<whitequark>
this is horrible but it should be correct
<whitequark>
ah, no, it's not exactly correct
<whitequark>
attie: so, let's say you have a WRITE_FIRST port with has_re=True
<whitequark>
now let's say you set re low
<attie>
mm yeah that case might be badly handled
<whitequark>
this is the general problem with the address latching trick
<whitequark>
I thought it was more broken than it is, but it's still broken
<whitequark>
can't exactly blame you
<attie>
it's not "address latching trick" so much as "this is the xilinx macro for write first"
<whitequark>
attie: what I do in nMigen simulator is I abuse one of the simulator implementation details to do actual forwarding
<whitequark>
abuse in the sense that this is an implementation detail no nMigen consumer may rely on
<whitequark>
but it will work correctly with any number and any kind of ports
<whitequark>
even multiple write ports
<whitequark>
let me write the tests for it and push so you can see
<attie>
but the re signal is not conforming to the xilinx macros, so I'm not sure exactly how the xilinx tools interpret it
<whitequark>
yes, this thing also bothers me about Migen's Memory primitive
<whitequark>
it's tailored to Xilinx enough to make it badly fit other FPGAs, but at the same time not enough that you won't get UB
<whitequark>
like half of the possible combinations give you nonsensical behavior on Xilinx
<whitequark>
honestly, the more i look at it, the more unhappy i am about it
<whitequark>
and xilinx FPGAs
<whitequark>
and Verilog
<whitequark>
really, fuck all of this shit, who thought synthesis from Verilog was a good idea in the first place
<attie>
that's a permanent way of life if you work in this area :D
<whitequark>
well ice40 does not give me this kind of pain, and ecp5 also seems to behave sanely
<attie>
we practically have weekly bitching sessions about "why are our tools so terrible"
<whitequark>
"weekly" implies this ever stops
<attie>
well in a 20 person office you have to shut up sometimes or the people from other departments will be cross.
<whitequark>
lol
<_whitenotifier-6>
[m-labs/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fhffT
<_whitenotifier-6>
[m-labs/nmigen] whitequark a40e2ca - back.pysim: fix an issue with too few funclet slots.
rohitksingh_work has quit [Read error: Connection reset by peer]
<whitequark>
attie: can you take a look at my simulation models and tests?
<_whitenotifier-6>
[m-labs/nmigen] whitequark pushed 1 commit to master [+0/-0/±2] https://git.io/fhfIy
<_whitenotifier-6>
[m-labs/nmigen] whitequark fbb5eab - hdl.mem: add simulation model for memory.
<whitequark>
basically, in nmigen, there is no MemoryToArray; it directly emits $memrd and $memwr cells for Yosys while providing a behavioral model for the simulator. the simulator actually doesn't understand memory at all.
<whitequark>
and I model transparent ports by combining an async port with a latch that prevents any changes while clock is high
<_whitenotifier-6>
[m-labs/nmigen] whitequark pushed 1 commit to master [+0/-0/±2] https://git.io/fhfIN
<_whitenotifier-6>
[m-labs/nmigen] whitequark e58d9ec - hdl.mem: add simulation model for memory.
<whitequark>
the *only* case where you could observe mismatch is if you gate a clock to some submodule while it is low and start changing the address and look at the outout
<whitequark>
but this is sufficiently pathological that it's not worth fixing I think
<hartytp>
white quark: "hopefully artiq will run on nmigen (in compat mode) before end of year " cool!
<hartytp>
sb0: "the only potential issue I see with RTM DRTIO is the size of the satman firmware that has to fit in BRAM " shall we stick a bigger Artix on the RTM?
<hartytp>
we can always go back to a smaller one in a future design revision once we've finished debugging
<hartytp>
but right now I need something that works and FPGAs are cheaper than code optimization
<hartytp>
"hartytp: so all SPI would be RTIO/realtime then? "
<hartytp>
"then it's exactly the same driver as mirny "
<hartytp>
yes and yes
<hartytp>
seems like a much nicer way of doing things so long as RTM DRTIO works
<hartytp>
if it doesn't then we can fall back to doing things in fw
<hartytp>
but that feels like a hack
<hartytp>
"btw when will you test the PLL chip for phase determinism? that would be a big problem if it doesn't work "
<hartytp>
soon. The eval board is waiting for me in Ox, but I'm in the US now for Christmas
<hartytp>
will test it in first week of Jan
<hartytp>
but, even if it doesn't work, it's still a better choice than the HMC830. We'd just need to add some extra logic to measure the phase and reset until it gives us what we want
<hartytp>
"rewriting into kernels will add some more development time before the board does anything, though, and joe will likely be unhappy with that. and it seems joe still hasn't learned from what happens when he pushes unrealistic deadlines "
<hartytp>
not necessarily
<hartytp>
my plan would be that I do the port to kernels in a branch
<hartytp>
you can focus on getting the existing system working without synchronisation (which no one needs straight away anyway)
<hartytp>
and I'll get the kernel port working in parallel. Also a good time to clean up some of the code which is getting to be a bit of a mess
<hartytp>
"actually, if we do that then we can strip the non-RT SPI support from the DRTIO firmware on the RTM side, and gain some memory "
<hartytp>
that sounds like a good plan! Let's just make the RTM a minimal satman
<hartytp>
"whitequark: hey maybe the new laser PDH locker can use Lattice "
<hartytp>
why would that card need an FPGA? I'd assumed it would be a simple micro controller to just configure some SPI settings and do some non-realtime ADC readout for diagnostics
<hartytp>
the PDH modulation/demodulation would usually be on a separate card to the servo. the servo might want an FPGA if you really want to push BW to the few MHz level (max you can get out of a diode)
<hartytp>
but that would be on a different board, or could just use stabilizer if you are happy with BW in the hundreds of kHz range (which one usually is)
<_whitenotifier-6>
[m-labs/nmigen] whitequark pushed 2 commits to master [+2/-0/±1] https://git.io/fhfGg
<_whitenotifier-6>
[m-labs/nmigen] whitequark fc7da1b - hdl.ir: do not flatten instances or collect ports from their statements.
<_whitenotifier-6>
[m-labs/nmigen] whitequark 00ef7a7 - compat: provide verilog.convert shim.
<sb0>
hartytp: the 15t is the same silicon as the 50t :)
<sb0>
but we can put devices that say 50t on the package and the idcode if that's safer...
<sb0>
26MHz or even 60 is much slower than xilinx. and I don't know why everyone uses picorv32, it's a pretty bad CPU
<sb0>
whitequark: synthesizing the memory cells is a good idea. i was considering doing it within migen, but it's even better if we can reuse the yosys code
hartytp has joined #m-labs
<sb0>
note that SyncFIFOBuffered uses an external register to turn the RAM read from async into sync
<sb0>
so it will need rewriting
<hartytp>
sb0: you mean that we can probably tell vivid we have a 50T and flash that onto a 15T and it will probably work?
<sb0>
iirc that's the only significant place where this trick is used
<hartytp>
s /vivid /vivado
<sb0>
hartytp: no, it will fail idcode check. so that and the bitstream CRC need to be edited after synthesis.
<hartytp>
so, question is how many Sayma boards we'll make in the next few years and the cost of FPGAs versus cost of paying someone to hack bitstreams
<hartytp>
my guess is the FPGAs are cheaper ;)
<sb0>
oh it's a trivial hack. i've been wondering if it should be enabled in migen by default.
<sb0>
that and smashing other obnoxious vivado features such as webtalk
<hartytp>
well, ultimately I'll leave FPGA-related decisions up to you and rjo
<hartytp>
my only comment is that Sayma working quickly is quite valuable to me, so if we're going to rely on doing this then it had better work with minimal fuss
<sb0>
yeah, maybe we can stuff 50t on the protos and use 15t later...
<sb0>
if that needs to be done in a hurry then i'd say go for 50t
<sb0>
they're pin compatible
hartytp has quit [Ping timeout: 256 seconds]
<sb0>
but then those protos will have a small compatibility issue
hartytp has joined #m-labs
<hartytp>
sb0: ack
<hartytp>
the real question is how many Sayma boards are ever going to be produced
<hartytp>
if they become widely used then there will be plenty of incentive to improve things
<hartytp>
but, right now, we have a really expensive RF card produced in small quantities and we're worrying about like $30 worth of additional FPGA costs. Unless the batch size increases that FPGA cost is nothing compared to even relatively trivial gateware work
<hartytp>
anyway, as I said, your call.
<hartytp>
if we need to, TS/Creotech also offer FPGA replacement pretty cheaply (we've had some dead Kaslis fixed that way).
<sb0>
so the 15T has ~100kbytes of BRAM
<sb0>
(usable)
<sb0>
and the 50T has 300
<sb0>
without any optimization and with switching support, satman is at 88K of code
<sb0>
you need some memory on top of that for bss and stack
<sb0>
if we really want to spend the absolute minimum amount of time on this, then the 50t is a safer choice
<hartytp>
let's assume Sayma is >$500 per channel (so $4k for the AMC + RTM), which would be really good going for an RF card like this and is almost certainly an underestimate of actual costs
<hartytp>
so adding $40 for the RTM is a 1% increase on the cost.
<hartytp>
obviously it's a slippery slope if one takes that attitude everywhere, but in this case it seems like a no brainer to me
<hartytp>
ultimately, the thing that is most likely to kill Sayma as a project is lots of small issues causing delays until everyone gets fed up and gives up
<hartytp>
anywhere where we can inject a small amount of cash to reduce time/risks seems worth doing to me
<sb0>
oh but hacking xilinx fpgas is fun
<sb0>
unlike e.g. microtca supplies not working
<hartytp>
haha
<hartytp>
well, up to you
<sb0>
let's go for 50t, we can populate the other one later, and we have sayma v1 to test
<hartytp>
really nothing more I can add to this other than to reiterate that currently my plans for Sayma rely on DRTIO so I need it to work fairly fast
<hartytp>
:)
hartytp has quit [Ping timeout: 256 seconds]
<cr1901_modern>
sb0: The idea behind picorv32 is that it can in most designs it can be run as a control CPU without requiring a separate clock domain between your speed-sensitive logic and the CPU. It runs at like 700MHz on Virtex devices.
<cr1901_modern>
But of course to meet that goal, everything is registered, and CPI is > 1. The fact that it is small is a nice side effect.
m4ssi has quit [Remote host closed the connection]
<key2>
whitequark: we managed to port minerva to nMigen
<key2>
but that crashes yosys
<key2>
(have not tested it yet tho, just generated)
<key2>
so we generate the rtlil, and use yosys to do opt before generating the verilog