<whitequark>
the problem is that the value of next_state is never advanced to state
<whitequark>
next_state and next_state$next are both 0, and state and state$next are both 1
<whitequark>
line 616 is what should transfer that value
<whitequark>
except... it never happens
<whitequark>
i *suspect* that what would work, is if i separated that gigantic always @* block into smaller processes
<whitequark>
like, determined sensitivity lists for each signal, then grouped signals by sensitivity lists, then grouped signals so that signals that appear together on lhs are combined
<cr1901_modern>
Also "$cr1901cc -pedantic > a.out; cat a.out" "Verilog was made in 1984 and wasn't meant for synthesis. It was shoehorned into that role."
<whitequark>
cr1901_modern: IEEE 1364.1
<cr1901_modern>
I know... still I don't think synthesis was what the 1984 WG had in mind :P. Anyways, what does "translation layer" mean in the context of the nmigen simulator?
<whitequark>
the $next stuff
<cr1901_modern>
$next is a verilog function?
<whitequark>
no
<cr1901_modern>
So you have an nmigen simulator written in python. Okay cool. But you also want the ability to generate Verilog code that works with an external simulator? And with nmigen there's no longer a way to generate Verilog code without sim/synth mismatches (because of yosys)?
<cr1901_modern>
So you have to add compat RTLIL to make the output Verilog play nice w/ sims?
<whitequark>
not necessarily because of yosys
<whitequark>
for some reason.
<cr1901_modern>
If it's not yosys, what could it be? (yosys RTLIL generation being arguably the biggest change from migen)
<whitequark>
well, all the $next stuff
<attie>
wait so if I'm reading this correctly
<attie>
next_state is the same as \next_state$next
<attie>
(by assign, lower down)
<attie>
and just before \state$next is set, \next_state$next is set to state
<attie>
by a blocking assign
<attie>
so next_state will always be assigned the value of state?
<attie>
but that's not what you're seeing either, is it?
<whitequark>
indeed
<attie>
I did observe the same problem when I tried to use blocking assign last year
<whitequark>
what did you do?
<attie>
...not use blocking assigns, and therefore not use non-migen simulators.
<attie>
and yes, the conclusion everyone came to is that we have to analyse the code and print only the direct DFG antecedents of a signal.
<attie>
I do believe that some of these simulation bugs (or maybe just features of the verilog specification?) do happen with manually written code too.
<attie>
it's just that you see them coming and avoid them by structuring your code differently.
<whitequark>
>There can be whole groups of those in blocks with multiple assignments. Crucially, all the resets happen at the beginning of the block. With blocking assignments, these resets sometimes end up overwriting the values that should be assigned, and spurious reset values get propagated through the design.
<whitequark>
right, this is exactly what I'm hitting
<whitequark>
attie: by "direct DFG antecedents" what do you mean here exactly?
<whitequark>
I'm not yet sure if we're talking about the same transform
<sb0>
blocking assignments are also quite problematic, since they happen without a delta-cycle the result of the simulation may depend on the order that the simulator chooses for always blocks
<attie>
well, ultimately, all signal values have to come from a combination of register values and inputs
<attie>
otherwise you have a combinatorial loop
<attie>
so you can construct a directed acyclic graph of signals and operators
<attie>
and then for each signal
<attie>
print only the subgraph that is strictly necessary for that signal
<sb0>
this will probably speed up the simulation too
<whitequark>
attie: and this subgraph goes into the individual always @* block, right?
<attie>
yeah
<whitequark>
ok
<whitequark>
sure
<whitequark>
i hate it, but i can do this
<whitequark>
it is necessary anyway for the integrated simulator
<whitequark>
because it is too slow otherwise
<sb0>
isn't there an existing yosys transform that does it?
<whitequark>
sb0: the short answer is there is, but the resulting verilog will not be human readable
<attie>
do we care about that?
<whitequark>
well, how do i debug it?
<sb0>
yes
<whitequark>
sb0: also the yosys transform does not help pysim
<sb0>
and isn't this a problem with yosys that should be improved?
<whitequark>
yosys does not generally aim to produce human readable verilog with write_verilog
<whitequark>
i.e. all yosys transforms work on the IR where processes are expanded
<attie>
how do they debug it then?
<whitequark>
that's different
<whitequark>
in case of nmigen you have one more layer, nmigen itself
<whitequark>
nmigen's debug output is yosys verilog
<whitequark>
sb0: anyway, so write_verilog does not guarantee, in general, that processes are represented correctly
<whitequark>
nmigen specifically emits only processes that write_verilog *can* represent
<whitequark>
but yosys transforms neither try to preserve that nor they attempt to transform processes
<whitequark>
this means that if we run a yosys transform, all processes are gone
<sb0>
processes?
<whitequark>
and you are left with a sea of operators and functions
<whitequark>
a process is an always block with if, case, etc stuffed inside
<whitequark>
almost all yosys passes require this to be expanded into mux trees
<whitequark>
write_verilog ends up representing these muxes as verilog functions with case statements inside
<whitequark>
and adds a ton of spaghetti
_whitelogger has joined #m-labs
<whitequark>
daveshah: ECP5 has a command to zero an entire BRAM, right?
<whitequark>
ice40 doesn't iirc
<_whitenotifier-6>
[m-labs/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fhJTz
<whitequark>
ironically, this actually made the generated verilog more readable
<whitequark>
even though it is now much larger
<whitequark>
interestingly, this substantially pessimized synthesis with yosys
<whitequark>
by like 40%
<sb0>
no bug even when I put two cards in the same rack. surprising!
_whitelogger has joined #m-labs
<daveshah>
whitequark: I don't think so, I think zeroing is a bit trickier than that
<daveshah>
As far as I know, BRAM are zeroed by setting their WID to 0 in the main bitstream (WIDs of 3 and above are used for initialized BRAM)
<daveshah>
But I haven't really played with that and just play it safe and init everything as if it were initialised with data
<whitequark>
interesting
<whitequark>
sb0: wow, you are right
<whitequark>
iverilog is several ORDERS OF MAGNITUDE faster processing nmigen code after I added the transform
<whitequark>
it is not even funny
<whitequark>
and yes, it no longer craps out due to fake comb loops
<whitequark>
sb0: unfortunately, since this is FPGAs, of course, this broke synthesis
<whitequark>
pessimized LUT count by like 50%
<daveshah>
I'd be curious to see if this was just Yosys or a more general issue
<whitequark>
daveshah: well
<whitequark>
there are two possible outcomes here
<whitequark>
depressing and really depressing
<whitequark>
so I prefer to not know
<whitequark>
but I can give you two .v files
<whitequark>
sb0: so it looks like the "new" nmigen generated code is at least 200 times faster than "old" code plus a workaround to make iverilog function at all
<daveshah>
whitequark: yeah, I'd be happy to take a look
<whitequark>
the plan is to make the Yosys frontend tolerate nMigen output
<daveshah>
yes, given that Vivado copes with this then Yosys should too imo
<whitequark>
and then, if your shitty proprietary toolchain cannot cope with this input
<whitequark>
you do coarse synthesis with Yosys and feed it the result
<whitequark>
and/or complain to the vendor of the toolchain
<daveshah>
echo complaint > /dev/null
<whitequark>
daveshah: well maybe someone in $1**mm in revenue will complain some day and they will care.
<whitequark>
but until then, this is not my problem anymore
<whitequark>
it is the problem of the user of the shitty toolchain
<whitequark>
I am far more interested in improving Yosys frontend than adapting nMigen to badly programmed proprietary frontends, given that the result for this code is the same anyway
<whitequark>
anyway, I think I can also use this process splitting to speed up pysim
<whitequark>
anyway, I think I can also use this process splitting to speed up pysim
<whitequark>
er
<sb0>
whitequark: oh of course, you need to disable this for synthesis
<sb0>
otherwise there will be a lot of duplicated logic
<sb0>
or is vivado really smart enough to produce exactly the same result after the transform?
<sb0>
how well-tested is this?
<sb0>
amazingly, I can even flash the sayma in the crate, no JTAG scansta/utca shenanigans so far
<sb0>
ah one of the two boards craps out
<sb0>
Error: JTAG scan chain interrogation failed: all zeroes
<sb0>
that's more like it
<sb0>
wow, even ethernet works in the crate
<sb0>
no RF output at all, of course, so I can't reproduce the ramp issue...
<sb0>
log looks ok
<sb0>
ah it's only one channel that is borked and of course i was measuring that one. could be just a broken SMP or something silly like that
<GitHub-m-labs>
migen/master 5e7c71a Florent Kermarrec: build/xilinx/vivado: use build_name instead of top in synth_design
<bb-m-labs>
build #357 of migen is complete: Failure [failed python_unittest] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/357 blamelist: Florent Kermarrec <florent@enjoy-digital.fr>
<hartytp>
the only ones I've played with are the Keysight Infiniium
<hartytp>
lovely things
<hartytp>
but you need $20k to get your hands on one
<sb0>
what about something less expensive?
<sb0>
maybe 10k
<hartytp>
tbh I've not played with much else.
<hartytp>
we also have an old Tek scope
<hartytp>
it runs on windows me and has a really old-school printer build into it (in case you want to grab a hilariously crappy hard copy of your data straight from the scope)
<hartytp>
not surprisingly we don't use that one much
<hartytp>
scopes which can see microwaves are expensive
<hartytp>
FWIW, R&S and Lecroy suck
<hartytp>
IME
<sb0>
what's the software on keysight? isn't it also windows-based?
<hartytp>
yes
<hartytp>
can't remember the vintage though
<sb0>
seems it's xp or 7
<hartytp>
one of the things I like about the key sight one is that it seems to do a lot more dsp on its FPGA
<hartytp>
much more responsive than the equivalents from other vendors that I tried
<hartytp>
we have the 4GHz ABW model. it's the same hw as the 20GHz BW model, but has software locks on it
hartytp has quit [Ping timeout: 256 seconds]
<rjo>
IMO it's pointless to get a scope just for the glitches.
<key2>
can't u just rent one ?
<sb0>
I'm thinking it could be used for other things, e.g. debugging SYSREF issues
<sb0>
otherwise this has to be funnelled through WUT
<sb0>
now I know I have to expect tons of hardware bugs with this stuff, so I want to plan and budget accordingly
<rjo>
ack. then get a good 1 GHz bandwidth scope. i'd look at the second hand refurbished dealers.
<rjo>
sb0: big carry toggles. data that doesn't do much if the code downstream does the right thing but wreaks havoc if there are framing/byte/nibble shuffling/scrambling issues.
<rjo>
it was just a hunch that there might be something wrong with the jesd core that the simple linear ramp would not expose. but AFAICT it's not needed since the 50 MHz generator also exposes the same issue.
<rjo>
sb0: my suspects for #1166 are: 1) some CDC mix up/issue 2) jesd core/packing.
<rjo>
sb0: i went through the verilog to check (1) a couple months back when this occurred and didn't find anything.
<rjo>
sync.rtio is the right domain AFAICT.
<rjo>
but there were some refactorings around the CDs some time ago.
rohitksingh has joined #m-labs
rohitksingh has quit [Remote host closed the connection]
<cr1901_modern>
sb0: Are you talking about the broken test, or the PR I made that broke said test?
<cr1901_modern>
sb0: All I wanted was the ability for migen to autodetect the diamond toolchain, like it does for ISE and Xilinx. Diamond's versioning scheme doesn't satisfy StrictVersion, so I improvised using LooseVersion
<cr1901_modern>
ISE and Vivado*
<whitequark>
sb0: vivado actually seems to cope with this splitting okay
<cr1901_modern>
daveshah: LSE is icecube only, or does Diamond let you use it as well?
<cr1901_modern>
B/c I thought Diamond was Synopsys
<daveshah>
I was using LSE with Radiant here
<daveshah>
All three tools - Diamond, Radiant and icecube - have both synthesis tools
<cr1901_modern>
Hmmm interesting.
<cr1901_modern>
icecube synposys will choke on even a simple soc design, but diamond handles it just fine
mumptai has joined #m-labs
<daveshah>
So the synplify in icecube will be configured (and it's techmaps developed) by SiliconBlue, whereas Diamonds Synplify will be configured by Lattice (and to some extent perhaps, AT&T/Lucent/Agere)
lkcl has quit [Ping timeout: 268 seconds]
lkcl has joined #m-labs
<_whitenotifier-6>
[m-labs/nmigen] whitequark pushed 2 commits to master [+0/-0/±3] https://git.io/fhJuz
<_whitenotifier-6>
[m-labs/nmigen] whitequark fd89d2f - hdl.ir: factor out _merge_subfragment. NFC.
<_whitenotifier-6>
[m-labs/nmigen] whitequark 68dae9f - hdl.ir: flatten hierarchy based on memory accesses, too.