ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at · logs at · IRC meetings each Monday at 1800 UTC · next meeting TBD
jeanthom has quit [Ping timeout: 265 seconds]
Degi has quit [Ping timeout: 260 seconds]
Degi has joined #nmigen
<whitequark> mh, guess not, let me finish describing this stuff anyway
<awygle> oh sorry
<awygle> i'm back
<awygle> drifted away
<whitequark> awygle: alright
<whitequark> so, i explained the state of cxxrtl, where the inelegance of triggers matters relatively little
<awygle> mhm
<whitequark> ah, one thing that cxxrtl would greatly benefit from is if you could clock domains directly rather than clock signals. with the current edge detector architecture that can't work as well.
<awygle> ah
<whitequark> i haven't worked on that goal specifically because almost everyone who tried cxxrtl said that it's more than fast enough
<whitequark> anyway. let's look at cxxsim now.
<whitequark> cxxsim performs cosimulation of python processes with the cxxrtl process (singular), which adds a lot of moving parts
<whitequark> for example, cxxrtl inputs (of the toplevel module) are value<>, which is fine for cxxrtl, which never writes to those. however, python processes most certainly can both read and write. what do? making those toplevel inputs wire<> would double the amount of delta cycles in best case.
<whitequark> well, the solution i came up is that cxxsim (the python module) creates a virtual wire<> whose `curr` part is c++ owned value<> that's a part of the netlist, and `next` part is python owned pseudo-value<> that only exists to make python processes deterministic.
<whitequark> which is lowkey cursed but it works as multiple processes only ever arise python-side
<whitequark> *as long as
<whitequark> but that's not the problem. the problem is triggering
<whitequark> see, if a python process waits on some signal (currently, that's pretty much always a clock), and the cxxrtl process has registers that trigger on that same signal, they *must* be evaluated concurrently, or you'll get a race.
electronic_eel has quit [Ping timeout: 246 seconds]
electronic_eel has joined #nmigen
<whitequark> okay, i explained enough context to explain the actual issue
sakirious has joined #nmigen
PyroPeter_ has joined #nmigen
PyroPeter has quit [Ping timeout: 256 seconds]
PyroPeter_ is now known as PyroPeter
emeb_mac has quit [Quit: Leaving.]
Bertl_oO is now known as Bertl_zZ
<whitequark> so, the issue arises when python processes wait on signals. with the cxxrtl process it's easy: it simply polls the async input on every call to eval(). this works because the simulator can advance the simulated time once the cxxrtl process converges.
<whitequark> python processes can't busy wait, so instead they register a trigger and go to sleep.
<whitequark> right now, the way i process triggers, is that during commit i check every one of them in sequence and compare curr/next
<whitequark> (similar to what cxxrtl does when the clock is a wire<>)
<whitequark> this works for inputs and registers, but it doesn't work for comb outputs, aliases, or anything else
<whitequark> since comb outputs in cxxrtl netlists don't have curr/next, to fix this, i would have to save the old value somewhere else
<whitequark> so basically, python processes right now only work if you wait on a wire
<whitequark> conversely, cxxrtl processes only work if their clock is *not* a wire, because cxxsim does an equivalent of this c++ code: top.p_clk.set(true); top.commit(); top.eval();
<whitequark> and in that sequence, commit() makes curr == next, and then eval() doesn't actually trigger any sync logic
<whitequark> you might ask: but if p_clk is a value<>, then prev_p_clk is set during commit, why doesn't that break in the same way?
<whitequark> and the answer is that it works more or less by accident. specifically, because i commit signals driven by the cxxrtl process first, and signals driven by python processes second
<whitequark> to make this even worse, currently every single cxxrtl-owned signal you use is registered as a trigger for the cxxrtl process
<whitequark> and is linear searched multiple times for every simulation instant
gkelly has quit [Quit: Idle for 30+ days]
<whitequark> i don't know how coherent what i just wrote is (i'm pretty sure it's impossible to understand), but it's ok if it isn't. i'm only really trying to convey the amount of ad-hoc hacks related to edge triggered logic here
nelgau has joined #nmigen
<whitequark> and it's not just that they're ad-hoc, it's that they violate basic invariants that both pysim and cxxsim are built around
<whitequark> - order of eval is unimportant
<whitequark> - order of commit is unimportant
<whitequark> i want something that would:
<whitequark> - trigger cxxrtl processes whether the clock is wire<> or value<>, and without doing unsound reads from .next
<whitequark> - trigger python processes in exactly the same way as cxxrtl processes are
<whitequark> - free python code from the need to do O(n) operations on large amounts of signals
<whitequark> - ideally, greatly reduce the cost of eval on the inactive edge of clock
jeanthom has joined #nmigen
nelgau has quit [Remote host closed the connection]
<cesar[m]> Regarding the last point, what if you split the eval function into eval_level_sensitive and eval_edge_sensitive?
<cesar[m]> eval_edge_sensitive would only run if it's triggered by an edge
<cesar[m]> if it's the wrong edge, it would not be run, neither would eval_level_sensitive
<cesar[m]> if eval_edge_sensitive does run, eval_level_sensitive would be run afterwards.
<cesar[m]> eval_level_sensitive would not have any clocks in its sensitivity list.
<whitequark> yes, that's exactly what i'm thinking about here
<whitequark> the devil is in the details, really; i'd like to preserve the beautifully simple `top.step()` interface, yet also enable this
<whitequark> well, it's not really "edge sensitive" and "level sensitive"
<whitequark> both comb and sync processes are edge sensitive, because comb processes are iterated to fixpoint
<whitequark> unlike in verilog, a process that reads its own output will self-trigger
<whitequark> so, perhaps something like eval_comb() and eval_sync(), where eval_sync() would perhaps further delegate its job to eval_posedge_p_clk_negedge_p_rst() or something like that
<whitequark> and then eval() would just be eval_comb();eval_sync()
jeanthom has quit [Ping timeout: 246 seconds]
chipmuenk has joined #nmigen
<_whitenotifier> [nmigen/nmigen] whitequark pushed 2 commits to cxxsim [+0/-0/±2]
<_whitenotifier> [nmigen/nmigen] whitequark 4027103 - sim.cxxsim: simplify handling of Python-owned signals. NFCI.
<_whitenotifier> [nmigen/nmigen] whitequark 0807149 - sim.cxxsim: dump simulation-only signals to VCD, when possible.
<_whitenotifier> [nmigen] whitequark commented on issue #556: cxxsim: simulator-only signals not included in VCD and GTKWave files -
<_whitenotifier> [nmigen] whitequark closed issue #556: cxxsim: simulator-only signals not included in VCD and GTKWave files -
<_whitenotifier> [nmigen] whitequark edited a comment on issue #556: cxxsim: simulator-only signals not included in VCD and GTKWave files -
<_whitenotifier> [nmigen] whitequark edited issue #324: Integrate the CXXSim simulator -
<_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to cxxsim [+0/-0/±1]
<_whitenotifier> [nmigen/nmigen] whitequark 547c296 - sim.cxxsim: dump simulation-only signals to VCD, when possible.
<_whitenotifier> [nmigen/nmigen] whitequark pushed 2 commits to master [+0/-0/±3]
<_whitenotifier> [nmigen/nmigen] whitequark 4e7e0b3 - back.rtlil: give private items an appropriate name. NFCI.
<_whitenotifier> [nmigen/nmigen] whitequark 818c8bc - hdl.ast: normalize case values to two's complement, not signed binary.
<_whitenotifier> [nmigen] whitequark closed issue #559: Negative values support in Switch-Case -
<_whitenotifier> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13]
<_whitenotifier> [nmigen/nmigen] whitequark 2034c40 - Deploying to gh-pages from @ 818c8bc46485ada0f31ad8ec23182ad01a6c7da1 🚀
<_whitenotifier> [nmigen] nturley opened issue #560: Rounding errors in vcd simulator time -
<_whitenotifier> [nmigen] whitequark commented on issue #560: Rounding errors in vcd simulator time -
<_whitenotifier> [nmigen] whitequark edited a comment on issue #560: Rounding errors in vcd simulator time -
nfbraun has joined #nmigen
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
Bertl_zZ is now known as Bertl
emeb has joined #nmigen
<_whitenotifier> [nmigen] nturley commented on issue #560: Rounding errors in vcd simulator time -
<_whitenotifier> [nmigen] nturley closed issue #560: Rounding errors in vcd simulator time -
nelgau has joined #nmigen
<_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1]
<_whitenotifier> [nmigen/nmigen] whitequark 7dde2aa - hdl.ast: formatting. NFC.
<_whitenotifier> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13]
<_whitenotifier> [nmigen/nmigen] whitequark d18e185 - Deploying to gh-pages from @ 7dde2aac7c77e3f196280d612ef4635bbb64d576 🚀
<_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to pysim-display [+0/-0/±6]
<_whitenotifier> [nmigen/nmigen] whitequark fef97ae - [WIP] hdl.ast: add Display statement, a mixture of print() and format().
Bertl is now known as Bertl_oO
<_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to pysim-display [+0/-0/±8]
<_whitenotifier> [nmigen/nmigen] whitequark dc6a805 - [WIP] hdl.ast: add Display statement, a mixture of print() and format().
mmsjRxd5 has quit [Quit: WeeChat 2.9]
<_whitenotifier> [nmigen-boards] nfbraun opened pull request #135: Add ZedBoard. -
FFY00 has quit [Remote host closed the connection]
<_whitenotifier> [nmigen-boards] nfbraun synchronize pull request #135: Add ZedBoard. -
FFY00 has joined #nmigen
FFY00 has quit [Read error: Connection reset by peer]
<_whitenotifier> [nmigen] nfbraun commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code -
FFY00 has joined #nmigen
<_whitenotifier> [nmigen-boards] rroohhh commented on pull request #135: Add ZedBoard. -
jeanthom has joined #nmigen
<cesar[m]> whitequark: I've identified a couple of tests where CxxSim and PySim still disagree. I'll investigate.
<cesar[m]> Otherwise, CxxSim is well reproducing PySim results.
<whitequark> cesar[m]: excellent
<whitequark> looking forward to your testcases
<whitequark> i was initially planning to do randomized testing, but that turned out to be more difficult than i anticipated, and on top of it there is still significant missing functionality
<whitequark> testing it on a large real-world project is the next best thing
<whitequark> cesar[m]: how's the performance so far?
<_whitenotifier> [nmigen] whitequark commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code -
emeb_mac has joined #nmigen
<cesar[m]> Sorry, any potential gain in run time is being completely offset by C++ compilation time.
<whitequark> cesar[m]: are you able to reuse a simulator by repeatedly resetting it?
<cesar[m]> Sure.
<whitequark> still compiles for too long?
<cesar[m]> What I mean, it sure would be faster if we repeatedly reseted the simulator instead of recreating it as we do today.
<whitequark> ah.
<whitequark> yes. if you recreate it then the unfortunate result you are observing is expected
<whitequark> in principle it would not be too hard to add caching, but the knob you would have to use for that is not currently exposed.
<cesar[m]> Also, I guess we could leverage the greater performance by increasing the number of iterations and test vectors.
<whitequark> the main reason I'm asking is that ctypes can have an.. unfortunate effect on performance
<whitequark> and the exact steps I will have to take to tackle that should, I think, mostly reflect real-world use
<cesar[m]> Maybe there could be a way to measure the simulation run time, excluding compile time, in both CxxSim and PySim?
<cesar[m]> It could be done in a Python process, I guess.
<whitequark> that could be done, but I'm worried about Amdahl's law
<whitequark> what you ultimately care about is how long the tests run, not how quickly CXXRTL itself runs
<cesar[m]> Indeed.
<whitequark> CXXRTL is fast enough that it's competitive with single threaded Verilator, which in practice means that even relatively small inefficiencies in Python have a dramatically higher effect on runtime
<whitequark> at one point, I measured a prototype of CXXSim run *slower* than PySim, even though the same design ran in pure C++ at over 1 MCPS
<whitequark> I know there's multiple places in the current version of CXXSim where I have great opportunities for optimizing the interface, but I'd like to actually do that with a benchmark in hand and not blindly
<cesar[m]> Unfortunately, I cannot try CXXSim on the full design at the moment, because it uses Litex peripherals, so it must be simulated in Litex / Verilator, not in nMigen.
<whitequark> ah I see
<_whitenotifier> [nmigen] davidlattimore commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code -
nfbraun has quit [Ping timeout: 256 seconds]
nfbraun has joined #nmigen
<cesar[m]> So, I do the next best thing, which is to simulate (in nMigen) the big integration test.
<lkcl> whitequark: some background there - nmigen-soc is still in its infancy so we had to use something that's well-established
<whitequark> yep, that's perfectly sensible
<lkcl> also, the initialisation of even 64k of "memory" (loading a BIOS that would allow further extensive testing) into pysim is... awful.
<whitequark> pysim memories are about to get a whole lot faster
<lkcl> microwatt's "random instructions" unit tests are around... 128k in size?
<lkcl> ah brilliant. that would be superb
<whitequark> I'm kind of forced to optimize them because otherwise I could not integrate cxxrtl properly
<whitequark> because of some tricky internal interface issues
<lkcl> :)
<whitequark> the reason memories are so painful right now is they are O(n)
nfbraun_ has joined #nmigen
<lkcl> ouch.
<whitequark> which as you have just discovered turns into O(n^2) the moment you load the entire thing
<whitequark> (before someone submits this to the "accidentally quadratic" tumblr blog: I did them this way on purpose, it was the right decision at the time)
<lkcl> i had to write directly to the _init internal data structure, btw, to load in BIOSes in a reasonable time. didn't tell you about it so as not to freak you out :)
<whitequark> wait, it was Memory.init setter that was this slow?
<whitequark> interesting
<whitequark> *that* is not supposed to be O(n)
<whitequark> just the pysim generated code
<lkcl> i bypassed it - long story.
<whitequark> yes, it's reasonable that you did
<whitequark> I'm just surprised that it's so slow
<lkcl> also... *sigh*... OpenPOWER ISA has bigendian / little-endian plus the data was in 32-bit format and needed to be in a 64-bit Memory... *sigh*
<lkcl> if that can be sped u
<whitequark> i'm fairly certain that can be sped up
<lkcl> p, and there's a UART in nmigen that is even remotely "semi-compatible" with 16550 (even for "read"), then we can try - pretty much straight away - to run e.g. the microwatt "helloworld" example
<whitequark> the whole "Memory is secretly an Array" thing is going to be in the past
<lkcl> it's only 1500 bytes
<lkcl> hooray :)
<whitequark> yet another decision I unthinkingly pulled from Migen where I should have really known better
<whitequark> ah well
<lkcl> :)
<whitequark> the whole fragment transformer thing makes compilation probably an order of magnitude slower than it should be, too
<whitequark> not to mention more buggy
<lkcl> cesar[m]: thank you for doing the extensive report on the libresoc unit tests. you saw i went through them?
<lkcl> fragment transformer? before handing to cxxsim?
<_whitenotifier> [nmigen] nfbraun commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code -
<lkcl> you mean the node-walker? (i took a look a couple days ago at
<whitequark> DomainRenamer, DomainLowerer, etc work by term rewriting essentially
<whitequark> it's something i kept like it was in Migen because it looked reasonable at first glance
<whitequark> but if you think about it, you'll notice that virtually no other compiler uses that approach
<lkcl> hmm
<whitequark> well, it's because it is not only massively wasteful, but also hard to reason about
<lkcl> luckily they do actually "work", putting them off from being high-priority
<whitequark> to be perhaps more fair to Migen, Migen's version was a lot less wasteful, but on the other hand it was even harder to use correctly because of all the mutation going on
<whitequark> really none of this stuff should exist in first place
<lkcl> i know litex is a fantastic accumulation of incredibly valuable expertise and recipes
<lkcl> but... gaah, it's just impossible to consider committing any resources to it because migen gives zero warnings.
<lkcl> recently we got as far as *P&R* in coriolis2 before discovering an error! that's 4 hours compilation time!
<whitequark> ouch!
nfbraun_ has quit [Quit: leaving]
<lkcl> it's to do with netlists that should have been amalgamated. the assignment is detected by coriolis2 (an input to an input) and converted to an *output*
<whitequark> sounds kinda cursed
<lkcl> you're supposed to "fix" this in verilog by having a register that's assigned to both inputs
<_whitenotifier> [nmigen] whitequark commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code -
<_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1]
<_whitenotifier> [nmigen/nmigen] whitequark b466b72 - Revert "vendor.xilinx_7series: byte swap generated bitstream"
<lkcl> but if you don't follow that pattern *in migen* you get zero warnings.
<lkcl> nmigen on the other hand gets this right
<_whitenotifier> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13]
<_whitenotifier> [nmigen/nmigen] whitequark e2044e2 - Deploying to gh-pages from @ b466b724fe9f62140062afc9ecde9a920a261487 🚀
chipmuenk has quit [Quit: chipmuenk]
Lord_Nightmare has quit [Remote host closed the connection]
Lord_Nightmare has joined #nmigen
emeb has left #nmigen [#nmigen]