#nmigen on 2020-12-12 — irc logs at freenode.irclog.whitequark.org

2020-12-07 01:53 ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen · IRC meetings each Monday at 1800 UTC · next meeting TBD

00:19 jeanthom has quit [Ping timeout: 265 seconds]

01:21 Degi has quit [Ping timeout: 260 seconds]

01:22 Degi has joined #nmigen

01:23 <whitequark> mh, guess not, let me finish describing this stuff anyway

02:29 <awygle> oh sorry

02:29 <awygle> i'm back

02:30 <awygle> drifted away

02:39 <whitequark> awygle: alright

02:40 <whitequark> so, i explained the state of cxxrtl, where the inelegance of triggers matters relatively little

02:41 <awygle> mhm

02:41 <whitequark> ah, one thing that cxxrtl would greatly benefit from is if you could clock domains directly rather than clock signals. with the current edge detector architecture that can't work as well.

02:41 <awygle> ah

02:42 <whitequark> i haven't worked on that goal specifically because almost everyone who tried cxxrtl said that it's more than fast enough

02:42 <whitequark> anyway. let's look at cxxsim now.

02:43 <whitequark> cxxsim performs cosimulation of python processes with the cxxrtl process (singular), which adds a lot of moving parts

02:45 <whitequark> for example, cxxrtl inputs (of the toplevel module) are value<>, which is fine for cxxrtl, which never writes to those. however, python processes most certainly can both read and write. what do? making those toplevel inputs wire<> would double the amount of delta cycles in best case.

02:49 <whitequark> well, the solution i came up is that cxxsim (the python module) creates a virtual wire<> whose `curr` part is c++ owned value<> that's a part of the netlist, and `next` part is python owned pseudo-value<> that only exists to make python processes deterministic.

02:49 <whitequark> which is lowkey cursed but it works as multiple processes only ever arise python-side

02:50 <whitequark> *as long as

03:11 <whitequark> but that's not the problem. the problem is triggering

03:12 <whitequark> see, if a python process waits on some signal (currently, that's pretty much always a clock), and the cxxrtl process has registers that trigger on that same signal, they *must* be evaluated concurrently, or you'll get a race.

03:42 electronic_eel has quit [Ping timeout: 246 seconds]

03:42 electronic_eel has joined #nmigen

04:15 <whitequark> okay, i explained enough context to explain the actual issue

04:40 sakirious has joined #nmigen

04:45 PyroPeter_ has joined #nmigen

04:48 PyroPeter has quit [Ping timeout: 256 seconds]

04:48 PyroPeter_ is now known as PyroPeter

06:09 emeb_mac has quit [Quit: Leaving.]

06:37 Bertl_oO is now known as Bertl_zZ

08:11 <whitequark> so, the issue arises when python processes wait on signals. with the cxxrtl process it's easy: it simply polls the async input on every call to eval(). this works because the simulator can advance the simulated time once the cxxrtl process converges.

08:13 <whitequark> python processes can't busy wait, so instead they register a trigger and go to sleep.

08:37 <whitequark> right now, the way i process triggers, is that during commit i check every one of them in sequence and compare curr/next

08:37 <whitequark> (similar to what cxxrtl does when the clock is a wire<>)

08:38 <whitequark> this works for inputs and registers, but it doesn't work for comb outputs, aliases, or anything else

08:39 <whitequark> since comb outputs in cxxrtl netlists don't have curr/next, to fix this, i would have to save the old value somewhere else

08:49 <whitequark> so basically, python processes right now only work if you wait on a wire

08:51 <whitequark> conversely, cxxrtl processes only work if their clock is *not* a wire, because cxxsim does an equivalent of this c++ code: top.p_clk.set(true); top.commit(); top.eval();

08:52 <whitequark> and in that sequence, commit() makes curr == next, and then eval() doesn't actually trigger any sync logic

08:54 <whitequark> you might ask: but if p_clk is a value<>, then prev_p_clk is set during commit, why doesn't that break in the same way?

08:55 <whitequark> and the answer is that it works more or less by accident. specifically, because i commit signals driven by the cxxrtl process first, and signals driven by python processes second

08:58 <whitequark> to make this even worse, currently every single cxxrtl-owned signal you use is registered as a trigger for the cxxrtl process

08:59 <whitequark> and is linear searched multiple times for every simulation instant

09:00 gkelly has quit [Quit: Idle for 30+ days]

09:00 <whitequark> i don't know how coherent what i just wrote is (i'm pretty sure it's impossible to understand), but it's ok if it isn't. i'm only really trying to convey the amount of ad-hoc hacks related to edge triggered logic here

09:03 nelgau has joined #nmigen

09:05 <whitequark> and it's not just that they're ad-hoc, it's that they violate basic invariants that both pysim and cxxsim are built around

09:05 <whitequark> - order of eval is unimportant

09:05 <whitequark> - order of commit is unimportant

09:08 <whitequark> i want something that would:

09:13 <whitequark> - trigger cxxrtl processes whether the clock is wire<> or value<>, and without doing unsound reads from .next

09:14 <whitequark> - trigger python processes in exactly the same way as cxxrtl processes are

09:17 <whitequark> - free python code from the need to do O(n) operations on large amounts of signals

09:30 <whitequark> - ideally, greatly reduce the cost of eval on the inactive edge of clock

09:32 jeanthom has joined #nmigen

10:08 nelgau has quit [Remote host closed the connection]

10:13 <cesar[m]> Regarding the last point, what if you split the eval function into eval_level_sensitive and eval_edge_sensitive?

10:14 <cesar[m]> eval_edge_sensitive would only run if it's triggered by an edge

10:16 <cesar[m]> if it's the wrong edge, it would not be run, neither would eval_level_sensitive

10:17 <cesar[m]> if eval_edge_sensitive does run, eval_level_sensitive would be run afterwards.

10:21 <cesar[m]> eval_level_sensitive would not have any clocks in its sensitivity list.

10:21 <whitequark> yes, that's exactly what i'm thinking about here

10:22 <whitequark> the devil is in the details, really; i'd like to preserve the beautifully simple `top.step()` interface, yet also enable this

10:28 <whitequark> well, it's not really "edge sensitive" and "level sensitive"

10:29 <whitequark> both comb and sync processes are edge sensitive, because comb processes are iterated to fixpoint

10:29 <whitequark> unlike in verilog, a process that reads its own output will self-trigger

10:30 <whitequark> so, perhaps something like eval_comb() and eval_sync(), where eval_sync() would perhaps further delegate its job to eval_posedge_p_clk_negedge_p_rst() or something like that

10:30 <whitequark> and then eval() would just be eval_comb();eval_sync()

10:56 jeanthom has quit [Ping timeout: 246 seconds]

11:53 chipmuenk has joined #nmigen

12:08 <_whitenotifier> [nmigen/nmigen] whitequark pushed 2 commits to cxxsim [+0/-0/±2] https://git.io/JIiLi

12:08 <_whitenotifier> [nmigen/nmigen] whitequark 4027103 - sim.cxxsim: simplify handling of Python-owned signals. NFCI.

12:08 <_whitenotifier> [nmigen/nmigen] whitequark 0807149 - sim.cxxsim: dump simulation-only signals to VCD, when possible.

12:08 <_whitenotifier> [nmigen] whitequark commented on issue #556: cxxsim: simulator-only signals not included in VCD and GTKWave files - https://git.io/JIiLy

12:08 <_whitenotifier> [nmigen] whitequark closed issue #556: cxxsim: simulator-only signals not included in VCD and GTKWave files - https://git.io/JIRpP

12:08 <_whitenotifier> [nmigen] whitequark edited a comment on issue #556: cxxsim: simulator-only signals not included in VCD and GTKWave files - https://git.io/JIiLy

12:08 <_whitenotifier> [nmigen] whitequark edited issue #324: Integrate the CXXSim simulator - https://git.io/Jv8VZ

12:15 <_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to cxxsim [+0/-0/±1] https://git.io/JIit0

12:15 <_whitenotifier> [nmigen/nmigen] whitequark 547c296 - sim.cxxsim: dump simulation-only signals to VCD, when possible.

12:42 <_whitenotifier> [nmigen/nmigen] whitequark pushed 2 commits to master [+0/-0/±3] https://git.io/JIiY4

12:42 <_whitenotifier> [nmigen/nmigen] whitequark 4e7e0b3 - back.rtlil: give private items an appropriate name. NFCI.

12:42 <_whitenotifier> [nmigen/nmigen] whitequark 818c8bc - hdl.ast: normalize case values to two's complement, not signed binary.

12:42 <_whitenotifier> [nmigen] whitequark closed issue #559: Negative values support in Switch-Case - https://git.io/JIagQ

12:43 <_whitenotifier> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13] https://git.io/JIiYg

12:43 <_whitenotifier> [nmigen/nmigen] whitequark 2034c40 - Deploying to gh-pages from @ 818c8bc46485ada0f31ad8ec23182ad01a6c7da1 🚀

13:01 <_whitenotifier> [nmigen] nturley opened issue #560: Rounding errors in vcd simulator time - https://git.io/JIi3p

13:03 <_whitenotifier> [nmigen] whitequark commented on issue #560: Rounding errors in vcd simulator time - https://git.io/JIis2

13:03 <_whitenotifier> [nmigen] whitequark edited a comment on issue #560: Rounding errors in vcd simulator time - https://git.io/JIis2

13:56 nfbraun has joined #nmigen

14:02 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

14:43 Bertl_zZ is now known as Bertl

14:46 emeb has joined #nmigen

14:50 <_whitenotifier> [nmigen] nturley commented on issue #560: Rounding errors in vcd simulator time - https://git.io/JIiVz

14:50 <_whitenotifier> [nmigen] nturley closed issue #560: Rounding errors in vcd simulator time - https://git.io/JIi3p

15:27 nelgau has joined #nmigen

15:42 <_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/JIiDT

15:42 <_whitenotifier> [nmigen/nmigen] whitequark 7dde2aa - hdl.ast: formatting. NFC.

15:43 <_whitenotifier> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13] https://git.io/JIiDC

15:43 <_whitenotifier> [nmigen/nmigen] whitequark d18e185 - Deploying to gh-pages from @ 7dde2aac7c77e3f196280d612ef4635bbb64d576 🚀

15:48 <_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to pysim-display [+0/-0/±6] https://git.io/JIiyJ

15:48 <_whitenotifier> [nmigen/nmigen] whitequark fef97ae - [WIP] hdl.ast: add Display statement, a mixture of print() and format().

15:53 Bertl is now known as Bertl_oO

15:54 <_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to pysim-display [+0/-0/±8] https://git.io/JIiSO

15:54 <_whitenotifier> [nmigen/nmigen] whitequark dc6a805 - [WIP] hdl.ast: add Display statement, a mixture of print() and format().

16:26 mmsjRxd5 has quit [Quit: WeeChat 2.9]

17:18 <_whitenotifier> [nmigen-boards] nfbraun opened pull request #135: Add ZedBoard. - https://git.io/JIixP

17:25 FFY00 has quit [Remote host closed the connection]

17:25 <_whitenotifier> [nmigen-boards] nfbraun synchronize pull request #135: Add ZedBoard. - https://git.io/JIixP

17:27 FFY00 has joined #nmigen

17:32 FFY00 has quit [Read error: Connection reset by peer]

17:33 <_whitenotifier> [nmigen] nfbraun commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code - https://git.io/JIipq

17:33 FFY00 has joined #nmigen

17:49 <_whitenotifier> [nmigen-boards] rroohhh commented on pull request #135: Add ZedBoard. - https://git.io/JIihl

18:00 jeanthom has joined #nmigen

18:19 <cesar[m]> whitequark: I've identified a couple of tests where CxxSim and PySim still disagree. I'll investigate.

18:20 <cesar[m]> Otherwise, CxxSim is well reproducing PySim results.

19:04 <whitequark> cesar[m]: excellent

19:04 <whitequark> looking forward to your testcases

19:05 <whitequark> i was initially planning to do randomized testing, but that turned out to be more difficult than i anticipated, and on top of it there is still significant missing functionality

19:05 <whitequark> testing it on a large real-world project is the next best thing

19:21 <whitequark> cesar[m]: how's the performance so far?

20:01 <_whitenotifier> [nmigen] whitequark commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code - https://git.io/JIPkI

20:39 emeb_mac has joined #nmigen

21:02 <cesar[m]> Sorry, any potential gain in run time is being completely offset by C++ compilation time.

21:03 <whitequark> cesar[m]: are you able to reuse a simulator by repeatedly resetting it?

21:03 <cesar[m]> Sure.

21:03 <whitequark> still compiles for too long?

21:04 <cesar[m]> What I mean, it sure would be faster if we repeatedly reseted the simulator instead of recreating it as we do today.

21:05 <whitequark> ah.

21:05 <whitequark> yes. if you recreate it then the unfortunate result you are observing is expected

21:06 <whitequark> in principle it would not be too hard to add caching, but the knob you would have to use for that is not currently exposed.

21:06 <cesar[m]> Also, I guess we could leverage the greater performance by increasing the number of iterations and test vectors.

21:06 <whitequark> the main reason I'm asking is that ctypes can have an.. unfortunate effect on performance

21:07 <whitequark> and the exact steps I will have to take to tackle that should, I think, mostly reflect real-world use

21:08 <cesar[m]> Maybe there could be a way to measure the simulation run time, excluding compile time, in both CxxSim and PySim?

21:09 <cesar[m]> It could be done in a Python process, I guess.

21:10 <whitequark> that could be done, but I'm worried about Amdahl's law

21:10 <whitequark> what you ultimately care about is how long the tests run, not how quickly CXXRTL itself runs

21:10 <cesar[m]> Indeed.

21:11 <whitequark> CXXRTL is fast enough that it's competitive with single threaded Verilator, which in practice means that even relatively small inefficiencies in Python have a dramatically higher effect on runtime

21:11 <whitequark> at one point, I measured a prototype of CXXSim run *slower* than PySim, even though the same design ran in pure C++ at over 1 MCPS

21:13 <whitequark> I know there's multiple places in the current version of CXXSim where I have great opportunities for optimizing the interface, but I'd like to actually do that with a benchmark in hand and not blindly

21:25 <cesar[m]> Unfortunately, I cannot try CXXSim on the full design at the moment, because it uses Litex peripherals, so it must be simulated in Litex / Verilator, not in nMigen.

21:25 <whitequark> ah I see

21:31 <_whitenotifier> [nmigen] davidlattimore commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code - https://git.io/JIPY9

21:49 nfbraun has quit [Ping timeout: 256 seconds]

21:51 nfbraun has joined #nmigen

21:52 <cesar[m]> So, I do the next best thing, which is to simulate (in nMigen) the big integration test.

21:53 <lkcl> whitequark: some background there - nmigen-soc is still in its infancy so we had to use something that's well-established

21:53 <whitequark> yep, that's perfectly sensible

21:53 <lkcl> also, the initialisation of even 64k of "memory" (loading a BIOS that would allow further extensive testing) into pysim is... awful.

21:53 <whitequark> pysim memories are about to get a whole lot faster

21:54 <lkcl> microwatt's "random instructions" unit tests are around... 128k in size?

21:54 <lkcl> ah brilliant. that would be superb

21:54 <whitequark> I'm kind of forced to optimize them because otherwise I could not integrate cxxrtl properly

21:54 <whitequark> because of some tricky internal interface issues

21:54 <lkcl> :)

21:54 <whitequark> the reason memories are so painful right now is they are O(n)

21:55 nfbraun_ has joined #nmigen

21:55 <lkcl> ouch.

21:55 <whitequark> which as you have just discovered turns into O(n^2) the moment you load the entire thing

21:55 <whitequark> (before someone submits this to the "accidentally quadratic" tumblr blog: I did them this way on purpose, it was the right decision at the time)

21:56 <lkcl> i had to write directly to the _init internal data structure, btw, to load in BIOSes in a reasonable time. didn't tell you about it so as not to freak you out :)

21:56 <whitequark> wait, it was Memory.init setter that was this slow?

21:56 <whitequark> interesting

21:57 <whitequark> *that* is not supposed to be O(n)

21:57 <whitequark> just the pysim generated code

21:57 <lkcl> i bypassed it - long story.

21:57 <whitequark> yes, it's reasonable that you did

21:57 <whitequark> I'm just surprised that it's so slow

21:58 <lkcl> also... *sigh*... OpenPOWER ISA has bigendian / little-endian plus the data was in 32-bit format and needed to be in a 64-bit Memory... *sigh*

21:58 <lkcl> if that can be sped u

21:58 <whitequark> i'm fairly certain that can be sped up

21:59 <lkcl> p, and there's a UART in nmigen that is even remotely "semi-compatible" with 16550 (even for "read"), then we can try - pretty much straight away - to run e.g. the microwatt "helloworld" example

21:59 <whitequark> the whole "Memory is secretly an Array" thing is going to be in the past

21:59 <lkcl> it's only 1500 bytes

21:59 <lkcl> hooray :)

21:59 <whitequark> yet another decision I unthinkingly pulled from Migen where I should have really known better

21:59 <whitequark> ah well

21:59 <lkcl> :)

22:00 <whitequark> the whole fragment transformer thing makes compilation probably an order of magnitude slower than it should be, too

22:00 <whitequark> not to mention more buggy

22:00 <lkcl> cesar[m]: thank you for doing the extensive report on the libresoc unit tests. you saw i went through them?

22:00 <lkcl> fragment transformer? before handing to cxxsim?

22:01 <_whitenotifier> [nmigen] nfbraun commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code - https://git.io/JIP30

22:01 <lkcl> you mean the node-walker? (i took a look a couple days ago at xfrm.py)

22:01 <whitequark> DomainRenamer, DomainLowerer, etc work by term rewriting essentially

22:02 <whitequark> it's something i kept like it was in Migen because it looked reasonable at first glance

22:02 <whitequark> but if you think about it, you'll notice that virtually no other compiler uses that approach

22:02 <lkcl> hmm

22:02 <whitequark> well, it's because it is not only massively wasteful, but also hard to reason about

22:04 <lkcl> luckily they do actually "work", putting them off from being high-priority

22:04 <whitequark> to be perhaps more fair to Migen, Migen's version was a lot less wasteful, but on the other hand it was even harder to use correctly because of all the mutation going on

22:04 <whitequark> really none of this stuff should exist in first place

22:05 <lkcl> i know litex is a fantastic accumulation of incredibly valuable expertise and recipes

22:06 <lkcl> but... gaah, it's just impossible to consider committing any resources to it because migen gives zero warnings.

22:07 <lkcl> recently we got as far as *P&R* in coriolis2 before discovering an error! that's 4 hours compilation time!

22:07 <whitequark> ouch!

22:07 nfbraun_ has quit [Quit: leaving]

22:08 <lkcl> it's to do with netlists that should have been amalgamated. the assignment is detected by coriolis2 (an input to an input) and converted to an *output*

22:08 <whitequark> sounds kinda cursed

22:09 <lkcl> you're supposed to "fix" this in verilog by having a register that's assigned to both inputs

22:09 <_whitenotifier> [nmigen] whitequark commented on issue #558: write_cfgmem is writing to arty board when not specified to in the code - https://git.io/JIP31

22:09 <_whitenotifier> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/JIP3M

22:09 <_whitenotifier> [nmigen/nmigen] whitequark b466b72 - Revert "vendor.xilinx_7series: byte swap generated bitstream"

22:09 <lkcl> but if you don't follow that pattern *in migen* you get zero warnings.

22:09 <lkcl> nmigen on the other hand gets this right

22:10 <_whitenotifier> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13] https://git.io/JIP3D

22:10 <_whitenotifier> [nmigen/nmigen] whitequark e2044e2 - Deploying to gh-pages from @ b466b724fe9f62140062afc9ecde9a920a261487 🚀

22:25 chipmuenk has quit [Quit: chipmuenk]

22:52 Lord_Nightmare has quit [Remote host closed the connection]

22:52 Lord_Nightmare has joined #nmigen

23:50 emeb has left #nmigen [#nmigen]