ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen · IRC meetings each Monday at 1800 UTC · next meeting September 7th
peeps[zen] has joined #nmigen
cr1901_modern has quit [Ping timeout: 240 seconds]
peeps has quit [Ping timeout: 240 seconds]
cr1901_modern has joined #nmigen
electronic_eel has quit [Ping timeout: 240 seconds]
electronic_eel has joined #nmigen
<_whitenotifier-3> [YoWASP/yosys] whitequark pushed 1 commit to develop [+0/-0/±1] https://git.io/JUYOG
<_whitenotifier-3> [YoWASP/yosys] whitequark ae00ea8 - Update dependencies.
awe00 has quit [Ping timeout: 256 seconds]
lkcl_ has quit [Ping timeout: 240 seconds]
lkcl has joined #nmigen
C-Elegans has joined #nmigen
Yehowshua has joined #nmigen
<C-Elegans> @whitequark I ran into an issue with cxxsim earlier today, would you mind taking a look at it?
<C-Elegans>
<C-Elegans> Basically, during simulation with cxxsim, yielding values from a 64 bit signal gives me a value with the 32 bit halves reversed
<whitequark> C-Elegans: oops.
<whitequark> i've never tested that particular code path
<C-Elegans> thanks! Do you need me to file a github issue or...?
<whitequark> let me just fix it
<C-Elegans> ok. Thanks!
Yehowshua has quit [Ping timeout: 245 seconds]
jaseg has quit [Ping timeout: 260 seconds]
jaseg has joined #nmigen
<_whitenotifier-3> [nmigen/nmigen] whitequark pushed 1 commit to cxxsim [+3/-0/±4] https://git.io/JUYZx
<_whitenotifier-3> [nmigen/nmigen] whitequark c08c30c - [WIP] sim: add cxxsim engine.
<_whitenotifier-3> [nmigen/nmigen] whitequark pushed 1 commit to cxxsim [+3/-0/±4] https://git.io/JUYZj
<_whitenotifier-3> [nmigen/nmigen] whitequark 9cbdff0 - [WIP] sim: add cxxsim engine.
<whitequark> C-Elegans: should be fixed
<whitequark> awygle: poke
<awygle> whitequark: peek
<whitequark> awygle: up to discuss cxxsim stuff?
<awygle> yup
<whitequark> great! hm. how much do you know about the way pysim works?
<awygle> not a lot lol
<whitequark> alright, let me explain all of it, then
<whitequark> hm. how much do you know about the way verilog simulation semantics works?
<awygle> i read your post on why VHDL's are better
<whitequark> er, my post?
<awygle> oh maybe you just linked it actually
<whitequark> yeah
<whitequark> so to recap: in verilog, you have processes executing in parallel. all these processes work on the same global state, so the order of execution of processes, in general, significantly affects the result. to work around that, nonblocking assignments are used to split the eval and commit phases and bring back determinism
<awygle> mhm
<C-Elegans> @whitequark sorry, went to do something else. Can confirm it works now!
<C-Elegans> Thanks!
<whitequark> in pysim, you have processes executing in parallel. all these processes work on *two* copies of the global state. they read the "curr" instance and write to the "next" instance.
<whitequark> since a process never* reads from "next", it doesn't matter which order you execute them in at all
<awygle> ok
<whitequark> * it reads from "next" during read-modify-writes, which is why it's important that RMWs must completely overwrite the bits they touch and don't modify the others at all
<whitequark> in cxxrtl, you have somewhat of a hybrid approach. at -O0 it works exactly the same as pysim, and is also very slow. this is because each time you change the input to a combinatorial function, you have to finish evaluating the current process, let the change trigger reevaluation of the comb function, then let the change of the output trigger more reevaluations
<whitequark> there's a pair of subtle tradeoffs in the design of this sort of simulator
<whitequark> - do you detect changes in inputs and reschedule a process reading those inputs when that happens, or do you just repeatedly run the process until the outputs stop changing?
<whitequark> - do you use many fine-grained processes (scheduling overhead, sequencing issues) or few coarse-grained processes (evaluation overhead: most of what you evaluate will be values that haven't changed)?
<whitequark> pyrtl and cxxrtl take exactly opposite approaches here
<whitequark> pyrtl generates many processes per design (one per clock domain per fragment, to be specific) and schedules them when inputs change, collecting the set of updated signals during comit
PyroPeter_ has joined #nmigen
<whitequark> cxxrtl generates one process (it's not even explicitly called that, it's just p_top.step()) which always reevaluates everything
<awygle> ok
<awygle> why the difference?
PyroPeter has quit [Ping timeout: 240 seconds]
PyroPeter_ is now known as PyroPeter
<whitequark> pyrtl tries to have low startup latency. this means it does a fairly straightforward translation of the nmigen IR. it can barely afford to translate it to python code
<whitequark> I experimented with different approaches. smaller processes make it slower, larger processes make it slower, not tracking sets of updated signals make it slower
<whitequark> the current design seems to be in a local optimum
<whitequark> cxxrtl, on the other hand, doesn't care about startup latency at all as long as it's not literally too long to wait for completion
<whitequark> which means that it can afford to flatten and statically schedule the entire netlist
<whitequark> which means that it doesn't need to track updated signal sets: most of your design is supposed to not have feedback arcs, and for the few feedback arcs it does encounter, it's basically fine to iterate a few times
<whitequark> still, since having feedback arcs slows down your design by at least 2x, it goes to great lengths to eliminate those. for example, blackboxes exist in large part to let you do comb feedback in behavioral code and still have single-pass statically scheduled evaluation
<whitequark> anyway, so far so good
<whitequark> now here's where things get really screwy.
<whitequark> clocks.
<whitequark> so far we had:
<whitequark> - synchronous processes, which are inert unless scheduled by a clock. so, mostly easy.
<whitequark> - combinatorial processes, which are scheduled by every change in inputs, and cannot (by decree) have feedback loops in them, in nmigen at least. so, easy, they're just pure.
<whitequark> the problem with clocks (and async resets, which are the same thing for the purposes of this discussion) is that they're the only true async part of a simulator for synchronous logic
<whitequark> so suddenly you have to start caring about things like "which delta cycle do I evaluate this clock-related thing?"
<awygle> mhm
<whitequark> this causes both potential and immediate issues
<whitequark> the potential issue is that each clk1.eq(clk2) statement, in pysim (and vhdl), introduces a delta cycle. which means that logic triggered by clk2 will get evaluated infinitesimally after logic triggered by clk1
<whitequark> in cxxrtl, what happens is... well, it differs based on -O level. which is kind of evil.
<whitequark> the immediate issue is that right now, pysim and cxxsim can't agree on exactly when (and how) clock ticks happen, with cxxsim being completely wrong in practice
Degi has quit [Ping timeout: 258 seconds]
<whitequark> let's back away a bit. how *exactly* do pysim and cxxsim trigger synchronous logic?
Degi has joined #nmigen
<whitequark> pysim does something like this:
<whitequark> def eval(): for process in processes: if process.runnable: process.run()
<whitequark> def commit() for signal in signals: if signal.commit(): for waiter in signal.waiters: waiter.runnable = True
<whitequark> so in pysim, if you're driving a clock via add_clock(), then in 1st delta cycle just the clock changes, in 2nd delta cycle all sync logic triggers, in 3rd and further delta cycles comb logic settles on a final value
<awygle> mk
<whitequark> cxxsim does something completely different
<whitequark> value<1> p_usb__clk, prev_p_usb__clk;
<whitequark> bool posedge_p_usb__clk() const { return !prev_p_usb__clk.slice<0>().val() && p_usb__clk.slice<0>().val(); }
<whitequark> then in eval() { ... if(posedge_p_usb__clk()) { /* run synchronous logic in this domain */ } }
<whitequark> why does it compare prev and curr values in eval, instead of comparing curr and next values in commit? well, it used to do the latter.
<whitequark> that is, until I tried to make it run on par with Verilator
<whitequark> there are two differences here
<whitequark> first, p_usb__clk is a value<1>, not wire<1>. why? well, if inputs were wire<X> rather than value<X>, then you'd have a spurious delta cycle each time you modify the inputs where nothing changes except for inputs assuming their next value as their curr value
<whitequark> second, the posedge condition on p_usb__clk is checked in eval, not commit. it's basically caused by the same problem: if you check it in commit, then your main loop has to look something like: <modify inputs>; commit(); eval(); commit();, even if your netlist can be statically scheduled and immediately converges
C-Elegans has quit [Ping timeout: 244 seconds]
<awygle> for clarity, how much of this is in cxxrtl and how much in cxxsim?
<whitequark> so far all of described lives in cxxrtl
<whitequark> actually, let me explain what cxxsim does exactly, since it's relevant
<whitequark> there's a generic simulator core that defines a few interafes in sim._base
<whitequark> among them is BaseSignalState. that's a piece of state the simulator keeps per-signal, and it must provide .curr, .next, and .set()
<whitequark> for pysim, .curr and .next are just bigints, and .set() pokes some logic that keeps track of updated signal sets
<whitequark> for cxxsim, this entire structure is a wrapper for cxxrtl_object. .curr and .next operate on the respective fields (through some painfully expensive ctypes calls...), and .set() simply updates .next.
<whitequark> so, suppose you have a clock, and you .set() its state from 0 to 1. what happens?
<whitequark> in pysim: all of the other processes finish eval()ing, then commit() wakes up every waiter, then on the next delta cycle they eval() the synchronous logic they have
<whitequark> in cxxsim things are way more complicated! first, python code doesn't even keep python-side objects for every signal in the simulation, because that'd take forever to just pull out, nor can it generally trigger anything on commit() because... well, it doesn't control commit(), cxxrtl does
<whitequark> second, clocks are inputs, so they're all value<1>, not wire<1>, so their curr is the same as their next.
<whitequark> third, in pysim, the simulator knows about every single signal, and they're all the same
_whitenotifier-3 has quit [Ping timeout: 260 seconds]
<whitequark> in cxxsim, not all signals are the same: you have wire<>s in the compiled code, which are commit()ted by the compiled code, you have inputs in the compiled code, which are value<>s and can't be commit()ted, and you also have signals that exist purely between Python processes
<whitequark> ... but those Python-only values still have to be available on the C++ side because the VCD writer lives there
<whitequark> so they're still backed by an artificial cxxrtl_object
<whitequark> we're almost at the point where I can explain the actual problem I'm trying to solve!
<whitequark> suppose there are no Python-only signals. then (those are at the moment not implemented) the cxxsim simulation loop looks like this:
<whitequark> def eval(): for process in processes: if process.runnable: process.run(); cxxrtl_top.eval()
<whitequark> def commit(): for waiter, signal, trigger in waiters: if prev[signal] != signal.curr and signal.curr == trigger: waiter.runnable = True; cxxrtl_top.commit()
<whitequark> let's say there's some synchronous logic in the generated code, plus a testbench that does `yield Tick()`
<whitequark> 1st delta cycle: the process created by add_clock() updates the clock. immediately after, cxxrtl_top.eval() evaluates the synchronous logic. then, commit() wakes up the testbench
<whitequark> 2nd delta cycle: testbench runs, reads a signal, discovers a value after the posedge instead of before the posedge, everything breaks.
<whitequark> as a bonus, it breaks in a similar albeit different way if you use -O3 instead of -Og and the clock input becomes a wire<1> instead of value<1>!
<awygle> oof
<whitequark> so far my effort to fix this has been limited to fiddling with the order of evaluation and triggering
<whitequark> but... as the description above should probably communicate, that can't fundamentally fix all issues. it happens to make #455 work but it can't make every example from #439 work simultaneously
<awygle> yep
<whitequark> my first semi-viable approach was to move the trigger check code out of commit() and into eval(); basically, by checking all triggers once eval() is done iterating through the processes, then, if anything changed, doing that again
<whitequark> which is (a) slow--which I tried to address by moving it to C++, and (b) as my description should hopefully make clear, a complete duplicate of eval/commit logic, nested inside eval, and also worse
Yehowshua has joined #nmigen
<whitequark> so overall i'm trying to achieve two things
<whitequark> i want cxxsim to work. i also want cxxsim to have a relatively small (ideally, zero) performance penalty compared to using cxxrtl manually.
<awygle> mhm
<whitequark> the former requires a redesign of cxxsim for reasons i just explained. the latter almost certainly requires a redesign of cxxrtl because right now cxxrtl is built around the assumption that, essentially, it is cosimulated with exactly one bench process (your loop in main()), and this makes single-pass evaluation actually possible
<awygle> and, pretending for the moment that you wouldn't have to rewrite cxxrtl to make this happen, the queued approach pysim uses is unacceptable performance-wise?
<whitequark> ... i mean it's not really possible to say without having a particular implementation in mind
<whitequark> but
<whitequark> cxxsim simulations, by virtue of being made largely from a big chunk of cxxrtl logic + a single testbench, inherently have few waiters (usually exactly one), so you really want to pay the cost of O(waiters) rather than O(signals)
Yehowshua has quit [Remote host closed the connection]
<whitequark> it's worse than just performance though
<whitequark> if we keep inputs as value<1> and not wire<1> then the simulation just becomes unsound in general
<whitequark> like, if you do clk1.eq(clk2) and translate it to C++ code you get different results than if you translate it to a Python process
<whitequark> what i think is that cxxsim should, ideally, take the existing pysim simulation loop (which is totally fine on its own), and push it down into C++ as much as possible
<awygle> sounds reasonable
<whitequark> in fact that was the insight i got, uh, two days ago
<whitequark> when i originally suggested we discuss C++ stuff i had a completely different, far more trivial set of questions :)
<whitequark> well, the problem is: how to actually do that?
Yehowshua has joined #nmigen
<whitequark> even worse: how to actually do that without completely breaking the current cxxsim interface, which people already depend on?
<whitequark> (sure, i could add a flag, but now it's even worse: there are two interfaces which do basically the same thing but in weirdly different ways)
<whitequark> oh and here's a bonus question: can we do that while making cxxrtl behave deterministically when you assign or gate clocks?
<whitequark> *without completely breaking the current cxxrtl interface, sorry
<awygle> ah yeah ok
<awygle> that's what i was thinking
<awygle> it _sounds_ to me, from this conversation and pretty much only this conversation, that cxxrtl is sort of... making assumptions about execution models
<awygle> and that ideally it would not do that
<whitequark> correct
<awygle> but that un-doing that would basically be a cxxrtl rewrite
<whitequark> well, it can't not make assumptions about execution models at all without getting a lot slower
<whitequark> not quite
<whitequark> the vast majority of cxxrtl deals with arbitrary precision arithmetic
<whitequark> the only thing that really makes any assumptions about execution models is eval and commit
<awygle> mm, k
<awygle> > it can't not make assumptions about execution models at all without getting a lot slower
<awygle> why is that, and how much is "a lot"?
<whitequark> let's see
<whitequark> so if it always emits inputs as wire<>s, you need, at least, one more commit() at the front (after your testbench changes the values)
<whitequark> but since that commit() triggers another eval(), it can no longer assume that a single pass is all it takes, so after that it does another commit() and eval()
<whitequark> which means you have a perf decrease of over 2x
<whitequark> basically, you can compare -O3 with -Og
<whitequark> lemme get some numbers in fact
Yehowshua has quit [Ping timeout: 260 seconds]
<whitequark> ... okay, so -O3 vs -Og is a slowdown of 20x
<awygle> wow that's a lot
<whitequark> that's because -O3 isn't actually equivalent to what I mentioned earlier
<whitequark> give me a sec
<awygle> it's getting towards bedtime here, so let me just kinda say where i'm at at you, and who knows if there's anything valuable in it, and then i can drop out when i need to sleep
<awygle> it feels like cxxrtl is trying to do too much at once, and there might be a cut line somewhere where you can break out the "turn verilog into c++" from the "execute a c++ simulation" part, and then you can choose an execution engine appropriate to what you're trying to do (which in cxxsim's case would be "the python loop but in c++" as discussed earlier)
<awygle> i don't know how real that perception is
<whitequark> well, yeah
<whitequark> that's what the solution is
<whitequark> the problem is actually finding that cut line
<awygle> it also sounds like probably that would break existing cxxrtl consumers
<whitequark> i think it might be possible to provide a shim or a default stub-y engine
<whitequark> but that requires first figuring out how to cut cxxrtl
<awygle> but it feels like "get it working, then solve that problem" is the way to go
<awygle> yes
<awygle> the thing you said that feels like "one step too far" to me is this:
<awygle> > [cxxrtl] flatten[s] and statically schedule[s] the entire netlist
<whitequark> wait. that's the entire point of cxxrtl
<whitequark> that's why it lives in yosys and i go through all the trouble of packaging that as wasm etc
<whitequark> instead of just... generating c++ from nmigen directly
<awygle> maybe i don't understand what you mean by "statically schedule" then
<whitequark> if you don't have feedback arcs, cxxrtl evaluates every cell exactly once
<awygle> exactly once per event, you mean
<whitequark> yeah
<whitequark> to do that, it needs to evaluate them in the dataflow graph preorder
<awygle> why is cxxrtl evaluating cells, might be where my confusion lies
<whitequark> i, uh
<whitequark> what else would it evaluate?
<awygle> my mental model has cxxrtl creating cells
<whitequark> creating cells?
<whitequark> i have no idea what that would even mean
<awygle> generating c++ descriptions of cells, which are driven by some kind of execution engine
<whitequark> well
<whitequark> right, when i say "cxxrtl evaluates" i mean "cxxrtl generates an eval() function that..."
<whitequark> are you suggesting that cxxrtl generate, say, one process per cell or something like that?
<whitequark> awygle: fwiw, I measured the impact of turning clocks into wire<>s. ~210% runtime compared to baseline, which is basically what i'd expect
hitomi2507 has joined #nmigen
<whitequark> awygle: mh. let's continue once you wake up
<whitequark> this all was just the preface for the actually interesting discussion on how to cut cxxrtl
<awygle> yup
<awygle> gnight
<whitequark> night!
<awygle> and > "are you suggesting that cxxrtl generate, say, one process per cell or something like that?" i think so yeah, kinda
<awygle> anyway, tomorrow
jeanthom has joined #nmigen
ianloic__ has joined #nmigen
_florent__ has joined #nmigen
ianloic_ has quit [Ping timeout: 272 seconds]
ianloic__ is now known as ianloic_
_florent_ has quit [Ping timeout: 260 seconds]
_florent__ is now known as _florent_
sorear_ has joined #nmigen
sorear has quit [Ping timeout: 272 seconds]
sorear_ is now known as sorear
sorear has quit [Ping timeout: 244 seconds]
sorear has joined #nmigen
jeanthom has quit [Ping timeout: 265 seconds]
jeanthom has joined #nmigen
_whitelogger has joined #nmigen
chipmuenk has joined #nmigen
futarisIRCcloud has joined #nmigen
jeanthom has quit [Ping timeout: 246 seconds]
SpaceCoaster_ has joined #nmigen
SpaceCoaster has quit [Read error: Connection reset by peer]
Asu has joined #nmigen
lkcl_ has joined #nmigen
awe00 has joined #nmigen
lkcl has quit [Ping timeout: 265 seconds]
chipmuenk1 has joined #nmigen
chipmuenk has quit [Ping timeout: 264 seconds]
chipmuenk1 is now known as chipmuenk
awe00_ has joined #nmigen
awe00__ has joined #nmigen
awe00 has quit [Ping timeout: 246 seconds]
awe00_ has quit [Ping timeout: 264 seconds]
<lkcl_> whitequark: that sounds like a lot more work. NLNet donations can go up to EUR 3000 if you need to do more than you originally thought. https://bugs.libre-soc.org/show_bug.cgi?id=475
Asu has quit [Remote host closed the connection]
yehowshua has joined #nmigen
Asu has joined #nmigen
Asuu has joined #nmigen
Asu has quit [Ping timeout: 256 seconds]
ademski has joined #nmigen
<yehowshua> whitequark: I was thinking about what mithro said yesterday about making emitted verilog look like the original nmigen
<yehowshua> This is mostly a thought experiement
<yehowshua> But I don't see any reason why there couldn't be crude lowerer that enumerates all signals in an elaboratble class
<yehowshua> and then connects them diretly, or inserts latches between them
<yehowshua> with verilog syntax of course
<yehowshua> This of course is not the point of nmigen at all, but i remembered you said making emitted verilog more human readable would be very hard
<yehowshua> with the approach outlined above, is that still the case?
<yehowshua> Actually, I'm not so sure that would be significantly more human readable then current yosys emitted verilog
<yehowshua> synchronous/combinational are natural to nmigen, but require curr and next signal in verilog
<whitequark> lkcl_: i've yet to decide on a specific approach here. it seems increasingly likely that i can implement something relatively simple to get correctness + acceptable performance, then iterate on it
<whitequark> yehowshua: i don't understand, sorry. how would that work?
<whitequark> are you basically suggesting that i lower `x.eq(a + b + c)` into, well, similar Verilog, as opposed to something like `x.eq(a + tmp0); tmp0.eq(b + c)`?
<whitequark> or something else?
<yehowshua> yes mostly
<yehowshua> yes actually
<yehowshua> for comb statements, that becomes assign
<yehowshua> for syn statements, you need a next signal
<whitequark> the short but unsatisfying answer is that i'm already doing approximately that
<DaKnig> yehowshua: actually, in many cases $next is not required actually; it just simplifies the logic.
<yehowshua> yes thats what I was getting at
<yehowshua> it wouldt be more readable then what we currently get
<whitequark> there is room for improvement
<DaKnig> I really hope that PR for yosys would get accepted
<whitequark> a significant amount of temporary signals can be eliminated
<DaKnig> where it removes temp signals that are used only once
<yehowshua> is it really worth even bothering with emitted verilog?
<whitequark> yehowshua: in theory, the yosys PR https://github.com/YosysHQ/yosys/pull/726 removes many such temporary signals
<yehowshua> Although, I sometimes read through it as a sanity check
<whitequark> check out the before/after there
<yehowshua> quite the difference!
<whitequark> see, the problem is, that PR is unsound
<DaKnig> whitequark: is this PR only removing temp signals?
<DaKnig> sometiems I use named signals as intermediates; would be a shame to remove them - makes code harder to read
<whitequark> yehowshua: Verilog has extremely complex integer promotion rules where the result of an operation is affected, among other things, by the *context* where the operation appears
<whitequark> i.e. you cannot translate nmigen "a + b" to verilog, well, "a + b" and then translate whichever node this "a + b" appears
<whitequark> you have to consider *where* this operation would live *in order to correctly translate it*
<DaKnig> whitequark: can you show an example you had trouble with?
<DaKnig> about a + b
<whitequark> DaKnig: https://imgur.com/a/KtCEpIL
<whitequark> this is from the actual spec
chipmuenk1 has joined #nmigen
<DaKnig> well ofc; the width is different. or am I missing the point?
chipmuenk has quit [Ping timeout: 260 seconds]
chipmuenk1 is now known as chipmuenk
<whitequark> DaKnig: https://imgur.com/a/w1yRHsE
<whitequark> so, suppose you want to translate nmigen's `(a + b) >> 1` into verilog correctly
<whitequark> you have two problems. first, verilog-2001 does not have a cast operator. so in order to have `a + b` evaluate to 17 bits of precision, you *have* to use a temporary wire
<yehowshua> does nmigen consider the carry bit?
<DaKnig> in nmigen , the width of the result is always 1+max(len(a),len(b))
<DaKnig> which is then expanded to fit lhs
<whitequark> second, you can only decide whether `a + b` should be translated through a temporary wire or not by considering the context in which it appears
<whitequark> so `x.eq(a + b)` is fine: you can translate that to `assign x = a + b;`.
<whitequark> but `x.eq((a + b) >> 1)` is not fine: you have to translate it to `wire [16:0] tmp = a + b; assign x = tmp >> 1;`
<DaKnig> oh no ; this sounds extra dumb.
<whitequark> thank you. this is my feelings exactly
<whitequark> it gets worse!
<DaKnig> so if you assign (a+b)>>1 to a signal that is as wide as a,b then the result wouldnt consider the carry
<whitequark> no
<whitequark> if you write `(a + b) >> 1` in verilog-2001 without using an intermediate wire, there is no way to make addition evaluate 17 bits instead of 16
<DaKnig> yes I meant the carry from bit 15 to bit 16 of the addition
<whitequark> oh, yeah
<DaKnig> which a sane language would expose after the >>
<whitequark> it actually doesn't matter what's the width of the signal you assign it to
<whitequark> you can assign (a+b)>>1 to a 32-bit signal
<whitequark> the result will still have 15 useful bits
<DaKnig> but then it would eval the right side as 32 bits?
<whitequark> nope!
<DaKnig> according to what you sent
<DaKnig> wot
<DaKnig> no?
<DaKnig> but max(32,16) = 32 so it'd eval rhs as 32bit ops? no?
<Lofty> whitequark: stupid idea for that case: a + b => a + b + N'b0
<DaKnig> ...disgusting solution. I love it.
<whitequark> let me double-check the standard
<whitequark> Lofty: this is actually what the spec recommends
<whitequark> except they don't tell you to use N'b0
<Lofty> Yeah, I remembered it from the spec
<whitequark> they tell you to use just 0, because 0 is "at least 32 bits wide" and a and b are both 16-bit
<whitequark> unfortunately, they don't explain *why* that works, so if you use that solution as-is with a 64+64 bit addition and expect a carry bit... oops
<whitequark> whether you get a carry bit is implementation-defined.
<whitequark> (generally, no)
<yehowshua> this is getting hairy
<Lofty> In other words, write_verilog always expands `a + b` to `a + b + N'b0`. Of course, you're still fucked when it comes to the ?: operator, but
<whitequark> DaKnig: oh yeah sorry, you are right and i was misreading the spec before
<whitequark> what you said here: < DaKnig> so if you assign (a+b)>>1 to a signal that is as wide as a,b then the result wouldnt consider the carry
<yehowshua> I heard similar arguments when it came to firtl
<whitequark> is indeed correct
<Lofty> Well, hmm. Sometimes you could elide the + N'b0, but
<whitequark> and it will evaluate the whole thing in 32 bits if you assign to a 32-bit signal
<whitequark> this is actually somehow even dumber than what I thought before
<whitequark> DaKnig: anyway, none of this is *insurmountable* perse
<whitequark> the problem is that it's a ton of work and it's extremely hard to test
<whitequark> it's not just bit width but also signedness
<whitequark> in verilog, if either of the operands is unsigned (for something like addition), the result is unsigned as well
<Lofty> Also where the standard overrides the user in how something is extended, like ?: always zero-extending its operands
<yehowshua> uhhh whitequark? looking at some cxxrtl...
<whitequark> yehowshua: sure?
<yehowshua> the memory is a large array of value8
<whitequark> yup
<yehowshua> hmm
<yehowshua> well, they're explicityly listed
<whitequark> yup. nmigen initializes the entire memory, then cxxrtl translates that to an explicit initializer
<whitequark> is that an issue?
<yehowshua> ah
<yehowshua> no no, just wondering
<yehowshua> Tryna understand more internals after last night's convo on cxxsim
<whitequark> Lofty: yep, which, when combined with no explicit type conversion operator in verilog-2001, makes it really hard
yehowshua is now known as Yehowshua
* Lofty waits for the inevitable write_systemverilog
<Lofty> /s
<whitequark> Lofty: `write_verilog -sv`
<whitequark> i added it recently
Yehowshua is now known as BracketMaster
<Lofty> That I know of
<whitequark> nmigen's workaround for not having always_comb interacted badly with the way it used yosys, so i had to stuff the workaround into yosys
<whitequark> i mean
<whitequark> what would write_systemverilog do if not that?
<Lofty> Well, the idea is that you can more naturally use SV constructs where available
<DaKnig> BracketMaster: I think you miss the point; having firtl is cool, but we're talking about when the requirement is having Verilog.
<whitequark> iirc firrtl has similar issues in some places
<BracketMaster> firrtl also has firrtl sim, i wonder if its any similar to cxxsim
<DaKnig> > in verilog, if either of the operands is unsigned (for something like addition), the result is unsigned as well
<DaKnig> that's ...
<DaKnig> a very interesting choice.
<whitequark> ... the opposite of what everyone wants?
<DaKnig> if anything, signed+unsigned should be signed
<whitequark> anyway, now you can see why that PR isn't merged
<BracketMaster> hah yup
<whitequark> i'll get that working *someday*
<BracketMaster> hopefully you won't have to
<whitequark> but probably not when i need to deliver cxxsim yesterday
<BracketMaster> verilog will be a memory
<BracketMaster> a bad memory
<BracketMaster> **verilog will be but a memory
<Lofty> Well, so far all the "verilog replacements" haven't been
<BracketMaster> chisel->firrtl->yosys has seen some use
<whitequark> verilog will be with us forever, but what's worse, verilog will be with us for a very long time as a netlist interchange format
<Lofty> ^
<BracketMaster> especially at UC berkely and SiFive
<whitequark> nmigen would be irrelevant if it could not be used with vivado and quartus
<Lofty> whitequark: you can always dump EDIF with LPM instead /s
<whitequark> Lofty: which of the five or so things vendors call EDIF? :p
<Lofty> Well, LPM seems to mostly be a Quartus thing
<Lofty> Or more specifically
<whitequark> ah
<Lofty> Quartus still uses LPM for some reason
<DaKnig> if Verilog is bad as a netlist lang, why not use VHDL
<DaKnig> ?
<BracketMaster> VHDL synth tools seem to be few and far between
<Lofty> ...I think the main reason is a lack of a write_vhdl
<DaKnig> it's just as popular (in some places its more popular, in others it is less)
<BracketMaster> and that
<DaKnig> BracketMaster: that's just nonsense.
<DaKnig> all the big players support VHDL
<BracketMaster> ok, ghdlsynth is recent
<DaKnig> Yosys is recent
<DaKnig> if you look at big companies and their tools, they support VHDL just as much as Verilog
<BracketMaster> oh yeah, i've only been doing HDL since 2017 TBH
<Lofty> Verilog is an easier language to implement, I think, but it's not by any means "good"
<whitequark> DaKnig: there are no downsides to support VHDL as an option besides the added work
<whitequark> it's not much better as a netlist exchange format though
<Lofty> Apparently the GHDL project wants to implement a write_vhdl
<DaKnig> Lofty: how is it easier to implement exactly?
<whitequark> like, yes, it's a less bad *language*, but i don't care so much how it is as a language, i care about it as a netlist file format
<DaKnig> do you mean making a frontend for it? or spitting it?
<Lofty> A frontend
<DaKnig> the second would be just as easy as in Verilog
<DaKnig> ah.
<DaKnig> well, there's some truth to this
<Lofty> VHDL files are meaningless if you don't have a frontend
<whitequark> it's not just as easy
<whitequark> if you look at yosys' write_verilog, it has been heavily tested with vloghammer
<DaKnig> whitequark: in VHDL iirc when you add two `unsigned` values it is extended by one bit
<DaKnig> so that should already solve the issue
<DaKnig> or well, it does what nmigen does
<whitequark> i can't really *not* support verilog
<whitequark> if i support only vhdl then i exclude everyone who'd like to use iverilog or verilator on the nmigen netlists
<Lofty> DaKnig: remember that nMigen does not directly output Verilog
<whitequark> sure, this is less relevant now that cxxrtl exists, but it's still very much relevant
<DaKnig> Lofty: when you spit the output as a "netlist" thing, you are probably doing that because you are using a commercial tool; all of them have VHDL frontends
<whitequark> you could also be using an open-source tool without a VHDL frontend
<whitequark> e.g. vtr
<whitequark> so vhdl-only nmigen would exclude the prjxray flow
<whitequark> or quicklogic
<DaKnig> I dont get it; why does the open source community focus so much on Verilog?
<whitequark> the same reason it focused on C and not Ada
<DaKnig> C is more popular
<DaKnig> Verilog is not
<whitequark> for a long time VHDL didn't even have an equivalent of SVA
<whitequark> so you had to write formal properties for your VHDL design in SV
<whitequark> i think it does now
<DaKnig> SVA?
BracketMaster has quit [Quit: Leaving]
<whitequark> systemverilog assertions
<whitequark> `assert property ...`
yehowshuaimmanue has joined #nmigen
<DaKnig> I am not sure about how it has been historically but I did get to use assertions with tools that dont support VHDL 2008 properly
<DaKnig> https://www.ics.uci.edu/~jmoorkan/vhdlref/assert.html <- looks like this is pretty old
<whitequark> those are just normal assertions
<whitequark> i'm talking about formal verification
<whitequark> what symbiyosys does
<DaKnig> what's the difference
<yehowshuaimmanue> I think the reason that Verilog has so much FOSS support is because verilator was the first FOSS verilog simulator
<yehowshuaimmanue> early 2000s
<yehowshuaimmanue> Everything kinda just followed
<DaKnig> I know what formal verification is
<DaKnig> I assumed that it uses the same construct for asserting stuff as in sim asserts
<DaKnig> just reading them differently
<whitequark> nop, lots of new syntax
<DaKnig> attributing different meaning to them I mean
yehowshuaimmanue has left #nmigen [#nmigen]
yehowshuaimmanue has joined #nmigen
<whitequark> DaKnig: VHDL's version of SVA is called PSL
<whitequark> take a look at what that does
<whitequark> anyway, i don't really care *why* VHDL has so little FOSS support; that's not relevant to me as someone shipping nmigen
<whitequark> the language is more elegant, sure, but also more limited, and still based on the same flawed model of inference from code written for an event-driven simulator
<whitequark> sorry, "more limited" isn't accurate there. what i meant is that while it's more elegant, that sometimes comes at a cost
<whitequark> e.g.: no blocking assignments? good. manual clock tree balancing? not good
<DaKnig> ok fair enough :)
<whitequark> i definitely draw inspiration from VHDL more than from Verilog
<whitequark> i mean, pysim and cxxrtl are both based on VHDL simulation semantics!
<DaKnig> ... wait, does Verilog not have equivalents to wait statements?
<whitequark> sure does
<yehowshuaimmanue> whitequark, just read it
<yehowshuaimmanue> VHDL events are ordered?
<whitequark> not all events, but you have two phases in a delta cycle
<yehowshuaimmanue> so you're guaranteed to only have to evaluate one process after chanaging a signal
<whitequark> first updates are queued while processes execute. then, updates are all applied at once, with no user code running in between
<whitequark> hm
<yehowshuaimmanue> very interesting
<whitequark> i don't think there's any guarantee about one process?
<whitequark> two processes can wait on a signal just fine, i think
<yehowshuaimmanue> sorry, i meant only once
<yehowshuaimmanue> let me reread, I think i'm conflating two different things
<yehowshuaimmanue> ok. I think I've got it. signal in VHDL always come before processes, where as in verilog, you can have [signal, process, signal, process]
<whitequark> yep
<whitequark> this is both good and bad
<whitequark> good because you, the HDL author, is freed from the responsibility of ensuring determinism regardless of scheduling
<whitequark> bad because it makes clock gating circuits (and similar stuff; e.g. DFF-based clock divider) much more annoying to express
<yehowshuaimmanue> I think I'm willing to sacrifice expressability for guarantees, then again, I always hated writing VHDL
<DaKnig> > with no user code running in between
<DaKnig> well that's not quite true
<whitequark> oh?
<DaKnig> your variables update in the order they are written
<DaKnig> many times per process
<DaKnig> unlike signals they are not queued
<DaKnig> so if you print some debug stuff via vars, it would print it in between queuing the things for the next cycle
<whitequark> oh, yeah, i was only talking about signals
<DaKnig> also prints I think
<DaKnig> yeah, that's true about signals.
yehowshuaimmanue is now known as yehowshua
<yehowshua> I'm guessing the fact that verilog has blocking and non-blocking are the cause for unordered [signal, process] behavior during simulation?
<whitequark> yeah. specifically that it has blocking assignments
<yehowshua> But with nMigen's sync/comb, pysim signal and process can be ordered?
<whitequark> yeah
<yehowshua> nMigen is definitely more natural to write, so it seems like we get the best of both worlds.
<whitequark> ideally!
<yehowshua> any tradeoffs I'm missing?
<whitequark> clock gating
<whitequark> m.d.comb += ClockSignal("a").eq(ClockSignal("b") & en) # this introduces a delta cycle
<yehowshua> ah yes
<yehowshua> That doesn't seem like idomatic nMigen though
<whitequark> it's how you write an architecture-independent clock gate
<yehowshua> Ah
<yehowshua> Also, something like that probably shows up in AFIFO
<yehowshua> or really anywhere you cross clocks
<whitequark> hm, not really
<whitequark> it's fine if you treat a and b as unrelated domains
<whitequark> well
<whitequark> it does show up if you try to treat a and b as in-phase and don't add CDC
<yehowshua> whats CDC? clock detection
<DaKnig> wdym by manual clock tree balancing?
<whitequark> yehowshua: clock domain crossing
<whitequark> 2FF synchronizers and such
<whitequark> DaKnig: if you drive one clock signal from another with e.g. a gate in between, you have to manually add dummy delta cycles or their posedges won't actually be in-phase
<yehowshua> oh!
<yehowshua> i follow now
hitomi2507 has quit [Quit: Nettalk6 - www.ntalk.de]
<yehowshua> Is there an RTLIL spec somewhere? I follow its general syntax, but i'd like to know the nitty-gritty as I start to take a closer look into nMigen's guts.
<whitequark> take a look at the yosys manual
<yehowshua> I saw chapter 4.
<yehowshua> Its only 10 pages
<yehowshua> Maybe RTLIL is really that simple
<whitequark> well
<whitequark> it used to be shorter before i started documenting the more obscure parts of it
<whitequark> you can guess the general situation here
<yehowshua> hah, sure
<yehowshua> If I have obscure questions, I know who to ask
<whitequark> yup
<whitequark> and I can update the manual then
yehowshua has left #nmigen [#nmigen]
<d1b2> <marble> I'm currently trying to write a board definition script for the colorlight 5A-75B. The ~CS pin is always connected to GND according to https://github.com/q3k/chubby75/blob/master/5a-75b/hardware_V7.0.md#sdram-u29 When I leave out the cs paramter in SDRAMResource or set it to None, the script throws an exception
<whitequark> SDRAMResource needs to be modified to take this case into account
<d1b2> <marble> my quick fix was diff diff --git a/nmigen_boards/resources/memory.py b/nmigen_boards/resources/memory.py index b2be757..9e8b1a0 100644 --- a/nmigen_boards/resources/memory.py +++ b/nmigen_boards/resources/memory.py @@ -103,13 +103,14 @@ def SRAMResource(*args, cs, oe=None, we, a, d, dm=None, return Resource.family(*args, default_name="sram", ios=io) -def SDRAMResource(*args, clk, cke=None, cs, we, ras, cas, ba, a, dq, dqm=None, +def
<d1b2> SDRAMResource(*args, clk, cke=None, cs=None, we, ras, cas, ba, a, dq, dqm=None, conn=None, attrs=None): io = [] io.append(Subsignal("clk", Pins(clk, dir="o", conn=conn, assert_width=1))) if cke is not None: io.append(Subsignal("clk_en", Pins(cke, dir="o", conn=conn, assert_width=1))) - io.append(Subsignal("cs", PinsN(cs, dir="o", conn=conn, assert_width=1))) + if cs is not None: + io.append(Subsignal("cs",
<d1b2> PinsN(cs, dir="o", conn=conn, assert_width=1))) io.append(Subsignal("we", PinsN(we, dir="o", conn=conn, assert_width=1))) io.append(Subsignal("ras", PinsN(ras, dir="o", conn=conn, assert_width=1))) io.append(Subsignal("cas", PinsN(cas, dir="o", conn=conn, assert_width=1)))
<whitequark> please use a pastebin
<d1b2> <marble> ah, sorry
<d1b2> <marble> because this channel is mirrored to IRC?
<whitequark> seems about right, please send a PR
<whitequark> yeah
yehowshua has joined #nmigen
<pepijndevos> mirrored to IRC... from where?
<whitequark> 1bitsquared discord
emeb has joined #nmigen
yehowshua has left #nmigen [#nmigen]
awe00__ has quit [Ping timeout: 240 seconds]
awe00__ has joined #nmigen
<d1b2> <marble> hm, would is also have been valid to set cs="-"
<d1b2> <marble> ?
<whitequark> marble: we don't have such a syntax
<d1b2> <marble> Ah. I saw it in the connector definition of one board
<d1b2> <marble> As part of the pin list
<whitequark> it does exist in connectors
<whitequark> but not in resources
<d1b2> <marble> Ok. Thanks 🙂
chipmuenk has quit [Ping timeout: 240 seconds]
<DaKnig> should I write testbenches in python classes/functions or as Elaboratables?
<whitequark> depends on the kind of testbench you want, really
<whitequark> usually it'd be the former
<d1b2> <marble> Is it ill-advised to use -ignore_error is the svf command?
<d1b2> <marble> *in
<daveshah> Yes
<daveshah> How is it failing?
<d1b2> <marble> The first TDO, that seems to check the IdCode, mismatches and read almos all f
<daveshah> Can you post the full OpenOCD output?
<d1b2> <marble> great. now it worked without the ignore 😄 ... somtimes it works. I used an STM running versaloon atm. maybe that's a bit unstable. i run until it fails. one sec
<d1b2> <marble> http://ix.io/2vYc
<daveshah> That's a CRC error according to the decode_status_reg tool (https://github.com/Spritetm/hadbadge2019_fpgasoc/blob/master/decode_status_reg.c)
<d1b2> <marble> ah, wait. I also had the IdCode thing. I didn't pay attention to all error log ^^'
<daveshah> Sometimes, I find a pulldown resistor or very small capacitor (10pF ish) on TCK can improve marginal JTAG links
<d1b2> <marble> Ok. I'll try that 🙂
jeanthom has joined #nmigen
jeanthom has quit [Ping timeout: 256 seconds]
jeanthom has joined #nmigen
smkz has quit [Quit: reboot@]
smkz has joined #nmigen
chipmuenk has joined #nmigen
Asuu has quit [Remote host closed the connection]
<DaKnig> lkcl_: I remember you linked me to a good tutorial for simulation; I cant find it now. mind sending that again?
<DaKnig> (lkcl or anybody else)
<DaKnig> what does `yield signal` return? a python int?
jeanthom has quit [Ping timeout: 240 seconds]
chipmuenk has quit [Quit: chipmuenk]
<cr1901_modern> whitequark: Random thought that may or may have not been brought up before: do you think it's possible as part of, maybe nmigen-stdio to have a "sanity test" gateware for testing that I/O works properly on new platforms _quickly_ before putting them into nmigen-boards?
<lkcl_> DaKnig, ehh.... it *might* have been Robert Baruch's one?
<cr1901_modern> Something like testing DRAM quickly, since leds and switches and buttons are easy
<lkcl_> DaKnig: yes, it'll return whatever value is in that nmigen Signal, at the precise moment in time in the Simulation.
<DaKnig> as a Python int?
<lkcl_> DaKnig: yes. although i am not sure what happens (what is returned) if you create a Signal(Enum)
<lkcl_> you'd have to experiment.
<DaKnig> thankfully I prefer magic numbers :)
<DaKnig> (not really please dont kill me)
<lkcl_> :)
<lkcl_> i keep creating magic constant classes, forgetting about Enums, too :)
<TiltMeSenpai> lkcl_: I'm pretty sure when you create a Signal(Enum), it uses the Enum's int mapping
<TiltMeSenpai> it only adds like the String decoding in the VCD files/Sim
<DaKnig> when it's already a Signal, it only tracks the width of it and signedness I think
<DaKnig> ah really? it shows the strings?
<DaKnig> that's cool
<TiltMeSenpai> in VCD's, yep
<TiltMeSenpai> Python's Enums low key suck
ademski has quit [Ping timeout: 240 seconds]
<awygle> i think it would be great to have models in nmigen-stdio for testing, cr1901_modern
<lkcl_> TiltMeSenpai: ahh yes that makes sense
<lkcl_> DaKnig: except if you run vcd2fst, then it doesn't preserve the string mappings
<lkcl_> i found that out when doing "repairs" to litex vcd files using vcd2fst
<cr1901_modern> Like, I want to add Cora Z7 to nmigen. And I'd like to eventually do my own DRAM controller for fun (for some metric of fun). But I don't want to write one for testing
<cr1901_modern> Just one that verifies that I created a reasonable UCF
<DaKnig> lkcl_: vcd2fst the commandline tool you mean, right?
<lkcl_> DaKnig: yes. it's a binary alternative to vcd ascii files, and takes up a *lot* less space
<DaKnig> yeah ik
<DaKnig> I used that
<lkcl_> oh cool
<DaKnig> that's a bummer
<DaKnig> does fst not have this construct?
<lkcl_> i ran into difficulties with litex sim vcd and fst output. converting "solved" the problem
<DaKnig> or is that a limitation of the tool?
<lkcl_> honestly don't know. it could well have just been that it was because i had to use verilog (to get into litex) that "lost" the string information
<lkcl_> i didn't investigate: i just put up with it :)
<DaKnig> I should do this more :)
<lkcl_> lol
* lkcl_ HA! finally, after *three weeks* i have parity with a microwatt "randomly-generated" unit test in libresoc
<lkcl_> unnnbelievable. 3 weeks of examining side-by-side diffs of running 32 *THOUSAND* POWER9 instructions
<DaKnig> I think it would be cool to have vcd output starting at a certain point
<DaKnig> so in cases like this, you would write a program that checks the 1000 cycles before the first diff
<lkcl_> yyyeah litex sim.py allows that.
<DaKnig> did you really need all the 32k inst. to get the bug?
<lkcl_> it's one of the command-line options. yeah it would be really useful
<DaKnig> for some reason I doubt
<lkcl_> anton blanchard wrote a POWER9 "random instruction generator". some of them are illegal instructions
<lkcl_> by running it i've found.... about.... 15 separate and distinct bugs, minimum
<lkcl_> or, no, that's not true. they're not illegal instructions: they're instructions where you're required to ignore certain flags in some circumstances, such as mulhw
<lkcl_> mulhw does *not* have an "overflow" variant (mulhwo) in POWER9
<DaKnig> I work with the principle "solve the first issue; rerun tests"
esden has quit [Ping timeout: 272 seconds]
<lkcl_> well in this case, that's absolutely critical, because at the very first error, the register files are, obviously, corrupted
<DaKnig> I would have given up looking at those traces and would have generated a few new tests after fixing the first issue
esden has joined #nmigen
<lkcl_> oh i definitely gave up looking at the traces :)
<lkcl_> instead i got litex sim.py to create log files dumping the regfiles
<lkcl_> used that to identify the instruction that was wrong, created a unit test for it and then specifically debugged the traces for that (small, one) unit test
<DaKnig> what'd I get if I `yield` a Record in sim?
<lkcl_> repeat that for 3 weeks solid and you create a *lot* of unit tests :)
<lkcl_> it will concatenate the constituents as if it was a contiguous block of bits
<lkcl_> and return that as an int for you
<DaKnig> :(
<DaKnig> getting a class that was generated on the spot would have been cooler (but more expensive ofc)
<lkcl_> so a Record containing a 4-bit value and a 1-bit value will return an int which, if you do bin(value), 0bNMMMM, M will be the 1st item of the record, N will be the 2nd
<lkcl_> you can just do "yield therecord.thememberoftherecord"
<lkcl_> you're not forced into a situation where you have to yield the record then *manually* unpack the bits using python int mask-manipulation if that's what you're thinking
<DaKnig> yeah I know that
<lkcl_> ok :)
<lkcl_> just checking
<DaKnig> that would require a few yields toh
<DaKnig> tho
<DaKnig> obv that's quite slow
<DaKnig> I am making a VGA tb
<DaKnig> (a very simple one)
<lkcl_> yes. i don't have a problem with that, but i am not counting on pysim being "very fast"
<lkcl_> nice
<DaKnig> I wanted to minimize the amount of control flow
<awygle> lkcl_: vcd2fst is based on the same gtkwave code, right? afaik that code is the only fst implementation
<lkcl_> sounds like a case of "premature optimisation" :)
<lkcl_> awygle: 1 sec let me run dpkg -l gtkwave
<DaKnig> pysim, without any process attached, under pypy3 finishes a frame in a reasonable amount of time
<lkcl_> awygle: answer yes it's part of gtkwave
<DaKnig> by that I mean "a few minutes"
<DaKnig> with a process that ought to be much slower
<DaKnig> esp since I would have to run that many many times
<DaKnig> might even use Cython on that problem if things get too dire
<DaKnig> well... that wouldnt help much.
<lkcl_> DaKnig: well, this is what cxxrtl is for, to give that speedup without sacrificing readability
<DaKnig> guess cxxrtl would be better
<DaKnig> ah you said it already :)
<DaKnig> I read that it has some issues though
<DaKnig> differences from pysim?
<lkcl_> DaKnig: mmm.... cython... yyyeah, i vaguely recall a conversation with whitequark about it, 2 years ago. the conclusion was - i think - "cython wouldn't help"
<lkcl_> cxxrtl's goals are to be *exactly* a drop-in replacement for pyrtl. 100% fully compatible.
<DaKnig> yes cython would not help here
<DaKnig> well goals are one thing; I have reality facing me right now.
<lkcl_> as in: if cxxrtl doesn't generate the same results as pysim, that's a show-stopping bug.
<lkcl_> well... you could always try cocotb.
<DaKnig> cython falls back to the "python way" way too often
<lkcl_> it compiles code under verilator then annotates it sufficient to interact with it from python. it's... conceptually similar to cxxrtl.
<lkcl_> staf from chips4makers uses it (with nmigen).... 1sec...
<DaKnig> I dont wanna waste time learning a new tool. I dont have that extra time right now.
<lkcl_> apologies, _one_ of those repos has...
<DaKnig> no amount of "but its simple!" would help there.
<lkcl_> ok that answer saves me some time :)
<DaKnig> thanks for the suggestion
<DaKnig> I appreciate that
<lkcl_> i was going to say that staf's code would help show you how to use cocotb with nmigen.
<lkcl_> i'm in a similar situation with the microwatt-libresoc side-by-side comparisons
<lkcl_> not all registers are accessible in microwatt simulations
<lkcl_> so i have to *learn VHDL* in order to add the code needed to access those registers so that i can see them and compare against...
<lkcl_> urrr :)
<DaKnig> does GHDL not show internal signals?
<DaKnig> in the VCD
<lkcl_> there's a bit of a problem with GHDL. records are converted to flattened bit-arrays.
<Chips4Makers> Main reason why I needed to use cocotb is that I used VHDL/Verilog code in wrapped in nmigen using Instance. pysim can handle only pure nmigen. Also was already using cocotb with omigen before going to nmigen; have to admit pysim did not click with me yet although I used it in some tests I did for nmigen-soc code.
<lkcl_> after learning that i didn't even bother looking at vcd files generated from GHDL. or, i did, once, to confirm it.
* lkcl_ waves to Staf
<lkcl_> Chips4Makers: do you happen to know, is it faster than pysim?
<lkcl_> obviously, it is c code (because it's verilator), but does the interaction through python result in a slowdown that makes it effectively no different in speed from pysim?
<Chips4Makers> @lklc_ It will depend on simulator you use. I did use it with GHDL for VHDL code, iverilog for Verilog code or (32bit) ModelSim for mixed RTL code. I did not do big runs with it, so was not so worried about speed, otherwise would certainly have looked at verilator for Verilog simulations.
<DaKnig> I have a testbench process; how can give it arguments?
<lkcl_> Chips4Makers: thank you
<lkcl_> DaKnig: do you have some code online anywhere? there's a couple of techniques, i can point you at some example code, 1 sec
<Chips4Makers> @lklc_ Yes, read on gitter that main thing for optimizing speed is to limit round trips from simulator to python and back.
<Chips4Makers> For example not too wise to generate your clock in cocotb.
<lkcl_> DaKnig: https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_inout_mux_pipe.py;hb=HEAD
<lkcl_> Chips4Makers: that makes sense
<lkcl_> DaKnig: lol. _that_ level of TODO :)
<DaKnig> ?
<DaKnig> no
<DaKnig> I have the code
<DaKnig> it *should* work
<DaKnig> I think
<DaKnig> I just dont think pasting it would help much
<lkcl_> ok so the example link above, you can see i created a class, called InputTest
<lkcl_> it takes the "dut" as an argument in its constructor, storing it for later use
<lkcl_> however the key here is the fact that i pass in python arguments into the processes - inputtest.send() and inputtest.recv()
<lkcl_> which *change* the behaviour of the test.
<lkcl_> probably the most important thing for you is the creation of the class itself
<lkcl_> you can pass in vga_mode, hsync, vsync, color, as arguments in the constructor of the class (along with the dut)
<DaKnig> what's dut?
<ktemkin> Device Under Test
<lkcl_> convention "device under ..." lol beat me to it, ktemkin :)
<DaKnig> I was thinking about just passing it the signals it should care about
<lkcl_> there's another technique which Jacob used
<DaKnig> wait; because it's a generator, doing `a=foo()` doesnt run it, but just returns an iterator which I can pass to nmigen..?
<lkcl_> ahh you have to watch out for that. i can't remember the exact details. it's something like... if you try to do "comb +=" or "sync +=" *after* you've created the Simulation object, you hose the internal state.
<lkcl_> basically, yes
<DaKnig> I dont do either
<DaKnig> that testbench doesnt really care about the nmigen side besides reading the values and putting them on screen
<lkcl_> look again at the file https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_inout_mux_pipe.py;hb=HEAD
<lkcl_> line 217 declares an instance of the dut (device under test)
<lkcl_> line 223 declares the object that *tests* that file
<lkcl_> sorry, tests that dut
<lkcl_> i need to find you a better example
<DaKnig> I saw that
<DaKnig> I think this is a perfect example. in 224:228 you send the functions with the right arguments you wanted
<lkcl_> try this one https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/test/test_compalu_multi.py;hb=HEAD
<DaKnig> with ()
<lkcl_> yes but unfortunately it's not been updated in... a year, and run_simulation() is deprecated
<lkcl_> cesar[m] wrote the test_compalu_multi.py much more recently and it uses the latest nmigen simulation API
<lkcl_> yeah, that's a better example. you can see several different techniques, there.
<DaKnig> what I said is on hte python level; not on nmigen's side.
<DaKnig> lemme test that.
<lkcl_> simple "function" based simulation tests, like scoreboard_sim_fsm()
<lkcl_> but if you *really* want to go for multiple processes, you want the CompUnitParallelTest() class
<lkcl_> i leave it with you, it's midnight here now
<DaKnig> thanks
cr1901_modern has quit [Quit: Leaving.]
<DaKnig> add_sync_process allows reading multiple values between clocks, right?
<DaKnig> asking because it looks like its `wrapper` `yield`s a `Tick`
<lkcl_> DaKnig: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/test/test_compalu_multi.py;hb=HEAD#l418
lkcl__ has joined #nmigen
lkcl_ has quit [Ping timeout: 246 seconds]
<DaKnig> I get this error when running my design : https://paste.debian.net/1162369/
<DaKnig> what'd I do wrong?
<DaKnig> (that's just partial code ofc)
cr1901_modern has joined #nmigen
<DaKnig> nvm; really dumb mistake. using the same name for diff things.
emeb has quit [Quit: Leaving.]