#nmigen on 2020-09-02 — irc logs at freenode.irclog.whitequark.org

2020-09-01 08:18 ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen · IRC meetings each Monday at 1800 UTC · next meeting September 7th

00:01 peeps[zen] has joined #nmigen

00:02 cr1901_modern has quit [Ping timeout: 240 seconds]

00:03 peeps has quit [Ping timeout: 240 seconds]

00:04 cr1901_modern has joined #nmigen

00:22 electronic_eel has quit [Ping timeout: 240 seconds]

00:22 electronic_eel has joined #nmigen

00:36 <_whitenotifier-3> [YoWASP/yosys] whitequark pushed 1 commit to develop [+0/-0/±1] https://git.io/JUYOG

00:37 <_whitenotifier-3> [YoWASP/yosys] whitequark ae00ea8 - Update dependencies.

01:09 awe00 has quit [Ping timeout: 256 seconds]

01:35 lkcl_ has quit [Ping timeout: 240 seconds]

02:03 lkcl has joined #nmigen

02:13 C-Elegans has joined #nmigen

02:17 Yehowshua has joined #nmigen

02:19 <C-Elegans> @whitequark I ran into an issue with cxxsim earlier today, would you mind taking a look at it?

02:20 <C-Elegans> https://gist.github.com/C-Elegans/dcff8ae8ef31a27081d534facebefee3

02:20 <C-Elegans>

02:21 <C-Elegans> Basically, during simulation with cxxsim, yielding values from a 64 bit signal gives me a value with the 32 bit halves reversed

02:30 <whitequark> C-Elegans: oops.

02:30 <whitequark> i've never tested that particular code path

02:33 <C-Elegans> thanks! Do you need me to file a github issue or...?

02:33 <whitequark> let me just fix it

02:33 <C-Elegans> ok. Thanks!

02:35 Yehowshua has quit [Ping timeout: 245 seconds]

02:55 jaseg has quit [Ping timeout: 260 seconds]

02:57 jaseg has joined #nmigen

03:02 <_whitenotifier-3> [nmigen/nmigen] whitequark pushed 1 commit to cxxsim [+3/-0/±4] https://git.io/JUYZx

03:02 <_whitenotifier-3> [nmigen/nmigen] whitequark c08c30c - [WIP] sim: add cxxsim engine.

03:02 <_whitenotifier-3> [nmigen/nmigen] whitequark pushed 1 commit to cxxsim [+3/-0/±4] https://git.io/JUYZj

03:02 <_whitenotifier-3> [nmigen/nmigen] whitequark 9cbdff0 - [WIP] sim: add cxxsim engine.

03:02 <whitequark> C-Elegans: should be fixed

03:05 <whitequark> awygle: poke

03:06 <awygle> whitequark: peek

03:06 <whitequark> awygle: up to discuss cxxsim stuff?

03:06 <awygle> yup

03:07 <whitequark> great! hm. how much do you know about the way pysim works?

03:08 <awygle> not a lot lol

03:10 <whitequark> alright, let me explain all of it, then

03:11 <whitequark> hm. how much do you know about the way verilog simulation semantics works?

03:12 <awygle> i read your post on why VHDL's are better

03:12 <whitequark> er, my post?

03:12 <awygle> oh maybe you just linked it actually

03:13 <whitequark> yeah

03:14 <whitequark> so to recap: in verilog, you have processes executing in parallel. all these processes work on the same global state, so the order of execution of processes, in general, significantly affects the result. to work around that, nonblocking assignments are used to split the eval and commit phases and bring back determinism

03:14 <awygle> mhm

03:15 <C-Elegans> @whitequark sorry, went to do something else. Can confirm it works now!

03:15 <C-Elegans> Thanks!

03:15 <whitequark> in pysim, you have processes executing in parallel. all these processes work on *two* copies of the global state. they read the "curr" instance and write to the "next" instance.

03:15 <whitequark> since a process never* reads from "next", it doesn't matter which order you execute them in at all

03:16 <awygle> ok

03:16 <whitequark> * it reads from "next" during read-modify-writes, which is why it's important that RMWs must completely overwrite the bits they touch and don't modify the others at all

03:18 <whitequark> in cxxrtl, you have somewhat of a hybrid approach. at -O0 it works exactly the same as pysim, and is also very slow. this is because each time you change the input to a combinatorial function, you have to finish evaluating the current process, let the change trigger reevaluation of the comb function, then let the change of the output trigger more reevaluations

03:19 <whitequark> there's a pair of subtle tradeoffs in the design of this sort of simulator

03:21 <whitequark> - do you detect changes in inputs and reschedule a process reading those inputs when that happens, or do you just repeatedly run the process until the outputs stop changing?

03:22 <whitequark> - do you use many fine-grained processes (scheduling overhead, sequencing issues) or few coarse-grained processes (evaluation overhead: most of what you evaluate will be values that haven't changed)?

03:23 <whitequark> pyrtl and cxxrtl take exactly opposite approaches here

03:25 <whitequark> pyrtl generates many processes per design (one per clock domain per fragment, to be specific) and schedules them when inputs change, collecting the set of updated signals during comit

03:25 PyroPeter_ has joined #nmigen

03:25 <whitequark> cxxrtl generates one process (it's not even explicitly called that, it's just p_top.step()) which always reevaluates everything

03:28 <awygle> ok

03:28 <awygle> why the difference?

03:28 PyroPeter has quit [Ping timeout: 240 seconds]

03:28 PyroPeter_ is now known as PyroPeter

03:29 <whitequark> pyrtl tries to have low startup latency. this means it does a fairly straightforward translation of the nmigen IR. it can barely afford to translate it to python code

03:30 <whitequark> I experimented with different approaches. smaller processes make it slower, larger processes make it slower, not tracking sets of updated signals make it slower

03:30 <whitequark> the current design seems to be in a local optimum

03:31 <whitequark> cxxrtl, on the other hand, doesn't care about startup latency at all as long as it's not literally too long to wait for completion

03:31 <whitequark> which means that it can afford to flatten and statically schedule the entire netlist

03:32 <whitequark> which means that it doesn't need to track updated signal sets: most of your design is supposed to not have feedback arcs, and for the few feedback arcs it does encounter, it's basically fine to iterate a few times

03:33 <whitequark> still, since having feedback arcs slows down your design by at least 2x, it goes to great lengths to eliminate those. for example, blackboxes exist in large part to let you do comb feedback in behavioral code and still have single-pass statically scheduled evaluation

03:35 <whitequark> anyway, so far so good

03:35 <whitequark> now here's where things get really screwy.

03:35 <whitequark> clocks.

03:36 <whitequark> so far we had:

03:37 <whitequark> - synchronous processes, which are inert unless scheduled by a clock. so, mostly easy.

03:37 <whitequark> - combinatorial processes, which are scheduled by every change in inputs, and cannot (by decree) have feedback loops in them, in nmigen at least. so, easy, they're just pure.

03:38 <whitequark> the problem with clocks (and async resets, which are the same thing for the purposes of this discussion) is that they're the only true async part of a simulator for synchronous logic

03:39 <whitequark> so suddenly you have to start caring about things like "which delta cycle do I evaluate this clock-related thing?"

03:39 <awygle> mhm

03:39 <whitequark> this causes both potential and immediate issues

03:40 <whitequark> the potential issue is that each clk1.eq(clk2) statement, in pysim (and vhdl), introduces a delta cycle. which means that logic triggered by clk2 will get evaluated infinitesimally after logic triggered by clk1

03:41 <whitequark> in cxxrtl, what happens is... well, it differs based on -O level. which is kind of evil.

03:43 <whitequark> the immediate issue is that right now, pysim and cxxsim can't agree on exactly when (and how) clock ticks happen, with cxxsim being completely wrong in practice

03:43 Degi has quit [Ping timeout: 258 seconds]

03:43 <whitequark> let's back away a bit. how *exactly* do pysim and cxxsim trigger synchronous logic?

03:44 Degi has joined #nmigen

03:44 <whitequark> pysim does something like this:

03:44 <whitequark> def eval(): for process in processes: if process.runnable: process.run()

03:45 <whitequark> def commit() for signal in signals: if signal.commit(): for waiter in signal.waiters: waiter.runnable = True

03:46 <whitequark> so in pysim, if you're driving a clock via add_clock(), then in 1st delta cycle just the clock changes, in 2nd delta cycle all sync logic triggers, in 3rd and further delta cycles comb logic settles on a final value

03:46 <awygle> mk

03:47 <whitequark> cxxsim does something completely different

03:48 <whitequark> value<1> p_usb__clk, prev_p_usb__clk;

03:48 <whitequark> bool posedge_p_usb__clk() const { return !prev_p_usb__clk.slice<0>().val() && p_usb__clk.slice<0>().val(); }

03:48 <whitequark> then in eval() { ... if(posedge_p_usb__clk()) { /* run synchronous logic in this domain */ } }

03:49 <whitequark> why does it compare prev and curr values in eval, instead of comparing curr and next values in commit? well, it used to do the latter.

03:50 <whitequark> that is, until I tried to make it run on par with Verilator

03:50 <whitequark> there are two differences here

03:51 <whitequark> first, p_usb__clk is a value<1>, not wire<1>. why? well, if inputs were wire<X> rather than value<X>, then you'd have a spurious delta cycle each time you modify the inputs where nothing changes except for inputs assuming their next value as their curr value

03:53 <whitequark> second, the posedge condition on p_usb__clk is checked in eval, not commit. it's basically caused by the same problem: if you check it in commit, then your main loop has to look something like: <modify inputs>; commit(); eval(); commit();, even if your netlist can be statically scheduled and immediately converges

03:55 C-Elegans has quit [Ping timeout: 244 seconds]

03:55 <awygle> for clarity, how much of this is in cxxrtl and how much in cxxsim?

03:55 <whitequark> so far all of described lives in cxxrtl

03:55 <whitequark> actually, let me explain what cxxsim does exactly, since it's relevant

03:56 <whitequark> there's a generic simulator core that defines a few interafes in sim._base

03:57 <whitequark> among them is BaseSignalState. that's a piece of state the simulator keeps per-signal, and it must provide .curr, .next, and .set()

03:57 <whitequark> for pysim, .curr and .next are just bigints, and .set() pokes some logic that keeps track of updated signal sets

03:59 <whitequark> for cxxsim, this entire structure is a wrapper for cxxrtl_object. .curr and .next operate on the respective fields (through some painfully expensive ctypes calls...), and .set() simply updates .next.

04:00 <whitequark> so, suppose you have a clock, and you .set() its state from 0 to 1. what happens?

04:00 <whitequark> in pysim: all of the other processes finish eval()ing, then commit() wakes up every waiter, then on the next delta cycle they eval() the synchronous logic they have

04:02 <whitequark> in cxxsim things are way more complicated! first, python code doesn't even keep python-side objects for every signal in the simulation, because that'd take forever to just pull out, nor can it generally trigger anything on commit() because... well, it doesn't control commit(), cxxrtl does

04:06 <whitequark> second, clocks are inputs, so they're all value<1>, not wire<1>, so their curr is the same as their next.

04:06 <whitequark> third, in pysim, the simulator knows about every single signal, and they're all the same

04:08 _whitenotifier-3 has quit [Ping timeout: 260 seconds]

04:08 <whitequark> in cxxsim, not all signals are the same: you have wire<>s in the compiled code, which are commit()ted by the compiled code, you have inputs in the compiled code, which are value<>s and can't be commit()ted, and you also have signals that exist purely between Python processes

04:09 <whitequark> ... but those Python-only values still have to be available on the C++ side because the VCD writer lives there

04:10 <whitequark> so they're still backed by an artificial cxxrtl_object

04:11 <whitequark> we're almost at the point where I can explain the actual problem I'm trying to solve!

04:12 <whitequark> suppose there are no Python-only signals. then (those are at the moment not implemented) the cxxsim simulation loop looks like this:

04:12 <whitequark> def eval(): for process in processes: if process.runnable: process.run(); cxxrtl_top.eval()

04:14 <whitequark> def commit(): for waiter, signal, trigger in waiters: if prev[signal] != signal.curr and signal.curr == trigger: waiter.runnable = True; cxxrtl_top.commit()

04:15 <whitequark> let's say there's some synchronous logic in the generated code, plus a testbench that does `yield Tick()`

04:16 <whitequark> 1st delta cycle: the process created by add_clock() updates the clock. immediately after, cxxrtl_top.eval() evaluates the synchronous logic. then, commit() wakes up the testbench

04:17 <whitequark> 2nd delta cycle: testbench runs, reads a signal, discovers a value after the posedge instead of before the posedge, everything breaks.

04:21 <whitequark> as a bonus, it breaks in a similar albeit different way if you use -O3 instead of -Og and the clock input becomes a wire<1> instead of value<1>!

04:23 <awygle> oof

04:24 <whitequark> so far my effort to fix this has been limited to fiddling with the order of evaluation and triggering

04:25 <whitequark> but... as the description above should probably communicate, that can't fundamentally fix all issues. it happens to make #455 work but it can't make every example from #439 work simultaneously

04:25 <awygle> yep

04:27 <whitequark> my first semi-viable approach was to move the trigger check code out of commit() and into eval(); basically, by checking all triggers once eval() is done iterating through the processes, then, if anything changed, doing that again

04:28 <whitequark> which is (a) slow--which I tried to address by moving it to C++, and (b) as my description should hopefully make clear, a complete duplicate of eval/commit logic, nested inside eval, and also worse

04:29 Yehowshua has joined #nmigen

04:34 <whitequark> so overall i'm trying to achieve two things

04:35 <whitequark> i want cxxsim to work. i also want cxxsim to have a relatively small (ideally, zero) performance penalty compared to using cxxrtl manually.

04:36 <awygle> mhm

04:38 <whitequark> the former requires a redesign of cxxsim for reasons i just explained. the latter almost certainly requires a redesign of cxxrtl because right now cxxrtl is built around the assumption that, essentially, it is cosimulated with exactly one bench process (your loop in main()), and this makes single-pass evaluation actually possible

04:39 <awygle> and, pretending for the moment that you wouldn't have to rewrite cxxrtl to make this happen, the queued approach pysim uses is unacceptable performance-wise?

04:39 <whitequark> ... i mean it's not really possible to say without having a particular implementation in mind

04:40 <whitequark> but

04:41 <whitequark> cxxsim simulations, by virtue of being made largely from a big chunk of cxxrtl logic + a single testbench, inherently have few waiters (usually exactly one), so you really want to pay the cost of O(waiters) rather than O(signals)

04:41 Yehowshua has quit [Remote host closed the connection]

04:41 <whitequark> it's worse than just performance though

04:43 <whitequark> if we keep inputs as value<1> and not wire<1> then the simulation just becomes unsound in general

04:44 <whitequark> like, if you do clk1.eq(clk2) and translate it to C++ code you get different results than if you translate it to a Python process

04:50 <whitequark> what i think is that cxxsim should, ideally, take the existing pysim simulation loop (which is totally fine on its own), and push it down into C++ as much as possible

04:50 <awygle> sounds reasonable

04:50 <whitequark> in fact that was the insight i got, uh, two days ago

04:50 <whitequark> when i originally suggested we discuss C++ stuff i had a completely different, far more trivial set of questions :)

04:51 <whitequark> well, the problem is: how to actually do that?

04:51 Yehowshua has joined #nmigen

04:51 <whitequark> even worse: how to actually do that without completely breaking the current cxxsim interface, which people already depend on?

04:52 <whitequark> (sure, i could add a flag, but now it's even worse: there are two interfaces which do basically the same thing but in weirdly different ways)

04:54 <whitequark> oh and here's a bonus question: can we do that while making cxxrtl behave deterministically when you assign or gate clocks?

04:58 <whitequark> *without completely breaking the current cxxrtl interface, sorry

05:00 <awygle> ah yeah ok

05:00 <awygle> that's what i was thinking

05:00 <awygle> it _sounds_ to me, from this conversation and pretty much only this conversation, that cxxrtl is sort of... making assumptions about execution models

05:00 <awygle> and that ideally it would not do that

05:00 <whitequark> correct

05:00 <awygle> but that un-doing that would basically be a cxxrtl rewrite

05:00 <whitequark> well, it can't not make assumptions about execution models at all without getting a lot slower

05:01 <whitequark> not quite

05:01 <whitequark> the vast majority of cxxrtl deals with arbitrary precision arithmetic

05:01 <whitequark> the only thing that really makes any assumptions about execution models is eval and commit

05:01 <awygle> mm, k

05:02 <awygle> > it can't not make assumptions about execution models at all without getting a lot slower

05:02 <awygle> why is that, and how much is "a lot"?

05:04 <whitequark> let's see

05:04 <whitequark> so if it always emits inputs as wire<>s, you need, at least, one more commit() at the front (after your testbench changes the values)

05:06 <whitequark> but since that commit() triggers another eval(), it can no longer assume that a single pass is all it takes, so after that it does another commit() and eval()

05:06 <whitequark> which means you have a perf decrease of over 2x

05:06 <whitequark> basically, you can compare -O3 with -Og

05:06 <whitequark> lemme get some numbers in fact

05:06 Yehowshua has quit [Ping timeout: 260 seconds]

05:22 <whitequark> ... okay, so -O3 vs -Og is a slowdown of 20x

05:23 <awygle> wow that's a lot

05:23 <whitequark> that's because -O3 isn't actually equivalent to what I mentioned earlier

05:23 <whitequark> give me a sec

05:23 <awygle> it's getting towards bedtime here, so let me just kinda say where i'm at at you, and who knows if there's anything valuable in it, and then i can drop out when i need to sleep

05:25 <awygle> it feels like cxxrtl is trying to do too much at once, and there might be a cut line somewhere where you can break out the "turn verilog into c++" from the "execute a c++ simulation" part, and then you can choose an execution engine appropriate to what you're trying to do (which in cxxsim's case would be "the python loop but in c++" as discussed earlier)

05:25 <awygle> i don't know how real that perception is

05:25 <whitequark> well, yeah

05:26 <whitequark> that's what the solution is

05:26 <whitequark> the problem is actually finding that cut line

05:26 <awygle> it also sounds like probably that would break existing cxxrtl consumers

05:27 <whitequark> i think it might be possible to provide a shim or a default stub-y engine

05:27 <whitequark> but that requires first figuring out how to cut cxxrtl

05:27 <awygle> but it feels like "get it working, then solve that problem" is the way to go

05:27 <awygle> yes

05:28 <awygle> the thing you said that feels like "one step too far" to me is this:

05:28 <awygle> > [cxxrtl] flatten[s] and statically schedule[s] the entire netlist

05:28 <whitequark> wait. that's the entire point of cxxrtl

05:29 <whitequark> that's why it lives in yosys and i go through all the trouble of packaging that as wasm etc

05:29 <whitequark> instead of just... generating c++ from nmigen directly

05:29 <awygle> maybe i don't understand what you mean by "statically schedule" then

05:29 <whitequark> if you don't have feedback arcs, cxxrtl evaluates every cell exactly once

05:30 <awygle> exactly once per event, you mean

05:30 <whitequark> yeah

05:30 <whitequark> to do that, it needs to evaluate them in the dataflow graph preorder

05:31 <awygle> why is cxxrtl evaluating cells, might be where my confusion lies

05:32 <whitequark> i, uh

05:32 <whitequark> what else would it evaluate?

05:32 <awygle> my mental model has cxxrtl creating cells

05:32 <whitequark> creating cells?

05:33 <whitequark> i have no idea what that would even mean

05:33 <awygle> generating c++ descriptions of cells, which are driven by some kind of execution engine

05:33 <whitequark> well

05:33 <whitequark> right, when i say "cxxrtl evaluates" i mean "cxxrtl generates an eval() function that..."

05:34 <whitequark> are you suggesting that cxxrtl generate, say, one process per cell or something like that?

05:37 <whitequark> awygle: fwiw, I measured the impact of turning clocks into wire<>s. ~210% runtime compared to baseline, which is basically what i'd expect

05:38 hitomi2507 has joined #nmigen

05:43 <whitequark> awygle: mh. let's continue once you wake up

05:43 <whitequark> this all was just the preface for the actually interesting discussion on how to cut cxxrtl

05:46 <awygle> yup

05:46 <awygle> gnight

05:46 <whitequark> night!

05:46 <awygle> and > "are you suggesting that cxxrtl generate, say, one process per cell or something like that?" i think so yeah, kinda

05:46 <awygle> anyway, tomorrow

05:50 jeanthom has joined #nmigen

06:02 ianloic__ has joined #nmigen

06:02 _florent__ has joined #nmigen

06:03 ianloic_ has quit [Ping timeout: 272 seconds]

06:03 ianloic__ is now known as ianloic_

06:03 _florent_ has quit [Ping timeout: 260 seconds]

06:03 _florent__ is now known as _florent_

06:03 sorear_ has joined #nmigen

06:03 sorear has quit [Ping timeout: 272 seconds]

06:03 sorear_ is now known as sorear

06:16 sorear has quit [Ping timeout: 244 seconds]

06:20 sorear has joined #nmigen

06:44 jeanthom has quit [Ping timeout: 265 seconds]

06:49 jeanthom has joined #nmigen

06:56 _whitelogger has joined #nmigen

07:04 chipmuenk has joined #nmigen

07:09 futarisIRCcloud has joined #nmigen

07:27 jeanthom has quit [Ping timeout: 246 seconds]

07:32 SpaceCoaster_ has joined #nmigen

07:33 SpaceCoaster has quit [Read error: Connection reset by peer]

08:04 Asu has joined #nmigen

08:27 lkcl_ has joined #nmigen

08:28 awe00 has joined #nmigen

08:30 lkcl has quit [Ping timeout: 265 seconds]

08:39 chipmuenk1 has joined #nmigen

08:41 chipmuenk has quit [Ping timeout: 264 seconds]

08:41 chipmuenk1 is now known as chipmuenk

08:41 awe00_ has joined #nmigen

08:43 awe00__ has joined #nmigen

08:45 awe00 has quit [Ping timeout: 246 seconds]

08:48 awe00_ has quit [Ping timeout: 264 seconds]

11:26 <lkcl_> whitequark: that sounds like a lot more work. NLNet donations can go up to EUR 3000 if you need to do more than you originally thought. https://bugs.libre-soc.org/show_bug.cgi?id=475

11:26 Asu has quit [Remote host closed the connection]

11:39 yehowshua has joined #nmigen

12:13 Asu has joined #nmigen

12:25 Asuu has joined #nmigen

12:29 Asu has quit [Ping timeout: 256 seconds]

12:40 ademski has joined #nmigen

12:49 <yehowshua> whitequark: I was thinking about what mithro said yesterday about making emitted verilog look like the original nmigen

12:49 <yehowshua> This is mostly a thought experiement

12:51 <yehowshua> But I don't see any reason why there couldn't be crude lowerer that enumerates all signals in an elaboratble class

12:51 <yehowshua> and then connects them diretly, or inserts latches between them

12:51 <yehowshua> with verilog syntax of course

12:52 <yehowshua> This of course is not the point of nmigen at all, but i remembered you said making emitted verilog more human readable would be very hard

12:52 <yehowshua> with the approach outlined above, is that still the case?

12:58 <yehowshua> Actually, I'm not so sure that would be significantly more human readable then current yosys emitted verilog

12:59 <yehowshua> synchronous/combinational are natural to nmigen, but require curr and next signal in verilog

13:11 <whitequark> lkcl_: i've yet to decide on a specific approach here. it seems increasingly likely that i can implement something relatively simple to get correctness + acceptable performance, then iterate on it

13:12 <whitequark> yehowshua: i don't understand, sorry. how would that work?

13:13 <whitequark> are you basically suggesting that i lower `x.eq(a + b + c)` into, well, similar Verilog, as opposed to something like `x.eq(a + tmp0); tmp0.eq(b + c)`?

13:13 <whitequark> or something else?

13:13 <yehowshua> yes mostly

13:14 <yehowshua> yes actually

13:14 <yehowshua> for comb statements, that becomes assign

13:14 <yehowshua> for syn statements, you need a next signal

13:15 <whitequark> the short but unsatisfying answer is that i'm already doing approximately that

13:15 <DaKnig> yehowshua: actually, in many cases $next is not required actually; it just simplifies the logic.

13:15 <yehowshua> yes thats what I was getting at

13:15 <yehowshua> it wouldt be more readable then what we currently get

13:15 <whitequark> there is room for improvement

13:15 <DaKnig> I really hope that PR for yosys would get accepted

13:15 <whitequark> a significant amount of temporary signals can be eliminated

13:16 <DaKnig> where it removes temp signals that are used only once

13:16 <yehowshua> is it really worth even bothering with emitted verilog?

13:16 <whitequark> yehowshua: in theory, the yosys PR https://github.com/YosysHQ/yosys/pull/726 removes many such temporary signals

13:16 <yehowshua> Although, I sometimes read through it as a sanity check

13:16 <whitequark> check out the before/after there

13:17 <yehowshua> quite the difference!

13:17 <whitequark> see, the problem is, that PR is unsound

13:18 <DaKnig> whitequark: is this PR only removing temp signals?

13:18 <DaKnig> sometiems I use named signals as intermediates; would be a shame to remove them - makes code harder to read

13:19 <whitequark> yehowshua: Verilog has extremely complex integer promotion rules where the result of an operation is affected, among other things, by the *context* where the operation appears

13:19 <whitequark> i.e. you cannot translate nmigen "a + b" to verilog, well, "a + b" and then translate whichever node this "a + b" appears

13:20 <whitequark> you have to consider *where* this operation would live *in order to correctly translate it*

13:20 <DaKnig> whitequark: can you show an example you had trouble with?

13:20 <DaKnig> about a + b

13:22 <whitequark> DaKnig: https://imgur.com/a/KtCEpIL

13:22 <whitequark> this is from the actual spec

13:22 chipmuenk1 has joined #nmigen

13:23 <DaKnig> well ofc; the width is different. or am I missing the point?

13:24 chipmuenk has quit [Ping timeout: 260 seconds]

13:24 chipmuenk1 is now known as chipmuenk

13:24 <whitequark> DaKnig: https://imgur.com/a/w1yRHsE

13:25 <whitequark> so, suppose you want to translate nmigen's `(a + b) >> 1` into verilog correctly

13:25 <whitequark> you have two problems. first, verilog-2001 does not have a cast operator. so in order to have `a + b` evaluate to 17 bits of precision, you *have* to use a temporary wire

13:25 <yehowshua> does nmigen consider the carry bit?

13:26 <DaKnig> in nmigen , the width of the result is always 1+max(len(a),len(b))

13:26 <DaKnig> which is then expanded to fit lhs

13:26 <whitequark> second, you can only decide whether `a + b` should be translated through a temporary wire or not by considering the context in which it appears

13:26 <whitequark> so `x.eq(a + b)` is fine: you can translate that to `assign x = a + b;`.

13:27 <whitequark> but `x.eq((a + b) >> 1)` is not fine: you have to translate it to `wire [16:0] tmp = a + b; assign x = tmp >> 1;`

13:27 <DaKnig> oh no ; this sounds extra dumb.

13:27 <whitequark> thank you. this is my feelings exactly

13:27 <whitequark> it gets worse!

13:27 <DaKnig> so if you assign (a+b)>>1 to a signal that is as wide as a,b then the result wouldnt consider the carry

13:27 <whitequark> no

13:28 <whitequark> if you write `(a + b) >> 1` in verilog-2001 without using an intermediate wire, there is no way to make addition evaluate 17 bits instead of 16

13:29 <DaKnig> yes I meant the carry from bit 15 to bit 16 of the addition

13:29 <whitequark> oh, yeah

13:29 <DaKnig> which a sane language would expose after the >>

13:29 <whitequark> it actually doesn't matter what's the width of the signal you assign it to

13:29 <whitequark> you can assign (a+b)>>1 to a 32-bit signal

13:29 <whitequark> the result will still have 15 useful bits

13:29 <DaKnig> but then it would eval the right side as 32 bits?

13:30 <whitequark> nope!

13:30 <DaKnig> according to what you sent

13:30 <DaKnig> wot

13:30 <DaKnig> no?

13:30 <DaKnig> but max(32,16) = 32 so it'd eval rhs as 32bit ops? no?

13:30 <Lofty> whitequark: stupid idea for that case: a + b => a + b + N'b0

13:31 <DaKnig> ...disgusting solution. I love it.

13:31 <whitequark> let me double-check the standard

13:32 <whitequark> Lofty: this is actually what the spec recommends

13:32 <whitequark> except they don't tell you to use N'b0

13:32 <Lofty> Yeah, I remembered it from the spec

13:32 <whitequark> they tell you to use just 0, because 0 is "at least 32 bits wide" and a and b are both 16-bit

13:33 <whitequark> unfortunately, they don't explain *why* that works, so if you use that solution as-is with a 64+64 bit addition and expect a carry bit... oops

13:33 <whitequark> whether you get a carry bit is implementation-defined.

13:33 <whitequark> (generally, no)

13:33 <yehowshua> this is getting hairy

13:34 <Lofty> In other words, write_verilog always expands `a + b` to `a + b + N'b0`. Of course, you're still fucked when it comes to the ?: operator, but

13:34 <whitequark> DaKnig: oh yeah sorry, you are right and i was misreading the spec before

13:34 <whitequark> what you said here: < DaKnig> so if you assign (a+b)>>1 to a signal that is as wide as a,b then the result wouldnt consider the carry

13:34 <yehowshua> I heard similar arguments when it came to firtl

13:34 <whitequark> is indeed correct

13:34 <Lofty> Well, hmm. Sometimes you could elide the + N'b0, but

13:35 <whitequark> and it will evaluate the whole thing in 32 bits if you assign to a 32-bit signal

13:35 <whitequark> this is actually somehow even dumber than what I thought before

13:36 <whitequark> DaKnig: anyway, none of this is *insurmountable* perse

13:36 <whitequark> the problem is that it's a ton of work and it's extremely hard to test

13:36 <whitequark> it's not just bit width but also signedness

13:36 <whitequark> in verilog, if either of the operands is unsigned (for something like addition), the result is unsigned as well

13:37 <Lofty> Also where the standard overrides the user in how something is extended, like ?: always zero-extending its operands

13:37 <yehowshua> uhhh whitequark? looking at some cxxrtl...

13:37 <whitequark> yehowshua: sure?

13:37 <yehowshua> the memory is a large array of value8

13:38 <whitequark> yup

13:38 <yehowshua> hmm

13:38 <yehowshua> well, they're explicityly listed

13:38 <whitequark> yup. nmigen initializes the entire memory, then cxxrtl translates that to an explicit initializer

13:39 <whitequark> is that an issue?

13:39 <yehowshua> ah

13:39 <yehowshua> no no, just wondering

13:39 <yehowshua> Tryna understand more internals after last night's convo on cxxsim

13:39 <whitequark> Lofty: yep, which, when combined with no explicit type conversion operator in verilog-2001, makes it really hard

13:39 yehowshua is now known as Yehowshua

13:39 * Lofty waits for the inevitable write_systemverilog

13:39 <Lofty> /s

13:40 <whitequark> Lofty: `write_verilog -sv`

13:40 <whitequark> i added it recently

13:40 Yehowshua is now known as BracketMaster

13:40 <Lofty> That I know of

13:40 <whitequark> nmigen's workaround for not having always_comb interacted badly with the way it used yosys, so i had to stuff the workaround into yosys

13:40 <whitequark> i mean

13:41 <whitequark> what would write_systemverilog do if not that?

13:42 <Lofty> Well, the idea is that you can more naturally use SV constructs where available

13:42 <DaKnig> BracketMaster: I think you miss the point; having firtl is cool, but we're talking about when the requirement is having Verilog.

13:42 <whitequark> iirc firrtl has similar issues in some places

13:43 <BracketMaster> firrtl also has firrtl sim, i wonder if its any similar to cxxsim

13:43 <DaKnig> > in verilog, if either of the operands is unsigned (for something like addition), the result is unsigned as well

13:43 <DaKnig> that's ...

13:43 <DaKnig> a very interesting choice.

13:44 <whitequark> ... the opposite of what everyone wants?

13:44 <DaKnig> if anything, signed+unsigned should be signed

13:44 <whitequark> anyway, now you can see why that PR isn't merged

13:45 <BracketMaster> hah yup

13:45 <whitequark> i'll get that working *someday*

13:45 <BracketMaster> hopefully you won't have to

13:45 <whitequark> but probably not when i need to deliver cxxsim yesterday

13:45 <BracketMaster> verilog will be a memory

13:45 <BracketMaster> a bad memory

13:45 <BracketMaster> **verilog will be but a memory

13:45 <Lofty> Well, so far all the "verilog replacements" haven't been

13:46 <BracketMaster> chisel->firrtl->yosys has seen some use

13:46 <whitequark> verilog will be with us forever, but what's worse, verilog will be with us for a very long time as a netlist interchange format

13:46 <Lofty> ^

13:46 <BracketMaster> especially at UC berkely and SiFive

13:46 <whitequark> nmigen would be irrelevant if it could not be used with vivado and quartus

13:46 <Lofty> whitequark: you can always dump EDIF with LPM instead /s

13:47 <whitequark> Lofty: which of the five or so things vendors call EDIF? :p

13:47 <Lofty> Well, LPM seems to mostly be a Quartus thing

13:47 <Lofty> Or more specifically

13:47 <whitequark> ah

13:48 <Lofty> Quartus still uses LPM for some reason

13:48 <DaKnig> if Verilog is bad as a netlist lang, why not use VHDL

13:48 <DaKnig> ?

13:48 <BracketMaster> VHDL synth tools seem to be few and far between

13:48 <Lofty> ...I think the main reason is a lack of a write_vhdl

13:48 <DaKnig> it's just as popular (in some places its more popular, in others it is less)

13:48 <BracketMaster> and that

13:48 <DaKnig> BracketMaster: that's just nonsense.

13:48 <DaKnig> all the big players support VHDL

13:48 <BracketMaster> ok, ghdlsynth is recent

13:49 <DaKnig> Yosys is recent

13:49 <DaKnig> if you look at big companies and their tools, they support VHDL just as much as Verilog

13:49 <BracketMaster> oh yeah, i've only been doing HDL since 2017 TBH

13:50 <Lofty> Verilog is an easier language to implement, I think, but it's not by any means "good"

13:50 <whitequark> DaKnig: there are no downsides to support VHDL as an option besides the added work

13:50 <whitequark> it's not much better as a netlist exchange format though

13:50 <Lofty> Apparently the GHDL project wants to implement a write_vhdl

13:51 <DaKnig> Lofty: how is it easier to implement exactly?

13:51 <whitequark> like, yes, it's a less bad *language*, but i don't care so much how it is as a language, i care about it as a netlist file format

13:51 <DaKnig> do you mean making a frontend for it? or spitting it?

13:51 <Lofty> A frontend

13:51 <DaKnig> the second would be just as easy as in Verilog

13:51 <DaKnig> ah.

13:52 <DaKnig> well, there's some truth to this

13:52 <Lofty> VHDL files are meaningless if you don't have a frontend

13:52 <whitequark> it's not just as easy

13:52 <whitequark> if you look at yosys' write_verilog, it has been heavily tested with vloghammer

13:52 <DaKnig> whitequark: in VHDL iirc when you add two `unsigned` values it is extended by one bit

13:52 <DaKnig> so that should already solve the issue

13:53 <DaKnig> or well, it does what nmigen does

13:53 <whitequark> i can't really *not* support verilog

13:53 <whitequark> if i support only vhdl then i exclude everyone who'd like to use iverilog or verilator on the nmigen netlists

13:53 <Lofty> DaKnig: remember that nMigen does not directly output Verilog

13:53 <whitequark> sure, this is less relevant now that cxxrtl exists, but it's still very much relevant

13:54 <DaKnig> Lofty: when you spit the output as a "netlist" thing, you are probably doing that because you are using a commercial tool; all of them have VHDL frontends

13:54 <whitequark> you could also be using an open-source tool without a VHDL frontend

13:54 <whitequark> e.g. vtr

13:55 <whitequark> so vhdl-only nmigen would exclude the prjxray flow

13:55 <whitequark> or quicklogic

13:55 <DaKnig> I dont get it; why does the open source community focus so much on Verilog?

13:56 <whitequark> the same reason it focused on C and not Ada

13:56 <DaKnig> C is more popular

13:56 <DaKnig> Verilog is not

13:57 <whitequark> for a long time VHDL didn't even have an equivalent of SVA

13:57 <whitequark> so you had to write formal properties for your VHDL design in SV

13:57 <whitequark> i think it does now

13:57 <DaKnig> SVA?

13:57 BracketMaster has quit [Quit: Leaving]

13:57 <whitequark> systemverilog assertions

13:57 <whitequark> `assert property ...`

13:59 yehowshuaimmanue has joined #nmigen

13:59 <DaKnig> I am not sure about how it has been historically but I did get to use assertions with tools that dont support VHDL 2008 properly

13:59 <DaKnig> https://www.ics.uci.edu/~jmoorkan/vhdlref/assert.html <- looks like this is pretty old

13:59 <whitequark> those are just normal assertions

14:00 <whitequark> i'm talking about formal verification

14:00 <whitequark> what symbiyosys does

14:00 <DaKnig> what's the difference

14:00 <yehowshuaimmanue> I think the reason that Verilog has so much FOSS support is because verilator was the first FOSS verilog simulator

14:00 <yehowshuaimmanue> early 2000s

14:00 <yehowshuaimmanue> Everything kinda just followed

14:00 <DaKnig> I know what formal verification is

14:00 <DaKnig> I assumed that it uses the same construct for asserting stuff as in sim asserts

14:01 <DaKnig> just reading them differently

14:01 <whitequark> nop, lots of new syntax

14:01 <DaKnig> attributing different meaning to them I mean

14:02 yehowshuaimmanue has left #nmigen [#nmigen]

14:02 yehowshuaimmanue has joined #nmigen

14:03 <whitequark> DaKnig: VHDL's version of SVA is called PSL

14:03 <whitequark> take a look at what that does

14:04 <whitequark> anyway, i don't really care *why* VHDL has so little FOSS support; that's not relevant to me as someone shipping nmigen

14:07 <whitequark> the language is more elegant, sure, but also more limited, and still based on the same flawed model of inference from code written for an event-driven simulator

14:08 <whitequark> sorry, "more limited" isn't accurate there. what i meant is that while it's more elegant, that sometimes comes at a cost

14:09 <whitequark> e.g.: no blocking assignments? good. manual clock tree balancing? not good

14:10 <DaKnig> ok fair enough :)

14:10 <whitequark> i definitely draw inspiration from VHDL more than from Verilog

14:10 <whitequark> i mean, pysim and cxxrtl are both based on VHDL simulation semantics!

14:12 <DaKnig> ... wait, does Verilog not have equivalents to wait statements?

14:12 <whitequark> sure does

14:13 <whitequark> i'm talking about this https://insights.sigasi.com/opinion/jan/vhdls-crown-jewel/

14:16 <yehowshuaimmanue> whitequark, just read it

14:16 <yehowshuaimmanue> VHDL events are ordered?

14:16 <whitequark> not all events, but you have two phases in a delta cycle

14:17 <yehowshuaimmanue> so you're guaranteed to only have to evaluate one process after chanaging a signal

14:17 <whitequark> first updates are queued while processes execute. then, updates are all applied at once, with no user code running in between

14:17 <whitequark> hm

14:17 <yehowshuaimmanue> very interesting

14:17 <whitequark> i don't think there's any guarantee about one process?

14:18 <whitequark> two processes can wait on a signal just fine, i think

14:18 <yehowshuaimmanue> sorry, i meant only once

14:18 <yehowshuaimmanue> let me reread, I think i'm conflating two different things

14:20 <yehowshuaimmanue> ok. I think I've got it. signal in VHDL always come before processes, where as in verilog, you can have [signal, process, signal, process]

14:21 <whitequark> yep

14:21 <whitequark> this is both good and bad

14:21 <whitequark> good because you, the HDL author, is freed from the responsibility of ensuring determinism regardless of scheduling

14:22 <whitequark> bad because it makes clock gating circuits (and similar stuff; e.g. DFF-based clock divider) much more annoying to express

14:23 <yehowshuaimmanue> I think I'm willing to sacrifice expressability for guarantees, then again, I always hated writing VHDL

14:23 <DaKnig> > with no user code running in between

14:23 <DaKnig> well that's not quite true

14:24 <whitequark> oh?

14:24 <DaKnig> your variables update in the order they are written

14:24 <DaKnig> many times per process

14:24 <DaKnig> unlike signals they are not queued

14:24 <DaKnig> so if you print some debug stuff via vars, it would print it in between queuing the things for the next cycle

14:24 <whitequark> oh, yeah, i was only talking about signals

14:25 <DaKnig> also prints I think

14:25 <DaKnig> yeah, that's true about signals.

14:29 yehowshuaimmanue is now known as yehowshua

14:34 <yehowshua> I'm guessing the fact that verilog has blocking and non-blocking are the cause for unordered [signal, process] behavior during simulation?

14:35 <whitequark> yeah. specifically that it has blocking assignments

14:36 <yehowshua> But with nMigen's sync/comb, pysim signal and process can be ordered?

14:36 <whitequark> yeah

14:37 <yehowshua> nMigen is definitely more natural to write, so it seems like we get the best of both worlds.

14:37 <whitequark> ideally!

14:37 <yehowshua> any tradeoffs I'm missing?

14:37 <whitequark> clock gating

14:38 <whitequark> m.d.comb += ClockSignal("a").eq(ClockSignal("b") & en) # this introduces a delta cycle

14:38 <yehowshua> ah yes

14:38 <yehowshua> That doesn't seem like idomatic nMigen though

14:38 <whitequark> it's how you write an architecture-independent clock gate

14:39 <yehowshua> Ah

14:39 <yehowshua> Also, something like that probably shows up in AFIFO

14:39 <yehowshua> or really anywhere you cross clocks

14:39 <whitequark> hm, not really

14:39 <whitequark> it's fine if you treat a and b as unrelated domains

14:39 <whitequark> well

14:40 <whitequark> it does show up if you try to treat a and b as in-phase and don't add CDC

14:40 <yehowshua> whats CDC? clock detection

14:40 <DaKnig> wdym by manual clock tree balancing?

14:40 <whitequark> yehowshua: clock domain crossing

14:41 <whitequark> 2FF synchronizers and such

14:41 <whitequark> DaKnig: if you drive one clock signal from another with e.g. a gate in between, you have to manually add dummy delta cycles or their posedges won't actually be in-phase

14:42 <yehowshua> oh!

14:42 <yehowshua> i follow now

14:42 hitomi2507 has quit [Quit: Nettalk6 - www.ntalk.de]

14:48 <yehowshua> Is there an RTLIL spec somewhere? I follow its general syntax, but i'd like to know the nitty-gritty as I start to take a closer look into nMigen's guts.

14:49 <whitequark> take a look at the yosys manual

14:50 <yehowshua> I saw chapter 4.

14:50 <yehowshua> Its only 10 pages

14:50 <yehowshua> Maybe RTLIL is really that simple

14:50 <whitequark> well

14:50 <whitequark> it used to be shorter before i started documenting the more obscure parts of it

14:51 <whitequark> you can guess the general situation here

14:51 <yehowshua> hah, sure

14:51 <yehowshua> If I have obscure questions, I know who to ask

14:51 <whitequark> yup

14:51 <whitequark> and I can update the manual then

14:53 yehowshua has left #nmigen [#nmigen]

14:59 <d1b2> <marble> I'm currently trying to write a board definition script for the colorlight 5A-75B. The ~CS pin is always connected to GND according to https://github.com/q3k/chubby75/blob/master/5a-75b/hardware_V7.0.md#sdram-u29 When I leave out the cs paramter in SDRAMResource or set it to None, the script throws an exception

14:59 <whitequark> SDRAMResource needs to be modified to take this case into account

15:00 <d1b2> <marble> my quick fix was diff diff --git a/nmigen_boards/resources/memory.py b/nmigen_boards/resources/memory.py index b2be757..9e8b1a0 100644 --- a/nmigen_boards/resources/memory.py +++ b/nmigen_boards/resources/memory.py @@ -103,13 +103,14 @@ def SRAMResource(*args, cs, oe=None, we, a, d, dm=None, return Resource.family(*args, default_name="sram", ios=io) -def SDRAMResource(*args, clk, cke=None, cs, we, ras, cas, ba, a, dq, dqm=None, +def

15:00 <d1b2> SDRAMResource(*args, clk, cke=None, cs=None, we, ras, cas, ba, a, dq, dqm=None, conn=None, attrs=None): io = [] io.append(Subsignal("clk", Pins(clk, dir="o", conn=conn, assert_width=1))) if cke is not None: io.append(Subsignal("clk_en", Pins(cke, dir="o", conn=conn, assert_width=1))) - io.append(Subsignal("cs", PinsN(cs, dir="o", conn=conn, assert_width=1))) + if cs is not None: + io.append(Subsignal("cs",

15:00 <d1b2> PinsN(cs, dir="o", conn=conn, assert_width=1))) io.append(Subsignal("we", PinsN(we, dir="o", conn=conn, assert_width=1))) io.append(Subsignal("ras", PinsN(ras, dir="o", conn=conn, assert_width=1))) io.append(Subsignal("cas", PinsN(cas, dir="o", conn=conn, assert_width=1)))

15:00 <whitequark> please use a pastebin

15:00 <d1b2> <marble> ah, sorry

15:01 <d1b2> <marble> https://pastebin.com/G2Ur43Ka

15:01 <d1b2> <marble> because this channel is mirrored to IRC?

15:01 <whitequark> seems about right, please send a PR

15:01 <whitequark> yeah

15:04 yehowshua has joined #nmigen

15:04 <pepijndevos> mirrored to IRC... from where?

15:05 <whitequark> 1bitsquared discord

15:05 emeb has joined #nmigen

15:06 yehowshua has left #nmigen [#nmigen]

15:07 awe00__ has quit [Ping timeout: 240 seconds]

15:20 awe00__ has joined #nmigen

15:59 <d1b2> <marble> hm, would is also have been valid to set cs="-"

15:59 <d1b2> <marble> ?

16:01 <whitequark> marble: we don't have such a syntax

16:02 <d1b2> <marble> Ah. I saw it in the connector definition of one board

16:02 <d1b2> <marble> As part of the pin list

16:02 <whitequark> it does exist in connectors

16:02 <whitequark> but not in resources

16:03 <d1b2> <marble> Ok. Thanks 🙂

16:29 chipmuenk has quit [Ping timeout: 240 seconds]

16:31 <DaKnig> should I write testbenches in python classes/functions or as Elaboratables?

16:35 <whitequark> depends on the kind of testbench you want, really

16:35 <whitequark> usually it'd be the former

17:03 <d1b2> <marble> Is it ill-advised to use -ignore_error is the svf command?

17:04 <d1b2> <marble> *in

17:04 <daveshah> Yes

17:04 <daveshah> How is it failing?

17:05 <d1b2> <marble> The first TDO, that seems to check the IdCode, mismatches and read almos all f

17:06 <daveshah> Can you post the full OpenOCD output?

17:08 <d1b2> <marble> great. now it worked without the ignore 😄 ... somtimes it works. I used an STM running versaloon atm. maybe that's a bit unstable. i run until it fails. one sec

17:08 <d1b2> <marble> http://ix.io/2vYc

17:10 <daveshah> That's a CRC error according to the decode_status_reg tool (https://github.com/Spritetm/hadbadge2019_fpgasoc/blob/master/decode_status_reg.c)

17:11 <d1b2> <marble> ah, wait. I also had the IdCode thing. I didn't pay attention to all error log ^^'

17:11 <daveshah> Sometimes, I find a pulldown resistor or very small capacitor (10pF ish) on TCK can improve marginal JTAG links

17:12 <d1b2> <marble> Ok. I'll try that 🙂

17:12 jeanthom has joined #nmigen

17:40 jeanthom has quit [Ping timeout: 256 seconds]

19:14 jeanthom has joined #nmigen

19:38 smkz has quit [Quit: reboot@]

19:39 smkz has joined #nmigen

20:35 chipmuenk has joined #nmigen

20:56 Asuu has quit [Remote host closed the connection]

21:02 <DaKnig> lkcl_: I remember you linked me to a good tutorial for simulation; I cant find it now. mind sending that again?

21:02 <DaKnig> (lkcl or anybody else)

21:03 <DaKnig> what does `yield signal` return? a python int?

21:14 jeanthom has quit [Ping timeout: 240 seconds]

21:29 chipmuenk has quit [Quit: chipmuenk]

21:37 <cr1901_modern> whitequark: Random thought that may or may have not been brought up before: do you think it's possible as part of, maybe nmigen-stdio to have a "sanity test" gateware for testing that I/O works properly on new platforms _quickly_ before putting them into nmigen-boards?

21:37 <lkcl_> DaKnig, ehh.... it *might* have been Robert Baruch's one?

21:37 <cr1901_modern> Something like testing DRAM quickly, since leds and switches and buttons are easy

21:38 <lkcl_> DaKnig: yes, it'll return whatever value is in that nmigen Signal, at the precise moment in time in the Simulation.

21:40 <DaKnig> as a Python int?

21:40 <lkcl_> DaKnig: yes. although i am not sure what happens (what is returned) if you create a Signal(Enum)

21:41 <lkcl_> you'd have to experiment.

21:42 <DaKnig> thankfully I prefer magic numbers :)

21:42 <DaKnig> (not really please dont kill me)

21:42 <lkcl_> :)

21:43 <lkcl_> i keep creating magic constant classes, forgetting about Enums, too :)

21:58 <TiltMeSenpai> lkcl_: I'm pretty sure when you create a Signal(Enum), it uses the Enum's int mapping

21:58 <TiltMeSenpai> it only adds like the String decoding in the VCD files/Sim

21:59 <DaKnig> when it's already a Signal, it only tracks the width of it and signedness I think

21:59 <DaKnig> ah really? it shows the strings?

21:59 <DaKnig> that's cool

21:59 <TiltMeSenpai> in VCD's, yep

21:59 <TiltMeSenpai> Python's Enums low key suck

22:00 ademski has quit [Ping timeout: 240 seconds]

22:02 <awygle> i think it would be great to have models in nmigen-stdio for testing, cr1901_modern

22:06 <lkcl_> TiltMeSenpai: ahh yes that makes sense

22:06 <lkcl_> DaKnig: except if you run vcd2fst, then it doesn't preserve the string mappings

22:07 <lkcl_> i found that out when doing "repairs" to litex vcd files using vcd2fst

22:07 <cr1901_modern> Like, I want to add Cora Z7 to nmigen. And I'd like to eventually do my own DRAM controller for fun (for some metric of fun). But I don't want to write one for testing

22:08 <cr1901_modern> Just one that verifies that I created a reasonable UCF

22:08 <DaKnig> lkcl_: vcd2fst the commandline tool you mean, right?

22:09 <lkcl_> DaKnig: yes. it's a binary alternative to vcd ascii files, and takes up a *lot* less space

22:09 <DaKnig> yeah ik

22:09 <DaKnig> I used that

22:09 <lkcl_> oh cool

22:09 <DaKnig> that's a bummer

22:10 <DaKnig> does fst not have this construct?

22:10 <lkcl_> i ran into difficulties with litex sim vcd and fst output. converting "solved" the problem

22:10 <DaKnig> or is that a limitation of the tool?

22:10 <lkcl_> honestly don't know. it could well have just been that it was because i had to use verilog (to get into litex) that "lost" the string information

22:11 <lkcl_> i didn't investigate: i just put up with it :)

22:11 <DaKnig> I should do this more :)

22:11 <lkcl_> lol

22:12 * lkcl_ HA! finally, after *three weeks* i have parity with a microwatt "randomly-generated" unit test in libresoc

22:13 <lkcl_> unnnbelievable. 3 weeks of examining side-by-side diffs of running 32 *THOUSAND* POWER9 instructions

22:14 <DaKnig> I think it would be cool to have vcd output starting at a certain point

22:15 <DaKnig> so in cases like this, you would write a program that checks the 1000 cycles before the first diff

22:15 <lkcl_> yyyeah litex sim.py allows that.

22:15 <DaKnig> did you really need all the 32k inst. to get the bug?

22:15 <lkcl_> it's one of the command-line options. yeah it would be really useful

22:16 <DaKnig> for some reason I doubt

22:16 <lkcl_> anton blanchard wrote a POWER9 "random instruction generator". some of them are illegal instructions

22:16 <lkcl_> by running it i've found.... about.... 15 separate and distinct bugs, minimum

22:17 <lkcl_> or, no, that's not true. they're not illegal instructions: they're instructions where you're required to ignore certain flags in some circumstances, such as mulhw

22:18 <lkcl_> mulhw does *not* have an "overflow" variant (mulhwo) in POWER9

22:18 <DaKnig> I work with the principle "solve the first issue; rerun tests"

22:19 esden has quit [Ping timeout: 272 seconds]

22:19 <lkcl_> well in this case, that's absolutely critical, because at the very first error, the register files are, obviously, corrupted

22:19 <DaKnig> I would have given up looking at those traces and would have generated a few new tests after fixing the first issue

22:19 esden has joined #nmigen

22:19 <lkcl_> oh i definitely gave up looking at the traces :)

22:20 <lkcl_> instead i got litex sim.py to create log files dumping the regfiles

22:20 <lkcl_> used that to identify the instruction that was wrong, created a unit test for it and then specifically debugged the traces for that (small, one) unit test

22:21 <DaKnig> what'd I get if I `yield` a Record in sim?

22:21 <lkcl_> repeat that for 3 weeks solid and you create a *lot* of unit tests :)

22:21 <lkcl_> it will concatenate the constituents as if it was a contiguous block of bits

22:21 <lkcl_> and return that as an int for you

22:21 <DaKnig> :(

22:22 <DaKnig> getting a class that was generated on the spot would have been cooler (but more expensive ofc)

22:22 <lkcl_> so a Record containing a 4-bit value and a 1-bit value will return an int which, if you do bin(value), 0bNMMMM, M will be the 1st item of the record, N will be the 2nd

22:22 <lkcl_> you can just do "yield therecord.thememberoftherecord"

22:23 <lkcl_> you're not forced into a situation where you have to yield the record then *manually* unpack the bits using python int mask-manipulation if that's what you're thinking

22:23 <DaKnig> yeah I know that

22:23 <lkcl_> ok :)

22:24 <lkcl_> just checking

22:24 <DaKnig> that would require a few yields toh

22:24 <DaKnig> tho

22:24 <DaKnig> obv that's quite slow

22:24 <DaKnig> I am making a VGA tb

22:24 <DaKnig> (a very simple one)

22:24 <lkcl_> yes. i don't have a problem with that, but i am not counting on pysim being "very fast"

22:24 <lkcl_> nice

22:24 <DaKnig> I wanted to minimize the amount of control flow

22:25 <awygle> lkcl_: vcd2fst is based on the same gtkwave code, right? afaik that code is the only fst implementation

22:25 <lkcl_> sounds like a case of "premature optimisation" :)

22:25 <lkcl_> awygle: 1 sec let me run dpkg -l gtkwave

22:25 <DaKnig> pysim, without any process attached, under pypy3 finishes a frame in a reasonable amount of time

22:25 <lkcl_> awygle: answer yes it's part of gtkwave

22:25 <DaKnig> by that I mean "a few minutes"

22:26 <DaKnig> with a process that ought to be much slower

22:26 <DaKnig> esp since I would have to run that many many times

22:26 <DaKnig> might even use Cython on that problem if things get too dire

22:27 <DaKnig> well... that wouldnt help much.

22:27 <lkcl_> DaKnig: well, this is what cxxrtl is for, to give that speedup without sacrificing readability

22:27 <DaKnig> guess cxxrtl would be better

22:27 <DaKnig> ah you said it already :)

22:27 <DaKnig> I read that it has some issues though

22:27 <DaKnig> differences from pysim?

22:28 <lkcl_> DaKnig: mmm.... cython... yyyeah, i vaguely recall a conversation with whitequark about it, 2 years ago. the conclusion was - i think - "cython wouldn't help"

22:28 <lkcl_> cxxrtl's goals are to be *exactly* a drop-in replacement for pyrtl. 100% fully compatible.

22:29 <DaKnig> yes cython would not help here

22:29 <DaKnig> well goals are one thing; I have reality facing me right now.

22:29 <lkcl_> as in: if cxxrtl doesn't generate the same results as pysim, that's a show-stopping bug.

22:29 <lkcl_> well... you could always try cocotb.

22:29 <DaKnig> cython falls back to the "python way" way too often

22:30 <lkcl_> it compiles code under verilator then annotates it sufficient to interact with it from python. it's... conceptually similar to cxxrtl.

22:30 <lkcl_> staf from chips4makers uses it (with nmigen).... 1sec...

22:30 <lkcl_> https://gitlab.com/Chips4Makers

22:30 <DaKnig> I dont wanna waste time learning a new tool. I dont have that extra time right now.

22:31 <lkcl_> apologies, _one_ of those repos has...

22:31 <DaKnig> no amount of "but its simple!" would help there.

22:31 <lkcl_> ok that answer saves me some time :)

22:31 <DaKnig> thanks for the suggestion

22:31 <DaKnig> I appreciate that

22:32 <lkcl_> i was going to say that staf's code would help show you how to use cocotb with nmigen.

22:32 <lkcl_> i'm in a similar situation with the microwatt-libresoc side-by-side comparisons

22:32 <lkcl_> not all registers are accessible in microwatt simulations

22:33 <lkcl_> so i have to *learn VHDL* in order to add the code needed to access those registers so that i can see them and compare against...

22:33 <lkcl_> urrr :)

22:37 <DaKnig> does GHDL not show internal signals?

22:37 <DaKnig> in the VCD

22:40 <lkcl_> there's a bit of a problem with GHDL. records are converted to flattened bit-arrays.

22:41 <Chips4Makers> Main reason why I needed to use cocotb is that I used VHDL/Verilog code in wrapped in nmigen using Instance. pysim can handle only pure nmigen. Also was already using cocotb with omigen before going to nmigen; have to admit pysim did not click with me yet although I used it in some tests I did for nmigen-soc code.

22:41 <lkcl_> after learning that i didn't even bother looking at vcd files generated from GHDL. or, i did, once, to confirm it.

22:41 * lkcl_ waves to Staf

22:41 <lkcl_> Chips4Makers: do you happen to know, is it faster than pysim?

22:42 <lkcl_> obviously, it is c code (because it's verilator), but does the interaction through python result in a slowdown that makes it effectively no different in speed from pysim?

22:45 <Chips4Makers> @lklc_ It will depend on simulator you use. I did use it with GHDL for VHDL code, iverilog for Verilog code or (32bit) ModelSim for mixed RTL code. I did not do big runs with it, so was not so worried about speed, otherwise would certainly have looked at verilator for Verilog simulations.

22:46 <DaKnig> I have a testbench process; how can give it arguments?

22:46 <lkcl_> Chips4Makers: thank you

22:47 <lkcl_> DaKnig: do you have some code online anywhere? there's a couple of techniques, i can point you at some example code, 1 sec

22:47 <Chips4Makers> @lklc_ Yes, read on gitter that main thing for optimizing speed is to limit round trips from simulator to python and back.

22:47 <Chips4Makers> For example not too wise to generate your clock in cocotb.

22:48 <lkcl_> DaKnig: https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_inout_mux_pipe.py;hb=HEAD

22:48 <DaKnig> https://paste.debian.net/1162366/

22:48 <lkcl_> Chips4Makers: that makes sense

22:48 <lkcl_> DaKnig: lol. _that_ level of TODO :)

22:48 <DaKnig> ?

22:48 <DaKnig> no

22:48 <DaKnig> I have the code

22:49 <DaKnig> it *should* work

22:49 <DaKnig> I think

22:49 <DaKnig> I just dont think pasting it would help much

22:49 <lkcl_> ok so the example link above, you can see i created a class, called InputTest

22:50 <lkcl_> it takes the "dut" as an argument in its constructor, storing it for later use

22:50 <lkcl_> however the key here is the fact that i pass in python arguments into the processes - inputtest.send() and inputtest.recv()

22:51 <lkcl_> which *change* the behaviour of the test.

22:51 <lkcl_> probably the most important thing for you is the creation of the class itself

22:52 <lkcl_> you can pass in vga_mode, hsync, vsync, color, as arguments in the constructor of the class (along with the dut)

22:52 <DaKnig> what's dut?

22:53 <ktemkin> Device Under Test

22:53 <lkcl_> convention "device under ..." lol beat me to it, ktemkin :)

22:53 <DaKnig> I was thinking about just passing it the signals it should care about

22:53 <lkcl_> there's another technique which Jacob used

22:54 <DaKnig> wait; because it's a generator, doing `a=foo()` doesnt run it, but just returns an iterator which I can pass to nmigen..?

22:54 <lkcl_> ahh you have to watch out for that. i can't remember the exact details. it's something like... if you try to do "comb +=" or "sync +=" *after* you've created the Simulation object, you hose the internal state.

22:55 <lkcl_> basically, yes

22:55 <DaKnig> I dont do either

22:55 <DaKnig> that testbench doesnt really care about the nmigen side besides reading the values and putting them on screen

22:56 <lkcl_> look again at the file https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_inout_mux_pipe.py;hb=HEAD

22:56 <lkcl_> line 217 declares an instance of the dut (device under test)

22:56 <lkcl_> line 223 declares the object that *tests* that file

22:56 <lkcl_> sorry, tests that dut

22:57 <lkcl_> i need to find you a better example

22:57 <DaKnig> I saw that

22:58 <DaKnig> I think this is a perfect example. in 224:228 you send the functions with the right arguments you wanted

22:58 <lkcl_> try this one https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/test/test_compalu_multi.py;hb=HEAD

22:58 <DaKnig> with ()

22:58 <lkcl_> yes but unfortunately it's not been updated in... a year, and run_simulation() is deprecated

22:59 <lkcl_> cesar[m] wrote the test_compalu_multi.py much more recently and it uses the latest nmigen simulation API

23:00 <lkcl_> yeah, that's a better example. you can see several different techniques, there.

23:00 <DaKnig> what I said is on hte python level; not on nmigen's side.

23:00 <DaKnig> lemme test that.

23:00 <lkcl_> simple "function" based simulation tests, like scoreboard_sim_fsm()

23:01 <lkcl_> but if you *really* want to go for multiple processes, you want the CompUnitParallelTest() class

23:02 <lkcl_> i leave it with you, it's midnight here now

23:02 <DaKnig> thanks

23:09 cr1901_modern has quit [Quit: Leaving.]

23:10 <DaKnig> add_sync_process allows reading multiple values between clocks, right?

23:11 <DaKnig> asking because it looks like its `wrapper` `yield`s a `Tick`

23:12 <lkcl_> DaKnig: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/test/test_compalu_multi.py;hb=HEAD#l418

23:18 lkcl__ has joined #nmigen

23:21 lkcl_ has quit [Ping timeout: 246 seconds]

23:26 <DaKnig> I get this error when running my design : https://paste.debian.net/1162369/

23:26 <DaKnig> what'd I do wrong?

23:26 <DaKnig> (that's just partial code ofc)

23:32 cr1901_modern has joined #nmigen

23:38 <DaKnig> nvm; really dumb mistake. using the same name for diff things.

23:58 emeb has quit [Quit: Leaving.]