ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at · logs at
smkz has quit [Quit: smkz]
smkz has joined #nmigen
smkz has quit [Client Quit]
smkz has joined #nmigen
Degi has quit [Ping timeout: 272 seconds]
Degi has joined #nmigen
zignig_ is now known as zignig
jeanthom has joined #nmigen
Asu has joined #nmigen
hitomi2500 has joined #nmigen
<_whitenotifier-f> [nmigen-boards] igrr opened issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
<_whitenotifier-f> [nmigen-boards] whitequark commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
<_whitenotifier-f> [nmigen-boards] whitequark commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
<_whitenotifier-f> [nmigen-boards] igrr commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
hitomi2500 has quit [Quit: Nettalk6 -]
<_whitenotifier-f> [nmigen-boards] whitequark commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
hitomi2500 has joined #nmigen
hitomi2500 has quit [Read error: Connection reset by peer]
hitomi2500 has joined #nmigen
<_whitenotifier-f> [nmigen-boards] igrr commented on issue #38: Add Digilent Genesys2 board. -
<_whitenotifier-f> [nmigen-boards] igrr commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
<_whitenotifier-f> [nmigen-boards] whitequark closed issue #38: Add Digilent Genesys2 board. -
<agg> whitequark: re adding probes in sim: I've pretty much always dumped all state because it's generally been much quicker to do that than work out what I need, run sim, find out I was wrong, add more state, re-run, repeat
<agg> in general I've also only cared about build+run times for sim, since mostly I'm building it once then running it once then re-building and re-running
<agg> so total build+run time for dumping all state could be 10x worse than dumping just a few bits of state and I'd still take it for not having to iterate a few times, at least in the scale of design I've been testing so far
<whitequark> agg: but if the sim runs at near 100%, simulating a real CPU at half a MCPS...
<whitequark> compared with something like 30-50 times slower when writing all state
<agg> all my testing so far is of large functional blocks rather than the entire system and none of it has included significant soft cores, so that will colour my results
<whitequark> do you need cxxsim for those?
<agg> I imagine there are people who build the sim of their whole system then run the same stuff multiple times with different runtime inputs?
<agg> yea
<whitequark> is pysim too slow but cxxsim "too fast"?
<agg> or rather I've needed verilator for them in the past
<whitequark> right
<whitequark> by the way, can verilator dump all state?
<agg> specific use case that comes to mind was a large order decimating DSP filter
<agg> need to run 10000s of cycles to even flush the FIRs
<agg> so it's not all that much logic but i want to run it for a long time
<whitequark> oh i see
<whitequark> btw can you try that on cxxrtl?
<agg> I did, it worked great, that's where i had some comparison numbers a while back
<whitequark> ohh
<agg> re regilator, I've only ever used it in "dump all state" mode, not sure it does anything else
<agg> verilator*
<whitequark> I mean, is it actually all state in the sense of all public wires?
<agg> e.g. VerilatedVcdC* tfp = new VerilatedVcdC; top->trace(tfp, 99); tfp
<agg> tfp->open("trace.vcd"); while(;;) top->dump(); tfp->close();
<agg> that sort of thing
<whitequark> or just all of the state it can offer to you for dumping?
<agg> hmm, it's every signal I've written in nmigen
<agg> I don't select any state for it to dump, I just tell it to "dump" at every timestep and it writes a vcd with all the signals in
<agg> where "all the signals" means every wire and reg, I suppose
<whitequark> fascinating
<whitequark> how much faster is it in that mode compared to a mode where you don't do that?
<agg> with no dumping at all it's 42ms, with dumping all state it's 682ms, for this particular setup I had loaded
<whitequark> so 16x slower, more or less
<whitequark> can you compare it with cxxrtl?
<whitequark> master should let you dump some state but not all state, so the difference should be smaller if i'm doing things right
<whitequark> i'm also working on an approach closer to what verilator does
<whitequark> what i *think* it does, anyway
<agg> I have some old testing results from 2020-04-22 where cxxrtl took 1.8x the execution time of verilator with neither doing any vcd
<agg> over 5M clocks in this case
<whitequark> was that using any hacks? like assigning posedge_clk
<agg> I've still got that cxxrtl test rig but i'm in the office today and this pc doesn't have yosys built
<whitequark> hey, i can build a wasm binary for you :p
<agg> I think that was without void_my_warranty
<agg> I can build yosys master here. what do I need to change in my c++ to dump a vcd?
<whitequark> actually, hold on, for 5Mcycles that's a bit annoying
<whitequark> this might work better
plaes_ has quit [Quit: Reconnecting]
plaes has joined #nmigen
plaes has joined #nmigen
plaes has quit [Changing host]
<whitequark> sorry, no, that's broken
<agg> it looks like "make install" in yosys is not copying cxxrtl_capi.h into PREFIX/yosys/include/backends/cxxrtl ?
<whitequark> ugh i forgot about that
<whitequark> i'll fix it in the next PR
<whitequark> please copy it manually for now
<agg> ah, also need to copy cxxrtl_vcd.h and include it
<agg> atm the autogenerated cpp seems to include the other files, should it also include vcd?
<whitequark> but you could also just compile it as a separate TU and link
<whitequark> it's a thin wrapper
<agg> ok, with same 5M cycles and same stimulus, verilator dumping all state takes 8.37s and yosys latest dumping whatever it's dumping takes 5.47s
<agg> verilator's vcd is 1.5G while yosys' is 533M
<whitequark> yep, that sounds very much about right to me
<whitequark> maybe on the impressive side, since verilator has vcd tightly integrated into the core AFAIK
<whitequark> and you could just write an FST dumper using nothing than public interfaces and stable(ish) ABI
<agg> and with neither dumping any state but otherwise same setup, verilator 323ms, yosys 709ms
<whitequark> so it regressed a bit
<whitequark> curious
<agg> there was quite a bit of variation depending on compiler flag and clang vs gcc when i was testing before
<whitequark> i think cxxrtl works best with clang
<agg> this yosys result is clang++ -flto -O3
<whitequark> no idea if -O3 is doing anything besides slowing down compiles
<whitequark> it's possible there's no improvement over -O2
<agg> O2 gives 747ms
<agg> (vs 709)
<whitequark> huh
<whitequark> by "not dumping any state" you mean you commented out all the vcd bits right?
<agg> yes
hitomi2500 has quit [Quit: Nettalk6 -]
<agg> and the debug bits and fstream bits
<whitequark> okay cool. any warnings in the yosys log?
<agg> nnone
<agg> vcd seems to contain all the signals I ever want to look at, I suspect verilator is being penalised by not flattening so outputs of some modules are also showing up as inputs of others?
<agg> verilator vcd does have the module hierarchy since it's not flattened, though
<whitequark> ok so cxxrtl vcd will have the module hierarchy after flatetning
<whitequark> you can merge into your local branch
<whitequark> and then hierarchy will magically appear in the VCD
<whitequark> *but* when you have a signal present exactly the same on multiple hierarchy levels you'd just see it on one, often not what you expect
<whitequark> my plan is to first fix that
<whitequark> since it's truly zero cost
<agg> heh, this poor desktop was not made for building yosys
<agg> sweet, just as promised the vcd magically has all the hierarchy
<agg> whitequark: hmm, it looks like it suffers from
<whitequark> oh
<whitequark> i discovered that's not actually an issue
<whitequark> oh hm
<whitequark> so there's an easy fix
<whitequark> instead of: top.debug_info(debug);
<whitequark> write: top.debug_info(debug, "\\top ");
<agg> yep, that fixes it
<agg> perhaps it could be the default? it's pretty annoying to not have a 'top' module in the gtkwave tree, makes it quite non-obvious how to get at top-level signals
<agg> (I know after you pointed it out in #23 there that i can click a module again to unselect it)
<whitequark> i feel like this is more of a gtkwave issue but i'm not strongly opposed
<whitequark> so, do you find the hierarchical output sufficient, or do you think it's unusable without alias tracking?
<agg> it seems completely fine to me
<agg> if anything i prefer not having the duplicates
<whitequark> interesting
<agg> this is a very hierarchial design where it's pretty obvious where all the inputs come and go, it might be more annoying in designs where that stuff is less apparent
<agg> like i have lots of repeats of lots of units making up a big tree
<whitequark> okay, that tells me it might be useful to have it as a separate -g level
<whitequark> like -g1 is just the non-optimized public wires, -g2 is non-optimized public wires plus aliases
<agg> sure, makes sense. I'm not too precious either way and don't know which is a more useful default
<agg> but i imagine most people will just use the default behaviour anyway?
<whitequark> let me try to add aliases and see what you think about it
<whitequark> on Minerva I found the lack of alias tracking fairly limiting, though I should get actual stats on that
<agg> the vcd being 1/3 the size compared to verilator sure helps with opening it in gtkwave
<agg> but compare, yosys vs verilator:
<agg> perhaps i see now why verilator's vcd is 3x the size, hah
<whitequark> ohhhhhh
<whitequark> so one problem with verilator is that it even dumps the $-wires
<whitequark> which cxxrtl won't ever consider, not even at highest possible debug level
<agg> I'm pretty sure I never want to look at them either, so...
<agg> updated with new screenshot of \seq which contains a hundred signals
<agg> basically all useless
<whitequark> yeah so this is actually an artifact of how nmigen wroks
<whitequark> or perhaps nmigen's verilog workflow
<whitequark> it's not fair to verilator, in the sense that handwritten verilog wouldn't result in that
<agg> sure
<agg> right, I have to go hit some things with spanners for a bit, thanks for the vcd help! happy to try out branch with aliases later or give other performance numbers
<whitequark> thanks! will have it in a bit
hitomi2500 has joined #nmigen
<whitequark> awygle: lays the groundwork for using hierconn for probe insertion
<whitequark> like, in a robust way
<Sarayan> wq: if I want to play with your latest and greatest, which branch of which repo should I use?
<whitequark> uhhhhhhhhh
<Sarayan> Or should I wait a little for things to land?
<Sarayan> I have a big enough todo list to wait
<whitequark> flatten-hdlname of whitequark/yosys is good
<whitequark> but the most interesting stuff is still not pushed
<whitequark> well
<whitequark> the basic VCD support is working nicely
<Sarayan> the c++ api is there?
<whitequark> yep, and the c api too
<whitequark> (i do c++ first)
<Sarayan> FUCK
<whitequark> eh?
<Sarayan> ah good, thanks git
<Sarayan> managed to "undelete" the slang stuff I was working on for a while
<Sarayan> until I realized that I really need to understand verilog better before I can actually do it
jeanthom has quit [Remote host closed the connection]
jeanthom has joined #nmigen
<_whitenotifier-f> [nmigen-boards] igrr commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
<_whitenotifier-f> [nmigen-boards] igrr commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
<_whitenotifier-f> [nmigen-boards] whitequark commented on issue #63: UARTResource flow control signals direction mismatch for ICEStickPlatform -
<Degi> How can I tell nextpnr which frequencies clock domains run at
<whitequark> via nmigen or do you want to know specifically nextpnr syntax?
<Degi> Via nmigen
<whitequark> do you create a clock domain via SERDES or PLL or sth?
<Degi> Hm yes via serdes, though I know the frequency
<Degi> nextpnr seems to think its 12 MHz
<whitequark> platform.add_clock_constraint(clk_signal, 125e6)
<Degi> Thanks!
<awygle> morning
<Degi> Heya
SpaceCoaster_ has joined #nmigen
SpaceCoaster_ has quit [Remote host closed the connection]
SpaceCoaster has quit [Quit: ZNC 1.6.5+deb1+deb9u2 -]
SpaceCoaster has joined #nmigen
SpaceCoaster_ has joined #nmigen
SpaceCoaster_ has quit [Client Quit]
<whitequark> run yosys with -g so that cxxrtl shows some statistics about debug info
<whitequark> also don't forget to check vcd size before and after
hitomi2500 has quit [Quit: Nettalk6 -]
<agg> whitequark: before and after both 533M
<agg> Debug information statistics for module top:
<agg> Member wires: 47
<agg> Alias wires: 57
<agg> Other wires: 97 (no debug information)
<whitequark> yup
<agg> looks nice
<whitequark> so you got 57 new items in hierarchy for free
<whitequark> they're actually aliased in the vcd
<agg> actually very manageable number of aliases, no complaint about this
<whitequark> check it
<agg> yea, that's very neat
<whitequark> another thing i'm going to do is to handle constants
<whitequark> ie wires which are just hardwired to a constant
<agg> this looks great as is, I don't think I'd actually want to not have the aliased signals
<whitequark> sweet :D
<agg> yea, it's nice that now the mem port is available as both coeff_mem_r_data and mac_b or whatever, before you only saw it under one name
<agg> looks good!
<whitequark> :)
<whitequark> constants should be even cheaper, let me quickly whip something up
<whitequark> it's handy because gtkwave will remove traces that disappear
<whitequark> so if you tie something to a constant for testing... that trace will be gone
<whitequark> after you refresh
<awygle> wq, have you explored any of the non-vcd waveform formats?
<whitequark> nope
<whitequark> i just thought, well, i can't avoid vcd
<whitequark> might as well get it done with
<whitequark> can always add other formats later
<awygle> mhm vcd is the obvious starting place
<whitequark> cxxrtl has a fully modular dumper
<whitequark> cxxrtl_vcd doesn't do anything that your code can't do
<whitequark> holy shit
<whitequark> Const wires: 952
<whitequark> Alias wires: 527
<whitequark> Member wires: 237
<whitequark> Other wires: 331 (no debug information)
<whitequark> minerva
<Lofty> ?
<Lofty> I just got back and am missing context
<whitequark> most wires in minerva trivially reduce to constants or other wires
<whitequark> so generating debug information for them is not exactly hard
<Lofty> Wow, yeah
jeanthom has quit [Ping timeout: 272 seconds]
<whitequark> wait
<whitequark> i think that's a bug
<whitequark> Const wires: 97
<whitequark> Alias wires: 527
<whitequark> Other wires: 1186 (no debug information)
<whitequark> Member wires: 237
<whitequark> ok that's a lot closer to reality
<whitequark> still, a pretty nice amount of wires
alexhw has joined #nmigen
<whitequark> agg: can you recheck pls
<awygle> whitequark: where did you land on this debuginfo thing? are you still pursuing the zero cost approach, or going with deopt, or something else?
<whitequark> awygle: going to have to evaluate various solutions
<whitequark> reevaluate zero cost solution with removed alias/const wires
<whitequark> see whether zero cost solution for only elided wires (but not more complex cases) will work
<whitequark> see if i can take advantage of additional assumptions valid in debug info generation but not normal evaluation
<whitequark> see if i can stuff it all into a separate class to reduce the amount of necessary lambdas
<whitequark> see just how expensive deopt is
<awygle> i see
<awygle> is there a reason i'm missing that the lambdas have to be compiled at the same time as the general sim? or can they be compiled only when the signal they represent is probed?
<whitequark> the thing is i don't want to have to do `raise SignalOptimizedOut` in cxxsim
<awygle> mm, ok
<whitequark> if you can afford rebuilding to insert probes you should just use (*keep*)
<awygle> well that's not exactly what i meant
<whitequark> what is the difference wrt your solution?
<whitequark> the keep might be inserted by cxxsim
<awygle> in the solution i'm picturing the whole sim is built (with LTO or whatever) except the rematerialization lambdas, which are emitted (someplace), and then when you insert probes you compile only the used lambdas and (re-)link them with the sim (but not recompile the sim)
<awygle> that may not be possible in the cxxsim case, idk
<whitequark> good luck getting this to work on windwos
<whitequark> well
<whitequark> the problem is that right now the design knows how to rematerialize everything within itself
<whitequark> with your solution it... doesn't
<whitequark> you need a ton of infrastructure to coordinate everything and frankly i have no idea how that infrastructure would look or work
<whitequark> do you emit 1000s of C++ sources with a single lambda in each every time you make a sim?
<awygle> that does seem problematic. and c++ doesn't have sub-file compilation units.
<whitequark> the answer is probably to just not make that many lambdas
<awygle> sure
<awygle> my next step is "basically reinvent dwarf, so you can create used lambdas after the sim has been compiled already from some kind of serialization format"
<awygle> which is getting to be a stretch (he says with irony)
<whitequark> but ... that defeats the point of using lambdas
<whitequark> the only reason i'm using lambdas is so that i can erase the type of value<N> with std::function
<whitequark> other than that i could just use a, like, function
<whitequark> it's not even really clear what the problem with lambdas *is*
<whitequark> i think it might be "too many functions" in which case it probably makes sense to emit like... one lambda.
<awygle> mhm
<awygle> i getcha
* whitequark shrugs
<whitequark> i'll decide it when i have info to decide
<whitequark> don't want to speculate
<awygle> yup
<agg> whitequark: what branch?
<agg> oh, yosys master now?
<whitequark> yeah
<agg> one of my "other wires" has become a "const wire" :p
<whitequark> heh
<whitequark> awygle: well, i think the way to go is deopt
<whitequark> baseline: 625 kCPS, deopt localized: 460 kCPS, deopt elided: 310 kCPS
<whitequark> (without probes)
<whitequark> and good compile time too
<whitequark> this very clearly beats everything else i can do
<_whitenotifier-f> [nmigen-boards] igrr opened pull request #64: resources: distinguish "dte"/"dce" roles of UARTResource -
<_whitenotifier-f> [nmigen-boards] igrr opened pull request #65: boards: fix UART flow control pin inconsistencies -
<_whitenotifier-f> [nmigen-boards] igrr commented on pull request #64: resources: distinguish "dte"/"dce" roles of UARTResource -
Asu has quit [Ping timeout: 256 seconds]
<awygle> that does seem pretty good
mwk has quit [Ping timeout: 272 seconds]
jeanthom has joined #nmigen
jeanthom has quit [Ping timeout: 260 seconds]
<TD-Linux> are the wishbone interface in nmigen_soc ready for use? or should I copy it into my project?