<lkcl>
that'll need a cache to properly resolve, if tracing is to be kept. normally, tracing is something that's only enabled explicitly
<lkcl>
it reduces the strace completion time by 40%.
<whitequark>
this is completely irrelevant. i'm well aware that tracing is causing the slowdown, that's in the issue.
<lkcl>
why would you say "it doesn't"?
<whitequark>
the question is not whether it should be enabled; the question is how to make it fast
<lkcl>
i don't understand.
<lkcl>
i just said: cache the lookups.
<lkcl>
"lkcl> that'll need a cache to properly resolve"
<whitequark>
so what's the cache key?
<whitequark>
(aside from that: i don't see why looking up file*name* should issue any syscalls in the first place)
<lkcl>
both those will involve looking at the source for the traceback module
<lkcl>
1 sec
<whitequark>
i see
<whitequark>
it's linecache
<whitequark>
ironically, the cache is slowing things down...
<lkcl>
that's interesting. there's something called "StackSummary" and it has a cache already
<whitequark>
ok, i know how to fix them
<lkcl>
aw doh
<whitequark>
give me a sec
<lkcl>
cool. glad you were able to spot it, fast.
<whitequark>
what worries me is you say it's only 40%
<lkcl>
it was only a short program
<lkcl>
1 sec let me run a longer one
mauz555 has joined #m-labs
<mtrbot-ml>
[mattermost] <sb10q> @astro thermostat no longer compiles after updating rust (https://nixbld.m-labs.hk/build/11182/nixlog/1). I could pin the rust compiler version, but it might be better to just update thermostat...
<lkcl>
tracer hacked to return "dummy" real 0m5.559s
<whitequark>
sb: cargo update smoltcp
<whitequark>
will fix it
<whitequark>
that was a bug in smoltcp that earlier versions of rust did not find
<whitequark>
because of a bug in the pre-MIR borrowck
<lkcl>
without hacks: real 0m6.174s
<lkcl>
hmmm...
<whitequark>
yes. still very slow. i suspect some fundamental rework will have to be applied to FragmentTransformer, but i'm not sure yet
mauz555 has quit [Ping timeout: 252 seconds]
<lkcl>
just did a cProfile check: 71466/70834 0.049 0.000 0.111 0.000 ast.py:1230(__hash__)
<lkcl>
*wow* that's a lot of calls.
<whitequark>
oh. yeah. ValueKey should do hash-consing.
<whitequark>
i can fix that easily.
<mtrbot-ml>
[mattermost] <sb10q> whitequark: that did nothing (i.e. cargo did not update the lockfile)
<mtrbot-ml>
[mattermost] <sb10q> and I suppose you meant ``cargo update -p smoltcp``
<whitequark>
sb: ah right. hm. needs a git dependency then
<whitequark>
lkcl: what are you doing that you have 10 million calls to normalize?
<whitequark>
are you simulating?
<lkcl>
in 6 seconds, almost 10 *million* calls to Const.normalise! wow, that's deeply impressive
<lkcl>
whitequark: yes
<whitequark>
right. the simulator needs to be rewritten to generate python code
<whitequark>
instead of ... whatever it is doing right now
<whitequark>
it's unnecessarily extremely slow.
<whitequark>
pypy copes well with it, but short simulations would benefit from codegen even on pypy, i think
<lkcl>
ooo, a language translator: that would be really neat.
<whitequark>
it's not exactly a language translator
<whitequark>
it's more of acompiler from nmigen to python
<lkcl>
ah yes good point
<whitequark>
the thing is that compiling to python does a lot of really good things for performance, mostly:
<whitequark>
1) eliminating name lookups
<whitequark>
2) splitting inline caches in code objects
<lkcl>
that would be extremely helpful, as we're running tens to hundreds of thousands of IEEE754FPU conformance tests: if we ran all of them it would require... estimated... three days to complete (on high-end hardware)
<whitequark>
afaik python has one inline cache per call or access site
<whitequark>
so the pattern currently emitted by pysim pretty much has 100% cache miss ratio
<lkcl>
urk
<whitequark>
tracing like pypy does is well equipped to deal with it, but only after warmup
<mtrbot-ml>
[mattermost] <sb10q> whitequark: I have trouble understanding how this error can lead to a linking failure, and also why the other project which is also using the same smoltcp version with the same features is not affected
<whitequark>
sb: oh. I misread the error.
<mtrbot-ml>
[mattermost] <sb10q> the error is "rust-lld: error: no memory region specified for section '.ARM.exidx'"
<whitequark>
yes. that's unrelated. you don't need to fix the smoltcp issue then.
<whitequark>
(well, it's technically unsound, but it's benign in that specific case; i checked)
<lkcl>
i wonder... is it even worth doing? i mean, there's a trail which gets you to c++ yet still keeps within python: cocotb. the trail goes: nmigen -> yosys -> verilog -> cocotb -> cocotb-driven-verilator-compilation -> cocotb-python-bindings-to-the-compiled-executable
<lkcl>
so you still have unit tests written in python, and you still get to "interact" with the nmigen program, albeit very indirectly.
<lkcl>
which is the downside
<lkcl>
debugging would be a pain as it's so indirect
<lkcl>
but, as the RTL is ultimately compiled to c++, it'll be *fast*
<whitequark>
lkcl: there's currently no way to correlate Signal objects to Verilog names
<whitequark>
it would be easy to add a way to correlate Signal objects to *RTLIL* names.
<lkcl>
urk.
<whitequark>
unfortunately, going to Verilog is not trivial at all, although certainly possible with enough effort
<lkcl>
... but verilator doesn't understand RTLIL, it only understands verilog. *sigh*
<whitequark>
key2 is interested in translation of RTLIL to C++ that could be simulated, sort of like verilator
<whitequark>
except without verilog
<whitequark>
so that's something i will look into
<lkcl>
now that *would* be interesting.
<whitequark>
that's medium term plans though and it's not really funded yet by anyone
<whitequark>
also, pysim still needs improvement because it's the most reliable way to do cosimulation on Windows
<lkcl>
well, if you have the time and the inclination, and have someone on the team that has an address within the EU, i'll be happy to help fill in a Grant Application with NLNet
<lkcl>
that would get up to EUR $50,000 within about... 3-4 months.
<whitequark>
it might be an option.
<_whitenotifier-3>
[m-labs/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fjQpu
<_whitenotifier-3>
[m-labs/nmigen] whitequark 4ee82c9 - tracer: use sys._getframe directly.
<whitequark>
i see about 30% speedup on Glasgow
<lkcl>
the "spin" is to make sure it relates to "Privacy and Enhanced Trust". as nmigen is such a critical dependency for the Libre RISC-V SoC, anything that impacts our project in a big way has "significance"
<lkcl>
even on its own, the fact that nmigen is a libre tool for the development of Open ASICs is "significant".
<whitequark>
nmigen in its current form is actually not very well suited for ASICs
<whitequark>
because of register initialization values.
<whitequark>
oMigen had an "ASIC mode" but no one used it so it bitrotted and got removed...
<lkcl>
rats.
<whitequark>
well, adding basic support is trivial. it's about one line in back.verilog.
<whitequark>
the problem is accurately reflecting semantics of uninitialized registers.
<whitequark>
Verilog has 'x, but 'x has... issues
<lkcl>
well, we'll definitely need it. we've 6+ months worth of development now
<whitequark>
the main reason it's not already in nmigen is that i have no hands on experience with ASIC development.
<lkcl>
much of the design that we're doing doesn't need resets, at all.
<whitequark>
for rather obvious cost reasons.
<lkcl>
neither do i.
<lkcl>
:)
<lkcl>
it's not stopping me :)
<whitequark>
well, as you probably know, nmigen does not *just* add language features. it does so in a way that these features are easy to use and hard to misuse.
<whitequark>
this *requires* hands-on experience.
<whitequark>
i suspect that a proper "ASIC mode" would involve some explicit modeling of power/clock/reset domains.
<whitequark>
which is necessary anyway for accurate modeling of CDCs.
<lkcl>
btw, we're doing an out-of-order design, where there are signals that say whether the data is valid.
<lkcl>
yehyeh.
<mtrbot-ml>
[mattermost] <sb10q> okay, I fixed it by updating the more related crates r0 and cortex-m-rt
<whitequark>
but you still need to reset the "data valid" flops to zero on startup.
<lkcl>
we'll need to do I/O clock domains, and separate processor domains
<mtrbot-ml>
[mattermost] <sb10q> the rate of breakage in embedded rust is pretty amazing..
<lkcl>
yes, the "data valid" ones clearly need to zero on reset, however the pipelines behind them definitely don't.
<whitequark>
sb: ~all of the breakage is related to getting embedded rust to work on stable
<mtrbot-ml>
[mattermost] <sb10q> I might actually just pin the rustc version for each project and be done with that
<whitequark>
i.e. the point of breakage is to prevent breakage later.
<lkcl>
we'll be putting in a 2nd NLNet funding proposal when it comes to doing layout.
<whitequark>
pinning rustc is quite silly. i fully intend to require 2018 revision in smoltcp sooner rather than later
<lkcl>
at that point, we'll have someone on-board who's done ASICs before
<whitequark>
lkcl: it doesn't really matter if you reset the entire pipeline or only data valid flops
<lkcl>
i forget his name: sb0, you know him. or, he knows you :)
<mtrbot-ml>
[mattermost] <sb10q> well then unpin when you actually want to upgrade, not when you're in the middle of something else
<whitequark>
i mean, it matters for the physical area
<whitequark>
but not for something like nmigen
<lkcl>
whitequark: ohh it matters. 12 stage DIV pipelines, 170-bit registers (for the FP64 DIV pipeline)...
<lkcl>
yes, for area.
mauz555 has joined #m-labs
<whitequark>
right, so nmigen is concerned primarily with semantics, with area minimization being handled at a broad language design stage
<lkcl>
the registers in the pipelined FPDIV algorithm need to be 3x the mantissa bitlength, which in FP64 is 53 bits.
<whitequark>
that means ensuring uninitialized values do not get used *somehow*.
<whitequark>
i don't really know how
<whitequark>
undef tracking could be a part of that solution, but not the entire solution
<lkcl>
honestly: it's fine. i mean, it's not "fine" as far as power-spikes at startup are concerned.
<lkcl>
i'll ask my contact. he's done ASIC layout, worked with Foundries.
<whitequark>
but this isn't an ASIC design problem, it's a language design problem.
<whitequark>
i know how it's done in the industry; verilog 'x
<lkcl>
all of the Computation Units (the CDC 6600 term for "ALUs"), they have a strange 3-way revolving door of signals: Read, Issue, Write. only 2 out of three of those are permitted to be active at any one time.
<lkcl>
ohh... hum... hmm hmm ok so you're saying that if nmigen doesn't have support for "'x", it's difficult to express (properly) when reset is or is not required?
<whitequark>
not exactly, but you're thinking in the right direction
<whitequark>
it's actually a problem even for FPGAs, for designs with multiple clock domains
<lkcl>
[vaguely-]understood :)
<whitequark>
because the FPGA global FF reset is asynchronous to user clocks
<lkcl>
sigh i wish that Bluespec wasn't proprietary. they *really* got this right.
<lkcl>
(the cross-domain clock synchronisation).
<whitequark>
but, most FPGA designs feature a clock tree with one root, so that can be just gated.
<whitequark>
no, you didn't get it
<whitequark>
adding support for CDC in nmigen is ~easy, i know how to do it, roughly speaking
<whitequark>
ergonomics is not entirely trivial, but it's certainly doable
mauz555 has quit [Ping timeout: 264 seconds]
<whitequark>
*if* your CDC is correct, *then* you never get indeterminate values or 'x
<lkcl>
okaay.
<whitequark>
the problem is startup.
<whitequark>
on ASICs, on startup, the entire design is in an indeterminate state.
<whitequark>
on FPGAs, on startup, global FF reset is deasserted asynchronously to user clocks... so if you don't gate user clocks, the entire design will be in an indeterminate state...
<whitequark>
... except in practice it is not typically a major issue. think 1 of 1000 power-ons will fail. not a problem for many designs.
<lkcl>
my experience with different clock domains is limited to watching some extremely experienced engineers (at IIT Madras) solve this with the correct application of some Bluespec classes. unfortunately, the "magic" - what was actually going on - would have been hidden behind those classes. that said: it *does* output verilog, and i know where to find some of the converted code, if that would help
<whitequark>
essentially, nmigen needs to gain a part that's a bit like the FPGA timing analyzer
<lkcl>
wow.
<whitequark>
it wouldn't be quantitative, but it would qualitatively know which signals are potentially asynchronous to each other
<whitequark>
partitioning the design into control domains.
<lkcl>
yehyeh. i've done event-driven simulation. several times, now.
<whitequark>
but even that doesn't quite help with ASICs because ASIC designs also need to somehow communicate to nmigen the *sequence* in which startup happens
<whitequark>
i suspect the only right answer here may be tighter integration with Yosys formal...
<whitequark>
btw, generated bluespec code would not help at all. types are erased.
<lkcl>
BSV is pretty awesome. if it compiles, it's *ONE HUNDRED PERCENT* guaranteed synthesiseable. this because BSV is written in haskell, and they've got formal mathematical correctness proofs of the resultant verilog
<lkcl>
utterly cool.
<_whitenotifier-3>
[nmigen] whitequark commented on issue #170: Tracer is issuing excessive amounts of stat() syscalls - https://git.io/fjQpX
<_whitenotifier-3>
[nmigen] whitequark closed issue #170: Tracer is issuing excessive amounts of stat() syscalls - https://git.io/fjQbz
<whitequark>
i mean, nmigen guarantees synthesizable verilog too.
<lkcl>
IIT Madras, a team with zero prior experience, did an ASIC that worked *first time* with it.
<lkcl>
superrrrb. that's really good to know.
<mtrbot-ml>
[mattermost] <sb10q> so it went from "extremely experienced engineers" to "zero prior experience"?
<whitequark>
actually, no, that's not entirely true yet.
<whitequark>
it's designed to do that, but there are a few corner cases where behavior would depend on the specific synthesizer
<whitequark>
e.g. logic loops
<lkcl>
sb0: Professor Kamakoti worked for Intel. he was the engineer who wrote all of Intel's test suites back in... i think... the 80s / early 90s.
<whitequark>
one major problem is that "synthesizable verilog" is actually ill-defined
<whitequark>
not only the spec is ambiguous, but vendors also frequently depart from it when they see it convenient
<whitequark>
for example: 1800-2017 doesn't define the semantics of "always_ff".
<whitequark>
and it's not binding in any case.
<lkcl>
yuck. sigh, yes, the ariane source code relies on one specific compiler. they use a proprietary vendor's SV compiler. nobody else can compile the code, unless you have a $1m+ license. grrrr
<whitequark>
you're basically trying to generate some sane subset of verilog and hope no one will misinterpret it
<whitequark>
there's also memories.
<whitequark>
for example, nmigen allows you to express a three write port memory
<whitequark>
but whether that will synthesize is doubtful. i mean, usually it will not.
<lkcl>
i have *no* idea what to do, here. ultimately we may actually need to end up patching yosys *and* nmigen, together, to support whatever specific cells we need to add
<whitequark>
no? why the hell would you need that
<lkcl>
going from the Cell Library (which we may need to license, or may actually need to design)
<whitequark>
you make an Instance.
<lkcl>
then... ahh ok.
<lkcl>
okaay.
<whitequark>
then you stuff that Instance full of logic that simulates its behavior in pysim.
<whitequark>
that's actually how memories *already* work.
<lkcl>
and.. ah that was going to be my next question :)
<lkcl>
one of the issues we have is: we need a very specific cell - a SR NAND latch.
<lkcl>
if we don't (if we do it as DFFs), we end up with one of the blocks in the design having a QUARTER OF A MILLION gates.
<lkcl>
an SR NAND latch "cell" would cut that down to [only] 50,000.
<whitequark>
that sounds like a sign of a problem elsewhere in the toolchain
<whitequark>
but. sure. you can instantiate it.
<whitequark>
i *think* current pysim code should allow you to express an SR latch as a logic loop
<whitequark>
although it would be far more elegantly expressed as a python process
<lkcl>
no, it's that we need the behaviour of an SR Latch (the *exact* behaviour).
<lkcl>
ok, great.
<whitequark>
well, 250k vs 50k does sound more reasonable
<lkcl>
:)
<lkcl>
we'll still need to simulate it (and probably a SPICE model too). ngggh, so much to doooo
<_whitenotifier-3>
[m-labs/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fjQp9