az0re has quit [Remote host closed the connection]
Degi_ has joined #yosys
Degi has quit [Ping timeout: 240 seconds]
Degi_ is now known as Degi
janrinze has quit [Remote host closed the connection]
emeb_mac has joined #yosys
az0re has joined #yosys
futarisIRCcloud has joined #yosys
citypw has joined #yosys
emeb has quit [Quit: Leaving.]
Cerpin has quit [Quit: leaving]
Stary has quit [Ping timeout: 246 seconds]
npe has joined #yosys
Stary has joined #yosys
Cerpin has joined #yosys
npe has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
npe has joined #yosys
Cerpin has quit [Remote host closed the connection]
Cerpin has joined #yosys
vidbina_ has joined #yosys
az0re has quit [Remote host closed the connection]
npe has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
craigo has joined #yosys
_whitelogger has joined #yosys
Vinalon has quit [Ping timeout: 250 seconds]
Vinalon has joined #yosys
emeb_mac has quit [Quit: Leaving.]
jakobwenzel has joined #yosys
ZipCPU has quit [Excess Flood]
ZipCPU has joined #yosys
Asu has joined #yosys
mirage335 has quit [Ping timeout: 265 seconds]
mirage335 has joined #yosys
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
agg has joined #yosys
Ekho has quit [Quit: An alternate universe was just created where I didn't leave. But here, I left you. I'm sorry.]
Vinalon has quit [Ping timeout: 264 seconds]
Ekho has joined #yosys
<agg>
whitequark: as a data point, cxxrtl generates 1/7th the C++ code but takes 2.3x the execution time as verilator on my dsp design (and produces identical output, but also, caught an out of bounds bram read)
<agg>
can it/are there plans for it to output a vcd?
<agg>
i would also rate it as "easier to get working" although it took a little while to figure out the api for getting/setting port values
<ZirconiumX>
agg: Is this before or after the go-faster branch?
<agg>
sorry yes this is on the current pr branch
<agg>
yosys 1d5b6ac2
<agg>
and write_cxxrtl at default -O6, g++ at -O2, I'll see if clang makes an appreciable difference
<agg>
clang actually gets it down to 1.83x though a slightly unfair comparison since verilator times are using gcc :p
<ZirconiumX>
What are you calling before write_cxxrtl?
<agg>
read_ilang; write_cxxrtl
<agg>
(I thought write_cxxrtl with -O6 takes care of any other passes that are useful?)
<agg>
I believe write_cxxrtl -O6 does proc;flatten though, and -O6 is the default
<ZirconiumX>
Okay, yeah, sorry
<agg>
to be clear i'm not complaining about the current speed, it's still like 600ms cpu time to simulate 100ms of design clock time, which is more than enough
<ZirconiumX>
The copy of Yosys I have in front of me only has -O5
<agg>
though the verilator vcd dump is very useful :p mostly i am very looking forward to this being integrated in nmigen
<ZirconiumX>
Oh yeah, definitely
<agg>
this module needs to do around 5M clocks to settle, so it's one of the only things in my design that pysim won't really cope with, and my current test strategy for it is "check the verilator vcd in gtkwave"
<agg>
mostly because i haven't faced writing c++ to properly test the outputs for it :P
<Sarayan>
What do you put in the vcd?
<Sarayan>
with verilator, that is
<ZirconiumX>
Also have you tried voiding your warranty?
<Sarayan>
everythng, like pysim? Everything starting at a point? A selection of signals or modules?
enigma has joined #yosys
<agg>
I have verilator dump all signals to the vcd
<agg>
(not in the time benchmark above, I took out vcd writing for that)
<agg>
i find that's the most useful way of looking at it in gtkwave to see what's going on, rather than realising i wanna see something else and having to re-add that, rebuild, rerun, etc. doesn't take that long or produce especially huge vcds in this instance
<agg>
this design is annoying because i need to run for many clocks but it's not got that many moving parts
<agg>
(very high order decimator)
<whitequark>
agg: try using g++/clang++ -O3; also, clang++ -flto seems to produce better results depending on your exact use case
<whitequark>
there is also one thing i realized i did that might not be entirely optimal, i'll try to fix it soon
<agg>
with clang++ -O3 -march=native -flto I get to 1.6x verilator
<Sarayan>
whitequark: is there enough information left at the cxxrtl level to make every signal visible to a potential vcd writer?
<Sarayan>
lto can help even if everything is single-file?
<whitequark>
Sarayan: (every signal visible to vcd) not sure if "every", but the majority of them, yes. i will see exactly how well i can get it to work
<whitequark>
the current plan is to add debug info to cxxrtl output without sacrificing optimization, sort of like DWARF is supposed to work (but it usually just gives you <value optimized out> insead)
<whitequark>
(lto) cxxrtl has black boxes now
<whitequark>
among other things
<whitequark>
agg: so, there is one more thing you could do if you wanted every last drop of performance
<whitequark>
run yosys with CXXRTL_VOID_MY_WARRANTY=1 set in the envirnonment, and replace your *two* calls to .step() with *one* line like this: top.prev_p_clk = value<1>{0u}; top.p_clk = value<1>{1u}; top.step();
<agg>
whitequark: using clang for the verilator build with the same flags actually makes verilator slightly slower :p
<whitequark>
agg: i find that with minerva sram soc, g++ works a bit better than clang++
<agg>
so with clang++ -O3 -march=native -flto, verilator is 455ms and cxxrtl is 383ms
<agg>
but verilator with g++ -O3 gets 343ms
<agg>
whereas cxxrtl gets 648ms with g++ -O3, go figure
<agg>
(cxxrtl numbers there using VOID_MY_WARRANTY)
<agg>
with the best settings so far for each, verilator is 1.1x faster, but still generating 7x the amount of c++ faff and requiring its own makefile
<Sarayan>
pure sync design without comb loops?
<agg>
no deliberate comb loops :p
<whitequark>
Sarayan: the backend would complain if it would find comb loops
<whitequark>
also, it now automatically optimizes out useless delta cycles if it can schedule the entire thing statically
<whitequark>
so you can remove all the "optimized" versions of the code i gave you before, not necessary
vidbina_ has quit [Ping timeout: 256 seconds]
vidbina_ has joined #yosys
enigma has quit [Ping timeout: 256 seconds]
strongsaxophone has joined #yosys
enigma has joined #yosys
<Asu>
whitequark: so i've been messing a bit with compile flags of the generated c++ for that SoC from that archive on the gas gas gas PR
<Asu>
> i find that with minerva sram soc, g++ works a bit better than clang++
<Asu>
so that is interesting because i get absolutely the opposite
<whitequark>
agg: please try the latest commit from the PR again
<Asu>
all -O3, clang-9 w/ lto+pgo 0.8s, clang-9 w/ lto 0.9s, clang-9 without lto or pgo 1.2s, g++ 9.3 both with and without lto are around 1.35s~1.45s
<whitequark>
Asu: sorry, i misspoke
<Asu>
oh, right
<whitequark>
g++ works a bit better than clang++ *without LTO*
<whitequark>
with LTO, clang++ is absolutely much better
<Asu>
clang -O3 slightly outperforms gcc -O3 for me
<Sarayan>
which clang, 9, 10?
<Asu>
9.0.1
<Sarayan>
oh, you said before
<Asu>
i think wq tests with clang-7 so that might be it? that or a cpu difference
<whitequark>
yup
<agg>
whitequark: with g++ -O3, new commit 648ms->478ms, clang++ -flto -march=native -O3 is same as before at 385ms
<agg>
without -march=native actually a bit of a speedup to 376ms
<whitequark>
agg: oh fascinating, so clang could do my optimization on its own
<agg>
seems like it
<whitequark>
does -march=native use avx?
<whitequark>
that could bump down your multiplier, causing the slowdown
<ZirconiumX>
x86 vendors for the past 30 years: SIMD is great
<ZirconiumX>
x86 vendors now: actually don't use SIMD
<agg>
i think march=native will use all available features including avx
<agg>
this is a ryzen 7 2700x
<whitequark>
oh, not sure if ryzen has avx offset
<Sarayan>
power management is complicated
<ZirconiumX>
XFR will probably throttle things down with AVX, but that's mostly heat and power management
<agg>
still more than fast enough to replace verilator for me though, even without the warranty-voiding
<whitequark>
fun fact: i developed cxxrtl with a focus on correctness and flexibility, not performance
<agg>
and you started it what, a couple months ago?
<whitequark>
that it runs almost as fast as verilator is more of a side effect
<agg>
and now it's the same speed as verilator, more correct, easier to use, generates less code
<whitequark>
i think about a month of development time total went into it
<whitequark>
maybe two
<agg>
poor verilator
<whitequark>
no, less than a month, judging by commit logs
Cerpin has quit [Read error: Connection reset by peer]
Cerpin has joined #yosys
npe has joined #yosys
Thorn has joined #yosys
yosys-questions has joined #yosys
<Asu>
whitequark: for this SoC i found merging most of the if (posedge_p_clk()) manually in the generated code improves run times by ~13% for me and the compile time of the cxxrtl by ~10% (results with clang -O3 -flto)
<whitequark>
Asu: that's already upstream in cxxrtl
<Asu>
not sure if that's helpful, i'm kind of curious why this improves performance though (is it unable to deduce that the expression remains constant or just doesn't know how to merge the branches?)
<Asu>
oh
<whitequark>
i also got 13-15% improvmement
<yosys-questions>
Hi, quick q: When I look at the output of write_json command, I don't see that the ports/nets have their "signed/unsigned"-ness prevailed. I understand that once RTL is fully understood, signed/unsigned doesn't matter, but at what stage is it possible to recover this information? I would like to have it for some automation that I am doing
vidbina_ has joined #yosys
<ZirconiumX>
yosys-questions: you shouldn't use JSON as a Yosys parsing format, really
<mwk>
yosys-questions: the wire/register signedness is already lost by the time read_verilog is done
emeb has joined #yosys
<ZirconiumX>
It should be in the cells though?
<mwk>
however, cell input/output signedness is preserved until actual synthesis
<mwk>
so you could look at the A_SIGNED/B_SIGNED/Y_SIGNED/... attributes of cells
<yosys-questions>
@mwk: Let me look
<ZirconiumX>
So basically "this wire is signed" becomes "this cell input is signed", right?
<mwk>
yeah
<mwk>
except, well, casts can happen
<yosys-questions>
ZirconiumX: what stage should I tap in to get access to the parse tree?
<mwk>
ast
<ZirconiumX>
^
<daveshah>
But there may well be better frameworks than Yosys if you just want the parse tree
<mwk>
read_verilog has a "dump ast" option, but it's not really intended for machine consumption, so perhaps your best bet would be to throw in some C++ code that walks the AST tree
<yosys-questions>
Ok. Let me take a look at that. It's a bummer for me, as I pretty much have EVERYthing else that I need :)
<mwk>
also I kind of wonder if we should just add preserving wire signedness information in yosys
<daveshah>
It might be useful during vcd generation
<mwk>
I mean, we already have the "upto" and "start_offset" attributes which are basically semanticless HDL leftovers
<daveshah>
as much as anything else
<mwk>
and yeah
<mwk>
vcd
<daveshah>
They are needed for correct pin constraints
<yosys-questions>
Let me know if you all are interested, I wouldn't mind making the modification and sending a PR
<mwk>
I'd say it sounds reasonable
<yosys-questions>
at least taking a stab at it
<yosys-questions>
I have used write_json for a little bit of automation in the past, and it is really powerful I think
<ZirconiumX>
It is, but it's not really a stable interface AIUI
<daveshah>
Most of the changes haven't actually been breaking if you language-lawyer the original spec
<daveshah>
The bigger problem is the spec has been fairly poor
citypw has joined #yosys
Vinalon has joined #yosys
cr1901_modern has quit [Read error: Connection reset by peer]
<yosys-questions>
ZirconiumX What you said worries me a little bit. Do you mind elaborating in what ways? Is it that the write_json "API" isn't always updated and is missing some tests?
citypw has quit [Ping timeout: 240 seconds]
<daveshah>
It has seen some changes fairly recently that have broken some tools using it, partly correcting some original bad design decisions
<daveshah>
It should never be broken itself for long, given it is used for nextpnr
<ZirconiumX>
yosys-questions: daveshah says there's a poor specification for it, but my own understanding of write_json is that it's subject to change
<ZirconiumX>
Not stable as in "API stability" rather than "bugs"
anticw has quit [Remote host closed the connection]
<yosys-questions>
Ok, thanks!
anticw has joined #yosys
<yosys-questions>
To summarize here: API will exist, it's form may change, use with caution (?)
<daveshah>
Yes
<daveshah>
The most likely changes are new fields being added
<daveshah>
which it is important your application can gracefully ignore
<daveshah>
Changes in the format used to encode parameter/attribute values are still possible
<daveshah>
but hopefully have settled down
* mwk
mutters something about floats
<daveshah>
Oh yeah
<whitequark>
personally i would parse RTLIL
<whitequark>
but it's true that parsing JSON is easier for most people
<yosys-questions>
It really depends on the application: I mean for 200 lines of python that automates a register bus, my question was wheter it is worth it to parse RTLIL. But
<yosys-questions>
It really depends on the application: I mean for 200 lines of python that automates a register bus, my question was wheter it is worth it to parse RTLIL. But if it's worth parsing RTLIL for future benefits I will look into it
<daveshah>
Parsing RTLIL has the big advantage that you can support unelaborated processes
X-Scale` has joined #yosys
<daveshah>
(ie always blocks before they become latches, FFs or mux trees, more or less)
<mwk>
... RTLIL is also not entirely stable, and harder to extend in backwards-compatible way than JSON
X-Scale has quit [Ping timeout: 258 seconds]
X-Scale` is now known as X-Scale
<whitequark>
yosys-questions: well, personally, i find writing parsers easy and relaxing, but that's just me
<whitequark>
daveshah: speaking of which, any idea if we can extend write_verilog so it'd stop complaining about unelaborated processes?
<whitequark>
it whines because some processes are inexpressible in verilog, but i doubt that we can't juts detect which
<whitequark>
mwk: the syntax or the cells?
<daveshah>
Yeah, if you can detect the broken cases then that seems the best option
<mwk>
whitequark: syntax
<daveshah>
I don't know the Verilog backend well though
<tpb>
Title: Add parameter positional and default value information. by mwkmwkmwk · Pull Request #1945 · YosysHQ/yosys · GitHub (at github.com)
<mwk>
(also the same PR subtly changes RTLIL semantics by making parameter order actually meaningful)
<whitequark>
oh i see
<yosys-questions>
whitequark after I did a dump_rtilil, I notice that port signed information is already lost :-/
<whitequark>
yosys-questions: do you mean on module ports?
<whitequark>
yosys does not preserve any signedness information for nets after the frontend; it only has signedness information in the netlist when it is required for arithmetics and that's it
<whitequark>
it sounds like you want a tooling grade parser rather than a compiler frontend
Vinalon has quit [Remote host closed the connection]
Vinalon has joined #yosys
jakobwenzel has quit [Remote host closed the connection]
N2TOH_ has joined #yosys
N2TOH has quit [Ping timeout: 260 seconds]
npe has quit [Ping timeout: 256 seconds]
npe has joined #yosys
<yosys-questions>
daveshah whitequark I am looking into adding signedness as a field to RTLIL and make it a part of exports. Let me know if you think it's a bad idea!
<mwk>
tbh I think it shouldn't be a field
<yosys-questions>
mwk ZirconiumX ^
<mwk>
it'd be perfectly servicable and much less work to just slap it as an attribute on the wire
<ZirconiumX>
^
<mwk>
the whole patch would probably be just a few lines of code in ast.cc
<yosys-questions>
that works as well ..
N2TOH has joined #yosys
<Asu>
whitequark: might have found a ~+7% performance opportunity after a bit of profiling. i need to double check that my implementation is correct and make my benchmarking more robust but it seems that inserting the memory write in ::update() at the right position in the queue to keep the vector sorted is quite more efficient than performing a sort in ::commit(). do you want me to do a PR?
N2TOH_ has quit [Ping timeout: 250 seconds]
emeb has quit [Quit: Leaving.]
emeb has joined #yosys
cr1901_modern has joined #yosys
<whitequark>
Asu: absolutely
<whitequark>
I only wrote that memory code to be correct
futarisIRCcloud has joined #yosys
<Asu>
alright
adjtm has quit [Remote host closed the connection]
adjtm has joined #yosys
<anuejn>
is there a way to disable the execution of the SHARE pass in synth_xilinx?
<anuejn>
it takes a long time for me (hasnt finished yet; started 5m ago) and I would rather get a less optimal synthesis result that waiting ;)
<ZirconiumX>
anuejn: That sounds like a bug, but I don't know how big your input design is
<anuejn>
it shouldnt be that big
<ZirconiumX>
How many lines of input source?
<ZirconiumX>
I'm inclined to assume you might have hit a bug in that pass
<anuejn>
about 20k lines of rtlil
<ZirconiumX>
That should be fine, then
<daveshah>
I've never seen that bad a case so this should definitely be looked at, but I think there is a general issue in Yosys that some of the SAT-based stuff can get stuck on difficult problems with minimal reward
<daveshah>
because there is no timeout at present, I think
<daveshah>
timeouts are slightly tricky to implement in a deterministic way
<anuejn>
maybe the problem is that i run the expose pass before?
<daveshah>
it would need to be based on some kind of iteration count inside the SAT solver, not a wall-clock timeout
<daveshah>
That shouldn't cause a major performance change afaik
<daveshah>
if it does, then it definitely sounds like a bug
<anuejn>
it works if i use expose correctly and let it only expose signals from the toplevel :)
<anuejn>
so it was mostly my thumbness
zkms has quit [Quit: zkms]
zkms has joined #yosys
<ZirconiumX>
5000. Executing Verilog backend.
<ZirconiumX>
I managed it
Cerpin has quit [Quit: leaving]
Cerpin has joined #yosys
jfcaron has quit [Ping timeout: 256 seconds]
Asu has quit [Quit: Konversation terminated!]
somlo has quit [Ping timeout: 265 seconds]
vidbina_ has quit [Ping timeout: 265 seconds]
vidbina_ has joined #yosys
craigo has quit [Ping timeout: 260 seconds]
<ZirconiumX>
mwk: you around
<ZirconiumX>
?
<mwk>
yes
<ZirconiumX>
I have spent literally all day trying to bugpoint a Quartus ICE
<ZirconiumX>
I think I found the cause
<ZirconiumX>
wire [31:0] \cpu.cpuregs.regs[10] ;
<ZirconiumX>
And then later
<ZirconiumX>
.q(\cpu.cpuregs.regs[10] [31]),
<ZirconiumX>
And I'm *sure* that's not going to cause problems
<mwk>
so what, you're saying that their verilog^Wvqm parser is so shit, it cannot deal with [ in backslashed name?
<ZirconiumX>
Both, actually
<ZirconiumX>
I checked: it's not a parser bug, but an elaboration bug, I think
<mwk>
alright, but... is it about the name?
<ZirconiumX>
If I rename the variable I get a syntax error. Which I'm not sure if it's a good thing or bad thing
<ZirconiumX>
Equally I'm not sure if this is actually trying to represent a memory or something
* mwk
is still not quite sure what you're saying
<ZirconiumX>
"Quartus ICEs and I think it's because of the variable name, but I don't know Verilog enough to tell if this is a literal [10] or referring to a memory in this context"
<ZirconiumX>
Because for example, `reg [31:0] foo[10]` is a memory, right?
<mwk>
should be literal [10]
<mwk>
this is post-yosys-synthesis, right?
<ZirconiumX>
Yes
<mwk>
then I'd guess the wire is a result of yosys implementing a memory by flops
<mwk>
and it *was* a memory at some point, but is now just a plain wire
<mwk>
(with a strange-ish name)
<whitequark>
mwk: fun fact: yosys has similar issues, and i hate that it does
<ZirconiumX>
<mwk> then I'd guess the wire is a result of yosys implementing a memory by flops <-- yes, this is a case of memory inference being disabled
<whitequark>
actually, the exact same issue, i think
<mwk>
I mean, uhhh
<mwk>
isn't that just a problem with flatten having to call the flattened wires *something*?
<whitequark>
the problem in both cases is that the names are treated as having internal structure
<whitequark>
i.e. in-band signaling
<whitequark>
whereas in fact there is no way for them to have any self-consistent internal structure
<mwk>
.. right
<ZirconiumX>
Can we blacklist [] in variable names for write_verilog then?
<mwk>
ZirconiumX: if you want a quick check whether it's names that cause problems for quartus, go to backends/verilog/verilog_backend.cc line 50 and change "if (*str == '$' && may_rename && !norename)" to "if (may_rename)"
<mwk>
... not quite sure if that's correct yet though, let's see
<whitequark>
ZirconiumX: or try `rename -hide *[* *]*`