Thorn has quit [Read error: Connection reset by peer]
AlexDaniel has joined #yosys
janrinze has joined #yosys
<janrinze>
hi. after the relut issues performance was restored but now the latest master seems to have significant perfomance reduction again. Apparently relut has been removed and a new strategy has replaced it. Anyone else seeing this too?
<janrinze>
doing a rebuild of yosys now to see if it was cause by a build issue. (sometimes changes don't propagate fully and a rebuild is required after git pull.)
<tnt>
relut was not removed. the bug was just fixed. (two things were merged at once and they conflicted to create a perf issue)
pepijndevos[m] has joined #yosys
rrika has quit [Ping timeout: 245 seconds]
rohitksingh has joined #yosys
rrika has joined #yosys
X-Scale has quit [Ping timeout: 248 seconds]
rohitksingh has quit [Ping timeout: 245 seconds]
<janrinze>
tnt: did you see commit d5e8c0e6d33de71493855eca72fcc454a67a6140 ?
rohitksingh has joined #yosys
<tnt>
Arf, no I had.
<tnt>
not
<janrinze>
tnt: care to try out the latest master? I'm curious if you see similar results
<tnt>
yeah, already building it ...
<tnt>
but my laptop is like 5y old, it takes a while :p
<janrinze>
yeah, i can relate.
<janrinze>
tnt: i have a simple design that ran 100Mhz and now can only reach 68Mhz. that's quite a set back.
<tnt>
I can top that.
<tnt>
ERROR: timing analysis failed due to presence of combinatorial loops, incomplete specification of timing ports, etc.
<tnt>
It went from working to ... not working.
<tnt>
so yeah, I'd say yosys master is broken ATM
<janrinze>
commit ea8ac8fd7484cc7c3b8929ae339f9aeb49403c36 too
<janrinze>
oh, the presence of combinatorial loops messages have been bugging me too recently.
<janrinze>
Thought it was my design but clearly it's not just me :-)
<janrinze>
70% speed is quite annoying.
<janrinze>
My cpu went back form 52Mhz to 34.5 !
<janrinze>
in all quite a bit of regression i.r.t. speed
<janrinze>
If it has been replaced then I wonder if there is a new flag to be used for optimization.
<janrinze>
tnt: with a build of the last commit on aug 7 everything is okay?
<tnt>
So I found a way to make it pass ... but the design is now 800 LCs larger (2700 -> 3500) and ~ 10% slower.
<tnt>
which commit do you want me to test ?
<janrinze>
commit f69410daaf68cd3cef5e365df9b27c623ce589a7 should be the last one of aug 7
<janrinze>
tnt: i'm building that one now too and hopefully will see what the differences are.
pepijndevos[m] has left #yosys ["User left"]
pepijndevos[m] has joined #yosys
AlexDaniel has quit [Remote host closed the connection]
<tnt>
janrinze: seems to work better ... fmax is a bit lower but that's probably just randomness ..
<tnt>
LCs at least are back to "normal" (just 3 LCs more)
<tnt>
nextpnr still crashes at the end with "terminate called after throwing an instance of 'std::out_of_range'"
<tnt>
maybe there was some incompatible change between yosys / nextpnr ...
emeb has joined #yosys
<daveshah>
There was a breaking change in the JSON, new nextpnr with old JSON (where there is a problem, mostly ecp5) will give a sensible error but it wasn't possible to do anything with new JSON and old nextpnr
<daveshah>
Although annoying it finally means we have unambiguous parameters in all cases
<tnt>
Rebuilding nextpnr now. But yeah, looks like f69410daaf68cd3cef5e365df9b27c623ce589a7 is fine. But master is not.
develonepi3 has joined #yosys
<janrinze>
tnt: just tested commit f69410daaf68cd3cef5e365df9b27c623ce589a7 with my simple design and it gives 100 Mhz again.
<tnt>
ac2fc3a144fe1094bedcc6b3fda8a498ad43ae76 is what screws it up for me.
X-Scale has joined #yosys
<tnt>
janrinze: feel free to open an issue on github
emeb has quit [Quit: Leaving.]
<janrinze>
tnt: looking for an example that is small enough to show in an issue.
emeb_mac has joined #yosys
Thorn has joined #yosys
<tnt>
given the merge branch name that screws it up is ice40_full_adder, I'd say anything with an adder would be bad :p
rohitksingh has quit [Ping timeout: 245 seconds]
<janrinze>
tnt: the unlut part was intended to undo lut allocation by abc and allow further optimization of carry and lut. It seems that abc9 now supports full adder and will emit those. Unfortunately it seems it's not very smart about it.
<daveshah>
I don't think abc9 is emitting full adders, just perhaps optimising around them
<daveshah>
I'm not even sure what that PR was about, I suspect it might be best to revert it
<tpb>
Title: Wrap SB_LUT+SB_CARRY into $__ICE40_CARRY_WRAPPER by eddiehung · Pull Request #1266 · YosysHQ/yosys · GitHub (at github.com)
<daveshah>
With the somewhat intricate carry structure in the iCE40, it's easy to trigger edge cases that result in ridiculous amounts of feed throughs being generated in an attempt to legalise them
<tpb>
Title: Revert "Wrap SB_LUT+SB_CARRY into $__ICE40_CARRY_WRAPPER" by daveshah1 · Pull Request #1280 · YosysHQ/yosys · GitHub (at github.com)
citypw has quit [Ping timeout: 244 seconds]
<pepijndevos>
What's the most gate-efficient way to write synchronous logic experssions? I currently have a big ball of nested if and case statements.
emeb_mac has quit [Ping timeout: 245 seconds]
<tnt>
pepijndevos: my best results have been either describe it as close as the exact logic I want. (i.e. do the synthesis in my head and describe that). Or describe it as bare logic equations ( OR of ANDs ) that I externally minimized.
rohitksingh has quit [Ping timeout: 272 seconds]
AlexDaniel has joined #yosys
<janrinze>
daveshah: i'm building yosys from branch revert-1266-eddie/ice40_full_adder now to see if the performance regression is fixed with that too.
rohitksingh has joined #yosys
<pepijndevos>
tnt, so far the pieces of logic I managed to extract as asynchronous assignments were definitely smaller than what I had.
<tnt>
pepijndevos: I'm not surprised :)
<tnt>
pepijndevos: is the code you're working on somewhere public btw ?
<pepijndevos>
So basically I want to reach a situation where my sequential process is JUST unconditional assignments.
<tnt>
you want to minimize them. and especially you want to minimize the dependencies on signals that don't matter.
<tnt>
with nested if and conditions it's easy to hardcode a 'priority' or force a signal in a state in some case where in fact in that case the actual value doesn't matter.
<pepijndevos>
Right
<pepijndevos>
What do you mean by "minimize" though?
<pepijndevos>
And also... for example the next state of b is quite complicated, so it's not trivial to turn all of them into an async assignment
<tnt>
I mean do as best you can :)
<tnt>
Something that can help as well (especially if you're shooting for area), is to extract the wide muxes and only genrate the control signal in the large switch.
<tnt>
For instance if B can only take 4 different values, but which one is complex, you manually create the mux and you only generate the 'selection' signal in the large case.
<pepijndevos>
Ah I see
<pepijndevos>
Making progress...
<pepijndevos>
If someone just made a "sufficiently smart" compiler...
<janrinze>
pepijndevos: do you use vhdl with yosys?
<tnt>
janrinze: did it work ?
<tnt>
janrinze: (and the commercial version of yosys has a vhdl frontend)
<janrinze>
daveshah: f.w.i.w. the branch revert-1266-eddie/ice40_full_adder produces 100 MHz results again for my design.
<pepijndevos>
janrinze, YES! I'm using GHDL for everything except formal at the moment.
<pepijndevos>
Formal verification with GHDL is the next thing on my todo list :)
<janrinze>
pepijndevos: GHDL? I should take a look at that. Does it translate to verilog in the backend?
s_frit has quit [Remote host closed the connection]
rohitksingh has quit [Ping timeout: 272 seconds]
s_frit has joined #yosys
<tnt>
pepijndevos: oh interesting, I thought you were using verific.
<janrinze>
pepijndevos: GHDL seems to be only a simulator. How do you use that with yosys?
<daveshah>
janrinze: if your design is public, could you add a link to it and a comment about the Fmax details on the revert PR?
<janrinze>
daveshah: it's a tta cpu, I'm still working on it. No real stuff to test or verify yet.
<daveshah>
No worries
<janrinze>
daveshah: TTA is very attractive for small fpga's its a 16 bit cpu with 8 KB program and 8KB data memory. and a 32 bit ALU.
<janrinze>
daveshah: the core of the cpu is one instruction : mov Rd,Ra
<janrinze>
daveshah: Rd and Ra are 8 bit references to a register in any of the 16 module slots.
<janrinze>
daveshah: so each module has 16 write and 16 read registers.
rohitksingh has joined #yosys
<janrinze>
daveshah: It's an experiment in TTA cpus that i wanted to try. With the HX8K is can run at 100 MHz!
<tnt>
janrinze: did you try an up5k ? (just to see fmax)
<janrinze>
tnt: my 16bit RISC runs on both hx8k and up5k. the up5k being about 50% the speed of the hx8k
<janrinze>
tnt: I'll check the timings for up5k
<tnt>
How's the LC count ?
<janrinze>
tnt: for the TTA or the RISC16?
<tnt>
Both I guess :p
<janrinze>
tnt: the TTA is ICESTORM_LC 1693/ 7680 22% , using 1 16x16 registerfile , 1 load/store module and 1 ALU (32 bit).
<janrinze>
tnt: on the up5k that is 1693/5280 or 32%
<janrinze>
tnt: up5k has 128KB SPRAM but only 15 KB Block RAM.
rohitksingh has quit [Remote host closed the connection]
<janrinze>
tnt: the up5k has a disappointing 40MHz result for the TTA.
<tnt>
ok.
<janrinze>
tnt: on the icoboard I have not been able to get sram to work reliably above 35 MHz. Something to do with the pin delays i think.
<daveshah>
40MHz is a pretty good going for a up5k, tbh
<janrinze>
daveshah: yes, i think it's about the limit for the up5k with anything practical
<tnt>
janrinze: as soon as you go outside bidirectionally you need to take all the IO delays into account and AFAIK next-pnr doesn't provide output clock-to-out or input setup/hold numbers.
<janrinze>
tnt: the RISC16 SoC takes ICESTORM_LC 3776/ 7680 49%
<tnt>
That's a bit large for my taste :p
<janrinze>
tnt: unfortunately there is no way to tell the toolchain that the address to data delay is 10ns
<janrinze>
tnt: that SoC has VGA 512x384 2bit , HW MUL, HW DIV, SPI access to SDcard etc..
<janrinze>
tnt: in only 49% of the HX8K I think that is a respectable 'small' size.
<tnt>
Ah oki that's not just the cpu.
<janrinze>
tnt: it's about the same as a home computer from the 80's
<tnt>
also hw mul/div ... I guess you're not using the dsp ?
<janrinze>
On the up5k I do use the DSP. Not on the HX8K
<tnt>
obviously :p
<janrinze>
daveshah: will we see some tooling for defining I/O delay in the near future?
<daveshah>
Probably not, it's nowhere near the top of the todo list
<tnt>
janrinze: you can't really _define_ IO delays btw, the tool would just report them mostly.
<tnt>
It's up to you to actually 'by design' make it so it works for you.
<daveshah>
I'm not even sure if we have enough data in icebox, tbh
<daveshah>
We have some slightly questionable numbers for the IO primitives but ime the vendor tools use numbers depending on voltage and load capacitance
<tnt>
first thing is to always use IO registers for in/out, this way those delays are constant.
<janrinze>
tnt: the SoC can run at 58Mhz if no external RAM is used. Unfortunately 16KB is way too small for a SOC with a VGA framebuffer.
<daveshah>
Yeah as tnt says I don't think the RAM issue is one the tool would be able to fix itself
<tnt>
That's a "text mode" (with user definable glyphs) core for the up5k using the SPRAMs.
<daveshah>
But better timing analysis would at least report issues
<janrinze>
tnt: text mode would work, I guess. Still the code from the ROM is already 16KB..
<janrinze>
I'll see if i can hack up a rom that is 8KB and put the rest (vga txt, stack etc.) in the other 8k.
<tnt>
janrinze: use one spram for the code :) (that's what I do. I have a minimal boot code in ROM that loads the rest from SPI to the spram)
<daveshah>
Could you add wait states to run the external RAM slower than the CPU core?
<janrinze>
tnt: on the up5k it runs at 22MHz and has 128KB SPRAM available. no external chips
<daveshah>
And then put speed critical stuff available in internal ram
<tnt>
janrinze: ah oki.
<janrinze>
daveshah: waitstates could work but i think cache has a better chance of getting higher speeds to work.
<pepijndevos>
janrinze, look at ghdlsynth-beta
<janrinze>
pepijndevos; yes already found that but no time yet to go through it.
<pepijndevos>
It's extremely beta
<janrinze>
I noticed it said that in the README too :-D
<pepijndevos>
hrm... I rewrote all of my code as async with just unconditional assignments, and it turns out that yosys is smarter than me on these last bits.
<janrinze>
pepijndevos: there is a lot of vhdl around that would be interesting to test with yosys.
<pepijndevos>
Certainly
<pepijndevos>
I know a few people who'd like to run popual retro computing simulators on yosys :)))
<pepijndevos>
*popular
<janrinze>
pepijndevos: I tried to write 'efficient' verilog in the past but noticed that yosys usually is very smart to compile high level verilog. So usually no need to worry about the code.
<pepijndevos>
Yea... usually, but a few bits that I moved outside of the process saved a lot.
<janrinze>
pepijndevos: I've been doing retro computing for 15 years now. Specifically 6502 based and ARM3
<janrinze>
pepijndevos: I designed a few CPUs to do some alternative computing platforms. Somthing between a Apple II GS and an Acorn Archimedes.
<janrinze>
tnt: any experience with ecp5?
<janrinze>
tnt: The amount of BRAM seems very adequate for my SoC
<tnt>
janrinze: nope, not yet ... I plan to get acquainted with it during CCC camp in a couple of weeks.
<tnt>
daveshah: will you be at camp btw ?
* pepijndevos
jealous
<daveshah>
No, I'm not
<pepijndevos>
I completely missed out on camp ticket sale
<janrinze>
daveshah: any quick pointers for using nextpnr-ecp5 instead of nextpnr-ice40 ? I'd like to get a feel for what the ecp5 can do