genii has quit [Remote host closed the connection]
AndresNavarro has joined ##openfpga
futarisIRCcloud has joined ##openfpga
rohitksingh_work has joined ##openfpga
Bike has quit [Quit: leaving]
AndresNavarro has quit [Quit: rcirc on GNU Emacs 25.2.1]
pie__ has joined ##openfpga
AndresNavarro has joined ##openfpga
pie___ has quit [Ping timeout: 245 seconds]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
emeb_mac has quit [Quit: Leaving.]
m4ssi has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
if anyone have any ecp5 boards and some time, I could use some help testing out multiboot my go-to test is just two bitstreams that blink different LEDs
Miyu has joined ##openfpga
in my tests so far, it works as well as the official tool, but that means not on everything so it works ok 1/2 boards I have
wasn't sure if the database was updated after the merge
ayjay_t has quit [Read error: Connection reset by peer]
it works great ok my ecp5-evn but the second bitstream doesn't work at all on another board I have. but that's no different from with lattice tools
ayjay_t has joined ##openfpga
rohitksingh_work has quit [Ping timeout: 250 seconds]
rohitksingh_work has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
rohitksingh_wor1 has joined ##openfpga
rohitksingh_wor1 has quit [Client Quit]
mumptai has joined ##openfpga
rohitksingh_work has quit [Ping timeout: 245 seconds]
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
Asu has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
indy has quit [Ping timeout: 250 seconds]
emily has quit [Remote host closed the connection]
emily has joined ##openfpga
AndresNavarro has quit [Ping timeout: 255 seconds]
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
rohitksingh has joined ##openfpga
daveshah: so
i am thinking about a soft PLL on ice40
using the LUT cascade output as a delay line
and using the metastable state of DFFs to sample at a higher resolution than can be afforded by the delay line
this hinges on the duration of metastable state being comparable to delay in the LUT
This is not something I really know anything about though
iCE40 LUTs are fairly slow
somehow they manage 60MHz or so
sxpert: you can get pipelined single lut layer logic at way higher than 60M even on a up5k.
yeah, though you need many pipeline stages I suppose
would you need to write this in a rather specific way ?
I always keep in mind the number of LUT layers between two FFs when writing logic.
how does one cound that ?
But of course depends on what you're doing. Like if I'm doing a DSP algo with no feedback, I can pretty much put a FF after each lut layer "for free" since the FF is there in the LCs anyway.
sxpert: when I write logic, I just 'map it' in my head to LUT4.
I know what can fit in an iCE40 LC and what doesn't.
I see
usually I always have a paper drawing of the exact logic I want. Then I write verilog that describes that.
Only time I went away from that recently was the decode logic for a custom soft core ... and that went pretty bad. I tried describing it behavioraly with verilog 'case' and let the synthesis figure it out. Result was aweful. In the end, I ended up writing and external tool to convert that behavioral description in a huge truth table and then run that through and external logic optimizer and then feed the basic logic equation to yosys and that worked way better.
* sxpert
wonders how to optimise his stuff
tnt: flowmap produces optimal LUT depth for any given logic
this is polynomial time for LUTs, unlike AIGs etc
it produces optimal LUT depth for a given logic netlist
if I understand tnt, yosys has issues with optimizing large 'case' statements ?
right, that's what i said, no?
afaik there are transformations to the netlist (like balancing) that reduce depth without changing fucntion
whitequark: the main issue was that yosys was not taking good advantage of the lot of 'don't cares'
ohh, i see what you mean
tnt: hmmm, that might be possible to improve
That was a very simple test case to illustrate what I meant.
whitequark: is your flowmap worked merged in yosys btw ? Or do I have to use a branch ? Curious to try if it helps the mapping of the base logic equations.
flowmap is upstream
all of my code that is ready to use, anyway
m4ssi has quit [Ping timeout: 240 seconds]
m4ssi has joined ##openfpga
rohitksingh has quit [Remote host closed the connection]
rohitksingh has joined ##openfpga
whitequark: any 'howto' to integrate it in an ice40 flow ? It's not in the default synth_ice40 'macro' right ?
it is
wait, hm
that part's not upstream it looks
tnt: ok. stop just before map_gates in synth_ice40. run simplemap; flowmap -maxlut 4
emeb has joined ##openfpga
rohitksingh has quit [Ping timeout: 246 seconds]
bibor has quit [Quit: WeeChat 1.6]
bibor has joined ##openfpga
_florent_: thanks!
whitequark: tx, that works. Results just for that comb block is pretty similar to the default synth_ice42 in # LUTs (input verilog is just a bunch of logic equations expanding a 16 bit opcode into 58 control signals). Not sure if there is an easy way to measure the depth of generated logic.
tnt: -debug will tell you
for flowmap that is
tnt: the main advantage of flowmap over abc is that it preserves all signal names
Oh wait, I just inserted simplemap; flowmap -maxlut 4 right before map_gates, but I left everything, so AFAICT it's still running abc.
so you could run flowmap instead of ABC ?
flowmap is a replacement for abc
and it's supposedly better ?
it produces less space efficient designs that are approximately as fast
in fact i have a partial implementation...
tnt: you do not need any abc commands at all
yeah, I just tried doing the final techmap after flow map (since it's only comb logic, no need for the rest).
it works, but it uses about twice as many LUTs.
genii has joined ##openfpga
that's expected
flowmap is not optimizing for LUT count at all
could there be a later stage where those luts are factorized ?
yes, FlowMap-Area
whitequark: That's useful, most of the times, synthesizers over optimise logic causing heavy logic congestion. It's nice being able to fine tune which gets optimised and which might benifit from not doing so.
ylamarre: even better
flowmap-r has a tunable tradeoff between area and depth
so you can trade one logic level for less area
or two
abc can't do this!
Since signal names are conserved, we don't have to fight the tools to put the constraints.
"over optimize" ... "mis optimize". My experience is they _think_ they do something good ... but end up doing something that's worse than just doing the obvious.
also sounds easier to debug
btw, flowmap is a very very early ancestor of the synthesis algorithm that's now in vivado
the team that worked on flowmap was sponsored by, and produced results for, xilinx
whitequark: Ah, makes sense!
and their algorithms are often an improvement over their predecessor that are something like "we took a superexponential algo and made it polynomial"
or "we took a polynomial algo and reduced the constsant by a few orders of magnitude"
I remember using such feature in Vivado and it was deeply appreciated.
in fact i think flowmap is the basis of *all* modern LUT synthesis
tnt: It's not so obvious to estimate routing congestion during synthesis. Espescialy in what I'd call "outlying design".
hm, not sure about abc
Richard_Simmons has joined ##openfpga
Bob_Dole has quit [Ping timeout: 264 seconds]
sxpert: A good way to estimate logic, is to check for the "worst case". If well coded/though, there's not much to optimise (others should correct me here).
Especially on something like an ice40 where you don't have a lot of specialty logic.
If you have a wide comparision, you might be able to take advantage of the carry propagation logic, but otherwise it all goes in the LUTs.
IMO, if your design is going (and fitting) on a smallish FPGA, you have a good enough idea on how it maps...
That's what I love about the ice40, it's small enough I can pretty much fit it in my head :)
this is what people say about c and microcontrollers too
but i'm not very convinced
sure, if i am doing an ALU that will be replicatede 16 or 32 times, i will hand optimize it
(but i still rely on yosys to infer correct carry chains and such)
It's more about estimating logic usage, than getting accurate results, knowing if you should add a layer of pipeline somewhere or does it still fit in your LUT.
Obviously you have to pick your battles. That's why the decode logic, I mostly did it using existing logic optimizers and I didn't go and hand code every LUT4.
But the ALU/execution unit, I knew it'd be the critical path and there, I went way lower to make sure things would be exactly as designed.
am recoding my alu to what I've learned...
ylamarre: I wonder if actual coding style has any bearing on the generated logic
depends what you mean by 'style' ... but yeah, in general, synthesis tool can infer different things depending how it's written.
getting them to reliably do what you want is an art form.
Now, o_bus_load_dp should just be assigned it's logic...
It's not a reset thing...
it's a control line to the bus controller
Now there are "2 shcool of thought": reset condition at the begining or at the end of your always block.
sxpert: My point still stand.
One problem is with the reset at beginning and the rest of the logic in an `else` is that if you decide not to put reset on a signal (e.g. for datapaths), then !reset becomes a clock enable for that signal which can waste routing/resources
yep. this is why nmigen puts reset at the end of the block in an `if`
I'm on two thread here, so I'm a little slow on the reply, sorry.
I'll finish why o_bus should just be it's own line, while people can explain the reset thing.
Reset at the begining is legacy style and it's only advatage is you see your "common" reset at the begining of you block.
ylamarre: ok, just tried, o_bus_load_dp can't be a wire, as start_load_dp is on for a very short time
Not a wire, but you can/should/will put logic on your reg assignation line!
ah, so put that as a similar logic in a always @(*) block ?
Give me a few minutes to trace this mess... I'll do it for this signal only, put it'll be a great explanation of what we mean by coding style drives/helps the tools...
always @(*) begin
if (start_load_dp) o_bus_load_dp = 1;
Then hopefully someone can put the explanation in a nice tweet post, thanking @gatin00b
if (xfr_copy_done) o_bus_load_dp = 0;
this makes it work
sxpert: please, let me a few minutes... it'll be very nice when all compact and sexy.
* sxpert
waits ;)
Ok, there's just so much stuff to say it'll take more than a few minutes actually... so I'll go with something else that'll explain what I wanna convey here...
So one good practice I'd give you is to keep resets or presets buffered. Meaning they should come from a FF
So you have this module that is preset or reset at some point by a signal 'rst'
something like "blah <= ~blah;" ?
* sxpert
missed somtething there
So you want something like (reset at the bottom style):
always @(posedge clk) begin
my_reg <= selector ? reg_b : reg_a;
if ( rst == 1'b1 ) begin
my_reg <= 1'b0;
In this case, reset was probably useless since it's a mux, but I didn't have a good example.
You need more code...
lemme expand a bit..
i'll include reg_a and reg_b and we'll have something to work with...
and it'll give us something to compare legacy reset and at-bottom reset.
always @(posedge clk) begin
reg_a <= reg_a + 4'b1;
my_reg <= selector ? reg_b : reg_a;
reg_b <= 4'hA;
if (rst == 1'b1) begin
reg_a <= 4'b0;
reg_b <= 4'h5;
always @(posedge clk) begin
if (rst == 1'b1) begin
reg_a <= 4'b0;
reg_b <= 4'h5;
else begin
reg_a <= reg_a + 4'b1;
reg_b <= 4'hA;
my_reg <= selector ? reg_b : reg_a; //"Wrong" as it introduce a CE
IOk, here we go...
So the first case shows what we mean by reset-at-bottom.
The top part of the always block is our logic and the bottom part is handling our control signal logic.
By control signal I mean: PRESET, RESET and CE (Clock Enable)
In the secound case, rst has to go through an inverter then to CE pins of my_reg's FF
This adds unecessary logic and routing.
sxpert: you're not targetting the ice40 right ?
my_reg is not reset because it doesn't need to be reset since it's value won't be used when reset is deasserted (somewhere else in the design)
It doesn't matter which FPGA is targeted, crap coding style is crap coding style ;)
ylamarre: I'm raising it because the arch of the ice40 makes async reset much preferrable imho. For other (ecp/xilinx/...) not so much.
Notice, I also provide the length of all my signals. This ensures there are no surprises if register width should expand later on.
tnt: why?
ylamarre: in the ice40, sync resets is gated by CE. (so you need CE=1 for RST line to work).
Oh! good catch!
m4ssi has quit [Remote host closed the connection]
I heard something like that for some Intel parts, but never bothered with them so, I never really cared, but good to know thanks.
* ylamarre
has only ever really used Xilinx parts
We'll modify later to accomodate for that.
I think sxpert is working on ECP5 IIRC ?
ylamarre: I worked on xilinx pretty exclusively as well until I started doing ice40 stuff ~ 1 y ago.
Still need to get into ECP5 ... got the hw, just no time yet to dive into it.
I think both...
sxpert: Ok, so in your code, alu_active is a reset and CE merged together... avoid this!
sxpert: You provide me with a good example here: wire [1:0] phase;
assign phase = i_clk_ph + 3;
well, alu_active is indeed reset and module_enable
verilog integers are 32bits signed
so that should be "2'b11" and not "3" ?
or maybe
If that's what you want...
"- 2'b01" ?
that's better.
'cause that's really the intent.
I thought the tool would cast the integer value to whatever the destination size is
I see
It's in the standard somewhere
Don't assume!
I'd also favor "- 2'd1"
Sign extension and subtle bit truncation will bite you back...
That's garanteed by Murphy!
zem has quit [Ping timeout: 240 seconds]
also, you have unbuffered substration followed by comparison going those some other logic which eventualy goes into reset and preset...
Your optimiser might save you, but I wouldn't rely on that.
zem has joined ##openfpga
Reversing the logic, my understanding here is you have logic that are only active one fourth of the time depending on the phase...
phase_0 is prepare commands onto the bus
phase_1 is when the other device on the bus does things
phase_2 is when the instruction decoder happens
phase_3 is when instruction execution starts or happens
ylamarre: I have updated the thing, does it look any better ?
Ok, and with all the comparisions and all that, instead on "saving" some flops on the adders and messing up your decoding logic, you might want to have a shift register, a one-hot state machine if you will
Code still looks the same on github...
hadn't pushed, sorry ;)
Not exactly what I'm trying to show you...
pointfree has quit [Excess Flood]
pointfree has joined ##openfpga
I'll rework the code on my way back home, I think it's a good example of how coding style "drives" the tool, but it's taking too much time and I need to work.
ok, no problem
I'll continue adding functionnality, and rewrite according to the new set of rules
oh yay, my order of amber 3ml syringes from amazon is out for delivery
Now i can play with my new UV cure epoxy
ylamarre: changed the clock generation to a shift register as you said
xdeller__ has quit [Read error: Connection reset by peer]
xdeller__ has joined ##openfpga
azonenberg_work: the amber color is to prevent the thing from curing in the daylight ?
they're basically a LPF with a cutoff in the mid-yellow range
so green/blue/UV are blocked
azonenberg: Like those sleeping glasses?