emeb has left ##openfpga [##openfpga]
Flea86 has joined ##openfpga
emeb_mac has joined ##openfpga
gsi__ has joined ##openfpga
gsi_ has quit [Ping timeout: 250 seconds]
unixb0y has quit [Ping timeout: 255 seconds]
unixb0y has joined ##openfpga
Bob_Dole has quit [Ping timeout: 264 seconds]
GenTooMan has quit [Quit: Leaving]
Bob_Dole has joined ##openfpga
genii has quit [Remote host closed the connection]
AndresNavarro has joined ##openfpga
futarisIRCcloud has joined ##openfpga
rohitksingh_work has joined ##openfpga
Bike has quit [Quit: leaving]
AndresNavarro has quit [Quit: rcirc on GNU Emacs 25.2.1]
pie__ has joined ##openfpga
AndresNavarro has joined ##openfpga
pie___ has quit [Ping timeout: 245 seconds]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
emeb_mac has quit [Quit: Leaving.]
m4ssi has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
<RaYmAn> if anyone have any ecp5 boards and some time, I could use some help testing out multiboot https://github.com/SymbiFlow/prjtrellis/pull/68 my go-to test is just two bitstreams that blink different LEDs
Miyu has joined ##openfpga
<RaYmAn> in my tests so far, it works as well as the official tool, but that means not on everything so it works ok 1/2 boards I have
<whitequark> i have a versa 5g
<RaYmAn> it needs a database with https://github.com/SymbiFlow/prjtrellis/pull/67 in
<daveshah> The config bits are upstream now
<RaYmAn> awesome
<RaYmAn> wasn't sure if the database was updated after the merge
ayjay_t has quit [Read error: Connection reset by peer]
<RaYmAn> it works great ok my ecp5-evn but the second bitstream doesn't work at all on another board I have. but that's no different from with lattice tools
ayjay_t has joined ##openfpga
rohitksingh_work has quit [Ping timeout: 250 seconds]
rohitksingh_work has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
rohitksingh_wor1 has joined ##openfpga
rohitksingh_wor1 has quit [Client Quit]
mumptai has joined ##openfpga
rohitksingh_work has quit [Ping timeout: 245 seconds]
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
Asu has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
indy has quit [Ping timeout: 250 seconds]
emily has quit [Remote host closed the connection]
emily has joined ##openfpga
AndresNavarro has quit [Ping timeout: 255 seconds]
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
rohitksingh has joined ##openfpga
<whitequark> daveshah: so
<whitequark> i am thinking about a soft PLL on ice40
<whitequark> using the LUT cascade output as a delay line
<whitequark> and using the metastable state of DFFs to sample at a higher resolution than can be afforded by the delay line
<whitequark> this hinges on the duration of metastable state being comparable to delay in the LUT
<daveshah> interesting
<daveshah> This is not something I really know anything about though
<daveshah> iCE40 LUTs are fairly slow
<sxpert> somehow they manage 60MHz or so
<tnt> sxpert: you can get pipelined single lut layer logic at way higher than 60M even on a up5k.
<sxpert> yeah, though you need many pipeline stages I suppose
<sxpert> would you need to write this in a rather specific way ?
<tnt> I always keep in mind the number of LUT layers between two FFs when writing logic.
<sxpert> hmm,
<sxpert> how does one cound that ?
<sxpert> count
<tnt> But of course depends on what you're doing. Like if I'm doing a DSP algo with no feedback, I can pretty much put a FF after each lut layer "for free" since the FF is there in the LCs anyway.
<tnt> sxpert: when I write logic, I just 'map it' in my head to LUT4.
<tnt> I know what can fit in an iCE40 LC and what doesn't.
<sxpert> I see
<tnt> usually I always have a paper drawing of the exact logic I want. Then I write verilog that describes that.
<sxpert> hah
<tnt> Only time I went away from that recently was the decode logic for a custom soft core ... and that went pretty bad. I tried describing it behavioraly with verilog 'case' and let the synthesis figure it out. Result was aweful. In the end, I ended up writing and external tool to convert that behavioral description in a huge truth table and then run that through and external logic optimizer and then feed the basic logic equation to yosys and that worked way better.
* sxpert wonders how to optimise his stuff
<sxpert> hmm
<whitequark> tnt: flowmap produces optimal LUT depth for any given logic
<whitequark> this is polynomial time for LUTs, unlike AIGs etc
<daveshah> it produces optimal LUT depth for a given logic netlist
<sxpert> if I understand tnt, yosys has issues with optimizing large 'case' statements ?
<whitequark> right, that's what i said, no?
<daveshah> afaik there are transformations to the netlist (like balancing) that reduce depth without changing fucntion
<tnt> whitequark: the main issue was that yosys was not taking good advantage of the lot of 'don't cares'
<whitequark> ohh, i see what you mean
<sxpert> hmm
<whitequark> tnt: hmmm, that might be possible to improve
<tnt> That was a very simple test case to illustrate what I meant.
<tnt> whitequark: is your flowmap worked merged in yosys btw ? Or do I have to use a branch ? Curious to try if it helps the mapping of the base logic equations.
<whitequark> flowmap is upstream
<whitequark> all of my code that is ready to use, anyway
m4ssi has quit [Ping timeout: 240 seconds]
m4ssi has joined ##openfpga
rohitksingh has quit [Remote host closed the connection]
rohitksingh has joined ##openfpga
<tnt> whitequark: any 'howto' to integrate it in an ice40 flow ? It's not in the default synth_ice40 'macro' right ?
<whitequark> it is
<whitequark> wait, hm
<whitequark> that part's not upstream it looks
<whitequark> tnt: ok. stop just before map_gates in synth_ice40. run simplemap; flowmap -maxlut 4
emeb has joined ##openfpga
rohitksingh has quit [Ping timeout: 246 seconds]
bibor has quit [Quit: WeeChat 1.6]
bibor has joined ##openfpga
<somlo> _florent_: thanks!
<tnt> whitequark: tx, that works. Results just for that comb block is pretty similar to the default synth_ice42 in # LUTs (input verilog is just a bunch of logic equations expanding a 16 bit opcode into 58 control signals). Not sure if there is an easy way to measure the depth of generated logic.
<whitequark> tnt: -debug will tell you
<whitequark> for flowmap that is
<whitequark> tnt: the main advantage of flowmap over abc is that it preserves all signal names
<tnt> Oh wait, I just inserted simplemap; flowmap -maxlut 4 right before map_gates, but I left everything, so AFAICT it's still running abc.
<sxpert> so you could run flowmap instead of ABC ?
<whitequark> yes
<whitequark> flowmap is a replacement for abc
<sxpert> ah
<sxpert> and it's supposedly better ?
<whitequark> it produces less space efficient designs that are approximately as fast
<whitequark> but it preserves debuggability
<tnt> That's the script I'm running: https://pastebin.com/ELgjyTjb
<sxpert> hmm. compromises, compromises ;-)
<whitequark> sxpert: yes. i am aware of flowsyn
<whitequark> in fact i have a partial implementation...
<whitequark> tnt: you do not need any abc commands at all
<tnt> yeah, I just tried doing the final techmap after flow map (since it's only comb logic, no need for the rest).
<tnt> it works, but it uses about twice as many LUTs.
genii has joined ##openfpga
<whitequark> that's expected
<whitequark> flowmap is not optimizing for LUT count at all
<sxpert> could there be a later stage where those luts are factorized ?
<whitequark> yes, FlowMap-Area
<ylamarre> whitequark: That's useful, most of the times, synthesizers over optimise logic causing heavy logic congestion. It's nice being able to fine tune which gets optimised and which might benifit from not doing so.
<whitequark> ylamarre: even better
<whitequark> flowmap-r has a tunable tradeoff between area and depth
<whitequark> so you can trade one logic level for less area
<whitequark> or two
<whitequark> abc can't do this!
<ylamarre> Since signal names are conserved, we don't have to fight the tools to put the constraints.
<tnt> "over optimize" ... "mis optimize". My experience is they _think_ they do something good ... but end up doing something that's worse than just doing the obvious.
<sxpert> also sounds easier to debug
<whitequark> btw, flowmap is a very very early ancestor of the synthesis algorithm that's now in vivado
<whitequark> the team that worked on flowmap was sponsored by, and produced results for, xilinx
<ylamarre> whitequark: Ah, makes sense!
<whitequark> and their algorithms are often an improvement over their predecessor that are something like "we took a superexponential algo and made it polynomial"
<whitequark> or "we took a polynomial algo and reduced the constsant by a few orders of magnitude"
<ylamarre> I remember using such feature in Vivado and it was deeply appreciated.
<whitequark> in fact i think flowmap is the basis of *all* modern LUT synthesis
<whitequark> indirectly
<ylamarre> tnt: It's not so obvious to estimate routing congestion during synthesis. Espescialy in what I'd call "outlying design".
<whitequark> hm, not sure about abc
Richard_Simmons has joined ##openfpga
Bob_Dole has quit [Ping timeout: 264 seconds]
<ylamarre> sxpert: A good way to estimate logic, is to check for the "worst case". If well coded/though, there's not much to optimise (others should correct me here).
<ylamarre> Especially on something like an ice40 where you don't have a lot of specialty logic.
<ylamarre> If you have a wide comparision, you might be able to take advantage of the carry propagation logic, but otherwise it all goes in the LUTs.
<ylamarre> IMO, if your design is going (and fitting) on a smallish FPGA, you have a good enough idea on how it maps...
<tnt> That's what I love about the ice40, it's small enough I can pretty much fit it in my head :)
<whitequark> this is what people say about c and microcontrollers too
<whitequark> but i'm not very convinced
<whitequark> sure, if i am doing an ALU that will be replicatede 16 or 32 times, i will hand optimize it
<whitequark> (but i still rely on yosys to infer correct carry chains and such)
<ylamarre> It's more about estimating logic usage, than getting accurate results, knowing if you should add a layer of pipeline somewhere or does it still fit in your LUT.
<tnt> Obviously you have to pick your battles. That's why the decode logic, I mostly did it using existing logic optimizers and I didn't go and hand code every LUT4.
<tnt> But the ALU/execution unit, I knew it'd be the critical path and there, I went way lower to make sure things would be exactly as designed.
<sxpert> am recoding my alu to what I've learned...
<sxpert> ylamarre: I wonder if actual coding style has any bearing on the generated logic
<tnt> depends what you mean by 'style' ... but yeah, in general, synthesis tool can infer different things depending how it's written.
<tnt> getting them to reliably do what you want is an art form.
<tnt> (at which I suck most of the time)
<sxpert> then I'll take constructive criticism on the latest incarnation https://github.com/sxpert/hp-saturn/blob/master/saturn_alu.v
<whitequark> that's an enormous module
<whitequark> what's the core doing?
<whitequark> which operations?
<tnt> what's with [0:0] btw ?
<sxpert> this is only the alu module
<whitequark> the core of an alu is a programmable adder/subtractor/etc
<whitequark> what is that? no bus fluff
<sxpert> well the arch is somewhat special
<whitequark> my ALSRU is about 130 lines for comparison
<sxpert> the instruction list is mostly alu oriented operations
<sxpert> and memory reads and writes go to either A or C
<sxpert> so you have all those regs, and the alu is a loop that goes around nibbles and does things with them
<sxpert> from reg to reg
<sxpert> whitequark: the thing's a 64 bit register arch, with a 4 bit bus...
<whitequark> oh
<whitequark> oh so it's mostly control logic then
<sxpert> yeah
<whitequark> the actual alu isn't as important
<sxpert> not really
<sxpert> the actual alu will be one "always @(*)" in the middle somewhere
<sxpert> I need to find out how to implement BCD ops too...
<ylamarre> Ok, so there are a few places where you have something like:
<ylamarre> if (signal_a) begin
<ylamarre> reg_a <= 1;
<ylamarre> end
<ylamarre> and probably some other code to set it back to zero somewhere else...
<sxpert> yeah
<ylamarre> So, what I'd do, is is:
<sxpert> usually not far below ;)
<ylamarre> reg_a <= reset_cond: 1'b0 : signal_a;
<ylamarre> Then all the logic on a single line.
<whitequark> separating resets from other logic is typically good practice
<whitequark> because it lets you quickly verify that resets are indeed happening
<whitequark> (of course, resets shouldn't be opt-in, but it's verilog....)
<ylamarre> whitequark: Yes, agreed, but only in cases they are actual resets.
<sxpert> well, I did it that way in order to trace where exactly things got reset properly
<sxpert> otherwise the FSM goes into lalaland
<ylamarre> Ok, here's an example of "why the f am I tracing those signals".
<ylamarre> if (start_load_dp) begin
<ylamarre> end
<ylamarre> o_bus_load_dp <= 1;
<ylamarre> [4 conditions later]
<ylamarre> f (xfr_copy_done) begin
<ylamarre> $display("ALU %0d: [%d] xfr_copy_done %h %b %b",phase, i_cycle_ctr, data_counter, xfr_init_done, xfr_data_done);
<ylamarre> xfr_init_done <= 0;
<ylamarre> xfr_data_done <= 1;
<ylamarre> o_bus_load_dp <= 0;
<ylamarre> // right on time to start the actual transfer
<ylamarre> o_bus_dp_write <= i_xfr_dir_out;
<ylamarre> o_bus_dp_read <= !i_xfr_dir_out;
<ylamarre> o_bus_xfr_cnt <= (i_field_last - i_field_start);
<ylamarre> end
<ylamarre> Now, o_bus_load_dp should just be assigned it's logic...
<ylamarre> It's not a reset thing...
<sxpert> it's a control line to the bus controller
<ylamarre> Now there are "2 shcool of thought": reset condition at the begining or at the end of your always block.
<ylamarre> sxpert: My point still stand.
<daveshah> One problem is with the reset at beginning and the rest of the logic in an `else` is that if you decide not to put reset on a signal (e.g. for datapaths), then !reset becomes a clock enable for that signal which can waste routing/resources
<whitequark> yep. this is why nmigen puts reset at the end of the block in an `if`
<ylamarre> Exactly
<ylamarre> I'm on two thread here, so I'm a little slow on the reply, sorry.
<ylamarre> I'll finish why o_bus should just be it's own line, while people can explain the reset thing.
<ylamarre> Reset at the begining is legacy style and it's only advatage is you see your "common" reset at the begining of you block.
<sxpert> ylamarre: ok, just tried, o_bus_load_dp can't be a wire, as start_load_dp is on for a very short time
<ylamarre> Not a wire, but you can/should/will put logic on your reg assignation line!
<sxpert> ah, so put that as a similar logic in a always @(*) block ?
<ylamarre> Give me a few minutes to trace this mess... I'll do it for this signal only, put it'll be a great explanation of what we mean by coding style drives/helps the tools...
<sxpert> always @(*) begin
<sxpert> if (start_load_dp) o_bus_load_dp = 1;
<ylamarre> Then hopefully someone can put the explanation in a nice tweet post, thanking @gatin00b
<sxpert> if (xfr_copy_done) o_bus_load_dp = 0;
<sxpert> end
<sxpert> this makes it work
<ylamarre> sxpert: please, let me a few minutes... it'll be very nice when all compact and sexy.
<sxpert> ok
* sxpert waits ;)
<ylamarre> Ok, there's just so much stuff to say it'll take more than a few minutes actually... so I'll go with something else that'll explain what I wanna convey here...
<ylamarre> So one good practice I'd give you is to keep resets or presets buffered. Meaning they should come from a FF
<ylamarre> So you have this module that is preset or reset at some point by a signal 'rst'
<sxpert> something like "blah <= ~blah;" ?
* sxpert missed somtething there
<ylamarre> So you want something like (reset at the bottom style):
<ylamarre> always @(posedge clk) begin
<ylamarre> my_reg <= selector ? reg_b : reg_a;
<ylamarre> if ( rst == 1'b1 ) begin
<ylamarre> my_reg <= 1'b0;
<ylamarre> end
<ylamarre> end
<ylamarre> In this case, reset was probably useless since it's a mux, but I didn't have a good example.
<ylamarre> You need more code...
<ylamarre> lemme expand a bit..
<ylamarre> i'll include reg_a and reg_b and we'll have something to work with...
<ylamarre> and it'll give us something to compare legacy reset and at-bottom reset.
<ylamarre> always @(posedge clk) begin
<ylamarre> reg_a <= reg_a + 4'b1;
<ylamarre> my_reg <= selector ? reg_b : reg_a;
<ylamarre> reg_b <= 4'hA;
<ylamarre> if (rst == 1'b1) begin
<ylamarre> reg_a <= 4'b0;
<ylamarre> reg_b <= 4'h5;
<ylamarre> end
<ylamarre> end
<ylamarre> always @(posedge clk) begin
<ylamarre> if (rst == 1'b1) begin
<ylamarre> reg_a <= 4'b0;
<ylamarre> reg_b <= 4'h5;
<ylamarre> end
<ylamarre> else begin
<ylamarre> reg_a <= reg_a + 4'b1;
<ylamarre> reg_b <= 4'hA;
<ylamarre> my_reg <= selector ? reg_b : reg_a; //"Wrong" as it introduce a CE
<ylamarre> end
<ylamarre> end
<ylamarre> IOk, here we go...
<ylamarre> So the first case shows what we mean by reset-at-bottom.
<ylamarre> The top part of the always block is our logic and the bottom part is handling our control signal logic.
<ylamarre> By control signal I mean: PRESET, RESET and CE (Clock Enable)
<ylamarre> In the secound case, rst has to go through an inverter then to CE pins of my_reg's FF
<ylamarre> This adds unecessary logic and routing.
<tnt> sxpert: you're not targetting the ice40 right ?
<ylamarre> my_reg is not reset because it doesn't need to be reset since it's value won't be used when reset is deasserted (somewhere else in the design)
<ylamarre> It doesn't matter which FPGA is targeted, crap coding style is crap coding style ;)
<tnt> ylamarre: I'm raising it because the arch of the ice40 makes async reset much preferrable imho. For other (ecp/xilinx/...) not so much.
<ylamarre> Notice, I also provide the length of all my signals. This ensures there are no surprises if register width should expand later on.
<ylamarre> tnt: why?
<tnt> ylamarre: in the ice40, sync resets is gated by CE. (so you need CE=1 for RST line to work).
<ylamarre> Oh! good catch!
m4ssi has quit [Remote host closed the connection]
<ylamarre> I heard something like that for some Intel parts, but never bothered with them so, I never really cared, but good to know thanks.
* ylamarre has only ever really used Xilinx parts
<ylamarre> We'll modify later to accomodate for that.
<tnt> I think sxpert is working on ECP5 IIRC ?
<tnt> ylamarre: I worked on xilinx pretty exclusively as well until I started doing ice40 stuff ~ 1 y ago.
<tnt> Still need to get into ECP5 ... got the hw, just no time yet to dive into it.
<ylamarre> I think both...
<ylamarre> sxpert: Ok, so in your code, alu_active is a reset and CE merged together... avoid this!
<ylamarre> sxpert: You provide me with a good example here: wire [1:0] phase;
<ylamarre> assign phase = i_clk_ph + 3;
<sxpert> well, alu_active is indeed reset and module_enable
<ylamarre> verilog integers are 32bits signed
<sxpert> so that should be "2'b11" and not "3" ?
<sxpert> or maybe
<ylamarre> If that's what you want...
<sxpert> "- 2'b01" ?
<ylamarre> Yes!
<ylamarre> that's better.
<ylamarre> 'cause that's really the intent.
<sxpert> right
<sxpert> I thought the tool would cast the integer value to whatever the destination size is
<ylamarre> Nope
<sxpert> I see
<ylamarre> It's in the standard somewhere
<ylamarre> Don't assume!
<sxpert> ok
<ylamarre> I'd also favor "- 2'd1"
<ylamarre> Sign extension and subtle bit truncation will bite you back...
<ylamarre> That's garanteed by Murphy!
zem has quit [Ping timeout: 240 seconds]
<ylamarre> also, you have unbuffered substration followed by comparison going those some other logic which eventualy goes into reset and preset...
<ylamarre> Your optimiser might save you, but I wouldn't rely on that.
<sxpert> hmm
zem has joined ##openfpga
<ylamarre> Reversing the logic, my understanding here is you have logic that are only active one fourth of the time depending on the phase...
<sxpert> yeah
<sxpert> phase_0 is prepare commands onto the bus
<sxpert> phase_1 is when the other device on the bus does things
<sxpert> phase_2 is when the instruction decoder happens
<sxpert> phase_3 is when instruction execution starts or happens
<sxpert> ylamarre: I have updated the thing, does it look any better ?
<ylamarre> Ok, and with all the comparisions and all that, instead on "saving" some flops on the adders and messing up your decoding logic, you might want to have a shift register, a one-hot state machine if you will
<ylamarre> Code still looks the same on github...
<sxpert> hadn't pushed, sorry ;)
<sxpert> done
<ylamarre> Not exactly what I'm trying to show you...
<sxpert> ah
pointfree has quit [Excess Flood]
pointfree has joined ##openfpga
<ylamarre> I'll rework the code on my way back home, I think it's a good example of how coding style "drives" the tool, but it's taking too much time and I need to work.
<sxpert> ok, no problem
<sxpert> I'll continue adding functionnality, and rewrite according to the new set of rules
<azonenberg_work> oh yay, my order of amber 3ml syringes from amazon is out for delivery
<azonenberg_work> Now i can play with my new UV cure epoxy
<sxpert> ylamarre: changed the clock generation to a shift register as you said
xdeller__ has quit [Read error: Connection reset by peer]
xdeller__ has joined ##openfpga
<sxpert> azonenberg_work: the amber color is to prevent the thing from curing in the daylight ?
<azonenberg> yeah
<azonenberg> they're basically a LPF with a cutoff in the mid-yellow range
<azonenberg> so green/blue/UV are blocked
<ylamarre> azonenberg: Like those sleeping glasses?
<azonenberg> Specifically labeled as for UV-sensitive materials
eigenzer_ has joined ##openfpga
eigenzer_ is now known as hliou
hliou is now known as hailang
* sxpert notes the optics dudes @work using similar glass based materials for filters in front of their IR cameras they send to space
<fseidel> have any of you gotten yosys to emit a blif file that vpr can consume?
<fseidel> specifically, how can I make it emit DFFs that VPR doesn't choke on?
hailang has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<azonenberg_work> sxpert: that might be a visible light cutoff filter too though
<azonenberg_work> if they only want to see IR
<azonenberg_work> a lot of IR lenses i see are covered with a filter that reflects most visible light and transmits IR
<azonenberg_work> (absorption is frowned upon as it heats up the lens and can distort it)
<sxpert> yeah, that's pretty much what they use
<sxpert> they want to prevent anything besides IR from reaching the sensor, and mess up measurements
<azonenberg_work> Yeah
<azonenberg_work> Because most of the CCD or similar sensors are pretty wideband
<sxpert> a photon is a photon...
<azonenberg_work> and will gladly eat any photons that hit them from like 300 to 1500nm or even wider
<sxpert> yeah, those sensors are basically full spectrum
<sxpert> any photon touching it is go for counting
<kc8apf> fseidel: I thought @mithro was doing work on that in https://github.com/symbiflow/yosys
<mithro> fseidel: We have a number of scripts which use Yosys with VPR
<mithro> fseidel: there is however a pretty tight binding between how to generate the output and the architecture files you are using
m_w has joined ##openfpga
futarisIRCcloud has joined ##openfpga
genii has quit [Read error: Connection reset by peer]
m4ssi has joined ##openfpga
Bike has joined ##openfpga
davidc___ is now known as davidc__
renze has quit [Quit: Spaceserver reboot?!]
renze has joined ##openfpga