kuldeep has quit [Remote host closed the connection]
m_w has quit [Ping timeout: 244 seconds]
zkms has joined ##openfpga
unixb0y has quit [Ping timeout: 246 seconds]
unixb0y has joined ##openfpga
mmicko has quit [Quit: leaving]
mmicko has joined ##openfpga
jevinski_ has joined ##openfpga
jevinskie has quit [Ping timeout: 268 seconds]
calle__ has joined ##openfpga
mumptai_ has quit [Ping timeout: 250 seconds]
genii has joined ##openfpga
_whitelogger has joined ##openfpga
genii has quit [Remote host closed the connection]
Miyu has quit [Ping timeout: 244 seconds]
rohitksingh has joined ##openfpga
<zkms>
is there a way to ask nextpnr to tell me which delay is the limiting one for the frequency it could reach?
<zkms>
i have a multiply and i tried to loosen the constraints on it by assigning it to a temporary variable (so it could be split up over two clock cycles and not have to fit in just one) and for some reason the report from icetime gives as the critical path that multiply >_<
<sorear>
i'm not aware of FPGA DSPs that can do a multiply with >1 cycle latency
<sorear>
they usually have registers before and after the multiplier but not in the middle of it
<zkms>
this is on ice40 8k which has no hardware DSPs.
<sorear>
if you're doing a big pile of logic, then two registers, you need retiming to make that happy, which I'm not sure how yosys/nextpnr handles
<zkms>
also is there a way to print out what's the contents/names of the inferred block RAMs?
catplant has quit [Quit: zwooop]
<zkms>
does anyone know how to do retiming with yosys?
catplant has joined ##openfpga
azonenberg_work has joined ##openfpga
jcarpenter2 has quit [Read error: Connection reset by peer]
rohitksingh has quit [Ping timeout: 272 seconds]
rohitksingh has joined ##openfpga
<tnt>
zkms: -retime but I doubt it's going to really work.
<zkms>
ok
<tnt>
You might be better off splitting the multiplication yourself.
<zkms>
ah. so i do have to write my own multiplier for my own bitwidth and split it up so it takes two clock cycles :\
<zkms>
i was dreading that ;;
<tnt>
well you don't have to do it from scratch. What's the bitwidth ?
<sorear>
what's the frequency target and how much are you missing it by? (personal interest only)
<zkms>
tnt: right now 9 bit x 9 bit but it's parametrisable and i think i'll change it later, i'm unsure really
<tnt>
and what's the frequency target ?
<tnt>
9x9 is really not that big :/
<zkms>
unsure yet what target needs to be :\
<zkms>
idk maybe 80ish MHz isn't too low :\
<zkms>
but what would you suggest for dealing with the multiply with yosys in an FPGA without hardware multipliers, if i can afford it to take more than one clock cycle? just handwrite a multiplier module and invoke it? or is there a way to use -retime to get it to automatically split the thing into two cycles
<tnt>
well you could try just doing partial products your self.
<tnt>
not ideal, but at least it's not going down to writing the full multiplier from scratch.
<tnt>
although, that's pipelined, if you have multiple cycles to do it and don't have to accept a new data at each cycle, there are other ways to save resources.
<tnt>
What's the target fpga ? lp / hx ?
<zkms>
hx8k
<daveshah>
sorear: loads of FPGA DSPs have pipeline registers inside
<daveshah>
I know even the ice40 ultraplus ones have input, pipeline and output registers
<daveshah>
As do the ecp5 ones
jcarpenter2 has joined ##openfpga
Miyu has joined ##openfpga
Miyu has quit [Ping timeout: 245 seconds]
oeuf has quit [Read error: Connection reset by peer]
oeuf has joined ##openfpga
m_w has joined ##openfpga
m_w has quit [Ping timeout: 245 seconds]
rohitksingh has quit [Ping timeout: 240 seconds]
pie___ has joined ##openfpga
pie__ has quit [Remote host closed the connection]
<ZipCPU>
tnt: If you don't necessarily need speed, I do have a multiply core that runs nicely on the iCE40 without using too many LUTs
<ZipCPU>
Only problem is it takes about N+2 clocks to do an NxN bit multiply
<ZipCPU>
Sorry, that was zkms who was asking, not tnt --- my bad
<zkms>
ZipCPU: would you mind posting a link to it? It might end up useful for me.
<ZipCPU>
It's a traditional shift and add multiply, but with a twist that halves the logic
<ZipCPU>
The twist is that the results are shifted to the right, so that the add can always be applied to the same bits
<zkms>
thanks!
<tnt>
Mm, yeah, you can pack a 8x8 unsigned mul in just 16 LCs for the datapath (and probably 4 LCs or so for control/bit count)
<tnt>
(ok, maybe 17 ... need to handle the overflow bit)
<ZipCPU>
As I recall, the 32x32 mpy used closer to 150 LC's, whereas it was using about 320 before
<ZipCPU>
I'd need to remeasure it to be exact
<tnt>
yeah but looking at your code (1) p_b you can actually use the LSBs of the partial product register for it ... you need less and less bit for it as you advance and those bits get replaced by the valid bits of the partial product.
<tnt>
(2) if you're careful you can make sure the adder and mux can be merged into the same cell using the -relut option of yosys.
<tnt>
doing quick build of you code the OPT_SIGNED is like 50% of the size :/ signs sucks.
calle__ has quit [Quit: Verlassend]
mumptai has joined ##openfpga
pie__ has joined ##openfpga
pie___ has quit [Remote host closed the connection]
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
Miyu has joined ##openfpga
<_whitenotifier>
[whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fp9WY
<_whitenotifier>
[whitequark/Glasgow] whitequark 1ac9087 - gateware.boneless: oops, accidentally swapped operands of SUB and CMP.