GenTooMan has joined ##openfpga
<whitequark> tnt: yes. i have used it a lot.
<whitequark> tnt: but regarding the incoming 0xff with large writes, that's interesting
<whitequark> i recall seeing some similar bug. not sure the context.
<whitequark> daveshah: poke
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
unixb0y has quit [Ping timeout: 244 seconds]
m_w has quit [Ping timeout: 246 seconds]
unixb0y has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
dj_pi has joined ##openfpga
dj_pi has quit [Ping timeout: 246 seconds]
m_w has joined ##openfpga
unixb0y has quit [Ping timeout: 272 seconds]
unixb0y has joined ##openfpga
Miyu has quit [Ping timeout: 244 seconds]
GenTooMan has quit [Quit: Leaving]
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
dj_pi has joined ##openfpga
catplant has quit [Quit: WeeChat 2.2]
catplant has joined ##openfpga
rohitksingh_work has joined ##openfpga
sgstair has quit [Remote host closed the connection]
sgstair has joined ##openfpga
Bike has quit [Quit: Lost terminal]
dj_pi has quit [Quit: Leaving]
rohitksingh_work has quit [Ping timeout: 250 seconds]
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
rohitksingh_work has joined ##openfpga
rohitksingh_work has quit [Client Quit]
rohitksingh_work has joined ##openfpga
rohitksingh_wor1 has joined ##openfpga
rohitksingh_work has quit [Ping timeout: 240 seconds]
emeb has quit [Quit: Leaving.]
rohitksingh_wor1 has quit [Read error: Connection reset by peer]
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
rohitksingh_work has joined ##openfpga
_whitelogger_ has joined ##openfpga
_whitelogger has quit [Ping timeout: 250 seconds]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
Sailo has joined ##openfpga
Sailo has quit [Client Quit]
<daveshah> whitequark: morning
<daveshah> How goes?
<whitequark> daveshah: going to actually write the techmapping part now
<whitequark> the graph manipulation part is mostly done. i think there's one more bug
m4ssi has joined ##openfpga
<daveshah> Very nice
<whitequark> ah, i haven't read the paper very carefully.
pie__ has quit [Ping timeout: 252 seconds]
<whitequark> or rather, carefully enough.
<whitequark> yep. works now.
m4ssi has quit [Quit: Leaving]
gardintrapp has joined ##openfpga
jcreus has joined ##openfpga
<whitequark> you can play with it and see if you can break it... should be ready
<whitequark> everything other than actual mapping
<daveshah> Let me see, building now
<daveshah> whitequark: getting the error
<daveshah> > ERROR: Multiple drivers found for wire $techmap$techmap$sub$addsub.v:2$1.$auto$$48.$and$<techmap.v>:260$64_Y.
<daveshah> with opt_flowmap -lut 4 on this test design
m4ssi has joined ##openfpga
<whitequark> interesting
<daveshah> Running splitnets fixes it
<whitequark> yeah, gimme a sec
<whitequark> splitnets should not be required
<whitequark> daveshah: oh oh i realized something
<whitequark> i can only handle multibit output cells if the outputs are *independent*
<whitequark> however, trying to consteval will probably catch this
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
Flea86 has joined ##openfpga
rohitksingh_work has quit [Ping timeout: 246 seconds]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
edmund has joined ##openfpga
rohitksingh_work has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
<whitequark> daveshah: interesting.
<whitequark> oh, nvm
<whitequark> daveshah: it works!!
<whitequark> Mapped 62 LUTs.
<whitequark> Packed 143 cells 180 times.
<daveshah> \o/
<whitequark> Equivalence successfully proven!
<whitequark> lemme push
<plaes> o_O :)
<whitequark> abc does it in... 32 cells
<whitequark> this is unsurprising considering two things
<daveshah> what happens with opt_lut afterwards?
<whitequark> first, opt_flowmap only does packing, it does absolutely no logic optimization
<whitequark> second, opt_flowmap in its current state doesn't even try to optimize for area.
<whitequark> so, well, it's no wonder that area sucks.
<daveshah> indeed
<whitequark> gate2lut followed by opt-lut gives 124 LUTs
<whitequark> so it's twice as good as my old work :D
<whitequark> should publish a paper, clearly. "we report up to 2x improvement" etc
<daveshah> :D
<whitequark> opt_lut cannot do anything after opt_flowmap
<whitequark> predictably
<whitequark> opt_flowmap *is* optimal
<daveshah> that's over what, 2 weeks?
<whitequark> 2 weeks doing what?
<daveshah> so by the end of the year you'll be >2^20 times better
<daveshah> at the current rate of improvement
<whitequark> lmao
<daveshah> imagine, picorv32 fitting in a small fraction of a LUT running at terahertz
<whitequark> ahahaha
<whitequark> daveshah: pushed.
<whitequark> please torture it.
<daveshah> so on a simple "ALU" I wrote for some synthesis experiments a while ago I see 125 LUTs compared to 85 for ABC
<daveshah> not too bad
<whitequark> is that for ice40?
<daveshah> without carry, just mapping to LUT4s
<whitequark> ah ok
<daveshah> looking at delays now
<daveshah> seems it is actually marginally faster than ABC!
<daveshah> 9.3ns for ABC, 8.9ns for flowmap, post pnr
<whitequark> holy shit.
<daveshah> well, next seed for flowmap gives 9.3ns
<daveshah> so I think basically identical
<whitequark> they're both optimal then
<whitequark> unsurpirsing
inquisitiv3 has quit [Remote host closed the connection]
<daveshah> So replacing `abc -lut 4` with `opt_flowmap -maxlut 4` in synth_ice40 and enabling carries I see a few problems
<daveshah> With this design I see a stray $_NOT_ in the output
<whitequark> oh interesting
<whitequark> I think I know why that happens... it's introduced during techmapping probably
<daveshah> With picorv32.v and picorv32_top.v from nextpnr/ice40 I see an assert failure even with -nocarry
<daveshah> `ERROR: Assert `!x[sink] && xi[sink]' failed in passes/opt/`
<whitequark> iiiinteresting
<daveshah> I am seeing the $_NOT_s even without carry on another design (rs232demo), fwiw
<whitequark> uhm.
<whitequark> cell $_NOT_ $auto$$449
<whitequark> wonder why this happens
<whitequark> i'm confused, why does yosys call my pass twice?..
<whitequark> hm, it doesn't anymore. what.
<whitequark> daveshah: figured it out
<whitequark> i don't consider inputs to hard logic when packing
<whitequark> so, it packed that NOT into something else
<whitequark> but it was feeding a carry
<daveshah> aha
<daveshah> that makes sense
<whitequark> easy enough to fix
<daveshah> I see it without carries too, I guess the problem there is the DFF
<whitequark> it can appear in a number of cases
<whitequark> any time a NOT feeds something that isn't packed into a LUT.
<whitequark> probably applies to module ports too
<whitequark> hm, wtf
<whitequark> I get that assert to... but *only* if i don't do write_ilang/read_ilang
<whitequark> oh a segfault... looks like ub
<Flea86> "<daveshah> imagine, picorv32 fitting in a small fraction of a LUT running at terahertz" Will it be using a bit-serial CPU design? ;D
<whitequark> daveshah: fixed some of the bugs
<whitequark> there are no stray inverters anymore
<daveshah> yep, looks good now
<daveshah> fwiw, the assert happens for me even going through ilang
<daveshah> both on the old code and again now
<whitequark> yeah, looking at it
<whitequark> i think it's an iteration order thing, maybe
<daveshah> sounds likely
m4ssi has quit [Quit: Leaving]
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
<whitequark> daveshah: ha.
<whitequark> there is a FIXME in code that says "I'm probably missing something".
<whitequark> well.
<whitequark> I am missing something.
<whitequark> ... and it requires me to rewrite half the pass :S
<whitequark> well not exactly
rohitksingh_work has quit [Read error: Connection reset by peer]
gardintrapp has quit [Ping timeout: 246 seconds]
<whitequark> daveshah: ahah wow
<whitequark> this deficiency was seriously pessimizing density
<whitequark> ah nvm, i misread
<daveshah> easy to do...
<whitequark> daveshah: okay, pull
<whitequark> this should fix all of the issues? i think?
<whitequark> gonna do a small cleanup then and then look at cut volume maximization
<daveshah> picorv32 builds now
<whitequark> whooo
<whitequark> what about fmax?
<whitequark> and density?
<daveshah> 64.22MHz Fmax
<whitequark> what about abc?
<daveshah> running now
<daveshah> 70.58MHz for abc
<whitequark> huh.
<whitequark> so
<daveshah> big part of the crit path is a carry
<whitequark> if i improve it a bit
<whitequark> we can make it the default synth_ice40 flow?
<daveshah> yes
<whitequark> cumulative time to replace abc: O(48 hours)
<whitequark> ...
* swetland applauds
<whitequark> daveshah: can you make me a few netlists i can use to check density?
<whitequark> i have your logic.v
<sorear> does synth_ice40 currently default to abc or the old yosys techmapper?
<whitequark> there is no "old yosys techmapper" :D
<whitequark> i wrote it like 2 weeks ago
<jcreus> whitequark: just curious, compared to abc (which should be open source) what opts is it missing?
<daveshah> a fake ALU
<whitequark> jcreus: (abc being open source) i invite you to read the source code
<whitequark> as for which opts it's missing, well, right now it completely disregards area
<daveshah> afaik this isn't doing any balancing yet either
<daveshah> which might be why ABC manages to produce a lower delay
<whitequark> yeah, the only thing it does is packs gates into luts
<jcreus> whitequark: whooops sorry yeah, I should do that
<jcreus> asking in case you already knew
<sorear> I believe that as of a week ago it was possible to use yosys without abc, but the QoR was consistently 2-3x worse than abc
<whitequark> jcreus: yes. i made it possible.
<sorear> and your work has gotten the difference down to 1.1x-1.2x and still shrinking
<swetland> is there a library of testcase designs for exercising/verifying synthesys, pnr, etc?
<whitequark> it is "new" yosys techmapper, and the one i'm finishing right now is "very new"
<whitequark> i wrote both of them
<whitequark> swetland: the papers refer to some kind of benchmarks
<whitequark> i have no idea how representative it is
<sorear> I've only been following for a year or so, if you wrote it in 2016 it's "old" to me
* swetland nods
<whitequark> swetland: i added -noabc switch a few weeks ago
<whitequark> to synth_ice40
<whitequark> if we're talking about fpgas, there was no way to do lut techmapping without abc before that
<whitequark> er
<whitequark> sorear: ^
<sorear> oh >_>
<sorear> so why isn't there literally SPEC for FPGAs. they do GPUs now
<sorear> while trying to figure out (yesterday) what the benchmark set flowmap was using, I found, which appears to overlap but not be it
<daveshah> there are definitely academic benchmark suites
<daveshah> whether they are actually a useful optimisation target compared to just hunting down Verilog from the internet, I don't know
<whitequark> daveshah: bleh
<whitequark> so i tried to make a "clever" representation of Nt''
<whitequark> and it caused me no end of issues
<whitequark> all to traverse 2x fewer nodes
<whitequark> i should've just represented those nodes explicitly right away
<whitequark> daveshah: pushed again
<whitequark> i think this is the final version of the basic algo
<whitequark> okay, now to figure out how cut volume maximization even works
<jcreus> whitequark: haven't read literature so I could be totally off the mark
<jcreus> but check out the largest eigenvector of the graph laplacian
<jcreus> if you're talking about a standard graph
<jcreus> scratch that I should probably read about it first
<whitequark> lol no it's something like "traverse these specific subnodes and add them to the cut"
<whitequark> i just don't understand which subnodes exactly
<jcreus> ah okay nvm, sorry
<sorear> are there other important dev channels besides #yosys, r/yosys, and the tracker?
<whitequark> apparently these guys are in some kind of hall of fame
<whitequark> well it *is* a really good algorithm
<whitequark> oh
<whitequark> oh it's this specific paper in fact
<daveshah> Makes sense
<whitequark> i like how they are actually targeting real xilinx devices in all their papers
<whitequark> as opposed to some made up academic arch
<daveshah> Yeah, this is a big problem with the VPR stuff
<daveshah> It's hard to really evaluate algorithms on fake architectures with fake routing and constraints
<whitequark> daveshah: thought: "spherical fpga in vacuum"
<daveshah> This is exactly why we threw away vpr
<Ultrasauce> routing on a spherical fpga would be interesting
<whitequark> daveshah: ohhh
<whitequark> it *already* maximizes cut volume
<whitequark> that part of the paper was proving that it optimizes for area
<whitequark> it's just somewhat opaquely written
<whitequark> ok, so it's basically finished then?
Miyu has joined ##openfpga
<whitequark> daveshah: so, i am looking at FlowSYN
<whitequark> this looks like it would bring us on par with abc easily.
<whitequark> hmm
<whitequark> I wonder what's more interesting to implement, CutMap or FlowSYN
<whitequark> maybe both?
<whitequark> CutMap is actually optimal [in more terms than FlowMap] but it doesn't do resynthesis
rohitksingh has joined ##openfpga
<whitequark> so, CutMap would preserve debug information well
<whitequark> on the other hand FlowSyn will give better synthesis results
gardintrapp has joined ##openfpga
<daveshah> So maybe CutMap ~O0, FlowSyn O1/O2 and abc_aig with retiming O3
<whitequark> something like that
gardintrapp has quit [Ping timeout: 250 seconds]
<whitequark> also, is there a reason we cannot add retiming to yosys?
<whitequark> is this another one of those things that everyone thinks is so hard only abc can do it, but there's a paper from '94 that does it well?
<daveshah> no idwa
<daveshah> *idea
<daveshah> I know there are some fancy sequential optimisations that we don't play with
<daveshah> I'm sure we can have some kind of retiming in Yosys though
<whitequark> daveshah: so i googled "jason cong retiming"
<whitequark> "An Efficient Algorithm for Performance-Optimal
<whitequark> FPGA Technology Mapping with Retiming"
<daveshah> heh
<daveshah> I did simple retiming-type transformations in a HLS tool, it wasn't too hard
<whitequark> lmao it's another one of his *map series algorithms
<whitequark> and based on the same fundamental concepts
<whitequark> did he look at the state of synthesis in 1991, decide that it's all shit, and start over properly?
<whitequark> because it sure looks like it
<daveshah> does seem so
<whitequark> using retiming to hide *global interconnect deays*
<whitequark> this seems like it'd need nextpnr support
<daveshah> I think that might be more for ASIC than FPGA
<daveshah> from memory ASIC flows have much larger routing to logic delay ratios
<whitequark> oh interesting
<daveshah> logic delay only based retiming should be good enough for most FPGA cases
<daveshah> not to say interconnect delay retiming won't be useful at all
<whitequark> so i looked through the slides
<whitequark> and
<whitequark> it's literally the same basic graph manipulation as in FlowMa
<whitequark> i can definitely implement it
<whitequark> this guy is a genius. no normal person can do so many optimal algorithms *and* make them easy to implement
<swetland> retiming feels like one of those things you want to have individual control over
<whitequark> well either that or everyone else just sucks a lot
<daveshah> ad retiming - the current Yosys/ABC retiming is badly broken, definitely need to think about improving things (this was going to be fixed in the abc/xaig stuff)
<daveshah> right now it gives a separate netlist to ABC for each clock domain
<daveshah> which seems sensible except "clock domain" includes different ce/sr too
<whitequark> oh
<swetland> as optimizations go, since shuffling the relative positions of logic vs registers feels like one of those "makes it harder to understand the final results and has opportunity for subtle effects"
<daveshah> yes, that's why almost no flows - even vendor - have significant retiming enabled by default
<whitequark> >h. By gradually increasing the depth bounds, we
<whitequark> are able to produce a set of mapping solutions with smooth
<whitequark> area and depth trade-off for a given design.
<whitequark> oooh.
<sorear> even asic flows you opt in retiming on a module by module basis
<daveshah> that makes sense
<daveshah> guess you might want it in say a filter pipeline, but definitely not in your control mcu
<sorear> rocket does this for the FPU, much nicer to do a combinatorial FMA unit and stick four stages after it then to try to figure out where to put pipeline regs *inside* a FMA
<swetland> yeah one thing I would love to see better support for is setting boundaries (at the module level) for certain classes of optimizations. just from a making-it-easier-to-reason-about-results standpoint. sure optimize the cpu and the peripherals, but don't start combining logic/registers or pushing bits of one into the other
<whitequark> so, hierarchical synthesis?
<daveshah> afaik at the moment that is in the realm of supported, but not well used enough to be bug free
<swetland> I think so.
* swetland will happily add it to his list of things to experiment with.
<daveshah> I think some of the people doing ASIC stuff with Yosys have done fully hierarchical stuff
<daveshah> should just be a case of adding -noflatten to synth_ice40; then flatten and opt_clean at the end to make a flat netlist for nextpnr
<swetland> maybe more subtle if you want some hierarchies flattened
<swetland> eg, cpu may be composed of 4-5 modules (regfile, decoder, etc) and I don't mind *them* being optimized together, just want to keep the top level components (cpu, peripherals, etc) from bleeding into each other)
<whitequark> probably a (*flatten="always"*) attribute or something
<daveshah> There is (* keep_hierarchy *) already
<daveshah> > The keep_hierarchy attribute on cells and modules keeps the flatten command from flattening the indicated cells and modules.
<swetland> but that makes nextpnr cranky, no? need to do some kind of force flatten as a final step perhaps?
<daveshah> ah no, that's not quite the same thing
<daveshah> yes, need `opt_clean; flatten` before netlist export
<daveshah> nextpnr will support hierarchical netlists at some point, but imo we should get rid of the hand-written JSON parser first
<whitequark> handwritten json parser....
<daveshah> badly ripped out of Yosys too (don't @ me, not my code)
<whitequark> daveshah: oh what the hell...
<whitequark> FlowMap-r is ... trivial to implement
<swetland> at least it's not xml ^^
m4ssi has joined ##openfpga
gardintrapp has joined ##openfpga
<whitequark> and it also preserves all original network properties
<daveshah> very nice
<whitequark> ahaha
<whitequark> >Experimental results show that the run time of SeqMapII for
<whitequark> computing the optimal solutions is too long in practice (e.g.,
<whitequark> more than 12 h of CPU time for a design of 134 gates on a
<whitequark> SPARC5 workstation).
<whitequark> shots fired.
<gruetzkopf> sparc5?
<whitequark> it's from the retiming paper
<gruetzkopf> do they mean sparcstation 5
<gruetzkopf> or a ultra 5
<gruetzkopf> important difference
<qu1j0t3> 1998. they mean an SS5
<edmund> mwk: can you please send me your email address
<mwk> edmund: and I thought you had it already?
<gruetzkopf> qu1j0t3: 1998-01 was the first month of ultra5 availability
<edmund> mwk: Lost in translation :-)
<edmund> You should have gotten an invite by email.
<qu1j0t3> gruetzkopf: hm :)
<tnt> whitequark: in glasgow, are unused pins force to 0 somewhere ?
knielsen has quit [Ping timeout: 240 seconds]
<tnt> Mmm, looks like in the verilog io[n] (n being the unused IO) is not connected to anything ... and then yosys or nextpnr decides to make that 0 which is rather inconvenient.
emeb has joined ##openfpga
X-Scale has quit [Ping timeout: 244 seconds]
knielsen has joined ##openfpga
X-Scale has joined ##openfpga
GuzTech has quit [Ping timeout: 246 seconds]
GuzTech has joined ##openfpga
GuzTech has quit [Remote host closed the connection]
pie__ has joined ##openfpga
m4ssi has quit [Quit: Leaving]
<tnt> mithro: btw, not sure if that's helpful at this stage but I wrote some hack to run sigrok PD inside gtkwave. Useful to debug USB probably.
<mithro> tnt: that would be useful
<tnt> mithro: do you have a .vcd for me to try it on to make sure it works with the usb pd first ? Then I'll send it along with how to run it. (gtk wave tends to be a bit picky and just plain freeze if you don't do thing exactly right)
<mithro> tnt: hrm, not at this very moment
<mithro> tnt: should generate them
edmund has quit [Quit: Ex-Chat]
m_w has quit [Quit: Leaving]
_whitelogger has joined ##openfpga
rohitksingh has quit [Ping timeout: 250 seconds]
mumptai has joined ##openfpga
edmund has joined ##openfpga
gardintrapp has quit [Remote host closed the connection]
pie__ has quit [Ping timeout: 250 seconds]
<tnt> Mmm, so how does one try out that flowmap stuff exactly ?
<tnt> (beside building it obviously ...)
futarisIRCcloud has joined ##openfpga
pie__ has joined ##openfpga
egg|anbo|egg is now known as egg|egg
oter has joined ##openfpga
Flea86 has joined ##openfpga