Bike has joined ##openfpga
genii has quit [Ping timeout: 268 seconds]
pie___ is now known as pie__
genii has joined ##openfpga
azonenberg_work has quit [Ping timeout: 245 seconds]
emeb has quit [Quit: Leaving.]
pie__ has quit [Remote host closed the connection]
pie___ has joined ##openfpga
jevinskie has quit [Ping timeout: 250 seconds]
Richard_Simmons has joined ##openfpga
Bob_Dole has quit [Ping timeout: 268 seconds]
catplant has quit [Quit: travels]
jevinskie has joined ##openfpga
catplant has joined ##openfpga
catplant has quit [Quit: WeeChat 2.2]
catplant has joined ##openfpga
genii has quit [Read error: Connection reset by peer]
unixb0y has quit [Ping timeout: 250 seconds]
unixb0y has joined ##openfpga
catplant has left ##openfpga ["WeeChat 2.2"]
catdemon has joined ##openfpga
Richard_Simmons has quit [Ping timeout: 268 seconds]
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpi0a
<_whitenotifier> [whitequark/Glasgow] marcan 2acbf94 - revC: remove spurious square in F.Paste
Bob_Dole has joined ##openfpga
catdemon has quit [Ping timeout: 246 seconds]
Miyu has quit [Ping timeout: 250 seconds]
ZipCPU|Laptop has joined ##openfpga
azonenberg_work has joined ##openfpga
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpizT
<_whitenotifier> [whitequark/Glasgow] marcan 2ec74e4 - revC: minor beauty tweaks to routing, redo USB 2.0 diffpairs
Bob_Dole has quit [Ping timeout: 268 seconds]
rohitksingh_work has joined ##openfpga
catdemon has joined ##openfpga
Flea86 has joined ##openfpga
Bob_Dole has joined ##openfpga
Bike has quit [Quit: Lost terminal]
Bob_Dole has quit [Ping timeout: 268 seconds]
Bob_Dole has joined ##openfpga
rofl_ has quit [Read error: Connection reset by peer]
rofl_ has joined ##openfpga
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 2 commits to master [+0/-0/±4] https://git.io/fpiV9
<_whitenotifier> [whitequark/Glasgow] whitequark 4d2cae5 - gateware.boneless: add tests for C-class opcodes.
<_whitenotifier> [whitequark/Glasgow] whitequark 6fd5253 - gateware.boneless: add Verilog export.
<whitequark> Number of cells: 2007
<whitequark> ok wtf
<whitequark> oh
<whitequark> oh nvm
SolraBizna has quit [Ping timeout: 252 seconds]
SolraBizna has joined ##openfpga
<whitequark> daveshah: Info: promoting $abc$5528$n24 [reset] (fanout 1)
<whitequark> *sigh*
rofl__ has joined ##openfpga
pie__ has joined ##openfpga
Bob_Dole has joined ##openfpga
Bob_Dole has quit [Read error: Connection reset by peer]
rofl_ has quit [Read error: Connection reset by peer]
pie___ has quit [Read error: Connection reset by peer]
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±2] https://git.io/fpiri
<_whitenotifier> [whitequark/Glasgow] whitequark f4836da - gateware.boneless: allow selecting external bus or GPOs for export.
Richard_Simmons has joined ##openfpga
Bob_Dole has quit [Remote host closed the connection]
jevinski_ has joined ##openfpga
jevinskie has quit [Ping timeout: 268 seconds]
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 2 commits to master [+0/-0/±2] https://git.io/fpiKq
<_whitenotifier> [whitequark/Glasgow] whitequark b8c4939 - gateware.boneless: use split memory ports, if any (-20 LUT).
<_whitenotifier> [whitequark/Glasgow] whitequark cb4b55b - gateware.boneless: use non-transparent read port (-47 LUT, -17 DFF).
<tnt> whitequark: so what's the current usage & f_max ?
<whitequark> tnt: 670 LUT, 25 MHz on UP5K
<whitequark> this kind of sucks
<tnt> :/ yeah 25M is a bit slow. The ice40 luts are really not all that efficient, even a substract needs two layer of logic.
<whitequark> 25 MHz on UP5K is not that bad
<whitequark> I get 60 MHz on HX8K
GuzTech has joined ##openfpga
<azonenberg_work> whitequark: and how fast on a 7 series? :p
<whitequark> azonenberg_work: we'll learn once it works in nextpnr :P
<Flea86> heh
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fpiyx
<_whitenotifier> [whitequark/Glasgow] whitequark 8f88f5b - gateware.boneless: ensure sensible names in Verilog output (-2 LUT).
<_whitenotifier> [whitequark/Glasgow] whitequark created branch master https://git.io/fpiSL
<marcan> I'm trying to wrap my head around how the PLLs interact with pins on HX8K
<marcan> this is very confusing
<marcan> so each PLL has two outputs, and depending on the PLL mode it either uses one or two. and an in-use PLL output eats the pin's *input* buffer, right?
<marcan> (because I'm guessing it uses that path to get into a gbuf?)
<marcan> what about the PLL *input*? I see REFERENCECLK comes from `fabout` on some other IO tile, but the SB_PLL40_*_PAD primitives seem to have a PACKAGEPIN input instead and I don't really understand where that comes from (fixed pin? something else?)
<marcan> it seems one PLL maps PLLOUT_A to the GBIN5 pin, and PLLOUT_B to the GBIN4 pin
<marcan> and the other maps to GBIN0 and GBIN1 respectively
<marcan> so using one PLL eats GBIN0's input buffer (and possibly GBIN1 in dual modes) but those pins can still be used as outputs (including tristate), right?
<marcan> and smae for the other PLL of course
<marcan> *same
<marcan> and the _PAD variants will take over the pins entirely, using them as outputs, right?
<marcan> or is that for inputs?
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fpiS7
<_whitenotifier> [whitequark/Glasgow] whitequark b6ef81b - gateware.boneless: ensure sensible names in Verilog output.
<daveshah> Yes, packagepin is a fixed pin
<daveshah> Some Lattice docs should explain that to you
<marcan> I'm failing to find any docs whatsoever describing the actual two PLLs in HX8K and what resources they consume/map to
<daveshah> For the love of all that is holy, PnR a design with the pins you are using and PLL(s) before taping out
<marcan> clearly.
<daveshah> I'm sure I've found something before
<marcan> I did not expect this to be so confusing
<daveshah> Yeah, the PLLs in the ice40 are clearly a bodge
<daveshah> Management: we need two PLLs
<marcan> part of the problem here is this is Glasgow... so it's intended to be flexible, so there isn't "one true design" that needs to work
<daveshah> I think you've worked this bit out already but each PLL makes two IO pins output only; and one of those becomes a dedicated PLL input with a PAD PLL
<daveshah> The Lattice docs really don't describe that anywhere though
<marcan> yeah, that's what I basically arrived at
<marcan> but just haven't been able to find anything to confirm it
<marcan> specifically, it seems PLLOUT_A is PACKAGEPIN, right?
<daveshah> Yes
<marcan> and then I assume the PLL outputs go on to drive the associated GBINs
<marcan> so effectively PAD PLLs insert themselves between a GBIN and its GBUF
<marcan> (and optionally drive a second output to the neighboring GBUF)
<swetland> yup
<daveshah> The PLL outputs don't have to drive a global buffer
<marcan> right, they have two outputs
<marcan> one for GBIN and one for random logic
<daveshah> They can also drive fabric, then the associated GB can still be driven by fabric
<marcan> (per _A/_B)
<marcan> ah
<marcan> so the GB is still usable in that case
<daveshah> BTW I wouldn't worry too much about using GBIN pins overall
<marcan> (just not the pin)
<daveshah> Yes
<daveshah> Routing globals through a small amount of fabric on the ice40 has never been a problem
<daveshah> When you don't use the global output of the PLL, the PLL output is treated the same as an input buffer output fwiw
<marcan> oh also just to confirm, "output-only" pins doesn't mean they always drive, right?
<daveshah> The PLL is really in the input path
<marcan> so it can still be tristated
<marcan> (just not read)
<daveshah> I've never actually tested that
<marcan> hm, ok
<daveshah> But from the PnR stuff I've done there's no reason it shouldn't work
<daveshah> I'm certain the tristating itself would work
<daveshah> Can't guarantee it doesn't break the PLL in some weird way
<marcan> hm
<marcan> I mean the question is basically can I drive a pin externally and if I need the PLL just not use that pin (but without causing a bus conflict)
<daveshah> Yes, sufe
<daveshah> *sure
<daveshah> You don't have to have an output IO there at all
<daveshah> What you really can't do is use the input buffer
<marcan> cool
<daveshah> The only thing not tested, but certainly should work, is on the fly tristating
<marcan> yeah we don't need that
<daveshah> The _2_PAD variant, unlike the _2F_PAD variant, passes through the PLL input to one PLL output
<daveshah> That is the only way to use the original input
<marcan> yeah, I saw that
<daveshah> But it has to also be used as the PLL input in this case
<whitequark> daveshah: is it me or does yosys 'kind of' suck at synthesis optimization
<whitequark> like
<whitequark> i'm tweaking the design and yosys appears to miss completely obvious optimization opportunities...
<whitequark> unless i'm missing something or migen generates bad verilog
<whitequark> by kind of i mean it seems to do onl ywhat i'd call "peephole optimization" in synthesis
<daveshah> Yosys' QoR is quite poor, yes
<whitequark> ok, thanks
<whitequark> not just me
<daveshah> I have seen around ~30% worse than Vivado in synthesis alone in one paper, although that was a while ago
<daveshah> We are hoping to get some help from Alan in making better use of ABC in the near future
<whitequark> more ABC?!
<whitequark> that sounds horrifying
<whitequark> ABC is bad enough as it is
<daveshah> Write your own logic optimisation tool then
<whitequark> ABC is the single worst usability issue with Yosys and you can't deny that
<whitequark> you can ignore that but it's user hostile
<swetland> yeah I'm seeing 62% more LUTs, 36% more CARRYs, 6% more DFFs in my project vs icecube/synplify
<daveshah> whitequark: sure, a much simpler optimisation pass that preserved way more names might be possible
<daveshah> but your QoR is going to collapse even further
<whitequark> daveshah: well... synplify seems to not be as bad with names as abc, doesn't it?
<whitequark> and it has a better QoR at the same time
<whitequark> so it's clearly possible in theory
<swetland> ideally we'd optimize and keep track of where everything came from, but one thing at a time, I suppose. even many of the vendor tools leave you wondering what went where after they chew on your design enough
<daveshah> Yes, if we were a billion dollar company I'm sure we could get all this stuff working
<daveshah> But we're not
<daveshah> So you've got to pick and choose
<whitequark> I'm wondering if there are missed opportunities *before* abc
<daveshah> Definitely
<whitequark> because I would probably not work on abc but I would probably write a Yosys pass
<daveshah> After ABC too
<whitequark> right, I might look into it some time later
<swetland> may well be. I get the impression the vendor tools use a lot of really high level heuristics to map common patterns to part-specific layouts and/or hard blocks
<daveshah> Sometimes they even do heuristics at the AST level, particularly for BRAM inference
<swetland> the rtl elaboration schematic view in vivado kinda gives some hints about what it's thinking
<daveshah> I'm not convinced this is the best solution though
<whitequark> daveshah: let's say I have a mux
<whitequark> n:1 mux specifically
<whitequark> is it better to control it with one-hots or is it better to control it with an encoded signal?
<whitequark> it looks like yosys cannot convert between those representations
<whitequark> this specific mux is 5:1. do you have any advice on what's the best way to lay out verilog for it for LUT4s?
<daveshah> I don't know
<daveshah> Depends where your control signals come from partly
<whitequark> an FSM
<whitequark> Yosys refuses to optimize the FSM for two reasons
<whitequark> one, it doesn't think it's efficient
<whitequark> second, there's an issue (bug, imo) in Yosys where it cannot handle FSMs that use reg init values
<swetland> I need to do some experiments. curious where the various tools infer muxes vs enables, etc
<whitequark> which I keep meaning to fix
<swetland> ah I see that everywhere.
<swetland> I am naughty and like to have my initial state by 0 (conveniently the initial state of lattice DFFs)
<whitequark> swetland: yes
<whitequark> that's what migen sets it to in this case
<whitequark> and at *least* i think yosys should recognize that...
<daveshah> I doubt there's that much difference either way for only 5 inputs
<swetland> there's a lot of "thou shalt always use resets" in various style guides
<swetland> but if you're specifically targeting an fpga it seems odd not to play to its advantages
<whitequark> swetland: yyyyyep
<swetland> provided you know that you're doing so
<daveshah> whitequark: fwiw both onehot and encoded mux5s end up as 3 LUT4s for me
<daveshah> with Yosys
<whitequark> per bit?
<daveshah> In total
<whitequark> mmm, okay, thanks
<whitequark> daveshah: onehot seems marginally more efficient in my case
<daveshah> Yes, seems quite possible with an FSM
<whitequark> but it's also a weird mux
<whitequark> it's 16 bit, and the bottom 3 bits are more muxed than top 13 bits
<whitequark> let me try and write that down explicitly
<whitequark> hrm
<whitequark> that... made it worse
<whitequark> but also, at best... my microoptimizations saved 2 LUTs
<whitequark> and made code way more gross
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fpid7
<_whitenotifier> [whitequark/Glasgow] whitequark 157f613 - gateware.boneless: fix indentation. NFC.
pie___ has joined ##openfpga
pie__ has quit [Remote host closed the connection]
<tnt> whitequark: just instanciate the LUT4 manually :p works everytime !
<tnt> daveshah: Ah you didn't find any reason not to uncomment that code in the end ? Nice, tx :)
<daveshah> There is another issue do to with feedout that might be causing serious qor issues in some places
<daveshah> I'm trying to fix that too, but it's a bit more tricky
<whitequark> daveshah: is it expected that using -abc2 would result in *less* logic density?
<daveshah> No, but it's also not impossible either
<daveshah> AIUI -abc2 was a hack to keep the swapforth people happy
<whitequark> swapforth?
<daveshah> They were trying to get a forth processor to fit in the 1k
<whitequark> oh
<whitequark> well my CPU definitely fits into the 1k
<whitequark> :D
<whitequark> actually you could fit *two* of them
<daveshah> sweet
<daveshah> you should have put a risc-v emulator on it and submitted it to the contest :P
<whitequark> daveshah: you've seen my new ISA right?
<daveshah> yes
<whitequark> I get ~25 MHz on UP5K
<whitequark> opinion?
<daveshah> I'm not enough of an ISA expert to comment
<daveshah> but 25MHz is pretty good
<whitequark> I'm wondering if I can shrink LUT count by like a factor of two
<daveshah> can you put the json somewhere? I want to see how a few nextpnr improvements change things
<whitequark> would you rather have a JSON with external bus on the FPGA pins, or with a few GPOs?
<whitequark> latter fits on UP5K
<daveshah> latter would be best
<whitequark> this is without any sensible program in it, I could put a blinky there if you wanted
<daveshah> Doesn't matter
<swetland> is there a reliable directive for yosys to ask it to assume all IOs from a module are actually used
<swetland> (eg so if won't trim out high address bits, etc, if I don't have a full memory bus wired up)
<cr1901_modern> (* keep = 1 *) ?
<daveshah> whitequark:
<daveshah> master: min = 22.39 MHz, avg = 23.24 MHz, max = 24.73 MHz
<daveshah> carry improvements: min = 23.16 MHz, avg = 24.47 MHz, max = 25.97 MHz
<daveshah> carry improvements & opt-timing pass: min = 23.95 MHz, avg = 25.73 MHz, max = 28.00 MHz
<daveshah> 16 runs of nextpnr in each case
<marcan> nice.
<cr1901_modern> daveshah: cr1901cc -pedantic, should you provide stddev?
<daveshah> probably
<whitequark> daveshah: that's quite nice
<whitequark> daveshah: how would you try to optimize this design, assuming free tools?
<daveshah> Can't offer more advice than just trying to keep the critical path as short as possible
<whitequark> what about size?
<whitequark> frequency is easier
<whitequark> I think I know how to make it faster, already
<daveshah> No idea, I'm not really an FPGA developer remember
<whitequark> right, sure
<cr1901_modern> whitequark: Are you using sync reset?
<daveshah> I've never optimised something for size before...
rohitksingh_work has quit [Read error: Connection reset by peer]
<whitequark> cr1901_modern: yes
<cr1901_modern> I wonder how many LUTs are being used just for resetting all the signals (since a MUX will need to choose between reset and new value for sync reset)?
<whitequark> hrmmm
<daveshah> ice40 has sync reset in hw
<whitequark> yeah, it makes no difference
<cr1901_modern> well that's my contribution :P. I'm all out of ideas.
<daveshah> whitequark: Tried building boneless myself (I think I'm doing it correctly)
<daveshah> with tnt's dffe_min_ce_use = 4
<daveshah> min = 26.74 MHz, avg = 27.82 MHz, max = 29.35 MHz
<daveshah> (plus the other nextpnr improvements as before)
<tnt> daveshah: ice40 as sync reset in HW but it doesn't override the CE pin ... so if you use sync reset, it sometimes needs to combine the reset signal with the ce signal :/ Using async reset solves that.
<daveshah> tnt: interestingly, ecp5 is the opposite here
<tnt> yeah, xilinx as well ... that really surprised me when I found out ...
<daveshah> 🚨 yosys bug 🚨 running boneless with -retime seems to cause it to dispose of the entire design
<tnt> at first I thought that yosys was doing something wrong ... but re-checking the lattice docs ... nope, ce has precedence.
<tnt> lol
<whitequark> daveshah: niiiiice
<whitequark> regarding -retime
<whitequark> yes, I have hit that
<daveshah> I don't think it would make a difference on this kind of design anyway
<tnt> it moved all the registers ... out of the fpga :P
<whitequark> daveshah: also I tried synplify and synplify seems to consistently discard the entire design
<whitequark> no matter what I'm trying to do
<whitequark> it thinks sys_clk is unused.
<whitequark> this.... makes no sense
<daveshah> lol
<whitequark> see, i might dislike yosys at times, but i really, really hate opaque vendor tools all the fucking time
<daveshah> Now I know nextpnr assertion failures are shit
<daveshah> But synplify assertion failures are 100 times worse
<whitequark> nextpnr gave me very little trouble fwiw
<daveshah> and I've hit them before now
<q3k> yeah, i was doing some synthesis with xst recently
<whitequark> most of my complaints with yosys boil down to "no systemverilog" (and even barely a complaint) and "fucking abc"
<q3k> it also removed nearly all of my design
<whitequark> is there really no alternative to abc? at all? anywhere?
<q3k> although that was a pebkac, i wasn't driving ODDR2 correctly
<q3k> whitequark: not really no
<daveshah> I had LSE once remove *just the deceleration* part of a motor controller state machine
<whitequark> so it's like opencascade ;_;
<whitequark> daveshah: incredible
<daveshah> That could have killed someone or destroyed millions of pounds of machinery lol
<daveshah> Yeah ABC is sadly irreplaceable rn
<q3k> that's somewhat terrifying
<daveshah> I did test that code in several sims and other synth tools and nothing else had any issues
<daveshah> So I'm pretty sure it was an LSE bug
<daveshah> Then again, Yosys removing one read port of a TLB wasn't fun either
<q3k> Final remarks:
<q3k> Unfortunately, there is no comprehensive regression test. Good luck!
<q3k> every fucking time
<daveshah> ad net naming - I think there might be some functionality in ABC that we are not taking full advantage of
<whitequark> daveshah: also... yosys does -flatten right?
<daveshah> Yes, pretty much straight away in the ice40 flow I think
<whitequark> so evne if I had a hierarchical design like azonenberg told me
<whitequark> it would not have helped me here at all
<daveshah> You can run without flattenning
<whitequark> but it breaks somehow, right?
<daveshah> You need a flatten after synthesis
<whitequark> I remember trying it some time ago
<daveshah> nextpnr doesn't support hierarchy yet
<whitequark> and it broke in bizarre, impossible to debug ways
<whitequark> something about tristates? I think?
<q3k> daveshah: what does yosys use abc for right now? only techmapping?
<daveshah> LUT techmapping and optimisation
<daveshah> plus ASIC techmapping
<q3k> so does the 'abc' command do both the techamp and an optimization pass by abc?
<daveshah> Yes
<daveshah> by default
<whitequark> can I tell abc to only techmap?
<daveshah> Maybe, I don't really understand ABC well enough
<daveshah> I suspect that techmapping is also optimisation to some extent
<whitequark> ... can't argue with that
<daveshah> :P
<q3k> whitequark: maybe fucking about with the -script parameter in `abc`?
<q3k> whitequark: (that's an abc script)
<whitequark> hmmm, I wonder if signed integers are broken horribly somewhere in migen or yosy
Miyu has joined ##openfpga
<swetland> I wish I could remember which toolchain it was (quartus or vivado), but long ago I had a small cpu design running out of rom that was readmemh'd in, and it optimized away parts of the cpu that executed instructions that did not exist in the rom
<swetland> which I found simultaneously quite impressive and really annoying
<whitequark> ha
<whitequark> compilers dot jpeg
<whitequark> simultaneously quite impressive and really annoying
<swetland> the solution to and cause of all our problems...
<q3k> dress – This command transfers the node names from an external network to the current network. For this purpose FRAIG package is used to detect functional equivalences between the nodes of the two networks. If a node in the current network has the same functionality as a node in the external network, the name is transferred. If the nodes are equivalent up to complementation, the name is transferred and
<q3k> suffix _inv is appended to it. This is useful because ABC (while preserving PI, PO, and latch output names) does not preserve the internal node names when processing logic networks and AIGs. Using this command, if the internal names are needed, they can be restored after the network processing is over, just before writing the resulting network into a file. Note that if, after calling command dress, the
<q3k> network is not immediately written into a file, the internal node names may be lost again.
<q3k> whitequark, daveshah ^
<q3k> seems yosys is not calling that.
<daveshah> whitequark: didn't we discuss this ~6months ago
<daveshah> or was it with rqou
<whitequark> daveshah: which part?
<whitequark> abc? not with me
<daveshah> there's definitely some deja vu here
<daveshah> must have been rqou
<daveshah> lemme check the logs. I'm sure there was something nasty here
<q3k> i'm sorry if i'm rehashing some old discussion, this is all new to me :)
<daveshah> no problem
<whitequark> I've never heard of dress
<q3k> i just saw it in some random introductory paper to abc
<daveshah> Hmm, seemed that previous discussion was pretty much just "abc is a mess"
rohitksingh has joined ##openfpga
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 3 commits to master [+0/-0/±3] https://git.io/fpPJW
<_whitenotifier> [whitequark/Glasgow] whitequark 2d0123c - gateware.boneless: split JAL and JR handling (-1 LUT).
<_whitenotifier> [whitequark/Glasgow] whitequark b922f7e - gateware.boneless: nicer naming for memory signals. NFC.
<_whitenotifier> [whitequark/Glasgow] whitequark 9ac4020 - arch.boneless: clarify when C/O flags end up in undefined state.
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fpPTY
<_whitenotifier> [whitequark/Glasgow] whitequark 226f6cb - arch.boneless: add ROL alias and ROR pseudo.
genii has joined ##openfpga
emeb has joined ##openfpga
rohitksingh has quit [Ping timeout: 250 seconds]
<_whitenotifier> [whitequark/Glasgow] marcan pushed 5 commits to master [+0/-0/±6] https://git.io/fpPt3
<_whitenotifier> [whitequark/Glasgow] marcan 6423d6e - revC: update incorrect comment re: Vbus threshold
<_whitenotifier> [whitequark/Glasgow] marcan 3489eb2 - revC: connect Port A 4,6 to extra FPGA I/Os as well as GBUFs
<_whitenotifier> [whitequark/Glasgow] marcan aefc5ce - revC: schematic style fix (add wire for net label)
<_whitenotifier> [whitequark/Glasgow] ... and 2 more commits.
<whitequark> daveshah: so, i'm hadn optimizing ALU
<whitequark> and yosys is definitely really bad at inferring good muxes
<daveshah> interesting
<whitequark> daveshah: http://bleyer.org/pacoblaze/picoblaze.pdf page 19
<whitequark> I am implementing this exact operation
<whitequark> except, instead of Sel0 Sel1 inputs to the mux
<whitequark> I have three 1-hot
<whitequark> yosys infers precisely twice as many 4-LUTs as it should
<daveshah> is there some code I can look at?
<whitequark> sec
<daveshah> anyway, have tried `dress` with ABC. It does preserve at least a few more net names
<daveshah> Although in some cases it just preserves Yosys-internal netnames from earlier on which is almost as unhelpful...
<whitequark> daveshah: https://paste.debian.net/1054318/
<whitequark> try running synth_ice40 on this
<whitequark> I would *expect* a synthesis tool to turn one-hot into encoded, then use encoded to drill further down
<whitequark> so, overhead of +2 LUTs, not x2 LUTs
<daveshah> For a circuit this small I would expect ABC to just optimise it outright
<daveshah> let me check something
<daveshah> A parallel_case doesn't change things
<whitequark> i feel what's happening here is that abc is committing to muxing the outputs
<whitequark> or yosys
<whitequark> too early
<whitequark> this almost feels like I should make a bit-slice ALU
<daveshah> I just don't think abc works at that level
<whitequark> :/
<daveshah> For something like this I would pretty much just expect abc to turn it into a big logic function and map it back to LUTs
<whitequark> so not only there is nothing better than abc, but abc is also not very good in the first place?
<daveshah> It is more likely that Yosys is using abc wrong
<whitequark> oh, okay
<daveshah> Alan makes his money telling people how to use abc
<daveshah> That's all you need to know to realise why abc is as it is....
<whitequark> consulting-driven development?
<whitequark> right...
<whitequark> that also likely means that it's completely pointless to try and improve abc
<whitequark> just like with opencascade
<daveshah> Yes
<whitequark> actually, the more i think about it, the more the analogy makes sense
<whitequark> this is *really* depressing.
<whitequark> if i didn't suck really badly at the exact kind of puzzle that abc is, i would have already been on call with edmund.
<daveshah> hmm
<daveshah> synplify makes 13 LUTs out of that design
<whitequark> 13?..
<daveshah> LSE makes 7, as expected
<whitequark> huh.
<whitequark> daveshah: so I'm wondering
<whitequark> there's Souper
<whitequark> I wonder if something similar can be made for FPGAs?
<daveshah> I have no idea
<whitequark> yeah, let me ping john
<daveshah> I dare say I feel similar to ABC now as I did to VPR this time last year. And we all know how it turned out with place-and-route :P
<whitequark> hahaha
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpPmK
<_whitenotifier> [whitequark/Glasgow] marcan b603f66 - revC: simplify VUSB routing, move input cap to top layer, double vias
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpPm9
<_whitenotifier> [whitequark/Glasgow] marcan 6e30c43 - revC: simplify VUSB routing, move input cap to top layer, double vias
<_whitenotifier> [Glasgow] Error. The Travis CI build could not complete due to an error - https://travis-ci.org/whitequark/Glasgow/builds/463374574?utm_source=github_status&utm_medium=notification
<_whitenotifier> [Glasgow] Error. The Travis CI build could not complete due to an error - https://travis-ci.org/whitequark/Glasgow/builds/463375150?utm_source=github_status&utm_medium=notification
<RaYmAn> hm, my Glasgow rev B has some issues (think one of the level shifters is dead), and I'm trying to run the selftest. It seems to fail here because it generates ".PACKAGE_PIN(io[0][0])," for the io SB_IO definitions. I have latest yosys/migen (from master). Any ideas?
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
rohitksingh has joined ##openfpga
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpP3z
<_whitenotifier> [whitequark/Glasgow] marcan 3533e74 - revC: reposition fiducials, add a third corner, add more for FX2, FPGA
<whitequark> RaYmAn: hmm, weird
<whitequark> I think I've seen that bug
<whitequark> yeah, I can reproduce it.
<RaYmAn> cool, so not just me then :)
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpPn6
<_whitenotifier> [whitequark/Glasgow] marcan 6ffb5e5 - revC: minor nudging and tweaking to improve silkscreen around reg area
<whitequark> RaYmAn: fixed, update your migen
<RaYmAn> oO
<GuzTech> Damn, that was fast :P
<RaYmAn> cool, works :D
<RaYmAn> whitequark: thanks :)
<whitequark> RaYmAn: and now with a synthesis test so this doesn't happen again.
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 2 commits to master [+0/-0/±2] https://git.io/fpPcZ
<_whitenotifier> [whitequark/Glasgow] whitequark dd80825 - gateware.boneless: formatting. NFC.
<_whitenotifier> [whitequark/Glasgow] whitequark 28afc4b - applet.selftest: test synthesis.
<RaYmAn> well, that's interesting. pins-loop test passes.
<whitequark> daveshah: so... yosys... pattern matches $alu to SB_CARRY plus SB_LUT4?
<daveshah> Yes
<whitequark> this is more boneless than i expected
<whitequark> and it looks like to get a good ALU I'd have to lay it out manually?
<whitequark> I assume abc doesn't understand SB_CARRY
<whitequark> or does it?
<daveshah> It can only do LUTs
<whitequark> SB_CARRY is a kind of LUT you can't modify or reconnect. I guess.
<daveshah> I think ABC does kinda sorta support these sort of things
<daveshah> I was talking to Eddie about extracting muxes (as in Xilinx and ECP5 logic cells) with ABC
<daveshah> It seems like it might be doable, but just not documented
<daveshah> The ice40 hardware also limits things - a subtractor requires O(2n) logic cells iirc
<whitequark> daveshah: hm
<whitequark> does it really?
<whitequark> I mean I've seen this stated
<daveshah> This is what the vendor tools infer too
<whitequark> hm I see
<whitequark> interesting
ZipCPU|Laptop has quit [Ping timeout: 246 seconds]
<tnt> whitequark: yeah, the carry logic is fixed, so you need to 'pre invert' the 'B' side.
<tnt> I never tried radiant because you can't even instanciate the SB_xxx primitives ... they cahnged all that stuff which means most of my design don't work out of the box :/
<whitequark> tnt: riiiight.
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpPlg
<_whitenotifier> [whitequark/Glasgow] marcan ecd4f6a - revC: another silkscreen tweak near USB connector
<_whitenotifier> [whitequark/Glasgow] marcan pushed 1 commit to master [+0/-0/±1] https://git.io/fpP8w
<_whitenotifier> [whitequark/Glasgow] marcan 8ae10d2 - revC: line up some traces better
GuzTech has quit [Quit: Leaving]
<whitequark> daveshah: so, I'm thinking
<whitequark> daveshah: self.comb += self.o.eq(Mux(self.c, self.a, self.a - self.b))
<whitequark> this should take the same amount of LUTs as the width
<daveshah> Yes
<whitequark> synth_ice40 infers three times as many.
<daveshah> Also not surprising
<whitequark> Number of cells: 31
<whitequark> SB_CARRY 7
<whitequark> SB_LUT4 24
<whitequark> daveshah: so I'm thinking.
<whitequark> can I teach yosys to fold muxes into ALUs?
<daveshah> I see no reason why not
<daveshah> But I have never touched that code either
<whitequark> ok sure
<q3k> mwk: are other old spartans just as cursed when it comes to clock output as S6?
<mwk> ... clock output?
<q3k> mwk: in the S6 you basically have to hack around an ODDR2 if you want to output a clock buffer to a pin
<q3k> mwk: there is no dedicated global-clock-buffer-to-io structure from what I can tell
<mwk> as in, route BUFGMUX out?
<q3k> yes
<mwk> yeah, no such thing
<q3k> mwk: so you just do ODDR2(.CLK(clk),.CCLK(~clk),.O1(1),.O2(0))
<q3k> absolute garbage
<q3k> i mean, at least XST could insert those fuckers
<mwk> well
<q3k> instead the warning that comes up is 'lol demote your clock to not be globally routerd'
<q3k> *routed
<RaYmAn> whitequark: hm, is glasgow revB actually supposed to be able to go > 3.3v VIO?
<mwk> yeah
<RaYmAn> whitequark: also, your selftest is amazing :D
<mwk> you can also just fabric-route it from BUFGMUX to io pin
<mwk> but that's... not exactly pretty
<whitequark> RaYmAn: yes, definitely
<RaYmAn> mine seems to fail at > 3.3v VIO
<RaYmAn> well, that gets me closer to figuring out hte issue at least :)
<mwk> q3k: do any Xilinx FPGAs even have dedicated clock output pins?
<q3k> mwk: i have no idea
<whitequark> RaYmAn: fascinating, I have never seen this fail
<mwk> AFAICT that was never a thing
<RaYmAn> whitequark: bizarely, it seems to be _status() returning ST_ERROR when doing the write_voltage with value over 3.3. Probably something with my board though :(
<whitequark> RaYmAn: hm
<whitequark> what are you doing exactly? can you post a log with `glasgow -vv` ?
<RaYmAn> hrm, looks a bit odd https://pastebin.com/ZytaXRvW - pretty sure the serial is not that.
<whitequark> oh yeah it didn't boot
<whitequark> can you flash it with firmware?
<whitequark> `glasgow flash`
<whitequark> actually, hmm
<whitequark> what even is happening?
carl0s has joined ##openfpga
<RaYmAn> yeaah..I'm certain this used to work, not sure what happened
<RaYmAn> it seems to be totally broken now. Boots up as ez-usb dev kit id, running glasgow factory puts it into glasgow vid/pid, but flash fails.
<kbeckmann> RaYmAn: I think there's an issue with your board.. works for me on both ports (except at 5V where it senses 4.48).
<RaYmAn> well, yeah. there seems to be a lot of issues now :D it's totally broken
<RaYmAn> I think the eeprom might be broken. or at least the contents
<kbeckmann> sounds like an opportunity to upgrade to RevC :D
<RaYmAn> I tried doign voltage on a in-memory loaded glasgow firwmare, and now voltage seems to work
<RaYmAn> but it won't let me flash it again
carl0s has quit [Quit: Page closed]
rohitksingh has quit [Remote host closed the connection]
catdemon has quit [Quit: WeeChat 2.2]
catplant has joined ##openfpga
<sorear> Who are Eddie, Edmund, Alan, and John?
<daveshah> Eddie -> Eddie Hung
<daveshah> Edmund -> Edmund Humenberger, CEO of Symbiotic
<daveshah> Alan -> Alan Mishchenko, author of ABC
<daveshah> John -> John Regehr
<daveshah> Yes, he's now working on nextpnr
Hamilton has joined ##openfpga
<whitequark> any comment?
<whitequark> RaYmAn: hmmmm, very weird
<whitequark> what happens during flashing?
<daveshah> whitequark: I think a generic solution after mapping in ice40_opt might be another option
<daveshah> Basically trying to merge another LUT into the carry LUT
<daveshah> Any LUT2 on the output or any LUT3 on the output sharing a LUT input can be merged in
<whitequark> hmmmm
<whitequark> what about constants?
<whitequark> oh, nvm
<whitequark> figured it out
<daveshah> That is the LUT2 case
<whitequark> good point
<daveshah> Yes, just checked and that's exactly how it ends up
<whitequark> ok this definitely seems nicer *and* more generic
<daveshah> hmm, I remember I wrote a somewhat embarassing techmapper-type-thing about 3 years ago and had a function that did something like this
<whitequark> actually, is this really ice40 specific?
<whitequark> well, I guess it sort of is
<RaYmAn> whitequark: with -vv, it seems to claim to succeed flashing, but fail during verification
<whitequark> RaYmAn: ok yeah your board's fucked alright
<whitequark> check powr
<daveshah> whitequark: it could well be generic, you'd just have to tell it what your LUT primitive was
<whitequark> with a multimeter
<whitequark> daveshah: hmmm
<whitequark> I take it this could be the beginning of a Yosys techmapper... right?
<whitequark> then I can write it, I think.
<daveshah> Yes
<whitequark> sweet.
<RaYmAn> I checked 3.3v on the eeproms and some other power points. looked good. I'll try flashing the eeprom externally with another Glasgow on Thursday
<daveshah> whitequark: Only possible issue is making sure you preserve the "carry" LUT
<tnt> daveshah: LUT4 that share both inputs should also be mergeable right ?
<daveshah> Yes, that case works too
<whitequark> daveshah: so I'd just need to merge into the larger one.
<whitequark> which seems like a good idea in general.
<daveshah> whitequark: tnt's case breaks that (and you have two LUT3s in the a/a+b mux case too)
<daveshah> Setting an attribute on the LUT during arith_map might be an option
<daveshah> to keep the optimisation pass generic
<whitequark> mm, okay
<daveshah> The (not Yosys) LUT merge code I wrote a while ago is intended to merge a LUT connected to the input of the "anchor" LUT, not the output as wanted here
<daveshah> but just in case it is somehow useful
<RaYmAn> whitequark: is there any way to get glasgow to totally wipe the eeproms? If I try to run factory programming while it's in the ezusb dev mode, it says it's already programmed.
<daveshah> Heh, might as well put the whole project on GitHub
<daveshah> This was written in a couple of weeks when I was still pretty inexperienced with this stuff (and used VHDL :P) but is probably as simple as techmapping gets
<whitequark> RaYmAn: uh, not exactly
<whitequark> RaYmAn: there is now, pass --force
<_whitenotifier> [whitequark/Glasgow] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/fpPD9
<_whitenotifier> [whitequark/Glasgow] whitequark a601d51 - cli: implement `factory --force`.
<TD-Linux> oh naisu
<whitequark> daveshah: ok, current plan: add a pass to rip up ice40 luts to $lut
<whitequark> oh
<daveshah> whitequark: I think there is one already
<whitequark> oh this already exists
<whitequark> daveshah: not exactly https://github.com/YosysHQ/yosys/pull/716
<whitequark> it exists *now* though
<whitequark> exisitng one would vomit muxes and gates all over the design
<tnt> daveshah: I wish your opt pass stuff had an 'effort level'. Because I always get beter timing when asking for unrealistically high frequencies :p
<zkms> wait is that in nextpnr?
<tnt> yes.
<daveshah> tnt: interesting
<zkms> nice
<daveshah> It doesn't really work harder if it fails timing
<daveshah> It will just change some of the numbers around a bit
<tnt> Like I ask for 58 Mhz ... it fails at 56 MHz. I ask for 100 MHz, it will give me a 59 MHz one :p
<daveshah> How many seeds are you running it across?
<daveshah> That is still within randomness bounds.
<daveshah> I have tended to see slightly worse results with vastly high frequency targets
<whitequark> daveshah: I've also observed that
<whitequark> but it could be randomness
<daveshah> The timing heuristic in the current placer isn't great
<daveshah> I plan to use the criticality stuff developed in the opt pass to improve the main placer too
<tnt> MM, let me run that on 10 seeds.
catplant has quit [Ping timeout: 252 seconds]
<tnt> Mmm, yeah, you might be right that this was just randomness on both the designs I tested with. Trying on more seed doesn't see to show much difference.
<daveshah> It would be great to have this built in to nextpnr
<daveshah> Just running 8 threads and picking the best should give a 10% improvement in QoR
<daveshah> Although defining best becomes harder once you have multiple clock domains...
<whitequark> daveshah: ok, so, thinking out loud
<whitequark> for any pair of n-LUT and k-LUT sharing m inputs, they can be re-encoded into a (n+k-m)-LUT
<whitequark> right?
<whitequark> *where n-LUT feeds k-LUT.
<tnt> looks right
<daveshah> I think there should be a -1 there too?
<whitequark> right
<tnt> ah yeah, one input of k-lut is the n-LUT out ...
<daveshah> Yeap
<sorear> maybe I can motivate myself to try to do a better logic optimizer at some point
<whitequark> i wish rtlil was... documented
<whitequark> at all
<whitequark> how the hell do i find out what a port is connected to?..
<tnt> whitequark: I ended up doing a SigMap sigmap(mod); to create a map of all connections.
<tnt> Then sigmap(cell->getPort("\\EN")) == sigmap(cell_other->getPort("\\EN")) to test if ports are connected.
<whitequark> hm
<daveshah> This is somewhat briefly documented in CodingReadme
<whitequark> tnt: what does that do exactly?
<tnt> Reading https://github.com/YosysHQ/yosys/blob/master/CodingReadme and it's mention of SigMap that seemed to be the thing to do.
<tnt> whitequark: apparently nets can be connected and have plenty of names ... this maps all the names to some unique id to be able to track what's connected to what.
<daveshah> Yes, unfortunately netlists in Yosys are (and probably have to be) much more complicated than in nextpnr
<daveshah> This is also because Yosys supports coarse grain synthesis and thus word wide ports and nets
<tnt> daveshah: yeah, I can understand that. What surprised me a bit was that yosys wasn't keeping such a "map" up-to-date permanently and you have to build it each time.
<daveshah> Yeah, I'm not sure for the reason behind that either
<sorear> the polarfire soc thing is A Development, no idea what the GA timeline is
<daveshah> Guess a year or so?
<daveshah> That's the usual timescale for FPGAs
<daveshah> Xilinx it's often near enough a year to dev kits, two years to actually regularly stocking at Digikey
<sorear> starting from "email us for limited availability dev kit"?
<daveshah> More like if you're important enough for a dev kit your FAE will drive round with one...
<daveshah> Incidentally, I'm curious if Lattice will release FPGAs with hard RISC-Vs at some point
<daveshah> I think they joined a while ago
Hamilton2 has joined ##openfpga
catplant has joined ##openfpga
Hamilton2 has quit [Client Quit]
Hamilton2 has joined ##openfpga
Hamilton2 has quit [Remote host closed the connection]
Hamilton has quit [Ping timeout: 252 seconds]
Hamilton has joined ##openfpga
catplant has quit [Ping timeout: 250 seconds]
Hamilton has quit [Quit: Leaving]
pie___ has quit [Remote host closed the connection]
pie___ has joined ##openfpga
<whitequark> Found top.$auto$alumacc.cc:474:replace_alu$6.slice[0].adder (cell A) feeding top.$abc$102$auto$blifparse.cc:492:parse_blif$103 (cell B).
<whitequark> Cell A is a 2-LUT. Cell B is a 2-LUT. Cells share 0 inputs and can be merged into one 3-LUT.
<whitequark> Found top.$auto$alumacc.cc:474:replace_alu$9.slice[2].adder (cell A) feeding top.$abc$102$auto$blifparse.cc:492:parse_blif$108 (cell B).
<whitequark> Cell A is a 3-LUT. Cell B is a 3-LUT. Cells share 1 inputs and can be merged into one 4-LUT.
<whitequark> daveshah: ^
<daveshah> oooh
<daveshah> Nice
<tnt> Looking fwd to trying that :)
<zkms> nice
<tnt> So much improvements on the ice40 chain lately, nice to see !
catplant has joined ##openfpga
<whitequark> daveshah: doesn't merge yet
<whitequark> working on that
<whitequark> RTLIL is kind of a pain, though I am warming up to it
catplant has quit [Ping timeout: 250 seconds]
<whitequark> well... it matches my experience from clifford's awesome ehsm talk exactly
<whitequark> it is a good IR, and a pain
<whitequark> like most IRs
<daveshah> Overall I think Yosys as a framework is fine
<whitequark> yeah
<daveshah> It's just the passes need a lot of work
<whitequark> it's not perfect, but it has an immense amount of good stuff in it
<daveshah> I dare say new fpga re has exposed weaknesses
<daveshah> Lack of true dp ram inference and dsp inference
<whitequark> I wouldn't make Yosys as it is currently done, but I can definitely work with it
<whitequark> reminds me of LLVM to an extent
<daveshah> I did speak to Eddie (who has worked with abc a lot) about replacing abc
<daveshah> He suggested that using it better would be a better approach
<daveshah> Apparently even Vivado uses it...
<whitequark> what.
<whitequark> Vivado?!
<daveshah> Verified too
<daveshah> with strings
<daveshah> That certainly suggests we could be using ABC a lot better than we do at the moment
<daveshah> I think the net naming is definitely fixable
<whitequark> I'll keep working on getting non-fucked logic optimization support in yosys though
Flea86 has joined ##openfpga
<daveshah> Definitely
<whitequark> daveshah: think someone will optimize the lut pass?
<whitequark> I'm coding it in uhhhhhh pythonic way
<whitequark> lots of dicts
<whitequark> temporary anyway
<whitequark> someone who isn't me can make bit fuckery out of that but I *loathe* this
<whitequark> I just want maps
<daveshah> Yeah don't worry
<daveshah> I believe dicts are Clifford's magic fast implementation anyway
<daveshah> Chances are for ice40 it will be more than fast enough
<whitequark> I mean my evaluate_lut function takes a dict from SigBit to int
<whitequark> tnt: btw this isn't just for ice40
<whitequark> ecp5 should be able to do this with minor changes
<daveshah> It won't because ecp5 carries are fucky
<daveshah> You have a special primitive that combines 2 bit carry and 2 LUTs
<whitequark> oh.
<daveshah> And an XOR between the LUT and the sum output too
<tnt> whitequark: yeah, I know, I was just speaking in general :P
<whitequark> daveshah: but it could still help if there are no carries?
<tnt> I know some of the improvements that happenned were more generic as wel.=l.
<daveshah> whitequark: that is probably only a transient issue while we sort out how we call abc
<daveshah> I think optimising around carry is the only thing that can't be fixed easily with the current state of abc
<whitequark> hrm, okay
<daveshah> There are some other passes that might be interesting for ECP5
<daveshah> Like trying to make better use of the MUX2s in the logic cell
<daveshah> Right now we only use them to build large LUTs (by mapping up to LUT7s in ABC with increasing cost)
<daveshah> But some structures like large muxes probably have a better implementation
grantsmith has quit [Quit: ZNC - http://znc.in]
<zkms> i dont know what exactly is going on but anything that makes yosys/nextpnr give better results makes me happy!
grantsmith has joined ##openfpga
grantsmith has quit [Changing host]
grantsmith has joined ##openfpga
<mithro> whitequark: In your comments on https://github.com/YosysHQ/yosys/issues/715 -- Your suggestion was that input I0 could be a MUX select for overriding the output of the ALU with one of the inputs?
<whitequark> yes
<whitequark> or a constant
<mithro> whitequark: Okay, great I understood what you were asking for then :-)
<mithro> whitequark: What was the conclusion you came to with daveshah?
<daveshah> That a generic LUT merge pass was more useful than something mux specific
<daveshah> This is really just merging a LUT3 (mux with input) or LUT2 (mux with const) into a LUT3 (alu out)
<daveshah> *alu lut
<mithro> daveshah: It is unclear to me if this should be done at techmap or pnr time?
<whitequark> techmap, definitely
<whitequark> PNR might do lut *re*coding but for other reasons
<whitequark> such as avoiding the need for a VCC net
<mithro> whitequark: I definitely think it needs to be done at techmap with nextpnr but I think with vpr it could be pretty easily done at the clustering/packing stage?
<whitequark> huh?
<whitequark> but this means less accurate synthesis results
<whitequark> and this also means that other backends will not benefit
<whitequark> like e.g. greenpak...
<daveshah> Can VPR really merge two LUTs?
<daveshah> Into a single LUT4
<mithro> I'm very open to the idea that synthesis is the right place
<daveshah> We already do too much at PnR, imo
<mithro> daveshah: I know it can do /some/ transforms like that - but I can also see how I could write an arch model which would enable vpr to do that
<daveshah> I'm of the opinion that IO buffer insertion should be synthesis
<whitequark> yes.
<whitequark> definitely.
<daveshah> mithro: that sounds very fragile and design specific
<daveshah> Compared to a generic LUT merge optimisation at synth time
<daveshah> There's no reason you couldn't do such a thing in nextpnr either
<daveshah> It's just much nicer and more generic in Yosys
<daveshah> Ironically, it would probably be easier to implement in nextpnr than Yosys
<daveshah> Because nextpnr has a much simpler netlist representation due to not needing to support coarse grain or hierarchical netlists unlike Yosys
<mithro> daveshah: I don't think I have enough information to make such a definite claim
<mithro> daveshah: I do think I agree that iobuf insertion should probably be a synthesis
<daveshah> This should be doable already with iopadmap
<daveshah> Although I think that might have some tristate bugs
<tnt> \o/ clifford merged the min_ce_use stuff :)
<whitequark> daveshah: yes
<whitequark> I hit them
<whitequark> and I had to work around them in arachne, iirc
<daveshah> Although tristate is somewhat horrible overall I think we're still behind the vendor tools
<whitequark> yes
<whitequark> quite a bit behind
<whitequark> though, not all that behind the synthesizable verilog spec
<whitequark> synopsys goes above and beyond what I would consider sane
<whitequark> propagating tristate through hierarchy is nice though...
<daveshah> Yes, definitely
<whitequark> oh right
<daveshah> At least tristate buffers inside fpga died a death
<whitequark> *that* was what prevented me from using -noflatten
<whitequark> wait what?
<whitequark> this was a thing?
<daveshah> Virtex II was the last part to have them I think
<whitequark> huh.
<daveshah> Lattice ORCA parts definitely had them
<tnt> ah yeah, good old days :)
<whitequark> what is the reason they are killed?
<whitequark> not that I think they're a good idea, just wondering what was the final nail
<daveshah> Timing I think
<whitequark> ah
<daveshah> Things moved to internally unidirectional buffers
<daveshah> These have the big benefit of redriving the signal and work against capacitance
<whitequark> oh I see
<whitequark> it had bidirectional buffers
<whitequark> yeah... good riddance
<daveshah> And tristate buffers
<whitequark> yeah
<daveshah> I guess the possibility of internal shorts wasn't great either
<whitequark> Found top._27_ (cell A) feeding top._15_ (cell B).
<whitequark> Cell A is a 3-LUT. Cell B is a 3-LUT. Cells share 1 input(s) and can be merged into one 4-LUT.
<whitequark> Combining LUTs into cell A.
<whitequark> Connecting input 0 as top.\c
<whitequark> Leaving input 1 as top.\a [1].
<whitequark> Leaving input 2 as top.\_4_.
<whitequark> Leaving input 3 as top.\_10_ [1].
<whitequark> that seems to do exactly what I want
<daveshah> Looks good!
<whitequark> now, actually reencoding them...
<tnt> You need to check that the Cell B is not a carry chain as well.
<whitequark> that'll be an attribute
<q3k> daveshah: hm, replaying an ecppack svf on the versa-5g using openocd doesn't seem to trigger a reset after it's done
<q3k> daveshah: ie i have to manually press the FPGA GSRN button to get my design to run
<daveshah> This is because initial values are broken in ecp5 synthesis at the moment
<daveshah> I'm working on a fix, but it's a bit ugly
<q3k> i'm actually not using trellis/nextpnr :P
<q3k> i mean, i want to program a diamond .bit using openocd
<q3k> so i am doing uh ecpunpack -> ecppack -svf
<daveshah> Does a Diamond svf trigger a reset?
<q3k> not surea bout an svf, but pgrcmd certainly does
<q3k> i'll generate an svf from diamond and see
<daveshah> I followed the template that Diamond generates
<q3k> hmhm
<q3k> nope, the lattice deployment tool generated svf has the same issue
* whitequark sighs
<whitequark> ok whatever, you get your optimized evaluate_lut
<whitequark> easier to write that way anyway
<daveshah> q3k: guess it is a case of sniffing the pgrcmd jtag then
<whitequark> i... think?
<q3k> daveshah: although hm
<q3k> daveshah: it might not be due to an svf difference
<q3k> daveshah: but maybe the difference between touching the ispclock and bypassing it in the chain?
<q3k> daveshah: can the ispclock reset the ecp5?
<daveshah> The ecp5 has no general external reset pin
<daveshah> Any ispclock reset would be design specific
<daveshah> I'm not even sure if that's hooked up
<q3k> right, but diamond has magic to automatically detect the largest fanout reset net and connect that to GSR
<q3k> and that's what the button on the ecp5 is connected to
<daveshah> No it's not
<daveshah> There is no GSR pin!
<daveshah> There is a GSR primitive internally that can be connected to anything
<q3k> oh no wait.
<daveshah> GSR doesn't even have to be on a pin
<q3k> my design does actually have a normal reset, i forgot this migen target/platform does that.
<q3k> i guess i'll just fix my reset then.
<q3k> still, i was running this earlier and somehow it didn't need a button press.
<q3k> i am slightly confused.
<whitequark> yeah that sounds hella weird
<daveshah> Is there a PLL in there?
pie__ has joined ##openfpga
Maylay has quit [Quit: Pipe Terminated]
pie___ has quit [Remote host closed the connection]
<q3k> daveshah: yes, although its' lock output is not used
<daveshah> Could just be issues with glitching during PLL startup
<whitequark> hmmm
<whitequark> Old truth table: 0110100110010110.
<whitequark> New truth table: 1100100110011100.
<whitequark> this looks... dangerously sane.
<whitequark> now need to disconnect the cell
Bike has joined ##openfpga
Maylay has joined ##openfpga
<whitequark> daveshah: Equivalence successfully proven!
<whitequark> actually, wait, it does something weird.
<whitequark> ok, it works now.
<whitequark> still proven.
<whitequark> ok, one bug left...
* qu1j0t3 read that as one hug left
* qu1j0t3 squints at font