clifford changed the topic of #yosys to: Yosys Open SYnthesis Suite: http://www.clifford.at/yosys/ -- Channel Logs: https://irclog.whitequark.org/yosys
<ZipCPU> Let me take a read of it. I might have some time to put into it.
* ZipCPU wants to add multiple clock support as well.
<awygle> to nextpnr??
<ZipCPU> Absolutely!
* awygle is more startled by the implication that nextpnr somehow "doesn't support" multiple clocks than the desire to add it
<ZipCPU> Sigh. Yes.
<awygle> why is nextpnr even capable of not supporting multiple clocks?
<ZipCPU> Ahh, you misunderstand
<ZipCPU> You can only give it one clock speed currently.
<ZipCPU> Hence, if you have a design that runs at 200MHz for some parts, and 10MHz for others, the whole design will be optimized as though it were all for the same clock.
<ZipCPU> The 200MHz logic might be so poorly placed that it can only run at 50MHz, when (if you hadn't worried so hard about the 10MHz clock nodes) you might have been able to rearrange the design to meet that criteria
<awygle> ah. so you want to tell it to weight the 10 MHz clock nets 20x less than the 200 MHz clock nets
<ZipCPU> Exactly!
<ZipCPU> That's what I would like to do.
<awygle> that seems like a reasonable goal
<awygle> i find it interesting that much of the literature on placement seems to focus on global minimization of net length or timing, when that's really not what you want for fpgas
<awygle> you want to minimize the *longest* path
<awygle> it makes sense for ASICs since you pay for every path there, i wonder if that bleedover is why the literature is the way it is
<ZipCPU> Fascinating!
* awygle relucantly goes back to cleaning the apartment
<sorear> because of low threshold transistors?
<ZipCPU> sorear: Are you asking awygle?
<sorear> Yes, “pay for every path”
<ZipCPU> I understood that to mean because you pay by the area used, and paths consume area, however I'm not an ASIC designer.
<cr1901_modern> >awygle: want to patch yosys' makefile so it stops trying to pass -fPIC on cygwin?
<cr1901_modern> Honestly, not right now :).
<awygle> sorear: ZipCPU: in hindsight that was sloppy language. i *assume* you pay for ASIC tracks somehow (above and beyond just area, although total length is probably a decent proxy for area), but i don't actually know.
<awygle> i would be surprised if transistor threshold was the mechanism, as i imagine that's more-or-less standard for a process, but again, i don't actually _know_ that
<awygle> cr1901_modern: cool :) you're just the only one i know that might know how off the top of their head :)
<awygle> i'll do it myself...eventually :p
<cr1901_modern> Make a new target in the Makefile: https://github.com/YosysHQ/yosys/pull/573/files
<cr1901_modern> (and use the cygwin-provided libpthread)
<awygle> ah yeah i could do that i suppose. i was hoping for magic instead of setting CONFIG manually :p
* awygle is greedy
<sorear> So FPGAs also have a certain total length of tracks available and if you use more than that, routing is impossible
<sorear> But we were talking about timing, not area
<cr1901_modern> why would the limitation be length and not "number of physical tracks left"?
<awygle> yeah it's basically "is this routable" and "what's the longest net (i.e. the net setting the clock rate)"
<awygle> i feel like most placers don't even try to answer "is this routable" until, well, routing
<awygle> which is also a somewhat problematic approach
<awygle> so on question 1 we punt and on question 2 we solve a (potentially quite poor) proxy
<awygle> at least that's the impression i have
digshadow has joined #yosys
emeb_mac has quit [Quit: Leaving.]
<ZipCPU> Wow ... one simple change and the analytic placer is running 10x faster ... cool. Still don't know if it'll produce a worthwhile result (yet).
NB0X-Matt-CA has quit [Excess Flood]
NB0X-Matt-CA has joined #yosys
emeb_mac has joined #yosys
<awygle> is there a flag to pass to icestorm so it installs timings as well?
<awygle> answer - the flag is 'git pull' :p
dxld has quit [Quit: Bye]
dxld has joined #yosys
<awygle> wow making these chipdb c++ files takes _forever_, and apparently happens serially
<awygle> why are we even doing that instead of just accessing the data files?
<sorear> i have the same question
<awygle> especially since it's literally just a huge binary blob?
<awygle> apparently this codebase don't give a _fuck_ about strict aliasing either lol
<awygle> daveshah: why the gigantic string, any idea? or q3k maybe?
<sorear> it's a gigantic string because that uses less memory at compile time than {0,1,2}, that part was explained on twitter
<sorear> not explained on twitter: why not .incbin or just load a file at runtime
<awygle> okay but uh, mmap uses zero memory
<awygle> at compile time
<awygle> lol
<awygle> ugh. what versions of boost is this supposed to work with?
* cr1901_modern is gonna go ahead and run the example on ZipCPU's slide and see if he's right or wrong. ZipCPU: Can we go by the honor system that I'll tell the truth if I was right or wrong :)? I don't feel like being made fun of tonight.
<cr1901_modern> If not I'll give my answer and I'll prob have missed something obvious
<awygle> this is the fifth time i've run make -j8, my laptop is melting
<awygle> cr1901_modern: don't read the backlog, i spoiled the solution earlier
<awygle> (after being quite wrong twice)
<cr1901_modern> I haven't seen it.
<cr1901_modern> in any case, my guess is: Yes it'll pass if you have induction length of 2 or more (and BMC of 2 or more)
<cr1901_modern> Nope I was wrong, let's see why
<awygle> oh hey i managed to build and run nextpnr, that's good
<awygle> whoops nope, segfault lol
<awygle> hm it's a gui problem, the cmd line works
<awygle> sorta irritating to still have to use icepack... wonder if the gui calls that for me
<cr1901_modern> https://gist.github.com/cr1901/6f6edb6fb02b914eee48ae5b586ad6be Okay, BMC fails on first timestep and returns a trace that I have no idea why it's failing...
<cr1901_modern> k-induction succeeds on the third timestep, and given the CEX for the second timestep, I would expect an induction length of 25 before k-induction passes
<awygle> CEX?
<cr1901_modern> counterexample
<awygle> oh
<awygle> would you like an explanation of the BMC failure?
<cr1901_modern> Probably having to do w/ the semantics of $past?
<awygle> correct, $past is.. I guess "undefined", for negative time
<awygle> kind of a dirty trick imo :-P that's the problem with abbreviated examples, how you complete them matters
<cr1901_modern> If you see the line "dummy <= $past", I was trying to get it to actually show me the value of $past for each timestep, but to no avail
<awygle> hm yeah idk if that's supposed to work or not
<awygle> I made that same mistake, if it can be called that, and I was also wrong about induction (I said it would fail)
<awygle> I was working induction the wrong way around, so I didn't correctly interpret the behavior of $past in that context
<cr1901_modern> Wonder why BMC didn't fail on timestep 0 then. At least w/ timestep 1 I would expect $past to "just use the initial value, if one exists"
<cr1901_modern> And also, doesn't help explain why t=3 works for induction. Hrm...
azonenberg_work has quit [Ping timeout: 272 seconds]
dys has joined #yosys
azonenberg_work has joined #yosys
emeb_mac has quit [Quit: Leaving.]
X-Scale has quit [Ping timeout: 252 seconds]
digshadow has quit [Ping timeout: 272 seconds]
azonenberg_work has quit [Read error: Connection reset by peer]
azonenberg_work has joined #yosys
Alistair has left #yosys ["WeeChat 1.6"]
dys has quit [Ping timeout: 268 seconds]
_whitelogger has joined #yosys
m_t has joined #yosys
<q3k> awygle | wow making these chipdb c++ files takes _forever_, and apparently happens serially
<q3k> awygle | why are we even doing that instead of just accessing the data files?
<q3k> that was, at the time of design, the easiest way to access pre-calculated data using plain C structures
<q3k> awygle: mmap would work as well, but then nextpnr stops being one big binary that can run from anywhere, and instead we have to start managing installation prefixes and whatnot
<q3k> awygle: if you have a gui segfault, please file a bug
<daveshah> clifford was also concerned that mmap creates portability issues and issues on embedded systems
<daveshah> with Python and the UI disabled, nextpnr should have no real depedencies
<q3k> daveshah: not sure what embedded systems we're talking about, but if you mean embedded linux then mmap should work just fine (tm)
<daveshah> no, as in rtos or no-os systems
<q3k> personally i think support for these platforms is a non-goal
<q3k> but eh
<q3k> considering we use exceptions and the heap 'here and there', accessing the data via mmap will be the least of our troubles
<q3k> awygle | wow making these chipdb c++ files takes _forever_, and apparently happens serially
<q3k> what platform are you using? cmake+makefiles here end up in parallel building
<q3k> just make -j $(nproc)
AlexDaniel has joined #yosys
m_t has quit [Quit: Leaving]
X-Scale has joined #yosys
ssb has joined #yosys
emeb_mac has joined #yosys
_whitelogger has joined #yosys
<awygle> q3k: cygwin, which is also a reasonable non goal
<awygle> Does the "one big binary" goal also mean you'd be uninterested if I tried to implement dynamic libraries for placers?
<q3k> you mean different .so's for different placer algorithms?
<awygle> I was hoping to use that for rapid iteration
<awygle> Yep
<q3k> i'm not sure how this makes your iteration faster
<awygle> with hot-reload
<q3k> uhm
<q3k> we were more thinking along the lines of dumping the current state of arch into a file and being to reload it
<q3k> so that any number of processes can be re-used - then you'd pack, dump; and to iterate you'd just run the new placer algorithm with the previous packed state loaded
<awygle> ? why/when? I don't understand how that helps
<awygle> oh
<q3k> having placers loaded from .so's with hotswap capability would entail an ABI compatiblity layer that would have to be designed along the arch/placer API, and that's really not stable yet
<awygle> sure that's also a good thing, but not really what I'm talking about
<awygle> yeah I get that
<q3k> state dumping only needs to dump the arch internal state, that means that api doesn't have to be frozen
<q3k> startup time for the tool isn't long enough imo to warrant hot reloading of plugins to make iteration faster
<awygle> this is just a pattern I typically deploy for complex applications
<q3k> i understand, I just don't personally see the use for it here (yet)
<awygle> Possibly a bit "if you have a hammer"
<q3k> if you'd like to explain your exact reasoning, feel free to file a bug and we'll discuss there further after paging in the rest of the team
<awygle> okay I'll keep that in mind. I need to play with it further anyway to really understand if that's useful
<q3k> generally you should be able to just add a placer2.cc or placer-awygle.cc, and iterate by `make -j8; ./nextpnr-ice40 --json foo.json --pcf foo.pcf --asc foo.asc`
<q3k> that's how the current place/route was developed, too :)
<daveshah> Yeah, link time isn't that bad at all
<daveshah> Also the placer and arch are effectively bound at compile, not link, time at present
<daveshah> This allows the arch to provide it data structures as needed and inlining to be done
<awygle> right, I noticed that. seems an odd design. I guess I get why if you're targeting no-os environments
<q3k> i agree that it is an odd design
<daveshah> No, its nothing to do with no-os
<daveshah> It is for those two reasons primarily
<daveshah> Virtual functions would add a very nonneglible overhead for some of the most commonly called functions
<daveshah> Also if you don't use a feature you can e.g. constant return zero or true and it has no cost at all
* awygle wants to see a citation for "very non-negligible"...
<awygle> I mean I did zero of this work, I'm not criticizing
<awygle> It's just Different Than I'd Have Done :-P
<daveshah> The SA placer certainly calls some of the small arch functions a lot
<daveshah> It seems like a pretty unnecessary overhead to use virtual funcs
<q3k> anything in try_swap_position is basically the hot path
<sorear> you could also make the chipdb smaller by exploiting regularity of the architecture. Remember, our goal is to remain smaller than vivado even when we have more targets
<daveshah> Yes, that's exactly what we do for ecp5 and will do for 7 series
<daveshah> Just we cba for ice40 for now
<awygle> dynamically loaded functions cost one pointer deref more than static ones. Inlining is probably a win.
<daveshah> Letting the Arch specify its own BelId etc structures also helps what sorear just mentioned
<daveshah> Eg for ecp5 Bel/Wire/PipId structures are a location and an index
<q3k> dynamically loaded functions on linux aren't even a deref
<q3k> it's call -> {jmp to resolver; jump to function} the first time, jump to function directly after that
<awygle> hm, not even a function pointer? cool
<q3k> anyway, it would be nice to have a benchmark of what would happen if arch.h/common.h/etc got turned into 'normal' link-time compatible code
<q3k> just forking, quickly hacking on the arch/packer/placer/router/bitgen code for ice40 and seeing the performance impact would be interesting
<awygle> I was just thinking of no-inline-ing them
<q3k> i had a quick hack where I made all of arch.h be available via virtual handle classes, and that resulted in a 10% slowdown of the placer/router
<daveshah> That doesn't solve the issue of Arches being allowed to return structure/range types of their choosing though
<awygle> But I need coffee and a bagel first
<q3k> daveshah: indeed
<awygle> What does that mean? Is the placer littered with ifdef ice40?
<awygle> And if not how does it know how to access the structs?
<daveshah> It doesn't access the structs
<awygle> Then why does it matter? Return an opaque handle, do whatever the hell you want
<q3k> awygle: no, anything that consumes the arch api uses types that the current arch defined
<q3k> awygle: WireId/BelId/PipId/DecalId/...
<q3k> awygle: what those types are is different across arches
* awygle needs to get out of bed and actually _look at the code_
<q3k> awygle: for instance, on ice40 a WireId is an integer index into the chipdb wire list
<daveshah> You also have ranges and iterators that may have non trivial implementations and differences for different functions in some arches (ecp5) without flat dbs
<q3k> awygle: while on generic (and iirc ecp5?) it's a IdString
<q3k> awygle: both of these implementations have their own copy/move constructors/assignement operators
<daveshah> They only have to implement begin and end (ranges) or ++, * and != (iterators)
<q3k> yep
<q3k> these would have to be made virtual classes if this was to be abi-compatible
<awygle> I still don't understand why that's a problem, they - yes, that
<q3k> fairly sure that would be quite a performance impact
<q3k> but we'd have to find out by just testing it
<daveshah> And implementing PipId as a handle is non trivial
<daveshah> Because it would mean you would have to keep each pip in memory for a large device
<daveshah> Whereas for ecp5 it is just location and index
<daveshah> So you only store each location type once
<awygle> I mean I guess there's nothing inherently wrong with choosing speed over usability. Again, just not the choice I would have made.
<daveshah> I am not convinced it massively affects usability - in many ways it makes things easiet
<daveshah> The only cost is having to make one binary per architecure
<awygle> I am specifically referring to having multiple binaries for multiple - yes that
<daveshah> But most people only care about one or two arches, and they may also have arch specific link dependencies
<awygle> That is a huge usability loss to a large segment of the population
<awygle> Imo
<daveshah> What is the usability loss?
<awygle> The increase in complexity. It's a non-trivial fraction of the cost of having icecube, diamond, and radiant
<awygle> I get that the UI is at least the same here but still, it comes across as fiddly bits for no good reason, especially if you're not a software dev
<awygle> I suspect this is another mismatched goals thing which, again, is fair. I just want to establish those points of difference so I know where not to bother having the conversation again (as with licensing for example)
<q3k> ... what's your issue with licensing?
<daveshah> awygle is a GPL guy :P
<awygle> q3k: tl;dr I like gpl not bsd/mit/isc
<q3k> ah
<q3k> then yeah, that's just like, your opinion, man
<daveshah> imo changing the suffix of an executable in a makefile is much the same as changing the arch flag passed to an exectuable
<q3k> ^^
<awygle> because "quartus is now backed by nextpnr" is a loss not a win for me
<daveshah> given you've also got to change device type, package, constraints, etc
<q3k> also creating a single super binary for all arches is definitely doable, but I honestly don't see the point
<q3k> you'd be calling 'nextpnr ice40' vs 'nextpnr_ice40'
<awygle> I argue it's worse but also that neither should be needed. But it's not like a *big* deal.
<awygle> The difference is primarily in gui mode, not cli
<awygle> Imo
* awygle has said his piece, drops it
<daveshah> ad licensing, imo the main exploitation of Yosys has been people using it in projects with loads of research funding without proper support to us/Yosys
<awygle> I am surprised though that the symbiotic people aren't going gpl given the difficulties they seem to be having getting people to pay them
<daveshah> we wouldn't go for gpl
<daveshah> for the above reason, it would be a more targeted copyleft licensing (prohibiting use by DARPA funded projects for example)
<daveshah> it would also revert to permissive after k years
<q3k> that's not gonna happen either
<awygle> Clifford is real mad about the darpa thing huh
<daveshah> not for Yosys etc
<daveshah> but we have discussed this for smaller glue type projects
<q3k> awygle: nobody so far has been able to prove that switching to gpl would suddenly make symbiotic money :P
<cr1901_modern> Is it bad that I'm waiting for Andreas Olofson to milkshake duck?
<awygle> q3k: because they haven't... Done it? Lol
digshadow has joined #yosys
<q3k> awygle: how does someone not using yosys/nextpnr make money?
<awygle> Eh idk if it would make them money or not to do gpl+commercial but liberal licenses are permission to be eaten
<q3k> awygle: additionally, it's much more difficult to then actually make any revenue off of GPL code, because you can't even sell consulting services for integration with other software
<q3k> awygle: you'd have to go gpl+commercial
<q3k> a.k.a. gpl+cla
<q3k> a.k.a. a dick move towards any external contributors
* cr1901_modern doesn't care about cla, fwiw :P
<awygle> This is such a longer conversation lol. Let's have it over drinks someday
digshadow has left #yosys [#yosys]
digshadow has joined #yosys
m_t has joined #yosys
emeb_mac has quit [Quit: Leaving.]
dxld has quit [Quit: Bye]
dxld has joined #yosys
ZipCPU has quit [Ping timeout: 260 seconds]
ZipCPU has joined #yosys
ZipCPU has quit [Ping timeout: 244 seconds]
smarter has quit [Ping timeout: 260 seconds]
smarter has joined #yosys
dys has joined #yosys
ZipCPU has joined #yosys
qu1j0t3 has quit [Quit: WeeChat 0.4.3]
qu1j0t3 has joined #yosys
X-Scale has quit [Ping timeout: 268 seconds]
X-Scale has joined #yosys
m_t has quit [Quit: Leaving]
[X-Scale] has joined #yosys
X-Scale has quit [Ping timeout: 244 seconds]
[X-Scale] is now known as X-Scale
danieljabailey has quit [Quit: ZNC 1.6.5+deb2build2 - http://znc.in]
danieljabailey has joined #yosys
<awygle> does nextpnr run a single SA schedule or does it do restarts?
<ZipCPU> As I understand it, it 1) places everything randomly, and then 2) randomly picks cells to try to replace. It will both lower and raise the placement "temperature" during this process.
<awygle> lower _and_ raise? okay, that sounds like a restart
<awygle> do you know how it decides to stop?
<awygle> (also it's sunday, didn't expect to see you here)
<ZipCPU> Nah, I'm just bored.
<ZipCPU> You are right, though.
<ZipCPU> The code is all within one file, so it shouldn't be too hard to figure out.
<ZipCPU> I should be at church tonight. Instead, I'm stuck on sick-kid duty. ;)
<awygle> looks like it ends when the temperature is below 1e-3 and there's been no improvement for 5 moves
<awygle> ahh, that's unfortunate :( hope they feel better soon
<ZipCPU> This was one where the first antibiotic regiment wasn't "good enough", so ... after the fever returned, the Dr. gave him another.
<ZipCPU> Should clean up soon enough.
<awygle> er, not 5 moves. 5 counts of 15 attempts to move every single cell.
<awygle> i've been there unfortunately
<awygle> it doesn't look like temperature ever increases, nor does it look like we do SA restarts
<ZipCPU> There 'ya go.
* awygle supposes he should have just read the code in the first place
dxld has quit [Quit: Bye]
dxld has joined #yosys