ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
Zorix has quit [Quit: Leaving]
Zorix has joined ##openfpga
mumptai has quit [Quit: Verlassend]
catdemon has joined ##openfpga
catplant has quit [Quit: WeeChat 2.2]
jcreus has quit [Ping timeout: 246 seconds]
catdemon is now known as catplant
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
Bike has joined ##openfpga
Vincenttl has joined ##openfpga
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
<whitequark> tnt: what would you prefer instead of pins being forced to 0?
<whitequark> tristate?
balrog has quit [Quit: Bye]
jevinskie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<sorear> probably excessively clever idea: since the ice40 bitstream is documented, we can use gateware to do processing/validation of it, which could be done on tinyfpga/tomu type boards to make them much more difficult to brick - a bootloader could refuse to overwrite itself and refuse to run a bitstream which enables the SPI I/O
balrog has joined ##openfpga
jevinskie has joined ##openfpga
<sorear> obvious problem 1: langsec really does not like it when you run the same bitstream through two parsers, one of which is closed source
<sorear> obvious problem 2: it's unclear how much "you can write a bitstream that kills the flash" is in practice, and "you can overwrite the bootloader using the bootloader" has a much simpler fix in isolation, which I believe is at least partially implemented
<whitequark> sorear: you can't actually do that
<whitequark> the bitstream is not a bitstream
<whitequark> it's a sequence of packets that address CRAM in blocks
<whitequark> you can't parse it in gateware
<sorear> I am aware that the format has an annoying amount of order flexibility. We only need to handle the packet orderings which are actually generated by nextpnr and icecube, although the general case doesn't seem impossible to me
<sorear> to handle the general case you'd have to reimplement the state machine and check each CRAM block against both address and content
<sorear> to NOT handle the general case you'd look for the expected commands in the expected order
<sorear> (trivially, the chip itself is a digital circuit which turns a bitstream into an initialized config memory, so "you can't parse it in a digital circuit" is absurd; the trivial solution is not feasible because it requires more RAM than the total BRAM, but a nontrivial solution is)
<sorear> (I'm not sure whether you include "synthesized hard logic" in the definition of "gateware")
<whitequark> hrm
<whitequark> fine
<ZipCPU> sorear: So ... you want to validate a stream of data with a configuration that has less information than the data stream, and you want to prevent writing the flash on a bad bitstream ... I don't think the numbers add up for this. In order to know if its a good or bad bitstream, you'd have to process the entire bitstream before any flash erase command
<ZipCPU> You'd also need a place to store the bitstream you were in the middle of processing
<sorear> ZipCPU: for tinyfpga, you'd write normally but disable SB_WARMBOOT until the bitstream looks OK
<ZipCPU> Unless you have some external memory, I can't imagine that the chip would have enough room to contain within it the copy of its configuration
<ZipCPU> So ... the plan would be to write the bitstream but just hold off the SB_WARMBOOT until you know it is valid? But then removing power when done undoes what you are protecting against
<sorear> the ice40 uses a different SPI flash address for power-on versus SB_WARMBOOT, power-on always goes to the bootloader, not the user-provided design
<ZipCPU> So you aren't discussing writing the primary/bootloader partition at all?
<sorear> correct
<whitequark> you could potentially do this
<whitequark> do two writes
<whitequark> on the first write, check for validity and remember CRC
<whitequark> ah no this doesn't work
<sorear> you do one write, checking validity, to the non-default boot address
<sorear> if it's valid, switch to it
<whitequark> oh hm
<sorear> since it's the non-default address, it won't be used unless explicitly switched to
<ZipCPU> The power shutoff will still get you, since the bootloader defaults to loading the second address after a short period of time
<sorear> that's a gateware function, not a chip function, it can be modified to check validity (using SPI reads) before chainloading
<ZipCPU> Yes, I suppose it could
<swetland> you could conceivably have a tiny initial image which decides based on gpio state and serialno of flash images, which to boot
<swetland> A. miniloader B. bootloader-1 C. bootloader-2 D. image
<swetland> miniloader loads image unless bootloader is requested, prefers "newest" of -1/-2 bootloader unless previous is requested
<swetland> bootloader only ever writes to the opposing bootloader image when updating the bootloader, to avoid clobbering itself (clearly working) with an unknown image
<sorear> you're proposing something even more complicated than what I said and I'm not sure why
<sorear> and there isn't really such a thing as a "tiny image", we don't have compression support in the open tools for any fpga aiui
<swetland> If the goal is to prevent bricking of the bootloader, the most reliable approach is never overwrite a working bootloader with an unknown one (and you can't "know" it works until you actually boot it), and having the first stage be extremely simple and wp-locked (ideally never needing an update post-production). This also covers the loss-of-power-while-writing case.
<swetland> but yeah, if every image must be full-size in flash that gets punitive for larger parts
<sorear> swetland: I may have failed to make clear that in my proposal the bootloader (there is only one) is absolutely never overwritten?
<sorear> I'm talking about ways to prevent the user design from touching the bootloader
<sorear> so I don't think what you're saying is relevant to what I'm saying?
<swetland> ah, sorry, yes I missed that
<sorear> I feel like I'm coming across as aggressive here :/ I'd like to not do that
<swetland> no worries -- I misread the ongoing conversation and thought a different problem was being discussed. it happens
<swetland> probably due to past projects I've been involved in *always* wanting to be able to update bootloaders in the field, no matter how terrifying ^^
unixb0y has quit [Ping timeout: 246 seconds]
unixb0y has joined ##openfpga
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
futarisIRCcloud has joined ##openfpga
Miyu has quit [Ping timeout: 272 seconds]
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
rohitksingh_work has joined ##openfpga
hl has quit [Ping timeout: 246 seconds]
dj_pi has joined ##openfpga
<whitequark> daveshah: also thinking about how to integrate carry logic into flowmap
<whitequark> i think i can have a step that like
<whitequark> looks at $alu cells
<whitequark> then, if the entire alu chain is packed into a lut, leaves it that way
<whitequark> but if a $alu is outside of a fanout free cone of another $alu cell, it takes both and un-packs them
<whitequark> so that the later techmap pass can pack them using dedicated carry logic
<whitequark> a really nice thing about flowmap is that it works entirely in terms of source cells
<whitequark> you don't have to painfully reconstruct cell boundaries from AIGs
m4ssi has joined ##openfpga
<whitequark> this way my cmp2lut techmap becomes just... completely redundant
emeb has left ##openfpga [##openfpga]
m4ssi has quit [Remote host closed the connection]
<catplant> nice
<azonenberg> whitequark: sooo
<azonenberg> i dont know how this will play with you or not
<azonenberg> but on architectures with higher-order luts, say lut6
<azonenberg> it's possible to do an adder plus some boolean operations in one lut
<azonenberg> say, (a ^ 0xdead) + (b ^ 0xbeef) should be one lut per bit plus carry chains
<azonenberg> xst isnt good at folding this, idk about vivado
<azonenberg> But it's an optimization to consider
<whitequark> azonenberg: this will probably end up as custom-ish logic in flowmap
<whitequark> however, flowmap is really easily adaptable to this kind of stuff
<whitequark> in part because it doesnt throw away information about adders in the original design
<whitequark> moreover
<azonenberg> Also, what about 3-input adders?
<azonenberg> iirc xilinx-land should be able to do those
<whitequark> probably needs a preliminary packing step in yosys that would produce eg $alu3 cells
<azonenberg> would that require any changes to the rtlil core? or would this be considered a temporary techmap-only cell type?
<whitequark> that would work the same as a $alu cell
<whitequark> just with 3 inputs
<whitequark> it would be yet another internal cell type, like $lut, $aoi, whatever
<whitequark> for any internal cell, yosys may choose to generate or not generate any of them depending on context
<whitequark> probably, there would be an option to $alumacc pass
<whitequark> whether to generate $alu3
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
Bike has quit [Quit: Lost terminal]
<tnt> whitequark: yeah, tristate would be better I think. Basically I have plenty of pins connected to a test target and when running different applets, I use different pins. But then I need the unused one not to interefere ... so tristate, possibly weak pullup if oscillation is a concern.
<whitequark> tnt: good point
<whitequark> can you open an issue
<tnt> sure.
_whitelogger_ has joined ##openfpga
_whitelogger has quit [Ping timeout: 250 seconds]
<tnt> sorear: huh ... reading the backlog I don't get how you'd determine if a bitstream is "bad" ? I mean, accessing the flash to load/store user data is a perfectly valid thing for a bitstream to do, so how exactly do you plan to guarantee a bitstream at no point will access the bootloader area ?
<tnt> kind of looks like you'd need to do formal verification of the bitstream on the fpga itself, that seems ... unrealistic.
_whitelogger has joined ##openfpga
<sorear> > and refuse to run a bitstream which enables the SPI I/O
<sorear> in the example given, you can't use the flash to load/store user data
<whitequark> well that's kind of shit actually
<sorear> a substantially more complex instantiation could require a filter module to syntactically exist at a specific place (vaguely similar to the partial reconfig approach used by f1)
<whitequark> sorear: i have an easier solution
<whitequark> just write-protect the flash
<tnt> yeah, I was going to say the same ... much easier.
<whitequark> it literally has a pin for this exact purpose
<catplant> sorear: another idea, lock the boot sectors?
<sorear> I don't think this chip has sector-level locking
<catplant> most do?
<tnt> which flash chip is it ?
<sorear> AT25SF161, appears to have the ability to lock ranges (which is good enough) but not random sectors
<catplant> yeah
<catplant> thats what we ment
<tnt> Is the goal to protect against accidental or adversarial erasure of the bootloader ?
<catplant> alternatively
<catplant> can you add more spi flash?
<catplant> also ice40 question
<catplant> if you nvcm flash it, can you SB_WARMBOOT to spi flash?
<whitequark> yes
<catplant> nvcm flash a recovery bootloader???
<whitequark> yes
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
<daveshah> I don't know if anyone has actually tested warm booting to flash?
<daveshah> I understand that it might be possible, but no one really knows
<daveshah> ad logic mapping. The plan for carries seems sensible
<daveshah> I'd also like to see support for optimisation around and/or mapping to the dedicated mux2s in Xilinx and Lattice
<daveshah> These can be used both to implement larger LUTs and large multiplexers (latter probably with some dedicated techmap rule)
<whitequark> hmmm
<whitequark> or what about rebalancing multiplexer chains?
<daveshah> Yes, that would make sense too
<daveshah> I'd like to see generic balancing at some point too
<whitequark> daveshah: can you take a look at https://sci-hub.tw/10.1109/92.285741 ?
<whitequark> they introduce a variable Rim but I don't understand how it's computed
<whitequark> Rex and Rslk are simple enough
jcreus has joined ##openfpga
<daveshah> I think it's basically a case of attempting simple packing and seeing how it reduces the LUT count and then looking for any trivially redundant LUTs after that
<whitequark> yeah, i don't understand how exactly they do that
<whitequark> which is my problem
<whitequark> what is "simple packing"
<daveshah> My feeling is a transformation along the lines of opt_lut
<whitequark> hmmmm.
<whitequark> daveshah: any specific proposals for calculating Rim?
<daveshah> No, not really
<whitequark> I guess I could try, hm,
<whitequark> daveshah: yeah. you are right. that is what they are considering.
ayjay_t has quit [Read error: Connection reset by peer]
<whitequark> so, hm, i can compute the number of possible LUT combine operations
ayjay_t has joined ##openfpga
<whitequark> and use that as the value of Rim?
asante has joined ##openfpga
m4ssi has joined ##openfpga
<mithro> tnt: the USB decoder in gtkwave seems to be working
<mithro> whitequark / catplant: It's still an active investigation topic
<mithro> catplant: nvcm flash a recovery bootloader is exactly what I'm investigating it for
Miyu has joined ##openfpga
<daveshah> whitequark: yes, that seems sensible to me
<daveshah> btw, a nice extension to that paper would be to use a slightly more advanced model of slack
<daveshah> instead of just LUT depth, also considering the delay of blackboxes
<mithro> Sooo coool....
<daveshah> this is something we were going to implement in abc xaig
<whitequark> daveshah: i mean
<whitequark> if yosys gains some way to provide timing models
<whitequark> i can probably use that
<daveshah> it'll probably just be a simple text file
<whitequark> yeah i dont wanna write that infrastructure
Miyu has quit [Ping timeout: 250 seconds]
<daveshah> sure, we'll be writing something like that for the XAIG stuff
<tnt> mithro: good :)
<tnt> whitequark: atm flowmap only does packing right ? It doesn't attempt to modify the logic "tree" to find an equivalent tree that packs better ?
<whitequark> it doens't, that's FlowSYN
<mithro> tnt: I made the color choices deterministic and deleted some of the worst choices
<tnt> mithro: hehe, yeah, I picked the whole list from the gtkwave source without paying attention to what works/doesn't :) Like ... white on white isn't ideal.
<mithro> Any idea how to make gtkwave load them automatically?
<mithro> tnt: the decoder also seems to be nondeterministic in some way...
<tnt> mithro: I think you can save your 'workspace' in gtkwave ?
<tnt> I noticed the non-determinism as well ... but mostly on 'invalid' signals. Must be something in the decoder itself because I don't touch the signals in anyway. I might try to save the vcd than gtkwave exports and make sure it's always consistent.
<mithro> It seems to miss the first edge sometimes
<daveshah> whitequark: aiui even flowsyn still isn't designed to do general combinational optimisations.
<daveshah> I think a nice way to approach a functional reduction type optimisation might be a random simulation plus SAT (I think ABC has something along these lines)
<daveshah> use random simulation to find signals that might be identical (e.g. with a hash table of sim traces), then use SAT to check whether they really are
<whitequark> hmmm
<daveshah> this way you could optimise around greybox cells like carry chains, etc,
<daveshah> maybe I'll have a play with this at some point
octycs has joined ##openfpga
emka has joined ##openfpga
<whitequark> daveshah: question
<whitequark> i have an internal representation where all luts are simultaneously mapped
<whitequark> this is done by having a graph where each node is mapped to some (best, depth wise) lut
<whitequark> has edges to each other possible lut
<whitequark> and then there's a set of nodes that is actually implemented
<whitequark> daveshah: can you take a look at the flowmap_area branch in my repo and help me out with updating this IR for LUT splitting?
<whitequark> specifically, there is "lut_path_lengths" and "lut_trans_outputs"
<whitequark> which i need to update
<whitequark> but i'm having trouble convincing myself that what i want to do is correct
<whitequark> so let's say i have a lutv and split lutw out of it
<whitequark> really this means two things
<whitequark> first, lutw is added into the list of implemented luts
<whitequark> it already has the set of gates as well as all the necessary edges
<whitequark> so that's easy
<whitequark> second, lutv is being reduced. this is the hard part.
<whitequark> it's hard because if w is not a gate connected directly to fanin of v, w can internally depend on other gates in v
<whitequark> i currently traverse the cone of all gates implementing v but excluding w
<whitequark> and noting the list of inputs that can be eliminated by breaking out w
<whitequark> i am thinking this can be used later in two ways
<whitequark> first, all the edges corresponding to these inputs may be broken
<whitequark> second, all the gates with fanin comprising only these inputs may be removed from v
<whitequark> (and recursively, all gates whose fanin is those inputs and the gates we just also removed)
<whitequark> now, i think lut_path_length is easier
<whitequark> because only the path length in input cone of w may change (is this correct?)
<whitequark> so, i restart the process as written for w
<whitequark> regarding lut_trans_outputs, i am not so sure
<whitequark> i think it will actually never change
<whitequark> but i can't seem to prove it
<whitequark> ah, no, it will change
_whitelogger has joined ##openfpga
<whitequark> Solution has -25.0% area overhead.
<daveshah> sorry, was afk
<daveshah> what exactly is lut_trans_outputs?
<daveshah> the assumption for path lengths seems fine
<edmund> tnt: we spoke about the SX1257 PMOD.
<whitequark> daveshah: lut_trans_outputs, for each lut, is the set of POs that will be affected by this lut becoming one level deeper
<tnt> edmund: true
<tnt> edmund: kbeckmann is working on it already it seems.
<daveshah> whitequark: ahhh i see
<daveshah> the code makes sense now
<edmund> tnt: you considered at ccc to send me a proposal for making the PMOD and a supporting design.
<kbeckmann> edmund: I'm soon done with the schematic, will start with the pcb layout after that. I'll probably be done in a few days.
<edmund> esden also looks into suggesting an ice40UP5 badge for Maker conference in Nov 2019 Los Angeles , including the SX1257.
<edmund> kbeckmann: awesome
<daveshah> there were two badge discussions, one radio and one face detection I think?
<edmund> kbeckmann: did you go with the suggestions by tnt?
<tnt> daveshah: no reason it can't be both :D
<daveshah> :D
<tnt> identity everyone and report to big brother ... killer app for a maker conf.
<kbeckmann> yeah currently we're thinking of using the sx1257 + i2c<->spi bridge (SC18IS602B). so it will only use up one dual pmod slot with 8 data pins.
<kbeckmann> io pins sorry.
<edmund> daveshah: It will be all at one badge. The sensor might be a Himax HM01B0
<daveshah> I see
<miek> another rf ic worth looking at is the AT86RF215
<daveshah> sounds scary, as tnt says
<tnt> need to find a non evil demo app for that combination of peripherals.
<tnt> miek: that chip looks interesting, but quite a bit more complex.
<tnt> I kind of like the sx1257 for its simplicity and sort of matching the ice40 'theme'.
<cr1901_modern> What would an SX1257 PMOD do?
<daveshah> yeah, there would be little point using the AT chip with an FPGA
<daveshah> cr1901_modern: the SX1257 is a low bandwidth RF frontend & up/downconverter
<daveshah> with a 32Msps delta-sigma IQ baseband interface
<daveshah> so you can implement your own radio protocol in an FPGA
<cr1901_modern> So it's basically the fronent to an SDR?
<cr1901_modern> frontend*
<tnt> yes
<daveshah> frontend, LO, mixer, 1-bit ADC/DAC
<cr1901_modern> How do you use a 1 bit ADC?
<daveshah> it's oversampled delta-sigma
<cr1901_modern> I know there's some weird proof that if you sample a 1-bit ADC biased w/ small noise, you can get the actual signal value to arbitrary precision
<daveshah> 32 or 36 MSPS for 500kHz bandwidth
<daveshah> for 1MHz for radio bandwidth, because it's an IQ interface
<miek> the AT chip has an IQ interface too, for some reason they leave it out of the overview
<daveshah> interesting
<daveshah> looks like that uses 14-bit ADCs and serialises over LVDS
<tnt> miek: yes it does. But it also has full MAC on board, so not using it is a bit meh. Also if we want people to play 'easily' with it, a 133 Mbps lvds interface on a up5k is not the easiest.
<daveshah> the 32Msps delta-sig is certainly more qt
<tnt> Mmm, the ATRF isn't full duplex ?
<miek> tnt: yeah, fair enough - might be something to play with on the ecp5 instead. they do have a slightly cheaper variant with just the IQ interface too
<tnt> yeah, it's half duplex only AFAICT. The SX1257 is full duplex.
<daveshah> I wonder if full duplex is actually practical to use though?
<daveshah> ie if you can get enough spacing in any of the ISM bands it supports?
<tnt> 915 band is 26 MHz wide, that's plenty.
<daveshah> ah, I was thinking about 868/9
<daveshah> yeah, that sounds fine
<tnt> The EU 868 band is definitely a whole lot narrower. I have no idea how good the OOB rejection is on that device, but it's something to test ...
<daveshah> there is 91x in the UK too (and probably most of the EU), but with strict duty cycle limits
<tnt> wtf ... there is an upduino shield for the HM01B0 : https://www.digikey.be/product-detail/en/HM01B0-UPD-EVN/220-2226-ND/9759580
<miek> looks like they've got a pin-compatible variant for 400-510MHz too, the SX1255
<daveshah> tnt: and a total dearth of example code
<tnt> miek: yup and a new variant for the 700 Mhz band as well.
<daveshah> just a sample bitstream, which is almost useless
<edmund> tnt: detecting a face is hard enought with 1 Mbit SPRAM, Identifying people is impossible.
octycs has quit [Ping timeout: 250 seconds]
<daveshah> well, you could send a compressed image over radio once you've detected a face
<daveshah> and do the actual identification on the backend (where you need to compare against a large database anyway)
<daveshah> that's how I would architect such a system, anyway
<edmund> daveshah: 0,3mW TX power is not that useful to transfer images to a base station :-)
<daveshah> I was assuming there would be a PA on there
<edmund> daveshah: I would rather go without a PA to limit long range interference in large crowds and motivate the development of meshed solutions.
<daveshah> The advantage of a PA is that it tends to lead to shorter, higher power transmit bursts than longer, lower power tranmissions. This means your transceiver and logic are running for shorter
<edmund> tnt: yes https://www.digikey.be/product-detail/en/HM01B0-UPD-EVN/220-2226-ND/9759580 is the easiest and fastest way to get a HM01B0
<edmund> tnt: @GregDavill already did a PMOD board for the HM01B0
<tnt> edmund: I think that image is not public :)
<edmund> tnt: I also already got the Datasheet of the sensor, but Greg did not yet find time to write the Firmware for it.
octycs has joined ##openfpga
Miyu has joined ##openfpga
<whitequark> daveshah: ok, i think i have the breaking heuristic working
<daveshah> nice
<whitequark> Potential for breaking node $techmap$techmap$add$logic.v:9$6.$auto$alumacc.cc:474:replace_alu$95.$and$<techmap.v>:260$220_Y [4]: 300 (Rex=0, Rim=1, Rslk=0).
<whitequark> these names are way out of hand
rohitksingh_work has quit [Read error: Connection reset by peer]
<daveshah> lol
Flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
jcreus has quit [Ping timeout: 250 seconds]
rohitksingh has joined ##openfpga
genii has joined ##openfpga
<kbeckmann> esden: is the pinout finalized on the icebreaker? Asking because there are only 2 global pins exposed on the 3 pmods and they are on different pins. Would be nice to keep them on the same pin in case you want to use a clock input from a PMOD.
<esden> Sorry it is the way it is. We will not shuffle any pins any more. That ship has sailed many months ago. :(
<kbeckmann> i fully understand!
<esden> The part itself has a very annoying bondout
<esden> The PMOD basically reflect that
<esden> Let’s hope we will do better in the future iCEBreakers using other FPGA ;)
<kbeckmann> alright :). planning on building an ECP5 board?
<daveshah> Using a pin adjacent to a global pin shouldn't be too bad global wise
* zkms nods
<daveshah> You might want to manually constrain the global buffer though, as it won't be locked and could end up elsewhere to satisfy reset/CE constraints of other GBs
<esden> kbeckmann: maaaybe? ;) :P
pie__ has quit [Remote host closed the connection]
pie__ has joined ##openfpga
<whitequark> daveshah: so
<whitequark> regarding lut_trans_outputs
<whitequark> (which i've renamed to lut_critical_outputs which is slightly less opaque)
<whitequark> any ideas on updating it correctly?
<daveshah> Let me have a look, my mental model of this stuff is still not very good
<daveshah> thanks for renaming btw
<whitequark> i spent like 10 minutes trying to come up with a descriptive name for that..
X-Scale has quit [Ping timeout: 240 seconds]
<daveshah> whitequark: so first thoughts, if a LUT split increases the critical path for a PO, that could remove that PO from other lut_critical_outputs
<daveshah> is my understanding correct?
<whitequark> daveshah: hmmm
<whitequark> i'm not entirely sure actually
<sorear> i have a … strong association for "PO"
<daveshah> we should probably be using the term CO (combinational output) rather than PO in fact
<daveshah> because it also includes the set of register and blackbox inputs
<whitequark> daveshah: so. hm. labels never change.
<whitequark> so that part of condition stays the same.
<whitequark> but, a new node may be introduced
m4ssi has quit [Remote host closed the connection]
<daveshah> yeah, it's the new node that might be the problem
<whitequark> daveshah: so, hm
<daveshah> if the cut never increases the depth (which I presume is implied by labels not changing?) then my fear of other unrelated lut_critical_outputs changing because of the crit path for a PO increasing shouldn't be a problem
<whitequark> the cut never increases the depth
<daveshah> ok, that makes things much easier
<whitequark> wait, hm
<whitequark> increases the depth of what?
<daveshah> the critical path
<daveshah> for any CO
<whitequark> it can increase depth after all
<whitequark> actually my use of labels there might be wrong, even
<daveshah> the problem is that that then takes other nodes off the critical path too
<whitequark> mmm, you are right
<whitequark> i see it now
<whitequark> i think i need to invalidate entire cones
<whitequark> that's ok
<daveshah> yeah
<whitequark> in real circuits, the graph is very wide and very disjoint
<whitequark> so invalidating a cone is cheap
<whitequark> however
<whitequark> do you understand the precise cone that needs to be invalidated
<daveshah> not without further thought
<daveshah> I there are optimisations depending on the nature of the split - many probably won't change anything at all?
<whitequark> I think you accidentally a word there
<daveshah> did I?
<whitequark> "I there are"
<whitequark> i'm not sure which verb you meant
<daveshah> ah I think there are
<daveshah> sorry
<whitequark> oh yeah
<whitequark> this is why there is the potential heuristic
<whitequark> it tries to choose the most promising splits in an ad-hoc way
<whitequark> cutmap works much more reliably, but i don't understand it yet *and* the output of cutmap still becomes better after flowmap-r
<whitequark> so i think i'll have all three
<daveshah> seems reasonable
<whitequark> i also think flowmap-r is the only one of these that lets you choose an area-depth tradeoff
<daveshah> yeah, that's really nice
<whitequark> i'm not sure, cutmap might be able to do it too, the paper isn't super clear
<whitequark> but it did say that flowmap-r starts to really improve on cutmap results with -optarea of 2..
<whitequark> so i assume cutmap cannot do it
<whitequark> probably it can only compute the optimal solution?
<whitequark> for some value of optimal
<daveshah> yeah
<daveshah> > Afterwards, FlowMap-r [6] is able to trade the depths of nodes on non-critical paths or even the depth of the entire network for a smaller area
<whitequark> aha
<whitequark> so yeah i need all of them
<whitequark> flowmap-r enables flow-pack
<whitequark> and is further enabled by cutmap
<whitequark> i didn't start with implementing flow-pack because they use some weird boolean decomposition there
<whitequark> before flowmap-r
<whitequark> that's mostly redundant with flowmap-r
<whitequark> daveshah: ok, thinking out loud. if we split lutw off lutv, lutw is a predecessor of lutv
<whitequark> therefore, the output cone of lutv should be safe
<daveshah> in terms of lut_critical_outputs or labels?
<whitequark> lut_critical_outputs
<daveshah> yes
<daveshah> that seems correct to me
<whitequark> so... i invalidate lutw and its input cone, right?
<whitequark> now, labels
<daveshah> I don't think this works if overall depth increases?
<whitequark> labels were fine for the initial solution but not fine after the first breaking
<whitequark> because depths now dont correspond to labels
<daveshah> there might be other paths that were critical, unrelated to lutw
<whitequark> so i have to *first* recompute depths
<whitequark> hmm
<whitequark> so, for depths, the entire output cone of lutw is affected
<daveshah> yeap
<whitequark> and everything that has a successor whose depth changed has critical outputs affected
<whitequark> and everything that has anything in fanout whose critical outputs changed is affected
<whitequark> does this seem enough?
<daveshah> yes
<daveshah> I think so
<whitequark> i have debug asserts that verify that this is valid
<whitequark> so let's see if this works or crashes
X-Scale has joined ##openfpga
<whitequark> oh
<whitequark> depths are the same as path lengths, just from the other side of the graph
<daveshah> are path lengths not constant from PI to PO on a given path, whereas depths increasing from PI to PO
<whitequark> so a path length is the longest path from node to PO
<whitequark> and depth is the longest path from PI to node
<whitequark> right?
emeb has joined ##openfpga
<daveshah> whitequark: At least in typical PnR papers, path length is the length of the longest path that a node is involved in
<whitequark> ohh
<whitequark> Given a depth bound D, the slack on node v is defined as follows: If v is not a PI or PO, the slack of U is D - (L, + P,), where L, is the level of v in the network, and P, is the length of the longest path from v to any PO node. If v is a PI or PO, the slack of v is zero.
<whitequark> I'm following this.
<daveshah> In this case, it does seem like P is max length from v to output
<daveshah> L + P is what I would normally think of as path length
<whitequark> ah I see
<whitequark> what would you call P?
<daveshah> I don't know
<daveshah> Never heard of a term for it
jcreus has joined ##openfpga
<whitequark> altitude? :D
<whitequark> gonna go with that, to avoid aliasing the term "path length"
<whitequark> oh *facedesk*
<whitequark> the paper said LEVELS not LABELS
<whitequark> but i constantly mix up these terms
<whitequark> so i used labels and it sort of worked by accident
<whitequark> mystery solved
<whitequark> hm
<whitequark> my critical output update function doesn't work properly :S
<whitequark> or invalidation, maybe
<whitequark> oh god
<whitequark> i did it again
<whitequark> i forgot a &
<whitequark> i forgot *another* & what the fuck
<whitequark> i hate c++
<shapr> I think it's too large a language
<shapr> we have several C++ codebases at work and they're entirely unlike each other.
<whitequark> none of the c++ codebases i work with are anything like each other
<whitequark> including the ones i wrote myself
<qu1j0t3> :)
<qu1j0t3> yes, one's style evolves
<qu1j0t3> i watched my scala evolve from java-in-scala to pure FP scala
<whitequark> it's backwards with c++
<whitequark> you learn to use fewer and fewer c++ features
<whitequark> but do more
<qu1j0t3> yes
<qu1j0t3> that's been my trajectory in scala too, though
<adamgreig> is there a small good language trapped inside c++? maybe it's just c with namespaces
<qu1j0t3> sane subsets are a thing in many langs
<qu1j0t3> adamgreig: yeah it's called C
* qu1j0t3 runz
<whitequark> nothing about c is good
* qu1j0t3 isn't a particular fan of C but the joke was irresistible
<whitequark> or sane.
<qu1j0t3> whitequark: I won't argue!
<adamgreig> a strong foundation for c++
<whitequark> yes
<swetland> I really think the problem is that C++ is more of a language toolkit than a language ^^
<whitequark> the problem is that everyone making C++ is smart but misguide
<whitequark> d
<swetland> so you have to start by deciding how you're going to use it (though this ends up being a totally adhoc process in most small projects)
<shapr> We have one codebase that religiously follows herb sutter
* shapr shrugs
<shapr> I hope rust takes over or C++ loses weight.
<qu1j0t3> Herb is not unlike Pike in his disdain for taking input
rohitksingh has quit [Ping timeout: 258 seconds]
<IanMalcolm> I feel like there are a couple of subsets of C++ which are both safe and nice to work with, and they are all rust
mumptai has joined ##openfpga
octycs has quit [Quit: No Ping reply in 180 seconds.]
octycs has joined ##openfpga
octycs has quit [Client Quit]
jcreus has quit [Read error: Connection reset by peer]
ayjay_t has quit [Read error: Connection reset by peer]
ayjay_t has joined ##openfpga
pie__ has quit [Remote host closed the connection]
Bike has joined ##openfpga
pie_ has joined ##openfpga
Miyu has quit [Ping timeout: 250 seconds]
<azonenberg> whitequark: yeah, i find that I write most of my projects in "C+" :p