<eddyb>
assuming `val[0:2]` is valid bit slicing, it goes to:
<eddyb>
> Info: ICESTORM_LC: 861/ 1280 67%
<eddyb>
wait that's a decrease o_O?
<eddyb>
am I getting something similar to aggressive inlining in software optimizers :P?
<whitequark>
you could
m_w has quit [Ping timeout: 250 seconds]
<whitequark>
hm, are you using abc9?
<eddyb>
abc shows up in the reports but not abc9 literally. I'm just using `ICEStickPlatform().build` with master nmigen so I assume I'm getting defaults for everything
<eddyb>
"Yosys 0.9+932 (git sha1 3c41599ee1, g++ 8.3.0 -fPIC -Os)" / " -- Next Generation Place and Route (git sha1 c365dd1cabc)" (both latest in nixpkgs, I think)
<whitequark>
NMIGEN_synth_opts=-abc9
<whitequark>
set this as an environment variable
<eddyb>
oh wow it went down to 45%
<eddyb>
and 41% for the `// 3` version
<whitequark>
right
<eddyb>
thanks :D
<eddyb>
58% for 6-digit, and the answer is correct :D
<whitequark>
what's the fmax
<eddyb>
22MHz IIRC (I already changed it to 7-digit which uhh results in a core dump)
<eddyb>
I may need to download more RAM or come up with a way to run the compilation on the server
<whitequark>
or not build gigantic divisors?
<eddyb>
oh yeah that too I was just curious if I could get it to fit now
<whitequark>
i'm not sure why you get a core dump
<whitequark>
you should be able to fit way more cells than even 10 times the largest ice40
<eddyb>
I don't actually see a RAM spike so it might not be that
<eddyb>
whitequark: welp it's an assertion failure and it was getting hidden, I can just run the command because the tmp file doesn't get deleted
<eddyb>
aaand the assertion was in the rpt file but I kept looking at the tim one
<eddyb>
I was going to open an issue but the template helpfully mentions trying that and would be nice to avoid (but I guess its not that hard, just have to override the git revision, in the NixOS package)
Richard_Simmons has quit [Ping timeout: 250 seconds]
Richard_Simmons has joined ##openfpga
Richard_Simmons2 has quit [Ping timeout: 250 seconds]
OmniMancer has joined ##openfpga
IanMalcolm has quit [Read error: Connection reset by peer]
IanMalcolm has joined ##openfpga
freemint has quit [Remote host closed the connection]
<eddyb>
whitequark: so I'm not sure this is expected, but I was trying to see what I could do with 32 bits, so I had `Array(10**i for i in range(10))` from before, and indexing into it. however, building a chain of `Mux`'s gets me from 95% down to 43% (this is all w/ abc9), which is way better than I expected
<eddyb>
wait I re-enabled the division by 3 and it only went up to 55% - it failed timing, but that's another thing
<eddyb>
so I wonder if moving from Array to Mux fixed the assertion failure I was seeing
<whitequark>
what kind of Mux chain
<whitequark>
you should post the .il output
<whitequark>
for both cases
<eddyb>
`pow10_cur = C(0, 32); for i in range(10): pow10_cur = Mux(out_idx == i, 10**i, pow10_cur)`
<eddyb>
whitequark: you mean for the assertion?
<eddyb>
or for the large difference in utilization
<eddyb>
the 75% utilization one is I'm pretty sure the same as the assertion failure one except it's 8-digit instead of 7-digit so 27-bit instead of 24-bit
<eddyb>
yuuuupp
<eddyb>
whitequark: so I could've finished my ridiculous experiments hours ago if I just *gave it more bits*
<whitequark>
i think the cause could be that out of bounds accesses to an Array map to the last element
<whitequark>
but with your mux chain, they'll always be zero
<whitequark>
try adding zero to the end of the array, maybe
<eddyb>
huuh
<eddyb>
`out_idx` is always in `0..self.digits`, so I don't see how it could go out of bounds
<whitequark>
every pointer in my program is always valid, so i don't see how an use after free could happen
<eddyb>
or, wait, am I being silly about this and the entire range of that bitwidth is relevant
<whitequark>
yeah, it has no special insight about ranges
<eddyb>
well, making it 8 long instead of 7, when digits=7, doesn't fix the assert
<whitequark>
Signal(some_range) is wide enough to represent every element of the range, but that's it. usually the actual representable range would be much larger
<whitequark>
i'm talking about utilization issues
<whitequark>
i have no idea wtf that assert means and i aggressively don't care about issues in abc
<eddyb>
sorry
<eddyb>
for utilization, the 47% vs 75% were with digits=8
<eddyb>
so I would expect then out_idx to be 3 bits and the Array to fully cover all possibilities, no?
<whitequark>
hmm
<whitequark>
oh wait, i see it now
<whitequark>
you're using `pow10` twice
<whitequark>
try assigning it to an intermediate signal, something like `pow10i = Signal.like(pow10); m.d.comb += pow10i.eq(pow10)`
<eddyb>
hah I even had two indexing expressions, moved them into the variable thinking that might help, but this makes more sense
<whitequark>
the IR I'm lowering to doesn't have arrays, and arrays are valid on LHS as well
<whitequark>
so I legalize them into essentially if/else chains
<eddyb>
whitequark: 45% w/ the array and pow10i :D
<whitequark>
it should be possible to add an optimization for this case, essentially cache array accesses on RHS
<whitequark>
you would probably call it GVN
<zignig>
whitequark: cxxrtl looks usefull, sounds like it was a epic C++ mindbender.
<eddyb>
I was actually wondering, is there a nicer way to combine the subtract and the comparison, maybe do it on an integer with an extra bit, and extract that to get the borrow?
<eddyb>
s/it/the subtract
<zignig>
is the intent to have it as a standalone simulator or fold back into python.
<zignig>
?
<whitequark>
zignig: both
<whitequark>
eddyb: are you asking how your code could be improved, or are you asking about that specific microoptimization?
<zignig>
cool , having it re-wrapped in python, and even doing board simulations would be excellent.
<zignig>
you could develop with nmigen and not have an _actual_ board.
<whitequark>
seems unlikely that anyone would go through all the trouble of simulating a board exactly
<eddyb>
whitequark: I was kind of hoping there's an easy way to do both operations at once and get the single-use-of-array for free
nrossi has joined ##openfpga
<whitequark>
eddyb: sure, you can subtract and look at the borrow
<zignig>
whitequark: indeed, but a representative simulation would be great for debugging.
<whitequark>
eddyb: but the way to improve your code would be to... well... rm and start over, really
<eddyb>
haha
<eddyb>
I would certainly try to make it more modular
* zignig
hands eddyb the modulator.
<eddyb>
had some really dumb moments getting this to work, and also I've been a bit stingy with the state variables, trying to get as much mileage as possible out of what was already there
<whitequark>
the main problem with it is that it's written like you'd write software
<eddyb>
whitequark: tbh I am still amazed I got an FPGA to do decimal nonsense
<whitequark>
and as you have noticed, this approach produces extremely large and slow designs, and so doesn't scale beyond toy problems
<eddyb>
and at this bitwidth (I think I can do 32 bits I just need to reduce the frequency)
<eddyb>
if I were to do this seriously I'd add an input FIFO and make it easy to have things take arbitrary many cycles (e.g. the divisions)
<whitequark>
a multicycle divisor by 3 at 32 bits shouldn't, morally, take more than 100 LUTs
<whitequark>
32 for the dividend, 32 for the divisor, a few more for a counter and a comparator
<whitequark>
the decimal/binary conversion is much more interesting
<whitequark>
i... think if you really wanted to have decimal input and output, you should do the entire thing in BCD.
<eddyb>
I considered BCD but I wasn't sure how you would do anything other than addition and subtraction in BCD
<eddyb>
now that I am already thinking of many-cycle dividers... it makes more sense again
<OmniMancer1>
daveshah: what does the "BelId() must construct a unique null-value." mean exactly?
<daveshah>
the default constructor for BelId must create a BelId that doesn't conflict with any real bels
<OmniMancer1>
but it is the same null value each time?
<daveshah>
Yes
<daveshah>
If you use flat indices, then usually -1 would be null and 0+ would be real bels
<daveshah>
iow, if you do `a = BelId();` then `a == BelId()` must hold but
<daveshah>
`a != any of ctx->getBels()`
<OmniMancer1>
indeed
Bike has joined ##openfpga
<ZirconiumX>
I am installing a second copy of Quartus purely for the Cyclone 10GX
<ZirconiumX>
Gotta love vendor tools being sucky
<OmniMancer1>
yaaaay /s
<ZirconiumX>
Quartus Pro is so big it comes in two parts
<ZirconiumX>
Plus a device support file you have to download separately
<OmniMancer1>
oh my
<OmniMancer1>
how many gigabytes must you sacrifice to the Quartus gods?
<ZirconiumX>
2.8 + 1.3 + 3.6 = 7.7
<OmniMancer1>
oof
<ZirconiumX>
*for the download alone*
<OmniMancer1>
oooof
<OmniMancer1>
In cases where you have LUTs that a fracturable, what part of the toolchain process would be responsible for working out what to do with that?
<ZirconiumX>
Because you need Quartus Lite for the NRND chips
<ZirconiumX>
Packing as part of P&R, I'd imagine
<daveshah>
I would probably do it as part of placement
<daveshah>
perhaps with some pre packing
<OmniMancer1>
so you would try to find two LUTs which share inputs and pack them into one shared LUT with two outputs?
<ZirconiumX>
Yep
<daveshah>
Yes although you have to be careful not to combine things that don't belong nearby
<daveshah>
the other option is to represent each LUT as two bels
<daveshah>
and have a validity check for shared inputs
<ZirconiumX>
The ALM in the CV (and presumably 10GX) has four outputs
<OmniMancer1>
does the synthesis backend need cells for the carry chain mode?
<OmniMancer1>
synthesis flow I guess
<daveshah>
Yes, it does
<ZirconiumX>
Depends how smart the P&R is
<ZirconiumX>
Quartus will figure it out on its own
<daveshah>
It depends on the arch too
<daveshah>
if the synthesis tool doesn't know about carry chains then it is likely that it will result in a pattern of LUTs that pnr couldn't easily map to a carry chain
<OmniMancer1>
it certainly seems easier on the pnr if the synthesis tool gives you a set of cells that represent the carry chain already
<daveshah>
Yes, this is what almost everything does
<OmniMancer1>
also I suspect the 2bit adder per lslice thing probably only works that way
<OmniMancer1>
actually it might be 4 bits per lslice
<OmniMancer1>
daveshah: in the ecp5 backend for nextpnr what does the database actually provide in terms of information? the routing pips that exist?
<daveshah>
Yes, the bels pips and wires
<daveshah>
it's a flattened but deduplicated databases, so locations that are the same (from a relative coordinate point of view) are only stored once
<OmniMancer1>
are these significantly different between chip sizes?
<daveshah>
No, but I never got round to sharing them
freemint has joined ##openfpga
freeemint has joined ##openfpga
freeemint has quit [Remote host closed the connection]
<ZirconiumX>
...Why would a vendor decide to invert *one* input of a full-adder?
<ZirconiumX>
Wish I could formally verify my stuff against the vendor model, but it's filled with all sorts of fun "not for synthesis" stuff like a `specify` block
<mwk>
"specify" blocks are easily fixed though
<mwk>
what else is there?
freemint has quit [Ping timeout: 240 seconds]
<ZirconiumX>
`buf`s which I can't tell if they're an actual Verilog thing or not
<daveshah>
Yosys should even ignore them (or parse them)
<daveshah>
They are
<daveshah>
and Yosys should support them
<daveshah>
it's `table` which usually kills vendor models from being supported by Yosys, ime
<ZirconiumX>
The file itself also has a primitive table
<mwk>
also `assign` and `deassign`
<ZirconiumX>
Okay, it does parse the vendor LUT model if I paste it into a file by itself
<ZirconiumX>
...Now to hack at my Yosys tests as necessary
<OmniMancer1>
The placement checks can see what nets inputs/outputs for a Bel are right?
<daveshah>
Yes, although you might want to use ArchCellInfo to speed it uo
<daveshah>
*up
<ZirconiumX>
- tests pass with my MISTRAL_* cells
<ZirconiumX>
- tests pass with vendor cyclonev_lcell_comb cells
<ZirconiumX>
- tests fail with Yosys cyclonev_lcell_comb cells
<daveshah>
Heh
<ZirconiumX>
I'll admit I've never seen a "proof inherently diverges" warning from equiv_induct before though
<OmniMancer1>
when a tile is used as distributed ram in ecp5 is the 4th slice still available as logic?
<daveshah>
Yes
OmniMancer1 has quit [Quit: Leaving.]
Bird|otherbox has quit [Ping timeout: 252 seconds]
<SpaceCoaster>
OmniMancer I read the eagle Datasheet v2.8 and EG4S20 pdfs. Interesting stuff. I looked through your fuzzers. How can I run them?
X-Scale has quit [Ping timeout: 276 seconds]
X-Scale has joined ##openfpga
X-Scale has quit [Ping timeout: 265 seconds]
X-Scale` has joined ##openfpga
X-Scale` is now known as X-Scale
<SpaceCoaster>
The Linux install of the software for anlogic fpga didn’t suck at all. 55MB download, 145MB install in one directory. No onerous dependencies. One executable for both GUI and CLI.
<ZirconiumX>
17.2 GiB of Quartus Pro on my machine, and you need to invoke a bunch of different exes for the compilation process
parataxis has joined ##openfpga
<hackerfoo>
I doubt they have the same functionality, though.
Jybz has quit [Ping timeout: 276 seconds]
Jybz has joined ##openfpga
<GenTooMan>
probably not
<hackerfoo>
It's like comparing DOOM to DOOM 3.
<hackerfoo>
BFG Edition.
mumptai has joined ##openfpga
rombik_su has joined ##openfpga
Bird|otherbox has joined ##openfpga
<ZirconiumX>
For laughs I ran quartus_cdb (for Lite 18.1) on a database generated by Pro 19.3. I have a *brand new* ICE.
<ZirconiumX>
...Quartus is using 16% CPU *idle*
<ZirconiumX>
What the fuck.
<tpw_rules>
ZirconiumX: somewhere i have a makefile for quartus that works in the soc eds shell
<ZirconiumX>
This is purely Pro Edition; Lite Edition sits idle at 0% CPU
<tpw_rules>
pros have enough cpus that that isn't noticeable
<ZirconiumX>
Are Intel subsidising the Lite version by making the Pro version cryptomine on your computer or something? :P
<tpw_rules>
i bet an employee could sneak in a cryptominer and nobody would be able to find it
<OK_b00m3r>
:(
<tpw_rules>
not even anyone else at intel
Asu has quit [Remote host closed the connection]
<TD-Linux>
I have a 15khz signal that I would like to have a PLL lock on to. in particular I need a narrow pull in range so that if the input is messed up or not present that it freeruns
<TD-Linux>
I'm currently doing it in software using a timer, but that limits best case jitter to one clock cycle, requiring a very high clock
<TD-Linux>
is there any way I can cleverly use a fpga pll for this?
Asu has joined ##openfpga
<cr1901_modern>
Xilinx DCMs go down to 1MHz; Lattice PLL down to 3MHz
<cr1901_modern>
I guess you could clock that into a counter and divide down to 15kHz
Asu has quit [Remote host closed the connection]
Asu has joined ##openfpga
<TD-Linux>
yeah multiplying up to a higher value is fine and in fact useful. but the input is still only 15khz
<cr1901_modern>
oh I think you're SOL then... b/c those numbers I gave _are_ input freqs :/
<cr1901_modern>
for 15kHz, I might considering making the PLL manually- VCO and phase detectors could be ICs, but the filter would be discrete
<cr1901_modern>
dunno if they still make phase detector ICs tho
<cr1901_modern>
TD-Linux: fun fact: According to Gardner's Phaselock Techniques, the very first application of PLLs was for exactly what you're trying to accomplish (hsync)
Asu has quit [Remote host closed the connection]
Asu has joined ##openfpga
<TD-Linux>
cr1901_modern, yeah I could assemble it out of a 4046, just feels like there should be something better™ in 2019
<TD-Linux>
there's the TI LMH1982 but it's $30 lol
<sorear>
this seems like something where the best solutions might be from the nominally GPSDO space
<sorear>
"how to generate a low-jitter 10MHz time reference from a 1Hz input"
<sorear>
since you're trying to get jitter to a _very_ small fraction of the input 15khz some of the same ideas might be applicable
<TD-Linux>
this seems to imply the xilinx pll can happily take 15khz with an enormous multiplication ratio, as long as you provide it with a much higher system clock?
<mwk>
sounds like bullshit
<mwk>
but let's take a look
<cr1901_modern>
Is there a mulitplier before the PLL input?
Jybz has quit [Quit: Konversation terminated!]
balrog has quit [Quit: Bye]
m_w has joined ##openfpga
balrog has joined ##openfpga
<omnitechnomancer>
SpaceCoaster I need to add some instructions to the readme at some point
<omnitechnomancer>
SpaceCoaster either place the tools in a td subdirectory of the prjtang repo or symlink them there, you want the td directory to contain the bin and so on dirs.
<omnitechnomancer>
Then source the environment.sh file in the repo root
<omnitechnomancer>
Compile libtang, if it doesn't have instructions the process should be the same as libtrellis
<SpaceCoaster>
Ok up to that point ... compiling
<SpaceCoaster>
No rule to make tangbit.dir :-(
<SpaceCoaster>
That was in the fuzzing branch, trying libtang-dev
<omnitechnomancer>
The fuzzing branch is on top of libtang-dev so I don't expect a difference, though try running it again, I remember there was something screwy with the generated files in cmake and clean builds
nrossi has quit [Quit: Connection closed for inactivity]
lutsabound has quit [Quit: Connection closed for inactivity]
<SpaceCoaster>
Screwiness confirmed. Built branch master first, the fuzzing built.
ym has joined ##openfpga
<SpaceCoaster>
It built tang it in libtang
<SpaceCoaster>
Tangbit
<SpaceCoaster>
And a libtang.so, pytang.so
<omnitechnomancer>
You actually only need pytang.so for the fuzzers
<SpaceCoaster>
Ok, got it.
<omnitechnomancer>
You should symlink devices.json into the database directory
<SpaceCoaster>
Is that in the repo or outside?
<omnitechnomancer>
And run tools/get_tilegrid_all.py followed by tools/annotate_tilegrid.py and then tools/create_empty_bitdbs.py
<omnitechnomancer>
In the repo
<SpaceCoaster>
Ok, database/eagle/eagle_20/tilegrid.json created
<omnitechnomancer>
Then you should be able to run fuzzer.py in any of the fuzzer directories
<SpaceCoaster>
Import bitstream, not found
<SpaceCoaster>
Also norouting.py:64 is indented in a way that makes python cry.
<SpaceCoaster>
Nonrouting, IOS is auto-correcting every word!
<omnitechnomancer>
Oh, delete that import it is left over from past experiments before I had pytang
<omnitechnomancer>
I will fix that tonight
<omnitechnomancer>
I'll check the indentation but it was working for me
<omnitechnomancer>
That should have the same indentation as the line above, it may have been disturbed when I was removing some other stuff from that function before pushing it
<SpaceCoaster>
Np, malice fuzzier is running, has run!
<SpaceCoaster>
Mslice
<SpaceCoaster>
Does that get me stuff that I can view I html files or something?
<omnitechnomancer>
To generate the html run tools/html_all.py and give it a path to put the docs in
<omnitechnomancer>
The first two fuzzers determine the LUT init bits for the two types of slices present
lutsabound has joined ##openfpga
<SpaceCoaster>
I read them, they made sense. I lut per slice tested.
<SpaceCoaster>
1 lut per slice
<omnitechnomancer>
There are 2 LUTs per mslice
<SpaceCoaster>
Got it the fuzzier tests them one at a time.
rombik_su has quit [Read error: Connection reset by peer]
<SpaceCoaster>
Fuzzing the routing takes a while. This all works on one tile and then replicates across all of them?
mumptai has quit [Quit: Verlassend]
<omnitechnomancer>
Depending on how many cores you have for the VM you can set the TANG_JOBS env var to a number to get more than 4 way parallelism for the fuzzing
<omnitechnomancer>
The routing actually does both the plb and pib types at once, but yes it just checks one of each and the assumption is that the routing bits are the same in all such tiles
<omnitechnomancer>
I had to move the tiles it was using in a bit since the edge tiles seem to have some special cases
<SpaceCoaster>
Nice, does that hold for edge tiles?
<SpaceCoaster>
Typing overlap
<SpaceCoaster>
Thanks for talking me through it
Asu` has joined ##openfpga
Asu has quit [Ping timeout: 250 seconds]
Asu` has quit [Quit: Konversation terminated!]
<omnitechnomancer>
As far as I can tell all the connections that are valid for the more central tiles are still valid for the edge ones, I think it is just that some of the inter tile wires have special cases at the edge since they don't have neighbors in some directions
<SpaceCoaster>
So the configuration bits exist but no neighbors to connect to.
freemint has joined ##openfpga
<SpaceCoaster>
What is the e input
<SpaceCoaster>
I see the rest a b c d mi sr ce but no e
<SpaceCoaster>
Oops lslice has e
<omnitechnomancer>
See where?
<omnitechnomancer>
It gets checked last because it was easier to do the starting at 4 that way
<SpaceCoaster>
Datasheet structure diagram. These are the lut inputs
<omnitechnomancer>
The lslices have the LUT able to be used as a 5 input LUT, with the extra mux driven by e or as two 4 input LUTs that share inputs