freemint has quit [Read error: Connection reset by peer]
genii has quit [Remote host closed the connection]
freeemint has quit [Ping timeout: 250 seconds]
freemint has joined ##openfpga
<mwk>
does anyone have some idea about what is a "CTLReg" configuration mode / Persist option setting for Xilinx FPGAs?
<mwk>
I see in UG908 it's one of the possibilities for the BITSTREAM.CONFIG.PERSIST option in Vivado, and it's also accepted by ISE for many FPGA families
<mwk>
oh
<mwk>
guess I figured it out; it's an option that just sets the CTL register bit that enables persist, but doesn't set up the necessery I/O pads
freemint has quit [Ping timeout: 248 seconds]
<mwk>
how boring
<mwk>
also, wtf, why does spartan 3a actually accept X16 setting in there, the thing's not supposed to have a config interface that wide
<mwk>
it... it actually configures 8 more pads than X8 mode as bidirectional, did I just stumble over a secret 16-bit-wide config mode
freemint has joined ##openfpga
freemint has quit [Ping timeout: 250 seconds]
_whitelogger has joined ##openfpga
mumptai_ has joined ##openfpga
mumptai has quit [Ping timeout: 245 seconds]
Bike has quit [Quit: Lost terminal]
MarcelineVQ has quit [Read error: Connection reset by peer]
MarcelineVQ has joined ##openfpga
MarcelineVQ has quit [Read error: Connection reset by peer]
MarcelineVQ has joined ##openfpga
_whitelogger has joined ##openfpga
_whitelogger has joined ##openfpga
pie_ has joined ##openfpga
Flea86 has joined ##openfpga
<sensille>
as the artix 7 35T and 50T have the same bitstream lengths, does anyone know if it's possible to use a 35T as a 50T?
<whitequark>
mwk: is it horribly broken somehow?
<whitequark>
sensille: yes
<sensille>
do i need to manipulate the bitstream manually, like changing the device id or something?
<sensille>
oh, and is even the 15T the same die?
emeb_mac has quit [Ping timeout: 272 seconds]
<sensille>
also interesting question, if i implement in vivado for a 15T, will it consider all resources for floorplanning/routing and only restrict the total amount or will it just ignore certain areas of the die?
<whitequark>
it will consider them all for routing
<whitequark>
which can lead to unexpected results if your design is very routing heavy and you scale up
<sensille>
:)
Asu has joined ##openfpga
<sensille>
ok, at least vivado doesn't let me program the stream for a different device as-is. not quite unexpected
<tnt>
the fpga will also reject the bitstream, idcode is burned in there somehow along with checksums.
<sensille>
hm, maybe generate_bitstream doesn't check the resource usage from the previous stage, so i can just implement for 50T and generate a bitstream for 35T from that
<whitequark>
you can edit the bitstream
<sensille>
xilinx could sell upgrade-codes for already sold chips ...
<sorear>
upgrade codes for the P4 were a thing because P4 money is mostly made by selling to individuals
<sorear>
xilinx does not care about making money from individuals. they make all their money on bulk orders
<sorear>
all qty 1 is effectively engineering samples
<sensille>
yeah, makes sense
<sensille>
and on the upper virtex end all dies are different
Asu has quit [Ping timeout: 246 seconds]
Asu has joined ##openfpga
<sorear>
(that was a SNB era thing, not P4, oops)
<azonenberg>
sensille: not exactly
<azonenberg>
on the bigger chips they have a few virtex dies, then different interposers to mount 2, 3, 4 etc of them
<azonenberg>
sensille: and yes, the 15T and 35T can be loaded to ~100% and still make timing
<azonenberg>
since you have space to spread out
<azonenberg>
when i first saw identical power and bitstream lengths i was all excited
<azonenberg>
i didnt think they were soft-crippling them like this
<azonenberg>
i thought they had actually come up with a way to fuse off a column of CLBs or something to do yield enhancement on large dies
<azonenberg>
so you'd declare say one out of every four columns bad and turn them into bypasses
<sensille>
wouldn't that require different bitstreams per individual chip?
<azonenberg>
no, because the fusing would be done in hardware and you'd have the bitstream remapped as you loaded it
<azonenberg>
so a given LUT config might go to column 3 or 4 depending on the fuse setting
<sensille>
well, yes, wouldn't that change timing? :)
<sorear>
if you did it at a column granularity you could use the same PnR BUT your timing analyses would be pessimistic because you'd have to assume every path was longer
<azonenberg>
Yes, the fused chips would have to be pessimistic timing
<azonenberg>
conjecture: the fused chips would only come in -1 and -2
<azonenberg>
this sort of remapping is done in SRAMs for yield enhancement already
<sorear>
but it turns out they have an, um, different way to dispose of rejects
<azonenberg>
you have a 2:1 mux on every column
<sensille>
oh, now you tell me -1, -2, -3 are also the same dies?
<azonenberg>
so if column X is bad, columns 0...X-1 map to 0...X-1 and X....top come to x+1...top+1
<azonenberg>
sensille: well duh
<azonenberg>
that's all just process variation
<sorear>
you run tests on the chip at various speeds, then laser-mark it with the highest speed it works at
<sorear>
standard practice "binning"
<sensille>
and it might be that just a single CLB is too slow
<azonenberg>
Yeah, or one GTP, or something
<sensille>
well, hard to make use of that anyway
<azonenberg>
and it might only be too slow at extremes of the temperature ranges anyway
<azonenberg>
So you can very often push a chip a fair bit past the timing limits if you have tight voltage specs, are running at controlled room temperature, and are willing to live a bit dangerously
<azonenberg>
i wouldnt ever want to put such a chip into an important application but for testing while you optimize, you can usually assume correct behavior means you got lucky
<azonenberg>
incorrect behavior could be an rtl bug or your timing problem so there's no way to know
<sensille>
like for a cryptocurrency miner or something
<azonenberg>
thats a horrible example because those run super hot ;p
<azonenberg>
i'm more thinking if your SoC has one path 20ps over the limit
<sorear>
it's also a bad example because you can verify the result after the fact
<azonenberg>
99.9% likely it will work fine on a lab bench
<sensille>
but a glitch from time to time wouldn't matter
<sorear>
so you can and should run a miner at PVT/frequency levels where 1% of the results are bogus
<azonenberg>
sorear: wont pools kick you out if you have too many false positives?
<whitequark>
you don't let that go to the pool duh
<azonenberg>
or does the control software double-check and reject those before they go upstream?
<azonenberg>
i guess if you confirm before submission that works
<sensille>
anyway, PoW currencies are dump anyway
<sensille>
dumb
<sorear>
yes
* azonenberg
is still waiting for someone to make a PoW that does useful work
<whitequark>
that exists
<azonenberg>
bruteforcing sha256 is stupid, but if we could get all the buttcoin miners to do protein folding or something...
<sorear>
I've pitched nfscoin to you haven't I
<azonenberg>
sorear: i think me and rqou called it nsacoin
<sensille>
hehe
<azonenberg>
but same idea
<azonenberg>
GNFS relations as a pow?
<sorear>
yes
<sorear>
hint: modern number field "sieves" don't actually sieve, they find smooth numbers by trial factorization (using ECM), so the relations can be tested at random
<azonenberg>
@_@
<azonenberg>
Modern factorization algorithms are far beyond my comprehension
<azonenberg>
i grok the basics of RSA
<azonenberg>
but even elliptic curve stuff i have a hard time understanding, and i say this as someone in the middle of implementing curve25519
<sorear>
I could implement QS, for which the same claim applies
<sorear>
not NFS :(
Jybz has joined ##openfpga
<pie_>
azonenberg: i suppose its a good thing 25519 is supposed to be easy to implement? :)
<sensille>
easy to implement, hard to understand
<pie_>
sure
_whitelogger has joined ##openfpga
<azonenberg>
sensille/pie: i'm porting the nacl C "ref" implementation to FPGA
<azonenberg>
it was the least optimized one i could find from a trustworthy source, which made it the easiest to grok
<azonenberg>
and undo a lot of the bignum stuff that's better handled with large integers on an FPGA
pepijndevos has joined ##openfpga
<sorear>
you probably still want to do field multiplications over multiple cycles?
<sorear>
a 255x255 multiplier is Kinda Big
<sensille>
azonenberg likes it Big
<whitequark>
lmao
<whitequark>
i like big MULs and i cannot lie
<azonenberg>
yes the field multiplication is being done multicycle
<azonenberg>
i'm still figuring out exact details of how much parallelism vs area to do
<azonenberg>
i will probably end up with some sort of microcode then a bunch of 255-bit mul/add/sub cores
<azonenberg>
and a script showing how to sequence it all
<azonenberg>
but details are TBD
<azonenberg>
right now i'm focusing on doing all the primitives, then i'll worry about how to hook them up
<sorear>
nice thing about mul/add/sub is that you can do all of them without leaving a redundant-carry representation
<sorear>
not sure what vivado does with a 255-wide adder, or if you're trying to do modular reduction at every step
<azonenberg>
I do reductions periodically to keep it from getting too big
<azonenberg>
i actually have two different representations
<azonenberg>
for add/sub and a few other things i have a 264-bit integer that allows room for a carry out that hasn't yet been reduced
<azonenberg>
for mul i have an array of 32 8/16/32 bit (depending on where i am in the process integers
<azonenberg>
which are sized to fit the fpga multiplier blocks
<azonenberg>
optimizing that for efficiency and a better fit is a TODO as well, right now its a pretty literal port of the C version
<azonenberg>
but i figure once i have a hdl ref implementation i can produce something more complex then try to do equivalency checks or something
_whitelogger has joined ##openfpga
mumptai has joined ##openfpga
freemint has joined ##openfpga
<freemint>
How many times solwer than real time your GHDL compared to your FPGA?
<freemint>
Are there environments which run GHDL much faster?
<ZirconiumX>
I mean, GHDL is a simulator, right?
<freemint>
Yes
<ZirconiumX>
FPGAs are difficult to simulate well in software, even with something like Verilator
<freemint>
Is GHDL's internal time being 4000 times slower than wall clock time good or bad?
<ZirconiumX>
Unfortunately I don't know simulators well enough to compare that with things like Icarus or Verilator
<whitequark>
4000 seems really fast
<freemint>
I am currently not sure what exactly the CPU executes so it might just be idling
<freemint>
turns out when i simulate short timeframes the simulation difference spikes to 30000-40000x
<ZirconiumX>
Actually, if GHDL can simulate it, then GHDLsynth *should* be able to convert it to RTLIL
<ZirconiumX>
AKA Yosys internal language
<freemint>
I only got an lx9 board, which runs j2 which i will not try flash rn.
Bike has joined ##openfpga
<pepijndevos>
Note that GHDL has several backends, for speed you probably want to use LLVM or GCC backends
<pepijndevos>
And in particular *not* the interpreter backend. Not sure how fast mcode is compared to those
<pepijndevos>
>Actually, if GHDL can simulate it, then GHDLsynth *should* be able to convert it to RTLIL
<pepijndevos>
This is not true
<pepijndevos>
It will eventually be true, but there are a lot of GHDL IIR types that are not synthesizable yest.
<pepijndevos>
s/yest/yet
<freemint>
I am not sure which back-end is used but it there are some .o generated during thhe first steps of the script and then i have some binary i execute.
freemint has quit [Remote host closed the connection]