##openfpga on 2019-08-17 — irc logs at freenode.irclog.whitequark.org

00:04 freeemint has joined ##openfpga

00:05 freemint has quit [Read error: Connection reset by peer]

00:07 genii has quit [Remote host closed the connection]

00:10 freeemint has quit [Ping timeout: 250 seconds]

00:10 freemint has joined ##openfpga

00:12 <mwk> does anyone have some idea about what is a "CTLReg" configuration mode / Persist option setting for Xilinx FPGAs?

00:14 <mwk> I see in UG908 it's one of the possibilities for the BITSTREAM.CONFIG.PERSIST option in Vivado, and it's also accepted by ISE for many FPGA families

00:18 <mwk> oh

00:19 <mwk> guess I figured it out; it's an option that just sets the CTL register bit that enables persist, but doesn't set up the necessery I/O pads

00:29 freemint has quit [Ping timeout: 248 seconds]

00:30 <mwk> how boring

00:31 <mwk> also, wtf, why does spartan 3a actually accept X16 setting in there, the thing's not supposed to have a config interface that wide

00:32 <mwk> it... it actually configures 8 more pads than X8 mode as bidirectional, did I just stumble over a secret 16-bit-wide config mode

00:38 freemint has joined ##openfpga

01:11 freemint has quit [Ping timeout: 250 seconds]

02:41 _whitelogger has joined ##openfpga

03:11 mumptai_ has joined ##openfpga

03:14 mumptai has quit [Ping timeout: 245 seconds]

03:51 Bike has quit [Quit: Lost terminal]

04:56 MarcelineVQ has quit [Read error: Connection reset by peer]

04:57 MarcelineVQ has joined ##openfpga

04:57 MarcelineVQ has quit [Read error: Connection reset by peer]

04:58 MarcelineVQ has joined ##openfpga

05:32 _whitelogger has joined ##openfpga

05:56 _whitelogger has joined ##openfpga

06:04 pie_ has joined ##openfpga

06:24 Flea86 has joined ##openfpga

07:01 <sensille> as the artix 7 35T and 50T have the same bitstream lengths, does anyone know if it's possible to use a 35T as a 50T?

07:02 <whitequark> mwk: is it horribly broken somehow?

07:02 <whitequark> sensille: yes

07:03 <sensille> do i need to manipulate the bitstream manually, like changing the device id or something?

07:04 <sensille> oh, and is even the 15T the same die?

07:10 emeb_mac has quit [Ping timeout: 272 seconds]

07:12 <sensille> also interesting question, if i implement in vivado for a 15T, will it consider all resources for floorplanning/routing and only restrict the total amount or will it just ignore certain areas of the die?

07:15 <whitequark> it will consider them all for routing

07:15 <whitequark> which can lead to unexpected results if your design is very routing heavy and you scale up

07:18 <sensille> :)

07:23 Asu has joined ##openfpga

07:23 <sensille> ok, at least vivado doesn't let me program the stream for a different device as-is. not quite unexpected

07:25 <tnt> the fpga will also reject the bitstream, idcode is burned in there somehow along with checksums.

07:27 <sensille> hm, maybe generate_bitstream doesn't check the resource usage from the previous stage, so i can just implement for 50T and generate a bitstream for 35T from that

07:28 <whitequark> you can edit the bitstream

07:32 <sensille> xilinx could sell upgrade-codes for already sold chips ...

07:38 <sorear> upgrade codes for the P4 were a thing because P4 money is mostly made by selling to individuals

07:38 <sorear> xilinx does not care about making money from individuals. they make all their money on bulk orders

07:38 <sorear> all qty 1 is effectively engineering samples

07:38 <sensille> yeah, makes sense

07:39 <sensille> and on the upper virtex end all dies are different

07:44 Asu has quit [Ping timeout: 246 seconds]

07:44 Asu has joined ##openfpga

07:47 <sorear> (that was a SNB era thing, not P4, oops)

07:54 <azonenberg> sensille: not exactly

07:54 <azonenberg> on the bigger chips they have a few virtex dies, then different interposers to mount 2, 3, 4 etc of them

07:55 <azonenberg> sensille: and yes, the 15T and 35T can be loaded to ~100% and still make timing

07:55 <azonenberg> since you have space to spread out

07:55 <azonenberg> when i first saw identical power and bitstream lengths i was all excited

07:55 <azonenberg> i didnt think they were soft-crippling them like this

07:55 <azonenberg> i thought they had actually come up with a way to fuse off a column of CLBs or something to do yield enhancement on large dies

07:56 <azonenberg> so you'd declare say one out of every four columns bad and turn them into bypasses

07:56 <sensille> wouldn't that require different bitstreams per individual chip?

07:57 <azonenberg> no, because the fusing would be done in hardware and you'd have the bitstream remapped as you loaded it

07:58 <azonenberg> so a given LUT config might go to column 3 or 4 depending on the fuse setting

07:58 <sensille> well, yes, wouldn't that change timing? :)

07:58 <sorear> if you did it at a column granularity you could use the same PnR BUT your timing analyses would be pessimistic because you'd have to assume every path was longer

07:58 <azonenberg> Yes, the fused chips would have to be pessimistic timing

07:58 <azonenberg> conjecture: the fused chips would only come in -1 and -2

07:59 <azonenberg> this sort of remapping is done in SRAMs for yield enhancement already

07:59 <sorear> but it turns out they have an, um, different way to dispose of rejects

07:59 <azonenberg> you have a 2:1 mux on every column

07:59 <sensille> oh, now you tell me -1, -2, -3 are also the same dies?

07:59 <azonenberg> so if column X is bad, columns 0...X-1 map to 0...X-1 and X....top come to x+1...top+1

07:59 <azonenberg> sensille: well duh

07:59 <azonenberg> that's all just process variation

08:00 <sorear> you run tests on the chip at various speeds, then laser-mark it with the highest speed it works at

08:00 <sorear> standard practice "binning"

08:00 <sensille> and it might be that just a single CLB is too slow

08:01 <azonenberg> Yeah, or one GTP, or something

08:01 <sensille> well, hard to make use of that anyway

08:01 <azonenberg> and it might only be too slow at extremes of the temperature ranges anyway

08:01 <azonenberg> So you can very often push a chip a fair bit past the timing limits if you have tight voltage specs, are running at controlled room temperature, and are willing to live a bit dangerously

08:02 <azonenberg> i wouldnt ever want to put such a chip into an important application but for testing while you optimize, you can usually assume correct behavior means you got lucky

08:02 <azonenberg> incorrect behavior could be an rtl bug or your timing problem so there's no way to know

08:02 <sensille> like for a cryptocurrency miner or something

08:02 <azonenberg> thats a horrible example because those run super hot ;p

08:02 <azonenberg> i'm more thinking if your SoC has one path 20ps over the limit

08:02 <sorear> it's also a bad example because you can verify the result after the fact

08:03 <azonenberg> 99.9% likely it will work fine on a lab bench

08:03 <sensille> but a glitch from time to time wouldn't matter

08:03 <sorear> so you can and should run a miner at PVT/frequency levels where 1% of the results are bogus

08:03 <azonenberg> sorear: wont pools kick you out if you have too many false positives?

08:03 <whitequark> you don't let that go to the pool duh

08:03 <azonenberg> or does the control software double-check and reject those before they go upstream?

08:04 <azonenberg> i guess if you confirm before submission that works

08:04 <sensille> anyway, PoW currencies are dump anyway

08:04 <sensille> dumb

08:04 <sorear> yes

08:04 * azonenberg is still waiting for someone to make a PoW that does useful work

08:04 <whitequark> that exists

08:04 <azonenberg> bruteforcing sha256 is stupid, but if we could get all the buttcoin miners to do protein folding or something...

08:05 <sorear> I've pitched nfscoin to you haven't I

08:05 <azonenberg> sorear: i think me and rqou called it nsacoin

08:05 <sensille> hehe

08:05 <azonenberg> but same idea

08:05 <azonenberg> GNFS relations as a pow?

08:05 <sorear> yes

08:06 <sorear> hint: modern number field "sieves" don't actually sieve, they find smooth numbers by trial factorization (using ECM), so the relations can be tested at random

08:06 <azonenberg> @_@

08:06 <azonenberg> Modern factorization algorithms are far beyond my comprehension

08:06 <azonenberg> i grok the basics of RSA

08:06 <azonenberg> but even elliptic curve stuff i have a hard time understanding, and i say this as someone in the middle of implementing curve25519

08:07 <sorear> I could implement QS, for which the same claim applies

08:07 <sorear> not NFS :(

08:29 Jybz has joined ##openfpga

08:41 <pie_> azonenberg: i suppose its a good thing 25519 is supposed to be easy to implement? :)

08:41 <sensille> easy to implement, hard to understand

08:42 <pie_> sure

08:53 _whitelogger has joined ##openfpga

09:10 <azonenberg> sensille/pie: i'm porting the nacl C "ref" implementation to FPGA

09:10 <azonenberg> it was the least optimized one i could find from a trustworthy source, which made it the easiest to grok

09:11 <azonenberg> and undo a lot of the bignum stuff that's better handled with large integers on an FPGA

09:12 pepijndevos has joined ##openfpga

09:13 <sorear> you probably still want to do field multiplications over multiple cycles?

09:13 <sorear> a 255x255 multiplier is Kinda Big

09:13 <sensille> azonenberg likes it Big

09:14 <whitequark> lmao

09:14 <whitequark> i like big MULs and i cannot lie

09:14 <azonenberg> yes the field multiplication is being done multicycle

09:15 <azonenberg> i'm still figuring out exact details of how much parallelism vs area to do

09:15 <azonenberg> i will probably end up with some sort of microcode then a bunch of 255-bit mul/add/sub cores

09:15 <azonenberg> and a script showing how to sequence it all

09:15 <azonenberg> but details are TBD

09:16 <azonenberg> right now i'm focusing on doing all the primitives, then i'll worry about how to hook them up

09:16 <sorear> nice thing about mul/add/sub is that you can do all of them without leaving a redundant-carry representation

09:17 <sorear> not sure what vivado does with a 255-wide adder, or if you're trying to do modular reduction at every step

09:17 <azonenberg> I do reductions periodically to keep it from getting too big

09:17 <azonenberg> i actually have two different representations

09:18 <azonenberg> for add/sub and a few other things i have a 264-bit integer that allows room for a carry out that hasn't yet been reduced

09:18 <azonenberg> for mul i have an array of 32 8/16/32 bit (depending on where i am in the process integers

09:19 <azonenberg> which are sized to fit the fpga multiplier blocks

09:19 <azonenberg> optimizing that for efficiency and a better fit is a TODO as well, right now its a pretty literal port of the C version

09:20 <azonenberg> but i figure once i have a hdl ref implementation i can produce something more complex then try to do equivalency checks or something

09:26 _whitelogger has joined ##openfpga

09:26 mumptai has joined ##openfpga

09:35 freemint has joined ##openfpga

10:40 <freemint> How many times solwer than real time your GHDL compared to your FPGA?

10:41 <freemint> Are there environments which run GHDL much faster?

10:41 <ZirconiumX> I mean, GHDL is a simulator, right?

10:41 <freemint> Yes

10:42 <ZirconiumX> FPGAs are difficult to simulate well in software, even with something like Verilator

10:43 <freemint> Is GHDL's internal time being 4000 times slower than wall clock time good or bad?

10:43 <ZirconiumX> Unfortunately I don't know simulators well enough to compare that with things like Icarus or Verilator

10:44 <whitequark> 4000 seems really fast

10:45 <freemint> I am currently not sure what exactly the CPU executes so it might just be idling

10:45 <ZirconiumX> Are you simulating J2?

10:45 <ZirconiumX> It comes with a Linux image, right?

10:46 <ZirconiumX> Actually, hmm

10:46 <freemint> Nope i am simulating the https://github.com/j-core/j-core-ice40

10:47 <freemint> turns out when i simulate short timeframes the simulation difference spikes to 30000-40000x

10:47 <ZirconiumX> Actually, if GHDL can simulate it, then GHDLsynth *should* be able to convert it to RTLIL

10:47 <ZirconiumX> AKA Yosys internal language

10:49 <freemint> I only got an lx9 board, which runs j2 which i will not try flash rn.

11:53 Bike has joined ##openfpga

12:06 <pepijndevos> Note that GHDL has several backends, for speed you probably want to use LLVM or GCC backends

12:06 <pepijndevos> And in particular *not* the interpreter backend. Not sure how fast mcode is compared to those

12:06 <pepijndevos> >Actually, if GHDL can simulate it, then GHDLsynth *should* be able to convert it to RTLIL

12:06 <pepijndevos> This is not true

12:07 <pepijndevos> It will eventually be true, but there are a lot of GHDL IIR types that are not synthesizable yest.

12:08 <pepijndevos> s/yest/yet

12:19 <freemint> I am not sure which back-end is used but it there are some .o generated during thhe first steps of the script and then i have some binary i execute.

13:12 freemint has quit [Remote host closed the connection]

13:13 freemint has joined ##openfpga

13:13 <_whitenotifier-3> [Boneless-CPU] zignig synchronize pull request #4: directives bikeshed - https://git.io/fjXmy

13:26 flaviusb has quit [Ping timeout: 272 seconds]

13:47 <pepijndevos> freemint, then it's probably llvm/gcc

14:03 <freemint> I just discovered radare2/Cutter. Great if your architecture is supported.

14:18 s_frit has quit [Remote host closed the connection]

14:18 s_frit has joined ##openfpga

14:53 tlwoerner has quit [Quit: Leaving]

15:26 emeb has joined ##openfpga

15:43 Flea86 has quit [Quit: Leaving]

17:45 Jybz has quit [Excess Flood]

17:45 Jybz has joined ##openfpga

17:58 X-Scale has quit [Quit: HydraIRC -> http://www.hydrairc.com <- Chicks dig it]

18:18 Asu has quit [Read error: Connection reset by peer]

18:18 Asu has joined ##openfpga

19:24 balrog has quit [Ping timeout: 272 seconds]

19:35 balrog has joined ##openfpga

20:47 emeb_mac has joined ##openfpga

20:58 s_frit has quit [Remote host closed the connection]

20:58 s_frit has joined ##openfpga

21:24 rohitksingh has joined ##openfpga

21:43 mifune has quit [Ping timeout: 245 seconds]

22:02 mumptai has quit [Remote host closed the connection]

22:11 Jybz has quit [Quit: Konversation terminated!]

23:14 Asu has quit [Remote host closed the connection]

23:15 Asu has joined ##openfpga

23:29 genii has joined ##openfpga

23:30 Asu has quit [Remote host closed the connection]

23:38 genii has quit [Read error: Connection reset by peer]

23:44 genii has joined ##openfpga

23:45 freemint has quit [Ping timeout: 250 seconds]

23:49 emeb has quit [Quit: Leaving.]

23:57 X-Scale has joined ##openfpga