##openfpga on 2019-04-21 — irc logs at freenode.irclog.whitequark.org

00:32 _whitelogger has joined ##openfpga

00:57 futarisIRCcloud has joined ##openfpga

01:23 _whitelogger has joined ##openfpga

01:46 dj_pi has quit [Ping timeout: 246 seconds]

01:52 unixb0y has quit [Ping timeout: 245 seconds]

01:56 unixb0y has joined ##openfpga

02:11 balrog has quit [Quit: Bye]

02:16 balrog has joined ##openfpga

02:41 dj_pi has joined ##openfpga

02:48 ZombieChicken has quit [Ping timeout: 256 seconds]

02:50 <whitequark> eddyb: you can just use a PCI-to-PCIe bridge chip

02:50 ZombieChicken has joined ##openfpga

02:54 gsi__ has joined ##openfpga

02:57 gsi_ has quit [Ping timeout: 244 seconds]

03:01 dj_pi has quit [Ping timeout: 246 seconds]

03:08 <TD-Linux> eddyb, I would start with booting it tbh

03:09 <TD-Linux> tbh I would start with a 3.3v 386sx. they are slow enough that you can sigrok them with a cheap logic analyzer

03:16 <gruetzkopf> ?

03:16 <gruetzkopf> the highlight on pci maaay have been a stupid idea

03:17 <sorear> we *could* be discussing pci-dss

03:29 <whitequark> lol

03:31 genii has joined ##openfpga

03:41 genii has quit [Remote host closed the connection]

04:52 _whitelogger has joined ##openfpga

05:03 ZombieChicken has quit [Remote host closed the connection]

05:03 ZombieChicken has joined ##openfpga

05:08 flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]

05:23 <eddyb> TD-Linux: yeah the logic levels on the Pentium FSBs seem obnoxious, but 66 or 100MHz aren't that hard to reach, right? anyway, I agree, modulo not having a 386 on hand already (but if they're cheap, eh)

05:24 <TD-Linux> yeah you should be able to reach that.

05:25 <eddyb> whitequark: heh, I suppose that is easier than spending who knows how long building an opensource version of such a bridge in an FPGA

05:32 <whitequark> 66 or 100 MHz isn't that easy to get reliably working on a parallel bus

05:32 <whitequark> remember how many months it took me to get stuff working on glasgow?

05:34 <eddyb> TD-Linux: oh wow there aren't that many steps between 386 and NetBurst. anyway, 486 seems like it's slow enough for Glasgow (although you wouldn't be able to connect it to everything)

05:35 <TD-Linux> er I thought you were using ecp5

05:35 <eddyb> whitequark: oops I was scrolled up. ugh reality of digital signals strikes again

05:35 <eddyb> TD-Linux: depends, I'd much rather trust a Glasgow to do initial poking (esp to watch an already running system) than my own thing

05:36 <whitequark> you don't have enough pins on a glasgow to watch a huge parallel bus

05:36 <whitequark> you have sixteen.

05:37 <whitequark> you need what, 96 at a minimum?

05:37 <whitequark> that's a fuckton of pins and at 66 MHz it is uhhh

05:37 <eddyb> ah, sure, I meant more the control lines

05:37 <whitequark> 6 Gbps?

05:37 <whitequark> not even Glasgow revE will be able to dump that straight into a PC

05:38 <TD-Linux> for a 386 I was mostly debugging it running off a boot ROM so watching just the address lines was enough as I could trace along with a ROM dump

05:38 <whitequark> yeah, you could do compression and all sorts of stuff

05:38 <TD-Linux> (I only saw 16 lines so even that only worked for early boot)

05:38 <TD-Linux> 386sx has only 16 data lines which helps

05:38 <whitequark> ahhh

05:39 <eddyb> TD-Linux: ahh right that is a neat trick. I was thinking some address lines and some of the control lines, but forgot instruction memory

05:39 <whitequark> so it might be possible to run several glasgows in parallel

05:39 <whitequark> using the sync port

05:40 <eddyb> TD-Linux: oh, that's a benefit to using 386 then, 486 has 32 data lines

05:41 <eddyb> oh hey https://blog.lse.epita.fr/articles/77-lsepc-intro.html

05:42 <eddyb> TD-Linux: oops, I misunderstood what the SX meant

05:44 <eddyb> or rather, 486SX doesn't get the bus width downgrade, just no integrated FPU AFAICT

05:46 <TD-Linux> do you mean the Processor Formally Known as the i486?

05:47 <TD-Linux> eddyb, certainly faster processors would be cooler. but I think 386->486->pentium has only incremental bus changes so you could scale up over time

05:47 <sorear> [34]86, or, "back in my day we did multiprocessing *without* anything equivalent to CAS"

05:51 <TD-Linux> sorear, but how do you modern folks do multitasking without a TSS???

05:52 <sorear> tss is mostly ignorable? it never did much of anything you couldn't do otherwise on other architectures

05:52 <sorear> it's of no theoretical impact

05:54 <whitequark> sorear: the iopl map is mildly interesting

05:54 <whitequark> it's like a shitty version of paging though

05:54 <whitequark> with better granularity

05:54 <sorear> the iopl map is sorta weirdly taped on to the tss

05:54 <fseidel> only part of TSS that really matters is ESP0, and you can

05:54 <fseidel> usually get a register for that on most ISAs

05:54 <sorear> i don't really understand the gotcha td-linux is trying to pull on me

05:55 <TD-Linux> I was being ironic™

05:55 <eddyb> oh hey why do the datasheets talk about address lines also being inputs for cache invalidation?

05:55 <fseidel> ah okay

05:56 <whitequark> eddyb: dma by peripherals into main ram, i think

05:56 <eddyb> can you do anything close to ccNUMA with 486s?

05:56 <TD-Linux> up until recent windows on 32 bit you could get direct io port access from a process. I dunno if that was via TSS or some other mechanism though

05:57 <eddyb> whitequark: oh, that makes far more sense than anything I could think of

05:57 <TD-Linux> this is relevant for my windows xp vm with a pci express parallel port iommu'd into it

05:57 * eddyb blinks

06:01 <eddyb> TD-Linux: I'm trying to figure out if 386SX could do PCI, is it at all possible with the 16-bit bus?

06:40 rohitksingh has joined ##openfpga

06:41 <TD-Linux> the pci bus is generally not directly attached to the fsb, so yes

06:41 <TD-Linux> though I don't know of any 386 machines that had pci

06:54 Jybz has joined ##openfpga

06:56 emeb_mac has quit [Ping timeout: 246 seconds]

07:10 rohitksingh has quit [Ping timeout: 246 seconds]

07:15 <eddyb> TD-Linux: hmm, can you regroup the 16-bit access? or am I silly and they are 32-bit over several clock cycles? or does PCI not require 32-bit?

07:17 <TD-Linux> it's just 32 bit in two halves

07:36 <eddyb> TD-Linux: cursed idea: run qemu on a RISC-V CPU on the FPGA

07:36 <whitequark> what's cursed about it

07:36 <whitequark> it's just slow

07:38 <eddyb> writing your own emulation, sure, but running a software emulator inside a hardware emulator... well, it would be tricky to provide e.g. ethernet to that, as opposed to a PCI ethernet board (or one implemented in the FPGA)

07:41 <whitequark> an fpga is not an emulator

07:42 <whitequark> it implements that circuit, it doesn't implement a different circuit with a behavior that is equivalent in some way

07:43 Jybz has quit [Ping timeout: 255 seconds]

07:43 <daveshah> Synopsys would disagree...

07:43 <daveshah> https://www.synopsys.com/verification/emulation/zebu-server.html

07:43 Jybz has joined ##openfpga

07:43 <whitequark> hrm

07:44 <whitequark> "Xilinx UltraScale FPGA, the industry’s most advanced emulation chip"

07:44 * whitequark stares

07:45 <whitequark> eddyb: anyway it's not much of a problem, you use the OS network driver in qemu and then implement the driver for whatever emulated ethernet you have on the FPGA

07:45 <whitequark> it's very doable

07:45 <whitequark> you bring it up step by step

07:46 <eddyb> yeah but isn't it easier to just skip the qemu and RISC-V and hook up the ethernet to the 386 directly?

07:47 <whitequark> yes? you were the one who suggested qemu in the first place

07:47 <whitequark> i mean, it's not always easier

07:48 <sorear> it's an emulator, just look at the number of I/O standards it can support

07:48 <sorear> /notentirelyserious

07:48 <whitequark> if you already have linux running on risc-v in an fpga, it's easier to run qemu on it than to connect a 386 to it

07:48 <whitequark> in no small part because well

07:48 <whitequark> what kind of stuff would you run on that 386? DOS? have you seen how much of a nightmare DOS networking is?

07:48 <eddyb> either way, by "hardware emulator" I meant the chipset/motherboard emulation by gateware/uC... wait, is this one of those "emulator" vs "simulator" things?

07:49 <eddyb> I barely memorized concurrency vs parallelism :P

07:49 wren6991 has joined ##openfpga

07:50 <eddyb> whitequark: ah, yes, then I agree. either by using someone else's SoC or building it completely without the 386 connected to it

07:52 <eddyb> at that point you might be able to skip the FPGA if you can get RAM working without the uC being in the loop for that. but what's the fun in that?

07:53 <whitequark> what

08:01 Jybz has quit [Quit: Konversation terminated!]

08:29 <xobs> Hey all. Does anyone here have any familiarity with some of the features of the ICE40 SB_WARMBOOT?

08:30 <xobs> In particular, I'm looking for a way to pass state between reboots.

08:30 <xobs> I know the Lattice documentation says that you can leave BRAM uninitialized, but I haven't seen /how/ to do that. Except maybe to leave that part out of the bistream.

08:31 <sorear> I'm pretty sure that's exactly what that means

08:32 <sorear> the bitstream is a list of commands, either it contains an "intialize this BRAM with this data" command or it doesn't

08:38 <xobs> sorear: do you have more information? nextpnr says that I'm using 29/30 "ICESTORM_RAM" blocks, but when I look at the bitstream I only see it setting 4 BRAM banks.

08:39 <xobs> And is it possible to only partially initialize a BRAM bank? Or is it an all-or-nothing kind of deal?

08:39 <sorear> i would assume, that the former is all RAMs that are inferred while the latter is just those with an `initial` value

08:40 <sorear> I don't know, but if a HDL memory larger than 256x16 is split among multiple physical BRAMs, you could partially initialize the HDL-level memory

08:42 <xobs> I guess the question then becomes: how can I ensure a BRAM is uninitialized, and fixed to a given ICESTORM_RAM unit, such that multiple bitstreams share the same uninitialized memory?

08:46 <daveshah> You can manually lock a BRAM's position by inferring a SB_RAM40_4K and setting the "BEL" attribute to the name of a BRAM bel (see the list of bels in the GUI)

08:46 <daveshah> nextpnr always initialises all four BRAM quadrants currently

08:47 <daveshah> You'd have to modify it to not initialise BRAM quadrants containing no initialised BRAMs

08:48 <xobs> daveshah: ooh, cool! that's very promising.

08:48 <xobs> and it sure beats the approach I was considering taking, which was to write something into the external SPI flash indicating state.

08:49 <daveshah> If Yosys' inferred BRAM naming is stable enough (I think it is but you'd have to check), you could also set placement constraints using a Python pre-pack script

08:49 Asu has joined ##openfpga

08:49 <daveshah> eg ctx.cells["name].attrs["BEL"] = "BEL name"

08:50 <xobs> I'll have to see how well that works, but it's good to have an approach I can take.

08:50 <xobs> I'll probably need to optimize my design somewhat. I'm still super impressed it routes and meets timing.

08:51 <xobs> "ICESTORM_LC: 5252/ 5280 99%" (for those of you who aren't in #tomu)

08:51 <daveshah> Not bad!

08:52 <daveshah> I think the iCE40 has a high routing to logic ratio compared to many FPGAs

08:52 <xobs> Ah, that would explain it then. That makes me feel much better about it working.

08:54 <sorear> if you have a large random graph of LUTs, the amount of routing resources you need (total wire length) is as the third power of the chip size, because the wires get longer AND you need more of them

08:55 <sorear> real designs are not random graphs (cf Rent's rule) but the scaling laws still give small FPGAs an easier time having enough routing

08:57 <xobs> Along those lines, the critical path slowing my 12 MHz domain down looks suspicious. I see lots of references to "adder". Is there a good way to trace down the inefficiencies?

08:57 <xobs> Unfortuantely, after scala -> python -> yosys -> json -> nextpnr, things get a bit lost in translation. "Sink $abc$53477$auto$blifparse.cc:492:parse_blif$53784_LC.I1" for example.

09:02 <sorear> so, of all the pipeline steps ABC (logic optimizer) is the worst offender in terms of mangling otherwise useful node names

09:04 <sorear> there's been some work recently on alternatives (whitequark's "synth_ice40 -noabc -relut" IIRC) but if you're at 99% ICESTORM_LC *with* ABC you may not be able to test this

09:07 <daveshah> ABC does preserve about 50-70% of net names and src attributes with dress now

09:07 <daveshah> But it doesn't copy these onto the cells at the moment

09:07 <xobs> That ends up at 118%, but might be worth looking at if I pull out some of the logic.

09:07 <whitequark> wow, only 118%?

09:07 <whitequark> that's actually a really good result

09:08 <whitequark> I'd expect more like 150%

09:08 <whitequark> since -noabc doesn't even try to optimize for area

09:10 <daveshah> If you get rid of any UltraPlus IP you could look at the critical path on LP8K

09:10 <daveshah> It should be more or less proportionally slower

09:11 <daveshah> *up5k is proportionally slower than lp8k

09:14 <xobs> daveshah: that's a good idea. the UP5K IP I have is the RAM. Which can be mocked around.

09:14 <daveshah> For the adders, BTW, Yosys tends to discard useful names but preserves the src attribute

09:15 <daveshah> Running 'rename -src' followed by 'write_json' might help (although will only go as far as the Verilog)

09:26 <whitequark> I think the naming of adders in Yosys is a bug that needs to be fixed

09:28 <wren6991> daveshah: Oooh that's really helpful, alumacc is the next worst offender for me after abc, this seems to help

09:30 <wren6991> Oh I just noticed that setup on SB_IO CLOCK_ENABLE is marked as a clk -> async path. That doesn't seem right?

09:32 <wren6991> Although I guess it's tricky as there are two clocks in the IO tile which can sample it

09:32 <whitequark> wren6991: i've implemented rename -src after being frustrated with alumacc, specifically

09:32 <wren6991> whitequark: yay :) it's awesome, thank you

09:33 <wren6991> For SB_IO: it seems like your design could be reported as meeting timing, but actually fail to meet setup on CLOCK_ENABLE? Luckily I have a little bit of slack on it

09:34 <daveshah> It should also count as a setup path for any used clocks

09:35 <daveshah> The async path will probably occur if either input or output clock is unused

09:35 <wren6991> Thank you, and you're right, in this case it's just an output clock

09:36 <wren6991> Although maybe that logic doesn't make sense, because if you have two independent clocks then you can't really drive CLOCK_ENABLE synchronously, whereas if you're only using one clock, it will be sampled synchronous to that clock if it's driven

09:38 <daveshah> Yes, the timing analysis might be better off ignoring disconnected clock ports than treating them as async

09:40 <wren6991> Yes, if you aren't driving a clock, you are most likely using the nonregistered path for that direction (in/out) anyway, so CLOCK_ENABLE wouldn't be pertinent to that path

09:42 <whitequark> if you're not driving a clock and using the registered path it's just a constant 0

09:42 <whitequark> which is safe albeit meaningless

09:44 <wren6991> Ooh did we confirm that those registers are tied to FPGA reset?

09:45 <wren6991> and it's useful for saving a bit of power without spending a LUT to drive the latch enable :)

09:46 <wren6991> (on input paths anyway)

09:54 wren6991 has quit [Quit: Page closed]

10:15 rohitksingh has joined ##openfpga

10:24 ZombieChicken has quit [Remote host closed the connection]

11:02 _whitelogger has joined ##openfpga

11:15 <tnt> I'm thinking having the timing histogram having independent bin size for positive/negative size would be informative. (and also always have a bin boundary at 0). Does that appeal to anyone else ?

11:19 <whitequark> agree

11:27 Laksen has joined ##openfpga

11:52 <whitequark> daveshah: I need some help

11:52 <whitequark> are you familiar with the SAME_EDGE mode of ODDR/ODDR2?

11:57 <daveshah> Touched that stuff about 3 years ago, not that familiar though

11:57 <daveshah> xc7 I guess?

11:59 <whitequark> yes

11:59 <whitequark> well, if you use the same clock for C0 and C1 (inverted) on series 6 and set DDR_ALIGNMENT to C0 then it will be the same behavior

12:20 <whitequark> daveshah: so the question i have is

12:20 <whitequark> can i emulate that behavior on ice40?

12:20 <whitequark> i think i'll need a posedge flop for DDR output and negedge flop for DDR input

12:20 <whitequark> but i'm confused as to how exactly i would instantiate them

12:20 <daveshah> Yes, that is what I would assume

12:23 <daveshah> I think for output a posedge flop on the D_OUT_1 path should do it

12:23 <daveshah> Not so sure about input

12:23 <whitequark> yeah, output is easy

12:24 <whitequark> looking through https://www.xilinx.com/support/documentation/user_guides/ug471_7Series_SelectIO.pdf

12:24 <whitequark> there is SAME_EDGE and SAME_EDGE_PIPELINED

12:24 <tnt> I'm pretty sure all you need to do to get SAME_EDGE is to put a fabric posedge FF in front of D_OUT_1

12:24 <whitequark> i think SAME_EDGE is one posedge flop on D_IN_1 and SAME_EDGE_PIPELINE is a posedge flop on D_IN_0 *and* D_IN_1

12:24 <whitequark> but I'm not sure

12:25 <daveshah> That makes sense

12:27 <tnt> yes +1

12:27 <daveshah> If Icarus accepts the Xilinx sim models, I'd do a side by side sim to be sure

12:35 gnufan_home1 has joined ##openfpga

12:35 gnufan_home has quit [Ping timeout: 244 seconds]

12:46 cr1901_modern has quit [Ping timeout: 246 seconds]

12:49 cr1901_modern has joined ##openfpga

12:50 <whitequark> daveshah: tnt: ack

12:50 <whitequark> I am thinking about providing basic DDR primitives in nmigen

12:50 <whitequark> universally supported on every architecture

12:51 <whitequark> I am thinking the output should be SAME_EDGE and input should be SAME_EDGE_PIPELINE to avoid timing horrors

12:51 <tnt> whitequark: sometime you can't tolerate the latency ...

12:52 <whitequark> hmmm

12:52 <whitequark> of pipeline?

12:52 <daveshah> I really like the idea of generic DDR primitives

12:53 <whitequark> i have half of that code in glasgow in a very ad-hoc way

12:53 <whitequark> and i see people reimplement them constantly

12:53 <daveshah> Yes, I've been there with CSI stufd

12:53 <daveshah> Although that had deserialisation needs too

12:54 cr1901_modern1 has joined ##openfpga

12:55 cr1901_modern has quit [Ping timeout: 246 seconds]

12:56 <whitequark> daveshah: do you think SAME_EDGE_PIPELINE is useful enough or would i need to provide an option to use SAME_EDGE too?

12:56 <whitequark> imo if you want your code to be generic and portable you really have to accomodate 1 cycle of pipelining

12:56 <whitequark> anything less and you're free to use the primitives yourself, because it will probably more than just that

12:56 <whitequark> but i may be wrong

12:59 <eddyb> whitequark: speaking of which, would you recommend nmigen, or bare RTLIL (ILANG?), as the target for some busywork-reducing DSL (I'd rather not touch Verilog)? nmigen has the obvious advantage of being able to implement the DSL in Python and construct objects instead of emitting text, and I'd probably want to avoid reimplementing some of its features too

12:59 <eddyb> but I'm less experienced/comfortable with Python atm

12:59 <whitequark> RTLIL is the name

13:00 <whitequark> i have no idea what your objective is

13:00 <daveshah> whitequark: yeah, I'd go for SAME_EDGE_PIPELINE

13:00 <eddyb> fair

13:01 <whitequark> daveshah: ack

13:02 <whitequark> the general nmigen design does not just do DDR, it allows arbitrary gearing

13:02 <whitequark> so the ecp5 4:1 primitives are fine, too, just provide ECLK

13:02 <whitequark> on xc7 this should probably actually use xSERDES, not xDDR

13:02 <daveshah> Very nice

13:03 <whitequark> ^_^

13:03 <daveshah> The only remaining thing I see are input/output delays

13:03 <whitequark> yes.

13:03 <whitequark> I was thinking about those.

13:03 <whitequark> I feel like I will inevitably need some form of delays

13:05 <whitequark> any ideas on how to expose them best?

13:06 <daveshah> I think a reasonably standard option would be a fixed delay in ps or variable delay with inc/reset inputs

13:06 <whitequark> hmm

13:06 <daveshah> ECP5 doesn't officially give the mapping from delay value to picosecond

13:06 <whitequark> i thought ecp5 only exposes edge-aligned/center-aligned?

13:06 <tnt> DDR is non-ambiguous but for higher ratio you need someway to sync.

13:07 <daveshah> You can work it out looking at SDF files

13:07 <daveshah> ECP5 also has a manual delay option

13:07 <whitequark> oh interesting

13:07 <daveshah> FYI, ECP5 IDDRX1F is SAME_EDGE_PIPELINED only

13:07 <daveshah> Looking at the vendor model

13:08 <whitequark> ooh I see

13:08 <daveshah> DEL_MODE set to USER_DEFINED and provide a DEL_VALUE between 0 and 127

13:08 <daveshah> It also has load/direction/increment inputs

13:09 <whitequark> wait, how does it even do edge-aligned/center-aligned?

13:10 <whitequark> does it just use the clock constraint to work out picoseconds or something?

13:11 <daveshah> It's not even based on clock constraint from my experience

13:11 <daveshah> Just a fixed value

13:11 <whitequark> what.

13:11 <whitequark> how does that even work??

13:11 <daveshah> See https://github.com/YosysHQ/nextpnr/blob/master/ecp5/pack.cc#L1658

13:12 <daveshah> It's to compensate for internal clock network delays

13:13 <whitequark> oh wow

13:14 <daveshah> Yeah, I expected it to be based on clock constraint or at least speed grade too

13:17 <whitequark> hmm, so there would be additional fields in the primitive for delays then, right?

13:17 <whitequark> right now there's i0,i1,...iN for gearing 1:N

13:18 <whitequark> and o0,o1...oN as well as oe

13:18 <whitequark> all depending on pin configuration

13:20 <daveshah> So I'd have a fixed delay value parameter, eg in ps, and also increment and reset/load inputs for variable delay

13:20 <whitequark> and then there'd be something like .delay.rst, .delay.stb, .delay.dir?

13:21 <daveshah> Xilinx doesn't have dir

13:21 <whitequark> oh?

13:21 <whitequark> s7 has...

13:22 <whitequark> and xc6 has too

13:22 <whitequark> incdec and inc, respectively

13:22 <whitequark> unless i'm missing something

13:24 <eddyb> whitequark: so, my objective is more or less "make some state machines easier to write". I guess it would make more sense if I would take some example, involving a simple memory bus with arbitrary wait times, maybe stick "add together two vectors, one element at a time, over that bus" to it (to avoid having a full CPU), and implemented it in a few different HDLs

13:24 <daveshah> whitequark: not seeing INCDEC?

13:24 <daveshah> https://github.com/YosysHQ/yosys/blob/master/techlibs/xilinx/cells_xtra.v#L2131

13:24 <daveshah> This is extracted from the Xilinx libs so should be accurate

13:25 <daveshah> You can dynamically load an arbitrary value too, which could emulate direction, but not sure if this has glitch issues compared to incrementing

13:25 <whitequark> that has INC

13:25 <daveshah> Right but no direction control

13:25 <whitequark> i think INC is direction, CE is stobe, no?

13:26 <whitequark> according to the doc i see

13:26 <whitequark> As long as CE remains High, IDELAY will increment or decrement by TIDELAYRESOLUTION

13:26 <whitequark> every clock (C) cycle. The state of INC determines whether IDELAY will increment or

13:26 <daveshah> Oh, I see

13:26 <whitequark> decrement; INC = 1 increments, INC = 0 decrements, synchronously to the clock (C). If CE

13:26 <whitequark> is Low the delay through IDELAY will not change regardless of the state of INC.

13:26 <daveshah> That makes sense

13:26 <eddyb> (and then come up with my own DSL-based version, that I would prefer to write. obviously it would probably suck for many things, but if it can do one thing at all I'd be glad. it could likely be just some Python classes/functions on top of nmigen, tbh, I should probably try that out first before going full DSL)

13:26 <whitequark> eddyb: do you mean like you want to simplify writing parsers or inverse parsers in gateware?

13:27 <eddyb> whitequark: I did see your tweet about the parser thing :P

13:27 <eddyb> this *might* be inspired by that

13:27 <whitequark> so what do you actually want to do

13:31 <eddyb> make it easier for myself to experiment with gateware, tbh. I find some/most of the existing solutions quite tedious and error-prone, but that is probably 99% inexperience talking

13:34 <whitequark> have you actually used nmigen

13:36 <eddyb> I've only read, not written, sorry :(

13:38 <eddyb> ugh, nevermind the DSL thing, it made more sense in my head before I asked it, I just need to make something small enough so I can focus on it, but non-trivial enough to better articulate the abstraction gaps I want to fill in

13:45 <tnt> :/ ... I've been searching for 30 min wtf my design stopped working ... turns out I accidently switched minicom to Odd parity.

14:04 rohitksingh has quit [Ping timeout: 246 seconds]

14:10 <whitequark> daveshah: is there any difference between DELAYF and DELAYG?

14:11 <whitequark> they look basically the same other than the pins

14:12 X-Scale has quit [Ping timeout: 246 seconds]

14:15 <daveshah> yeah the only difference is one has dynamic control inputs (DELAYF I think) and the other doesn't

14:16 cr1901_modern1 has quit [Quit: Leaving.]

14:16 cr1901_modern has joined ##openfpga

14:18 <whitequark> i mean, you can use any combination of DELAYF and DELAYG anywhere in the design right?

14:18 <whitequark> or are there some restrictions?

14:25 X-Scale has joined ##openfpga

14:25 emeb has joined ##openfpga

14:26 rohitksingh has joined ##openfpga

14:34 <daveshah> whitequark: yeah, there is one DELAY block per IO with separate control set from all other DELAYs

14:34 <daveshah> The only limitation is you can't have both an input and an output delay on the same pin

14:35 <whitequark> but this applies to both DELAYF and DELAYG, right?

14:35 <daveshah> Both types can be mixed freely and there's no control block, unlike Xilinx

14:35 <daveshah> Yes

14:35 <whitequark> ack

14:35 <whitequark> xilinx has a control block?

14:39 <tnt> IDELAYCTRL

14:45 <whitequark> how does it work? i can't quite grasp it

14:46 <tnt> you just feed it a 200 MHz clock that some internal logic will use to calibrate the delay taps.

14:46 <G33KatWork> instantiate one per IO bank, supply refclk, use ODELAY and IDELAY on that bank. The delay taps are tied to the reference clock of course and there are some restrictions what reference clocks you can supply based on the speed grade. at least that was the case for me when I used it on a zynq

14:47 <tnt> you need a proper rst to it when the clock is stable and it will assert rdy when it's all ready to process the IO delay commands.

14:47 <whitequark> how do you associate it to IO bank?

14:47 <tnt> location constraint in the UCF for instance

14:48 <whitequark> so you have to propagate its name to UCF? gross

14:49 <tnt> well I think you can also use attributes on the instance.

14:49 <tnt> (* LOC="..." *) IIRC for Xilinx.

14:50 <whitequark> ah

14:50 <whitequark> hmm

14:51 <tnt> you might need a big table with io->pin to IODELAYCTRL location for every xilinx device though ...

14:51 <whitequark> that's kinda really annoying

14:51 <whitequark> but i guess i can always punt to the user

14:51 <whitequark> the clock needs to be specified manually anyway...

14:52 <whitequark> does IDELAYCTRL clock need to relate to IO clock?

14:56 <G33KatWork> seems not to be the case. At least I can't find anything in the documentation

14:56 <G33KatWork> https://www.xilinx.com/support/documentation/user_guides/ug471_7Series_SelectIO.pdf

14:57 <whitequark> wait, the clock freq is fixed?

14:57 <whitequark> so why can't i justisntantiate all of them?

14:57 <whitequark> power?

14:58 <tnt> you can. I'm pretty sure I've done it in the past.

15:29 rohitksingh has quit [Ping timeout: 244 seconds]

15:32 <emeb> tnt: hilarious how much up5k can be overclocked - mangling your riscv-usb project to learn and got PLL feedback divider wrong for my board. Was running it @ 1.5x faster than timing expected and it still worked fine.

15:33 <tnt> emeb: yeah, something I still have to measure is how much the timing estimate from nextpnr match the real max freq. Not sure how to do that meaningfully though.

15:34 <whitequark> does nextpnr do pvt corners?

15:34 <emeb> tnt: was it you who posted analysis of speed vs Vcore recently?

15:34 <emeb> showed nice linear relation IIRC.

15:34 <tnt> yes

15:35 <tnt> but that doesn't really tell anything wrt to nextpnr timing model vs reality.

15:35 <daveshah> whitequark: it doesn't support adjustment of pvt

15:35 <daveshah> It's data structures do support min and max delays

15:35 <daveshah> These will be used once we have hold time analysis done

15:36 <daveshah> It would be easy enough to add voltage/temp options

15:36 <daveshah> Speed grades for ECP5 are effectively a case of this and are supported already

15:36 <sorear> in principle, you can use the nextpnr timing model to calculate a frequency for a ring oscillator and then compare that to reality

15:36 <daveshah> (I guess the 5G ECP5 variant is actually a different Vcore too)

15:37 <sorear> but neither nextpnr nor icetime supports that, so you'd need to find someone who understands the timing model well enough to calculate by hand (not me)

15:39 <whitequark> daveshah: so i'm thinking, the timing fuzzer (?) y'all have built

15:39 <tnt> sorear: yeah, I "tried" sort of. By breaking the ring oscillator and putting a register in the middle and use the path len as a guide for the delay.

15:39 <whitequark> can it be used for qualiying the real device too?

15:39 <daveshah> The iCE40 one not, that's basically just an sdf parser

15:39 <daveshah> The ecp5 one could be

15:40 <sorear> I'd feel comfortable doing that if you had a few hundred stage ring oscillator

15:40 <sorear> if it's just 3 or 5 stages, the register will add a lot of delay

15:40 <daveshah> It assigns different pip types classes, then builds a system of linear equations between pip class delays and vendor timing analysis value for that net

15:40 <daveshah> Knowing the routed path of the net

15:40 <whitequark> yeah that one

15:40 <daveshah> Yeah, that would work

15:41 <daveshah> You'd need a way of measuring the delays

15:43 <daveshah> I know someone who spent many years on this problem

15:43 <daveshah> http://cas.ee.ic.ac.uk/people/gac1/pubs/JoshFPL13.pdf

15:44 <daveshah> (ignore the cursed name of the technique)

15:47 <tnt> sorear: the timing report details the estimated time for each segment, so I'm just ignoring the reg setup time and clock-to-out estimate.

15:48 <sorear> this is probably obvious but one-way delays don't matter (except for I/O), only delays summed over a loop that returns to the same point on the chip…

15:59 gsi__ is now known as gsi_

16:13 Jybz has joined ##openfpga

16:37 <tnt> Ok, so according to nextpnr, the path delay should be 45 ns. That should be the half period of the ring oscillator, giving a 90 ns period and a 11.1 MHz frequency. The real frequency I get is 17.44 MHz.

16:44 <tnt> Can't see a difference using I0/I1/I2/I3.

16:45 <tnt> (it's probably swamped in all the other delays)

17:42 rohitksingh has joined ##openfpga

18:26 rohitksingh has quit [Ping timeout: 246 seconds]

18:51 <emeb> hmm... for some reason my SPI flash erase/write stuff has stopped working.

18:51 <emeb> read of existing data still works fine, but it acts like protection is preventing erase & write.

18:51 <emeb> yet status register reads 0x00.

19:00 <tnt> emeb: strange it would enable by itself ...

19:02 <emeb> tnt: what do you mean "enable by itself" ?

19:03 <emeb> I preface all my erase & write operations with a write enable cmd.

19:04 <emeb> and check the status after completing those commands and see that status reads 0x03 for a bit until those operations finish.

19:05 <emeb> but prior to issuing WEN + erase or WEN + page write I see no protection

19:05 <tnt> oh, nm, I misread, I thouth that you had checked a protection register and it was reading as protected.

19:06 <emeb> tnt: I also issue a cmd 0x98 global unlock but that doesn't seem to do anything.

19:06 <whitequark> emeb: what does the status register look like after a failed write or erase?

19:06 <emeb> whitequark: it reads 0x00

19:07 <emeb> well, it reads 0x03 for a bit, then 0x00

19:07 <emeb> so it acts like it's doing something, but I don't see a change in the memory contents

19:07 <whitequark> that doesn't look like protection then

19:07 <whitequark> yeah

19:07 <whitequark> what flash?

19:07 <emeb> W25Q32JV

19:07 <emeb> (same winbond that lots of folk use)

19:08 <whitequark> that's super interesting

19:08 <whitequark> any LA trace?

19:09 <emeb> whitequark: not ATM - I'm just banging on it w/ firmware in the FPGA right now. Need to hook up the scope and see what the SPI bus is doing next.

19:09 <whitequark> can you read S8 and S9 registers?

19:10 <emeb> sure - need to write some code for that. just a sec...

19:13 <emeb> whitequark: a bit confused about what S8/S9 are - don't see those mentioned in the datasheet.

19:14 <whitequark> emeb: 35h/15h

19:17 <emeb> kk

19:23 <emeb> ah - got it. those are bits in status reg 2

19:24 <emeb> s8 (SRL) is 0

19:24 <emeb> s9 (QE) is 1 - that's odd.

19:24 <emeb> didn't turn on QSPI mode.

19:28 <emeb> and the warning about QE being enabled with the /wp and /hold pins are strapped to the rails....

19:40 GuzTech has joined ##openfpga

19:57 Jybz has quit [Read error: Connection reset by peer]

19:58 Jybz has joined ##openfpga

20:08 ZombieChicken has joined ##openfpga

20:17 GuzTech_ has joined ##openfpga

20:17 <emeb> weird - can't seem to clear that QE bit either. always comes up set even after a WEN and write status reg 2 + 00

20:18 <emeb> like the chip is permanently write protected.

20:20 GuzTech has quit [Ping timeout: 246 seconds]

20:20 <daveshah> Have you checked Vcc is good?

20:23 <emeb> Yeah - solid 3.3V

20:25 GuzTech_ has quit [Remote host closed the connection]

20:26 GuzTech has joined ##openfpga

20:35 emeb has quit [Quit: Leaving.]

20:37 emeb_mac has joined ##openfpga

20:52 emeb_mac has quit [Ping timeout: 246 seconds]

21:28 Asu has quit [Read error: Connection reset by peer]

21:31 Asu has joined ##openfpga

21:35 Jybz has quit [Quit: Konversation terminated!]

22:05 GuzTech_ has joined ##openfpga

22:06 GuzTech_ has quit [Remote host closed the connection]

22:08 GuzTech has quit [Ping timeout: 240 seconds]

23:03 Asu has quit [Remote host closed the connection]