#nmigen on 2021-03-23 — irc logs at freenode.irclog.whitequark.org

00:08 <_whitenotifier-4> [YoWASP/nextpnr] whitequark pushed 1 commit to develop [+0/-0/±1] https://git.io/Jmxyl

00:08 <_whitenotifier-4> [YoWASP/nextpnr] whitequark f076e35 - Update dependencies.

00:45 <_whitenotifier-4> [YoWASP/yosys] whitequark pushed 1 commit to develop [+0/-0/±1] https://git.io/JmxQF

00:45 <_whitenotifier-4> [YoWASP/yosys] whitequark acf4de9 - Update dependencies.

03:22 <d1b2> <4o> oh. ok. i thought python with creates local namespace

03:22 <d1b2> <4o> thanks

16:32 <d1b2> <4o> https://github.com/personal-army-of-4o/nyanMigen/issues/95 how should i interpret the error message?

16:33 <d1b2> <4o> why do i see errors only when i ask for help...:)

16:35 <d1b2> <4o> yeah. cause i create an array of types, not the array of instances

16:38 <slan> Hmm... still trying to figure out how to make my interconnect work. I think I've hit a wall.

16:38 <slan> Context: a cpu, 2 read ports. Goal: fetching and reading memory should happen in 1 cycle if read ports are different, 2 cycles otherwise.

16:38 <slan> My problem: simulation works fine, synthesis is reporting 10 combinatorial loops.

16:38 <slan> At this point, I'm not sure if 1. this is doable 2. this is doable with nmigen 3. there is a bug in the nmigen -> yosys -> vivado chain 4. my implementation is broken

16:39 <slan> Since I'm a beginner, I'm focusing on 4 but after rewriting and simplifying my design many times I'm getting similar results and I'm running out of ideas on how to investigate further.

16:39 <slan> I would be extremely gratefull if somebody could point me in the right direction. The (standalone) code is there: https://github.com/slan/hartysoc/blob/repro/repro.py

16:43 <d1b2> <4o> which section of it defines latency?

16:47 <slan> not sure what you mean... the read ports are combinatorial, cycles are triggered by update pc (sync)

16:49 <d1b2> <4o> what "2 read ports" refers to?

16:50 <slan> rp0 and rp1, the 2 read ports form mem0 and mem1

16:50 <d1b2> <4o> cycle == clock cycle?

16:50 <slan> yes

16:52 <slan> simulation is working as expected: https://imgur.com/a/04d3A6K

16:53 <d1b2> <4o> line 81+ should implement this, right?

16:53 <d1b2> <4o> this = different read latency

16:55 <slan> what do you mean?

16:56 <slan> yes, lines 81+ are where the sync logic is

16:56 <slan> it's actually not about latency, more about port collision between fetching the instaruction and executing it (reading from memory)

16:57 <d1b2> <4o> https://github.com/slan/hartysoc/blob/61a30e039b6b6a228beaf8b05675f2a5b87c74a0/repro.py#L82-L88 looks like a latch. i read it as "if condition is true, assign values to signals, otherwise keep current values". this may produce latches cause you do it combinatorically

16:58 <agg> I don't think you can generate latches in nmigen without instantiating one explicitly/using verilog

16:58 <d1b2> <4o> challenge accepted

16:58 <agg> L82-88 will make the comb assignment if the condition is true, otherwise those signals will be zero/any previous comb assignment

16:58 <slan> Hmm, I don't think I want/need a latch there...

16:58 <agg> it's not really a challenge, it's a design of nmigen

16:59 <d1b2> <4o> "previous comb assignment" is what looks like a latch

16:59 <agg> I mean previous in the code, not in time

16:59 <slan> Actually in case of a collision I will have dport.addr set by L61, so I don't want to override it

16:59 <agg> it won't latch, dport.addr and ddata and dack will all be 0 unless that condition is true

17:00 <slan> that is fine, this is the case where next cycle will use the cached instruction (and the port will be free)

17:00 <d1b2> <4o> i don't see any other assignments to dport.addr

17:00 <slan> in case dport_id is the same as iport_id, L69 will set it

17:01 <slan> (because dport will be the same as iport)

17:01 <d1b2> <4o> well it's great if nmigen does zero out signals in implied else branch

17:05 <agg> slan: oh right I see the dport/iport thing

17:06 <d1b2> <4o> true. no latch https://paste.debian.net/1190679/

17:21 <agg> slan: your design does have a combinatorial loop, and that's "not allowed"

17:21 <agg> e.g. memory output -> iport data -> insn -> daddr -> iport addr -> memory input

17:22 <agg> assuming I'm understanding your design correctly, anyway

17:22 <agg> something like this https://imgur.com/a/QMriTvO

17:22 <slan> daddr -> iport addr should be gated by the "collision" detection

17:31 <agg> yes, I see

17:31 <agg> you sure are making your life difficult :p

17:32 <agg> the muxes i drew on the memory address inputs are wrong too, actually

17:43 <slan> The reasoning behind this "exercise" is that I would like my CPU to work with distributed RAM (as L1 cache) backed by BRAM/SDRAM

17:44 <slan> So I'm aiming at 1 CPI if possible

17:46 <slan> And the reason I'm thinking of a bug in the nmigen -> yosys -> vivado chain is that if I change L57 with a constant value, everything works as expected

17:46 <slan> but I'm losing the ability to run code from different memories

17:47 <agg> I still expect there's a combinatorial feedback loop in here, and they're explicitly UB in nmigen

17:48 <agg> but maybe not, in which case maybe there's a bug creeping in somewhere instead

17:48 <agg> e.g. https://nmigen.info/nmigen/latest/lang.html#combinatorial-evaluation

17:48 <agg> (the warning box in particular)

17:49 <slan> Yes, that's the reason I'm also suspecting my implementation is broken: I think my idea should not create combinatorial loops. Maybe I failed to express my intention to nmigen

17:50 <slan> but I've tried to approach the problem from as many angles as I could think of

17:50 <slan> without much success...

18:03 <agg> slan: it might help clarify the design if you're more explicit about what drives each read port's addr signal

18:03 <agg> which rp sources each data signal is very clear, but it's much less obvious what the control signal to the mux on the memory address is

18:04 <agg> just a vague thought though

18:05 <slan> Maybe I'll try. I'm not used to thinking in terms of mux/circuit since I have no formal education in this. Just played a bit with logisim... that could help calarify

18:06 <slan> good idea, thanks

18:06 <slan> (agree that the mux driving memory access is tricky)

19:05 <agg> slan: hm, thinking about the mem addr muxes, I think you have a problem where insn (and thus daddr) is fed combinatorially from the two read port datas, and both read port addresses combinatorially depend on daddr

19:05 <agg> even though it should be a _stable_ feedback loop because eventually your conditions are exclusive, it's still a combinatorial loop, I think

19:07 <agg> rp0.addr depends on daddr which depends on insn which depends on rp0.data which depends on rp0.addr, basically, even though once it settles down it will find a stable value

19:11 <slan> How can insn be fed by the 2 read ports? Isn't it either from a readport or from the cache?

19:11 <agg> it's from iport which might be rp0 or rp1 combinatorially

19:12 <agg> only depending on pc and cache_addr, but the rp0/rp1.addr signal depends on both pc and daddr to select whether it's iport or dport

19:16 <slan> I don't see it... let's take my example of booting from mem1, reading from rp0 gives: pc -> rp1.addr -> insn -> rp0.addr

19:17 <slan> reading from rp1 gives: pc -> rp1.addr -> insn -> cache / next cycle cache -> insn -> rp1.addr

19:18 <slan> assuming cache_addr cache_insn are registers there should be no dependency, right?

19:19 <agg> if you look at it the other way, rp0.addr is fed from a combinatorial circuit with inputs pc, cache_addr, daddr, dreq

19:19 <agg> so is rp1

19:20 <agg> and daddr is a wire that is generated from a combinatorial circuit fed by cache_insn and rp0.data and rp1.data

19:20 <agg> so the logic generating the rp0.addr signal has the rp0.data signal as an input

19:22 <slan> I think I see. Although I agree it should be stable.

19:22 <agg> unfortunately it being stable doesn't matter

19:22 <agg> in terms of nmigen, anyway

19:24 <slan> Hmm... do you think I can turn that stable loop into something nmigne would be happy with?

19:25 <slan> Drawing more (not fully there yet) makes me feel I could have a priority encoder in fromt of the ports to decide if fetch or mem should have access

19:26 <slan> In practice turning the problem upside down and starting (as you hinted) with the "mux" to mem

19:34 <slan> AH but no... I need the fetch in any case... maybe I'll finish my drawing and look into instancing a lath (as per nmigen doc)

19:34 <slan> ^latch

19:37 <slan> I could also have the interconnect as a module with a 2x clock (and make all mem instructions execute in 2 cycles). I'm wondering, in general, how people go about harvard -> modified harvard architecture (which, I believe, is what I'm aiming for)

19:59 <agg> slan: yea, it's tricky, I guess most "serious" designs are pipelined so it stops being a problem

20:00 <agg> if you can make the memory sync that certainly solves it

20:03 <slan> How come pipeline solves it? You will have one fetch and 1 mem read per cycle anyways, no?

20:08 <agg> i assume the pipeline means the memory becomes synchronous, so there's no longer a combinatorial feedback path through the memory at all

20:08 <agg> and/or the address input comes from a register from the previous pipeline stage, again breaking the loop

20:08 <agg> you still get 1 cpi, but with a latency equal to the number of stages