whitequark[m] has quit [Ping timeout: 268 seconds]
whitequark[m] has joined #nmigen
emily has joined #nmigen
ZirconiumX is now known as Lofty
chipmuenk has quit [Quit: chipmuenk]
lkcl has quit [Ping timeout: 240 seconds]
lkcl has joined #nmigen
electronic_eel has quit [Remote host closed the connection]
electronic_eel has joined #nmigen
jeanthom has joined #nmigen
cr1901_modern1 has joined #nmigen
cr1901_modern1 has quit [Read error: No route to host]
cr1901_modern has quit [Ping timeout: 265 seconds]
emeb_mac has joined #nmigen
cr1901_modern has joined #nmigen
<slan>
agg/modwizcode: I spent more time drawing my comb interconnect idea and got a working model in logisim (evolution). Diagram is there if you're interested https://github.com/slan/hartysoc/blob/toysoc/docs/toysoc.svg and the circuit file alongside the drawings. Many thanks for your feedback/ideas. I belive I made some progress and understand better now
<slan>
Next step is to make it work in nmigen, and synthesize!
<modwizcode>
yay
<modwizcode>
nice diagram
<slan>
ty
<modwizcode>
I'm still wondering if my interconnect has problems or it was the thing you connected it too, I got distracted before I could test in vivado myself.
<modwizcode>
one thing I realized with your other interconnects is that they don't actually take advantage of your dual ported memory. In fact the dual ported ram needs special treatment because if you treat it as two devices you might violate the typical invariant of not accessing the same address simultaneously (the interconnect solves the same issue essentially), but if you treat it as one device you don't get
<modwizcode>
the benefits of two ports.
<modwizcode>
whoops that was longer than I thought
<slan>
Your interconnect had exactly same problem, but I based my further investigation on your much more readable code
<slan>
re: dual port mem, it's something I actually want to get rid of (my CPU works fine with it). I'm thinking "beyond" L1/L2 cache where I'd have a bunch of devices (including SDRAM) where I have one read port (and one write port) in practice
revolve has quit [Read error: Connection reset by peer]
<slan>
My idea is L1 with separate adress spaces for instruction and data (distributed RAM), L2 with unified address space (distributed or block RAM) and beyond, well latency-dependant
revolve has joined #nmigen
cr1901_modern1 has joined #nmigen
cr1901_modern has quit [Ping timeout: 264 seconds]
bvernoux has quit [Quit: Leaving]
chipmuenk has joined #nmigen
chipmuenk has quit [Quit: chipmuenk]
cr1901_modern1 has quit [Ping timeout: 260 seconds]
<modwizcode>
Yeah I think the common thing is an L1 cache to do the data, instruction fetches independently (with the cache giving you stalls if it needs to fill) and an L2 cache on the interconnect to the main bus
jeanthom has quit [Ping timeout: 256 seconds]
<slan>
Cool
<modwizcode>
anyway I think I missed it, what was the specific issue? It sounded like it was the iaddr being driven from the instruction which is driven from the iaddr? But that didn't seem right
<agg>
modwizcode: the rp0 addr is driven from a combinatorial circuit that's driven by rp0 data
<agg>
even though ultimately the circuit would never settle on driving whichever was the iport addr with the iport data, there's still a comb feedback path, aiui
<slan>
The thing is there shouldn't be
<slan>
(I believe)
<agg>
what do you mean?
<agg>
my (maybe wrong) understanding is you have a combinatorial loop any time there's a direct combinatorial connection from output back to input, without any registers in-between
<slan>
There shouldn't be a loop... there's a register when collision is detected
<agg>
i.e. the output signal only goes through routing and LUTs before coming back to the input of that same LUT
<agg>
but there's also a combinatorial path that bypasses the register, right?
<slan>
when ibus_id and dbus_id are different, yes
<slan>
but that doesn't bcreate a loop
<slan>
(it's more like, a spiral ;)
<slan>
I mean: pc -> rom1 -> insn -> rom0 -> ddata (spiral)
<slan>
loop should be broken by: pc -> rom1 -> insn -> cache
<slan>
next cycle: cache -> rom1 -> ddata
<agg>
the loop is part of the hardware design, though, irrespective of what the current state is, it can't be broken dynamically
<agg>
i want someone more confident in what counts as a 'combinatorial loop' to step in and explain why i'm wrong, i'm really just going off my understanding of what counts here
<agg>
I think the problem is you can follow a path through only luts and routing that runs from the mem output right back to the mem addr and thus output, even if some of the inputs to that path come from registers it doesn't matter
<agg>
at no point is the path actually broken by a register, it never has to go 'through' a register
<whitequark[m]>
yep
<agg>
it's just some of the luts along the way have registers feeding their inputs
<agg>
(since your memory is static anyway, you can think of the data output as a strict and static function of the address, too)
<slan>
I understand the combinatorial aspect of the memory, just can't figure out what the loop is
<slan>
(at least in my design... I see the LUT loop in Vivado but don't understand where it's synthesized from)
<slan>
I thought about the "static" aspect of memory and tried to add a write port but it didn't help
<agg>
adding a write port doesn't really make a difference: you can think of the memory as a bunch of registers (that the write port updates) feeding a mux which is selected by the read address pins and which outputs the read data
DaKnig has quit [Ping timeout: 265 seconds]
<slan>
thanks, I see a red loop now :) Yes, I thought that for ROM synthesis could generated only LUTs
<slan>
Thuns the write port to "force" regs
<agg>
there's a second loop too, added it in green to that page
<slan>
Actually the red loop is broken by the top MUX feeding insn from either the cache or the ROM
<agg>
(picked the second memory arbitrarily, there's a loop through each)
<agg>
it doesn't matter, the mux is still a combinatorial circuit
<slan>
Oh
<slan>
There is a loop, but it's stable. Right. Got it
<slan>
And it's "broken" next cycle to give the result I'm expecting
<agg>
i would usually think of 'breaking' a combinatorial path to mean inserting a register
<agg>
rather than anything to do with its runtime state
<slan>
Hmm, I guess I have to conclude that it's not possible to get CPI=1 when mem access is needed, even if accessing different ROMs
* slan
is a bit sad
<slan>
Note: the current implementation in my repo (FSM-based) is working fine and I get 1 CPI if accessing different ROMs. But the FETCH is forced to a specific memory (1 in my case)
<agg>
if you had a separate read port and data port you'd be fine
<agg>
even if both were a small cache that read the same larger synchronous memory or whatever
<slan>
agree
<agg>
more typically you'd pipeline it, in which case you're also fine and can still get 1cpi
<slan>
but I would have to solve the "unified address space" at some point, right
<agg>
sure but that's easier to solve with synchronous memories
<slan>
Not sure how pipelining helps, I'll have 1 fetch and 1 MEM access at the same time anyways, no?
<slan>
I see, keep it comb with != address spaces for cache, and after that since I have t deal with latency anyways it easy to serialize
<agg>
but they can both go to a synchronous memory that returns its result one cycle later
<agg>
(with pipelining)
<slan>
ah yes indeed
<slan>
Well then, here goes my next adventure :)
<slan>
Really appreciate the help and discussion. Thank you for your patience, I've learned a lot... and itäs only the beginning!