<cr1901_modern>
Thinking about this again: https://irclog.whitequark.org/m-labs/2016-10-10#1476058862-1476065341; I may still buy that RISC-V micro dev board. Quite frankly, I don't want to use closed-arch ARM for micro projects anymore. But, I must admit, 32 bit micros are the easiest time I've had w/ embedded projects.
<rqou>
AVR is quite sane for an 8-bit micro
<rqou>
but yes, 32-bit micros tend to be much easier
fengling has joined #m-labs
fengling has quit [Ping timeout: 268 seconds]
kuldeep has joined #m-labs
sandeepkr has quit [Quit: Leaving]
<sb0>
whitequark, bidirectional drtio appears to work (with the si5324 clock recovery and switching)
<sb0>
i can read out the remote fifo levels
<rjo>
sb0: your attr changes in migen broke the simulation. before MultiRegs worked fine. now they fail because of missing attr_translate entries.
<rjo>
sb0: also, FullMemoryWE() does not play nice with simulation because it leaves areound _MemoryPort or its transform_fragment() is called either too early or too late w.r.t. MemoryToArray().
fengling has joined #m-labs
<sb0>
what simulation? I've just run many drtio simulations that use MultiReg intensively without issues
<sb0>
you mean verilog exports?
fengling has quit [Ping timeout: 268 seconds]
<sb0>
whitequark, so we are going to have one rtio core and several drtio cores sharing the same kernel CPU interface
<sb0>
should we connect them traditionally, i.e. one base address for each core, or multiplex in gateware based on the channel number?
<sb0>
#1 requires nice and fast software support
<rjo>
sb0: yes. attr broke verilog convert
<rjo>
sb0: please don't design it like that. drtio should be transparent.
<rjo>
sb0: mux by channel number is "the drtio switch".
<rjo>
the root switch should also be just a switch. i don't see why it needs to be special.
<sb0>
design it like what?
<sb0>
multiple base addresses?
<sb0>
I wonder how important switches actually are. a well-designed switch will still add 100-200ns of latency, and no one seems to be interested in them
<GitHub26>
[migen] sbourdeauducq pushed 2 new commits to master: https://git.io/vXn1H
<GitHub26>
migen/master 749704b Sebastien Bourdeauducq: genlib/cdc: use no_retiming in BusSynchronizer
<GitHub26>
migen/master 6f0be15 Sebastien Bourdeauducq: fhdl/verilog: dummy translation of signal attributes by default
<sb0>
rjo, so the default verilog.convert will now spit out unrecognized attributes as is instead of erroring
<rjo>
sb0: to me the switches are extremely important. they are the ones supporting the bulk of the enumeration of the tree. every metlino needs one (root or sattellite). every sayma needs one if we want to use the sfp or sata ports there.
<rjo>
sb0: they add one transciever round trip of latency.
<rjo>
sb0: why do you think 100-200ns latency are relevant?
<sb0>
rjo, see that email about JESD204 for SC qubits
<sb0>
plus twice the full latency slows down the CPU when a FIFO level request is made
<rjo>
sb0: that's not relevant here.
<sb0>
enumeration of the tree - is there a tree?
<rjo>
sb0: 200ns << 1µs
<sb0>
it all adds up
<rjo>
sb0: why is there not a tree? you have the root metlino, its local channels, then multiple satellite metlinos connected, they have local channels and multiple saymas connected. they have local channels plus various other things connected to the saymas over drtio.
<sb0>
yes, but who's going to have multiple metlinos like this?
<rjo>
sb0: yes. but that latency is given. you can't beat it.
<sb0>
it's certainly a cute idea, but is it practical?
<rjo>
sb0: sure. otherwise nobody would have wanted the sfp and sata links on sayma.
<rjo>
sb0: it's not like writing that switch would be terribly hard. it's pretty easy.
<rjo>
also for distributed dma i see the switch as crucial.
fengling has quit [Ping timeout: 268 seconds]
<sb0>
what does distributed dma have to do with switches?
<rjo>
it seems natural to link the two. i would support ddma at every switch.
<rjo>
sb0: what about that FullMemoryWE() issue?
<sb0>
how do you propose linking them?
<sb0>
ddma is a upper layer protocol I'd think
<sb0>
which doesn't know about switches
<rjo>
it needs to know about the hierarchy. a remote dma engine can only inject/receive to channels further down.
<rjo>
it also nicely ties in with the enumeration. since the channels are only at fixed offset from each switch, the channel numbers in the dma data can be generated on the fly.
<rjo>
basically, what i would do is: you enter a dma context where every switch intercepts the events and stores them unless he knows that he can pass them to a dma capable switch further down.
<sb0>
oh you actually want to send from the DRAM of one card to the PHYs of another
<sb0>
why not just support local DRAM?
<rjo>
you want to require that DMA is either a) at the root metlino or b) local to the (remote) channel?
<sb0>
yes
<rjo>
i would make it independent of where it is. seems like a no-brainer to me. the arbitration needs to happen anyway. i would just trickle the dma data down the fabric until it hits the last dma capable switch.
<sb0>
considering the price of DRAM, is there any real-world advantage to supporting those configurations?
<rjo>
do we want to require dram on kasli as well?
<sb0>
I'm still not convinced about switches, so those kasli would just go to the metlino and it's dma a)
<sb0>
otherwise, potentially
<rjo>
i'd be worried about handling all that channel metadata in software. where its dma engine sits, which drtio port it is on, ...
<sb0>
it's easier than in gateware. how do you know if it's the last DMA capable switch?
<rjo>
you determine whether a port on a dma capable switch is dma capable by itself during enumeration.
<sb0>
so the enumeration needs to be done in gateware then?
<rjo>
the switches need (AFAICT now) to have two pieces of information per port. the channel number offset they are supposed to apply, and the dma capability.
<rjo>
i would let the software traverse the tree by just building the channel offset list simulaneously with doing aux pakets on every port for enumeration.
<rjo>
so you ping the first port and you get a bit of information back: how many child ports "it" has, if it's a switch, you ask ping the first port and ask again.
<sb0>
well the comparisons can be done in parallel
<sb0>
so it's not too bad
<rjo>
yes.
<rjo>
... and by doing that you know the channel number offset of the "second" port. and you write that to the switch so that the switch can cut-through and subtract that offset from the outgoing packets.
<rjo>
... there must be some CS lingo/terminology for this kind of algorithm to describe it better.
fengling has joined #m-labs
fengling has quit [Ping timeout: 268 seconds]
fengling has joined #m-labs
fengling has quit [Ping timeout: 268 seconds]
<GitHub71>
[artiq] sbourdeauducq pushed 1 new commit to master: https://git.io/vXcmV