sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
whitequark has joined #m-labs
fengling has joined #m-labs
fengling_ has joined #m-labs
fengling has quit [Ping timeout: 268 seconds]
<GitHub85> [artiq] klickverbot opened pull request #606: compiler: Fix break/continue targets in loop else blocks (master...loop-else-continue) https://git.io/vX8q5
mumptai has quit [Ping timeout: 250 seconds]
mumptai has joined #m-labs
<sb0> whitequark, that pulls the distro rustc as dependency. are you recommending that two rustc be installed?
<sb0> wow, the online microsoft onenote and drive apps are remarkably unusable
<sb0> slow, crashy, buggy
<sb0> was the Windows ME team on the job?
<sb0> rjo, if we're not having a "transparent root switch", but independent DRTIO cores on a crossbar bus, then DMA can use several backplane links at once
<sb0> rjo, seen this? https://github.com/nasa/openmct
<cr1901_modern> sb0: whitequark got K-lined again
<sb0> he's in the channel right now
<cr1901_modern> Oh... yes, they are. I hope I wake up soon
<sb0> one thing that could be interesting for DMA is reorder the buffer a bit, to make sure we don't block on a full FIFO with timestamps far in the future only to get underflows on other channels
rohitksingh has joined #m-labs
rohitksingh has quit [Quit: Leaving.]
<GitHub1> [migen] sbourdeauducq pushed 1 new commit to master: https://git.io/vX8lq
<GitHub1> migen/master b94d1f5 Sebastien Bourdeauducq: fhdl/simplify: remove stale MemoryPorts. Closes #49
<bb-m-labs> build #111 of migen is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/111
<bb-m-labs> build #167 of misoc is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/misoc/builds/167
<GitHub12> [migen] sbourdeauducq pushed 1 new commit to master: https://git.io/vX8lZ
<GitHub12> migen/master 9228a74 Sebastien Bourdeauducq: build: replace mkdir_noerror with os.makedirs. Closes #47
<bb-m-labs> build #112 of migen is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/112
<bb-m-labs> build #168 of misoc is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/misoc/builds/168
<sb0> _florent_, when you have a bunch of transceivers with their TX buffer disabled, do you know how much skew you get on the TXOUTCLK's?
<sb0> transceivers all clocked from the same source of course
<sb0> is it the same stupid design as the receiver where they simply divide the bit clock and you cannot control the divider? (well on 7-series they tried, but failed)
<whitequark> sb0: yes, it is perfectly fine to have two rustcs installed
<whitequark> there's no harm in it and it saves writing installation instructions for cargo
<sb0> oh, do I read that right that Xilinx did things correctly for once and the "phase alignment circuit" will align TXOUTCLK with the reference clock?
<bb-m-labs> build #1059 of artiq is complete: Failure [failed python_unittest] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1059
<sb0> _florent_, if that's the case, you probably don't actually need elastic buffers at all for JESD, though they absorb unspecified xilinx timing variations
<GitHub163> [artiq] whitequark pushed 1 new commit to master: https://git.io/vX883
<GitHub112> [artiq] whitequark closed pull request #606: compiler: Fix break/continue targets in loop else blocks (master...loop-else-continue) https://git.io/vX8q5
<GitHub163> artiq/master 7dcc987 David Nadlinger: compiler: Fix break/continue targets in loop else blocks...
<whitequark> sb0: I'm going to use the 1st kc705
<bb-m-labs> build #162 of artiq-board is complete: Exception [exception interrupted] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/162
<bb-m-labs> build #1060 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1060
<sb0> ok
<GitHub142> [artiq] sbourdeauducq pushed 1 new commit to release-2: https://git.io/vX88K
<GitHub142> artiq/release-2 4c2e921 David Nadlinger: compiler: Fix break/continue targets in loop else blocks...
mumptai_ has joined #m-labs
<whitequark> sb0: you were right about test_pulse_rate_dds
<whitequark> 3.0 no longer hoists everything it should out of the loop
<whitequark> well, this should be an easy fix at least. though i'm not yet sure why this happened...
mumptai_ has quit [Quit: Verlassend]
<sb0> _florent_, yeah, you shouldn't need elastic buffers. the somewhat obscure "Using TX Buffer Bypass in Multi-Lane Manual Mode" procedure should fix you up.
_whitelogger has joined #m-labs
rohitksingh has joined #m-labs
<sb0> whitequark, why is rtio.c not in rust?
<sb0> are there issues with rtio_log?
<sb0> whitequark, why is "now" a global symbol since all kernel API calls take a timestamp parameter?
<sb0> couldn't you save load/stores by trying to keep it in a register when calling e.g. rtio_output()?
<sb0> (repeatedly, with delay()'s)
rohitksingh has quit [Quit: Leaving.]
<rjo> sb0: that would be a minor argument to me. it's irrelevant with distributed dma.
<rjo> sb0: i had seen that a while back when browsing for timeseries interfaces. looks nice!
<rjo> sb0: that reordering should be done by the yet-to-come true "with parallel" afaict.
<sb0> doing it dynamically is slow, though it is more acceptable when filling a dma buffer
<sb0> how is it a minor argument? 60+ Gbps DRAM bandwidth, 4 Gbps per link
rjo has quit [Read error: Connection reset by peer]
<sb0> how does your switch-based scheme handle the local regular-RTIO TTLs on the metlino?
<sb0> also, I'm not particularly satisfied with the artiq speed/latency, and those switches make it worse
rjo has joined #m-labs
<rjo> sb0: the data needs to be generated anyway. why not generate it in a smart way. that's also required for non-dma operation. and i don't see how you want to reorder a dma segment while it is being executed. possibly with multiple dma segments executing. possibly with arbitration of non-dma traffic.
<rjo> it is a minor argument because the drtio links in question will have dma at the downstream end. there is no need to do dma at the upstream side as well.
<rjo> the current rtio latency is your design, right? why not fix that now?
<rjo> the switches are cut through. they don't make it worse. i explained that last time.
<rjo> how do you want to handle the tedious amounts of to-be-hardcoded metadata about where the channels are, how many ports there are?
<rjo> how do you want to handle trees that are more than one level deep?
<sb0> they make it worse by at least 100ns-200ns, even with cut-through
<sb0> just with the transceiver, 8b10b etc.
<rjo> your plan seems to make the sfp and sata ports on sayma unusable for drtio. is also precludes multi-rack and more than one metlino.
<sb0> yes and? we can do that in due time
<rjo> the switches don't make it worse. the hardware does. they make optimal use of the hardware.
<sb0> if that's ever needed, that is
<sb0> bah, that's a moot point
<sb0> a switch will add 100-200ns to the latency seen by the user
<sb0> period
<rjo> its not. if you need to get from the root metlino to another metlino and then to a sayma, you need a switch. the latency is already there.
<rjo> compared to what? no switch means the entire thing doesn't work.
<rjo> and how do you think that latency affects stuff? it's only relevant if the CPU is in the loop.
<rjo> making drtio intransparent will be a shitload of compile-time constants, and configuration files (to use your style of complaining). everybody using this will complain about not having a configuration file for their hardware and being unable to derive one.
<rjo> i am fine with having the dma access the rtio fifos and drtio links "behind the local switch" so that it can fully saturate them. for ddma on sayma that would also be needed.
<rjo> so basically, on a metlino or sayma, the dma and the switch would compete for every rtio channel or drtio port. the dma engines and the switch would be masters on the rtio crossbar and the rtio channels and drtio ports would be slaves.
<whitequark> sb0: yes, rt_log cannot be written in rust
<whitequark> I can rewrite the codegen that invokes it though
<whitequark> just not done yet
<whitequark> re now: it doesn't necessarily make things better. consumes one register that has to be saved and restored. inhibits some optimizations
rjo has quit [Read error: Connection reset by peer]
<sb0> it works fine with a single crate and no switch
<sb0> the configuration for multi-drtio-core is just a port# to add in device_db. which corresponds to the sayma board number. how is that worse than some constant added to channel numbers to select a sayma?
rjo has joined #m-labs
rjo has quit [Read error: Connection reset by peer]
rjo has joined #m-labs
<rjo> sb0: i don't think it's wise to specifically design for and insist on such a limited interpretation of drtio where its shortcomings are already visible and where its re-write and replacement are programmed in.
<rjo> adding the drtio port number to the configuration is worse because it adds to the list of arguments for the rtio API. and it doesn't scale.
<sb0> the rewrite is what, 10%?
<sb0> there is also nothing in common between routing a RTIO command from the CPU to a transceiver link or a local RTIO core, and routing a RTIO packet
<sb0> the "transparent root switch" is totally different than the cut-through switch
<sb0> not even 10% actually, all the transceiver, link layer, etc. code stays the same, only the packet processing needs some adaptation
<sb0> a lot of things become more complicated with switches, e.g. rtio counter synchronization and return channels, but that's more adding code than rewriting
<rjo> the rewrite would touch so much all over the stack from the configuration, gui, moninj, core device api, gateware. i would be careful with saying it's just 10%. somebody might take your word for it.
<rjo> rtio counter synchronization does not become more complicated with switches.
<sb0> it does, you have to compute switch latency to determine the required underflow margin
<sb0> or measure it
<rjo> is that a rtio counter synchronization issue?
<rjo> the difference between the trnasparent root switch and the downstream switch is only in how it is driven. one is fed by rtio api the other by an upstream drtio link.
<sb0> it's linked to it, yes
<sb0> if you do it that way then it's not cut-through
<sb0> alternatively, the cpu could write the packet itself...
<sb0> but this doesn't solve the "request fifo space" issue, which is better done in gateware as the cpu is slow enough already
<sb0> anyway, I'm fine with encoding the port number into the channel number as you propose
<sb0> this packs the channel state BRAM more efficiently than separate cores
<rjo> maybe its better to break the terminology up: on the root metlino there is the rtio kernel api and the dma engines as masters to the rtio ports as slaves (a local rtio port for the local channels and remote drtio ports). downstream metlinos have a drtio switch receiving packets and cut-through switching them to downstream drtio ports or unpacking them to the local rtio port.
<sb0> the local RTIO (TTL on root/sole Metlino) could be selected with the channel number MSB or something like that
<rjo> i wouldn't want to build the drtio packets on the cpu. that sounds inefficient.
<rjo> i fully agree that the status handling should be offloaded into the gateware as much as possible.
sandeepkr has quit [Remote host closed the connection]
<sb0> we'd be using one transceiver link at a time though
<rjo> ack the port/channel number packing. we could even consider also packing the address into that value...
<rjo> one link at a time in which case?
sandeepkr has joined #m-labs
<sb0> if you have a Metlino and 12 Saymas, only one of the 12 links at most will be active at any given time
<rjo> ... i mean that rtio address that is a few bits currently.
<rjo> when fed by the cpu. yes.
<sb0> when fed by dma too, in this case
<rjo> why?
<sb0> shared channel state BRAM
<sb0> among other things. pretty much everything is shared.
<rjo> hmm. i don't follow. you mean the bram of local rtio channels would be shared? or the bram in front of an egress drtio port is shared by all drtio ports?
<rjo> i do see the benefit of enabling the dma engine to dispatch more than one rtio event per cycle.
<sb0> the master DRTIO core has some BRAM that for each channel tracks the last timestamp (to detect sequence errors and fully empty FIFOs) and a pessimistic estimate of the FIFO level
<rjo> right. that's only at the root, right? it would not be needed for d-dma.
<rjo> assuming we don't suppport the case of a satellite metlino doing the dma into downstream saymas/kaslis.
<rjo> i.e. a dma engine that feeds remote rtio channels needs would also need to track that stuff.
<sb0> yes
fengling_ has quit [Ping timeout: 268 seconds]
<rjo> i would be ok with that restriction (no intermediate dma and one-link-at-a-time) because i think that ddma (which can feed local rtio channels in parallel) will prevent it from becoming a bandwidth problem.
<rjo> otoh: can't that tracking bram still be in front of every egress drtio link? then the root metlino dma engine could blast multiple links.
<sb0> it can, but then you need one BRAM per link, and you lose the efficient packing
<sb0> right now the channel number (that contains the port number, with the additive scheme) can be an index into the BRAM
<sb0> with multiple BRAMs, each BRAM must have a fixed size that corresponds to the maximum number of channels on each port
<rjo> right. but the "single bram" would also have the same constraint. just in one point now.
<rjo> and less efficient packing.
<sb0> the single bram has more efficient packing
<rjo> yes.
<rjo> ack. i am undecided what's better here. a single tracking bram sounds good enough given ddma.
sandeepkr has quit [Remote host closed the connection]
sandeepkr has joined #m-labs
rohitksingh has joined #m-labs
sandeepkr has quit [Remote host closed the connection]
sandeepkr has joined #m-labs
fengling_ has joined #m-labs
fengling_ has quit [Ping timeout: 268 seconds]
fengling_ has joined #m-labs
fengling_ has quit [Ping timeout: 268 seconds]
rohitksingh has quit [Quit: Leaving.]
fengling_ has joined #m-labs
fengling_ has quit [Ping timeout: 268 seconds]