#m-labs on 2016-11-05 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:50 whitequark has joined #m-labs

01:03 fengling has joined #m-labs

01:08 fengling_ has joined #m-labs

01:08 fengling has quit [Ping timeout: 268 seconds]

02:32 <GitHub85> [artiq] klickverbot opened pull request #606: compiler: Fix break/continue targets in loop else blocks (master...loop-else-continue) https://git.io/vX8q5

02:52 mumptai has quit [Ping timeout: 250 seconds]

03:05 mumptai has joined #m-labs

03:43 <sb0> whitequark, that pulls the distro rustc as dependency. are you recommending that two rustc be installed?

04:07 <sb0> wow, the online microsoft onenote and drive apps are remarkably unusable

04:07 <sb0> slow, crashy, buggy

04:08 <sb0> was the Windows ME team on the job?

05:53 <sb0> rjo, if we're not having a "transparent root switch", but independent DRTIO cores on a crossbar bus, then DMA can use several backplane links at once

06:09 <sb0> rjo, seen this? https://github.com/nasa/openmct

06:29 <cr1901_modern> sb0: whitequark got K-lined again

06:31 <sb0> he's in the channel right now

06:32 <cr1901_modern> Oh... yes, they are. I hope I wake up soon

06:44 <sb0> one thing that could be interesting for DMA is reorder the buffer a bit, to make sure we don't block on a full FIFO with timestamps far in the future only to get underflows on other channels

08:20 rohitksingh has joined #m-labs

08:39 rohitksingh has quit [Quit: Leaving.]

08:49 <GitHub1> [migen] sbourdeauducq pushed 1 new commit to master: https://git.io/vX8lq

08:49 <GitHub1> migen/master b94d1f5 Sebastien Bourdeauducq: fhdl/simplify: remove stale MemoryPorts. Closes #49

08:51 <bb-m-labs> build #111 of migen is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/111

08:52 <bb-m-labs> build #167 of misoc is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/misoc/builds/167

08:54 <GitHub12> [migen] sbourdeauducq pushed 1 new commit to master: https://git.io/vX8lZ

08:54 <GitHub12> migen/master 9228a74 Sebastien Bourdeauducq: build: replace mkdir_noerror with os.makedirs. Closes #47

08:56 <bb-m-labs> build #112 of migen is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/migen/builds/112

08:57 <bb-m-labs> build #168 of misoc is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/misoc/builds/168

08:59 <sb0> _florent_, when you have a bunch of transceivers with their TX buffer disabled, do you know how much skew you get on the TXOUTCLK's?

09:00 <sb0> transceivers all clocked from the same source of course

09:02 <sb0> is it the same stupid design as the receiver where they simply divide the bit clock and you cannot control the divider? (well on 7-series they tried, but failed)

09:06 <whitequark> sb0: yes, it is perfectly fine to have two rustcs installed

09:06 <whitequark> there's no harm in it and it saves writing installation instructions for cargo

09:11 <sb0> oh, do I read that right that Xilinx did things correctly for once and the "phase alignment circuit" will align TXOUTCLK with the reference clock?

09:14 <bb-m-labs> build #1059 of artiq is complete: Failure [failed python_unittest] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1059

09:14 <sb0> _florent_, if that's the case, you probably don't actually need elastic buffers at all for JESD, though they absorb unspecified xilinx timing variations

09:17 <GitHub163> [artiq] whitequark pushed 1 new commit to master: https://git.io/vX883

09:17 <GitHub112> [artiq] whitequark closed pull request #606: compiler: Fix break/continue targets in loop else blocks (master...loop-else-continue) https://git.io/vX8q5

09:17 <GitHub163> artiq/master 7dcc987 David Nadlinger: compiler: Fix break/continue targets in loop else blocks...

09:20 <whitequark> sb0: I'm going to use the 1st kc705

09:24 <bb-m-labs> build #162 of artiq-board is complete: Exception [exception interrupted] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/162

09:24 <bb-m-labs> build #1060 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/1060

09:26 <sb0> ok

09:27 <GitHub142> [artiq] sbourdeauducq pushed 1 new commit to release-2: https://git.io/vX88K

09:27 <GitHub142> artiq/release-2 4c2e921 David Nadlinger: compiler: Fix break/continue targets in loop else blocks...

09:29 mumptai_ has joined #m-labs

09:37 <whitequark> sb0: you were right about test_pulse_rate_dds

09:38 <whitequark> 3.0 no longer hoists everything it should out of the loop

09:38 <whitequark> well, this should be an easy fix at least. though i'm not yet sure why this happened...

09:44 mumptai_ has quit [Quit: Verlassend]

09:50 <sb0> _florent_, yeah, you shouldn't need elastic buffers. the somewhat obscure "Using TX Buffer Bypass in Multi-Lane Manual Mode" procedure should fix you up.

10:21 _whitelogger has joined #m-labs

11:00 rohitksingh has joined #m-labs

11:13 <sb0> whitequark, why is rtio.c not in rust?

11:14 <sb0> are there issues with rtio_log?

11:20 <sb0> whitequark, why is "now" a global symbol since all kernel API calls take a timestamp parameter?

11:22 <sb0> couldn't you save load/stores by trying to keep it in a register when calling e.g. rtio_output()?

11:23 <sb0> (repeatedly, with delay()'s)

11:25 rohitksingh has quit [Quit: Leaving.]

11:54 <rjo> sb0: that would be a minor argument to me. it's irrelevant with distributed dma.

11:55 <rjo> sb0: i had seen that a while back when browsing for timeseries interfaces. looks nice!

11:56 <rjo> sb0: that reordering should be done by the yet-to-come true "with parallel" afaict.

12:12 <sb0> doing it dynamically is slow, though it is more acceptable when filling a dma buffer

12:12 <sb0> how is it a minor argument? 60+ Gbps DRAM bandwidth, 4 Gbps per link

12:23 rjo has quit [Read error: Connection reset by peer]

12:31 <sb0> how does your switch-based scheme handle the local regular-RTIO TTLs on the metlino?

12:31 <sb0> also, I'm not particularly satisfied with the artiq speed/latency, and those switches make it worse

12:43 rjo has joined #m-labs

12:49 <rjo> sb0: the data needs to be generated anyway. why not generate it in a smart way. that's also required for non-dma operation. and i don't see how you want to reorder a dma segment while it is being executed. possibly with multiple dma segments executing. possibly with arbitration of non-dma traffic.

12:50 <rjo> it is a minor argument because the drtio links in question will have dma at the downstream end. there is no need to do dma at the upstream side as well.

12:51 <rjo> the current rtio latency is your design, right? why not fix that now?

12:51 <rjo> the switches are cut through. they don't make it worse. i explained that last time.

12:52 <rjo> how do you want to handle the tedious amounts of to-be-hardcoded metadata about where the channels are, how many ports there are?

12:52 <rjo> how do you want to handle trees that are more than one level deep?

12:53 <sb0> they make it worse by at least 100ns-200ns, even with cut-through

12:53 <sb0> just with the transceiver, 8b10b etc.

12:53 <rjo> your plan seems to make the sfp and sata ports on sayma unusable for drtio. is also precludes multi-rack and more than one metlino.

12:53 <sb0> yes and? we can do that in due time

12:54 <rjo> the switches don't make it worse. the hardware does. they make optimal use of the hardware.

12:54 <sb0> if that's ever needed, that is

12:54 <sb0> bah, that's a moot point

12:54 <sb0> a switch will add 100-200ns to the latency seen by the user

12:54 <sb0> period

12:54 <rjo> its not. if you need to get from the root metlino to another metlino and then to a sayma, you need a switch. the latency is already there.

12:55 <rjo> compared to what? no switch means the entire thing doesn't work.

12:56 <rjo> and how do you think that latency affects stuff? it's only relevant if the CPU is in the loop.

12:58 <rjo> making drtio intransparent will be a shitload of compile-time constants, and configuration files (to use your style of complaining). everybody using this will complain about not having a configuration file for their hardware and being unable to derive one.

13:00 <rjo> i am fine with having the dma access the rtio fifos and drtio links "behind the local switch" so that it can fully saturate them. for ddma on sayma that would also be needed.

13:02 <rjo> so basically, on a metlino or sayma, the dma and the switch would compete for every rtio channel or drtio port. the dma engines and the switch would be masters on the rtio crossbar and the rtio channels and drtio ports would be slaves.

13:02 <whitequark> sb0: yes, rt_log cannot be written in rust

13:03 <whitequark> I can rewrite the codegen that invokes it though

13:03 <whitequark> just not done yet

13:04 <whitequark> re now: it doesn't necessarily make things better. consumes one register that has to be saved and restored. inhibits some optimizations

13:12 rjo has quit [Read error: Connection reset by peer]

13:27 <sb0> it works fine with a single crate and no switch

13:28 <sb0> the configuration for multi-drtio-core is just a port# to add in device_db. which corresponds to the sayma board number. how is that worse than some constant added to channel numbers to select a sayma?

14:06 rjo has joined #m-labs

14:10 rjo has quit [Read error: Connection reset by peer]

14:18 rjo has joined #m-labs

14:21 <rjo> sb0: i don't think it's wise to specifically design for and insist on such a limited interpretation of drtio where its shortcomings are already visible and where its re-write and replacement are programmed in.

14:22 <rjo> adding the drtio port number to the configuration is worse because it adds to the list of arguments for the rtio API. and it doesn't scale.

14:33 <sb0> the rewrite is what, 10%?

14:34 <sb0> there is also nothing in common between routing a RTIO command from the CPU to a transceiver link or a local RTIO core, and routing a RTIO packet

14:34 <sb0> the "transparent root switch" is totally different than the cut-through switch

14:35 <sb0> not even 10% actually, all the transceiver, link layer, etc. code stays the same, only the packet processing needs some adaptation

14:48 <sb0> a lot of things become more complicated with switches, e.g. rtio counter synchronization and return channels, but that's more adding code than rewriting

14:51 <rjo> the rewrite would touch so much all over the stack from the configuration, gui, moninj, core device api, gateware. i would be careful with saying it's just 10%. somebody might take your word for it.

14:53 <rjo> rtio counter synchronization does not become more complicated with switches.

14:55 <sb0> it does, you have to compute switch latency to determine the required underflow margin

14:56 <sb0> or measure it

14:56 <rjo> is that a rtio counter synchronization issue?

14:58 <rjo> the difference between the trnasparent root switch and the downstream switch is only in how it is driven. one is fed by rtio api the other by an upstream drtio link.

15:02 <sb0> it's linked to it, yes

15:02 <sb0> if you do it that way then it's not cut-through

15:04 <sb0> alternatively, the cpu could write the packet itself...

15:05 <sb0> but this doesn't solve the "request fifo space" issue, which is better done in gateware as the cpu is slow enough already

15:14 <sb0> anyway, I'm fine with encoding the port number into the channel number as you propose

15:14 <sb0> this packs the channel state BRAM more efficiently than separate cores

15:15 <rjo> maybe its better to break the terminology up: on the root metlino there is the rtio kernel api and the dma engines as masters to the rtio ports as slaves (a local rtio port for the local channels and remote drtio ports). downstream metlinos have a drtio switch receiving packets and cut-through switching them to downstream drtio ports or unpacking them to the local rtio port.

15:15 <sb0> the local RTIO (TTL on root/sole Metlino) could be selected with the channel number MSB or something like that

15:15 <rjo> i wouldn't want to build the drtio packets on the cpu. that sounds inefficient.

15:16 <rjo> i fully agree that the status handling should be offloaded into the gateware as much as possible.

15:16 sandeepkr has quit [Remote host closed the connection]

15:16 <sb0> we'd be using one transceiver link at a time though

15:17 <rjo> ack the port/channel number packing. we could even consider also packing the address into that value...

15:17 <rjo> one link at a time in which case?

15:17 sandeepkr has joined #m-labs

15:18 <sb0> if you have a Metlino and 12 Saymas, only one of the 12 links at most will be active at any given time

15:18 <rjo> ... i mean that rtio address that is a few bits currently.

15:19 <rjo> when fed by the cpu. yes.

15:19 <sb0> when fed by dma too, in this case

15:19 <rjo> why?

15:19 <sb0> shared channel state BRAM

15:20 <sb0> among other things. pretty much everything is shared.

15:25 <rjo> hmm. i don't follow. you mean the bram of local rtio channels would be shared? or the bram in front of an egress drtio port is shared by all drtio ports?

15:28 <rjo> i do see the benefit of enabling the dma engine to dispatch more than one rtio event per cycle.

15:28 <sb0> the master DRTIO core has some BRAM that for each channel tracks the last timestamp (to detect sequence errors and fully empty FIFOs) and a pessimistic estimate of the FIFO level

15:30 <rjo> right. that's only at the root, right? it would not be needed for d-dma.

15:31 <rjo> assuming we don't suppport the case of a satellite metlino doing the dma into downstream saymas/kaslis.

15:31 <rjo> i.e. a dma engine that feeds remote rtio channels needs would also need to track that stuff.

15:34 <sb0> yes

15:34 fengling_ has quit [Ping timeout: 268 seconds]

15:37 <rjo> i would be ok with that restriction (no intermediate dma and one-link-at-a-time) because i think that ddma (which can feed local rtio channels in parallel) will prevent it from becoming a bandwidth problem.

15:38 <rjo> otoh: can't that tracking bram still be in front of every egress drtio link? then the root metlino dma engine could blast multiple links.

15:39 <sb0> it can, but then you need one BRAM per link, and you lose the efficient packing

15:39 <sb0> right now the channel number (that contains the port number, with the additive scheme) can be an index into the BRAM

15:40 <sb0> with multiple BRAMs, each BRAM must have a fixed size that corresponds to the maximum number of channels on each port

15:43 <rjo> right. but the "single bram" would also have the same constraint. just in one point now.

15:43 <rjo> and less efficient packing.

15:43 <sb0> the single bram has more efficient packing

15:43 <rjo> yes.

15:44 <rjo> ack. i am undecided what's better here. a single tracking bram sounds good enough given ddma.

16:11 sandeepkr has quit [Remote host closed the connection]

16:12 sandeepkr has joined #m-labs

16:27 rohitksingh has joined #m-labs

16:33 sandeepkr has quit [Remote host closed the connection]

16:34 sandeepkr has joined #m-labs

17:31 fengling_ has joined #m-labs

17:36 fengling_ has quit [Ping timeout: 268 seconds]

19:32 fengling_ has joined #m-labs

19:37 fengling_ has quit [Ping timeout: 268 seconds]

20:46 rohitksingh has quit [Quit: Leaving.]

21:34 fengling_ has joined #m-labs

21:39 fengling_ has quit [Ping timeout: 268 seconds]