<GitHub-m-labs>
[artiq] whitequark commented on issue #943: This is exceptionally (no pun intended) annoying to fix. We already take a copy of the exception itself but it's not easily possible to take a copy of data without an allocation of some sort. Maybe put the data into runtime (not kernel) memory... https://github.com/m-labs/artiq/issues/943#issuecomment-370110743
attie has quit [Read error: Connection reset by peer]
attie has joined #m-labs
mumptai_ has joined #m-labs
mumptai has quit [Ping timeout: 240 seconds]
attie has quit [Ping timeout: 240 seconds]
attie has joined #m-labs
<sb0>
so the kernel module is there to provide a NIH socket interface?
<davidc__>
sb0: usually the GIGE kernel modules are to work around shitty prioritization / crappy userspace code
<davidc__>
er, GIGE vision
<davidc__>
basically, GIGE vision just dumps the frame to you over UDP. If you get all the packets, good for you. If not, your loss
<davidc__>
so for shitty devices, or programmers who write shitty code, its easier to provide a kernel module that just grabs the frame when requested
<davidc__>
no idea if thats what this particular GIGE vision driver does, but I've seen similar for other machine vision cameras
<sb0>
huh. why not a shared library, or subprocess?
<davidc__>
sb0: there are many questions with no good answers :)
<davidc__>
sb0: in all seriousness, probably because somebody wrote that library $X years ago for $Y shitty oversubscribed hardware
<davidc__>
and thats the way they've always done it. Or, it guarantees that valid frames are grabbed even under worst case loading conditions, regardless of whether the user tunes their code/system right
<davidc__>
so they get less support calls
<davidc__>
(really, I got no idea why they'd do it that way in particular, all I know is that its not particularly uncommon)
<davidc__>
sb0: FWIW, I have a GIGE vision camera kicking around the lab if you need another GIGE vision device to test against, but its probably not useful unless you are writing a universal GIGE vision driver
<whitequark>
pretty sure kernel can still drop packets with a module
<davidc__>
whitequark: sure it can, but depending how you hook it in and depending on the system loading conditions (and depending how how well the userspace code is written), it might drop packets less
<davidc__>
to be clear, I'm not saying its a good or valid design decision
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #942: Planning to introduce a ``RTIOLinkError`` exception when attempting a RTIO operation that involves a link that is down. It cannot be precise (since we usually don't wait for feedback from the satellite devices for latency/throughput reasons) but it should catch most cases. https://github.com/m-labs/artiq/issues/942#issuecomment-370120046
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #941: Yes, maybe the startup kernel can explicitly wait for all relevant links to be up. I propose introducing a separate API call to check for link status, which will be cleaner than attempting RTIO operations until ``RTIOLinkError`` (#942) is no longer raised. https://github.com/m-labs/artiq/issues/941#issuecomment-370120134
<GitHub19>
[smoltcp] whitequark commented on issue #174: I think the implementation you propose (with `Iterator<Ipv4Cidr>`) is too niche, I don't see a lot of uses for it. Since it can be freely implemented outside of smoltcp I don't think it should be included in smoltcp. https://github.com/m-labs/smoltcp/issues/174#issuecomment-370128369
attie has quit [Ping timeout: 265 seconds]
attie has joined #m-labs
attie has quit [Ping timeout: 240 seconds]
attie has joined #m-labs
<sb0>
whitequark, i didn't touch DMA. cache effect?
<rjo>
sb0: could you look at slave_fpga when you get a chance?
<sb0>
rjo, let me finish the two drtio issues that chris reported first
<rjo>
whitequark: was a firewall missconfig
<rjo>
sb0: ack. the positive slack with external clock thing is also mysterious.
<rjo>
i'll be afk today.
<rjo>
sb0: on slave_fpga, i also tested various pullups/downs, slew=fast on cclk, drive=16, driving done high (and checking serwb afterwards), various speed changes, slower cclk, traced the individual bits being sent, confirmed sync word bit order, tested just for fun byte-swapping the rtm bitstream, compared with u-boot xilinx slave serial code, linux kernel xilinx slave serial code.
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<cjbe>
sb0, a thought: rather than reworking on a si5326, could we just turn up the si5324 bandwidth, then use a MMCM to phase shift the recovered clock that feeds the si5324
<cjbe>
as long as the si5324 lock b/w to the input clock, it will
<cjbe>
worst case requires a 150 MHz reference crystal bodged onto the clkin2 so we can use a higher PFD frequency and hence higher lock b/w
<cjbe>
So minimal hardware rework in the short term
<sb0>
if there is higher bandwidth (which is also what the 5326 is providing) then there is no need for phase lock, just measure the skew and compensate for it with either a mmcm or (for the 5326) the skew adjust feature
<cjbe>
Also, if we decide to go for an TCVCXO and DAC, like White Rabbit, the 'only' gateware we change out is replacing the interface to the MMCM with a DAC SPI interface
<sb0>
my understanding is the wander is due to a combination of low bandwidth + unstability of the reference. input corrections to the wander would be outside the bandwidth.
<cjbe>
sb0: as the input-output phase stabilty is not specced on the si5326, I expect this to drift over time, hence the 'measure and correct the skew' operation will need to be repeated frequently - I would call this a phase lock
<sb0>
cjbe, what kind of wander do you get with the 5324 and a good quality 150MHz clkin2?
<cjbe>
sb0, yes - the input correction would be attenuated by the si bandwidth, but we could turn up the gain of the feedback and precompensate for this
<cjbe>
sb0, I have not measured that carefully - will measure that today
<sb0>
I'm afraid turning up the gain too much will lead to oscillations of the loop, or maybe problems with the MMCM
<cjbe>
another possiblity is to go straight to the White Rabbit TCVCXO + DAC solution - we could bodge this on pretty easily onto the clkin1 of the si chip, then run that in bypass mode
<cjbe>
sb0, possibly - I am not sure entirely what is going on inside the si to cause this in the first place
<cjbe>
sb0: just had a look at the si5324 phase offsets
<cjbe>
using a good clock on clkin2, looking at phase shift between this clock and the si5324 output (MMCX)
<cjbe>
using the default Artiq settings (PFD at 16 kHz, BWsel=3) I see phase wander at the ~10 Hz timescale with pk-pk of 4ns over a minute. Touching the si reference crystal gives many many cycles of phase shift
<cjbe>
using the PFD at 2 MHz with 540 Hz bandwidth (BWsel=4) I see a jitter stddev of 7ps, and a pk-pk over a minute of 75ps. Touching the reference crystal gives a pk-pk of ~150ps
<cjbe>
using the si in bypass mode, I see a jitter stddev of 8ps, and a pk-pk of 70ps
<cjbe>
and the jitter of my nice clock against itself (from a power splitter) I see a jitter of 8ps stddev and pk-pk 72ps
<cjbe>
(this is all using the stock Kasli, without a nicer si reference crystal)
<cjbe>
so this is not consistent with my earlier measurements, where I stated that even at this higher bandwidth the si was not locking to the recovered clock properly
<sb0>
cjbe, okay. turns out I was having a 2MHz PFD and BWSEL=4 with the initial kc705 tests (at 62.5MHz)
<cjbe>
so with the master and satellite running at 2 MHz PFD and BWSEL=4 (using an external 150 MHz clock on both), the satellite si output has a good phase lock to the master si output - (6ps stddev, 57ps pk-pk)
<cjbe>
once everything is up, I can disconnect the external clock from both master and satellite, and everything still is phaselocked nicely (so no funny business going on)
<sb0>
alright! so it was basically a si5324 config error. what about the skew between power-ups?
<sb0>
in the kc705 tests we did that was constant, even though the DS says it's not
<cjbe>
If I disconnect and reconnect the fiber (so reset the DRTIO link) I see the skew varying
<cjbe>
it may be quantised, but it appears to vary over a full turn
<sb0>
okay. so we can add some simple FPGA calibration
<cjbe>
indeed
<sb0>
what were your 150MHz sources?
<cjbe>
the only remaining issue is how to get the PFD frequency up - we cannot do this using the si in free run mode with a ~114.3 MHz crystal.
<sb0>
OCXO-grade?
<cjbe>
But we could generate a 150 MHz / 125 MHz clock from a MMCM on the FPGA and switch it into the si input (instead of rtio_rx0)
<cjbe>
I am using a synth and a splitter to generate the two 150 MHz external references
<sb0>
well if the references are from the same oscillator, it's cheating
<cjbe>
Or we could replace the ~114.3 MHz crystal with (say) 125 MHz to make nice numbers
<cjbe>
sb0, I need the external clock to startup the si - I can startup the master, then disconnect the external clock, and startup the slave and everything still works
<sb0>
oh ok, I see
<sb0>
it still has the original crystal as reference
<cjbe>
yep
<sb0>
okay, good. there's hope we can get the hardware to <<1ns with just gateware
<cjbe>
indeed - hallelujah
attie has quit [Ping timeout: 256 seconds]
attie has joined #m-labs
<GitHub125>
[smoltcp] dlrobertson opened pull request #175: Add has_solicited_node to EthernetInterface (master...solicited_node) https://github.com/m-labs/smoltcp/pull/175
<GitHub100>
[smoltcp] dlrobertson commented on issue #175: Adding IPv6 address resolution to `EthernetInterface` will take quite a bit of work. I'll try my best to break it down into small bite-sized chunks like this, when possible. https://github.com/m-labs/smoltcp/pull/175#issuecomment-370162840