sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
<GitHub168>
[smoltcp] jhwgh1968 commented on issue #106: I've been thinking about using smoltcp for a project of mine. If no one else has started on this task yet, I could give it a try over the next week or two. https://github.com/m-labs/smoltcp/issues/106#issuecomment-394203335
<GitHub-m-labs>
[artiq] hartytp commented on issue #1043: @sbourdeauducq my question was addressed to @jbqubit (see the text I quoted) as he is still using 2017.4, where Sayma didn't meet timing for me... https://github.com/m-labs/artiq/issues/1043#issuecomment-394269914
<hartytp>
maybe even better would be to just leave it shutdown?
<_florent_>
change that to write(0x1, 0x48);
<GitHub-m-labs>
[artiq] cjbe opened pull request #1047: correct documented siphaser VCO frequency [NFC] (master...siphaserdoc) https://github.com/m-labs/artiq/pull/1047
<_florent_>
if it stops crashing, maybe something we can try next is to enable the outputs only when the rest of the configuration has been done
<hartytp>
yes, I did implement that before (change the shutdown function to mute and then only call unmute after the init)
<hartytp>
it didn't help then, but we've fixed a few things since, so maybe now it will do something
<hartytp>
I'll try
<_florent_>
ok thanks
<hartytp>
but, might be shutting the proverbial stable door after the horse has kicked the shit out of our FPGA
<hartytp>
i.e. we can't mute the 7043 until after boot has been completed, so maybe enough time for it to cause memory corruption, etc that only shows up later on?
<_florent_>
ah yes indeed...
<hartytp>
hmmm what about using the reset line
<_florent_>
now that you are able to see the broadband noise, do you see if we only have it on the first start after power on, or if we have it at each restart?
<hartytp>
each restart
<hartytp>
(each time I call artiq_flash ... load)
<_florent_>
ok interesting, i was thinking it was only at the first power on
<hartytp>
nope
<hartytp>
I guess that loading the RTM FPGA resets things
<_florent_>
ok
<hartytp>
(regulators?)
<hartytp>
do you remember how the resets work
<hartytp>
?
<hartytp>
looking on schematic sheet 9, it looks like the Si5324 and HMC7043 reset lines are tied together
<hartytp>
I guess we don't need the SI5324 atm, so I can hold both in reset and see what happens
<_florent_>
no, but i'm going to look too.
<hartytp>
yes, it resets both chips
<hartytp>
okay, I'll add a CSR to control the HMC7043 reset and see what happens if I keep it disabled until after HMC830 boot...
<_florent_>
yes we can do that
<_florent_>
do you want i add this?
<hartytp>
I think I'm fine doing it
<_florent_>
ok good
<hartytp>
remind me: on the AMC, where do you disable the inputs from the HMC7043 during boot?
<_florent_>
ah, i was also thinking about disabling this feature to be sure we really eliminate the broadband noise :)
<GitHub-m-labs>
artiq/master 07d4145 Chris Ballance: correct documented siphaser VCO frequency [NFC]
<bb-m-labs>
build #2418 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2418 blamelist: Florent Kermarrec <florent@enjoy-digital.fr>
<GitHub-m-labs>
[artiq] jbqubit commented on issue #1043: > FWIW, with 2018.1 I've run two different Sayma boards (after the various fixes for bugs like SDRAM, HMC7043 noise, 1V8, etc.) continuously for days without any bug of this sort.... https://github.com/m-labs/artiq/issues/1043#issuecomment-394346575
<hartytp>
(building without sawg because life is too short)
<_florent_>
hartytp: yes, if rtm is already build, you can probably also start doing some tests without the input buffers always enabled on the AMC.
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #1043: Always running the same kernel that uses SAWG, but there was Ethernet traffic processed by the comms CPU due to TCP keepalive and network broadcasts.... https://github.com/m-labs/artiq/issues/1043#issuecomment-394348143
<hartytp>
okay either serwb init fails or hmc830 acks 0
<bb-m-labs>
build #2419 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2419 blamelist: Chris Ballance <chris.ballance@physics.ox.ac.uk>
<_florent_>
and the hmc7043 rst is floating or connected to the trace?
<hartytp>
was grounded
<hartytp>
now tied to 3v3
<hartytp>
will tell you what happens when I get JTAG to stop playing silly buggers
<hartytp>
okay, tying the reset high does stop the noise
<_florent_>
ok, and are you able to connect it to the trace or is it complicated?
<hartytp>
hmc830 is just not acking
<hartytp>
I saw this once before where it stopped responding
<hartytp>
I left it for an hour and it started again
<hartytp>
may be some thermal thing...
<hartytp>
**shudder**
<_florent_>
ok, so maybe you should power off the board, try to connect the rst to the trace, and continue the test in one hour
hartytp has quit [Ping timeout: 260 seconds]
hartytp has joined #m-labs
<hartytp>
I'll have another quick go in the morning.
<hartytp>
but, I'm beginning to think that it's better to just focus on the boards that work
<GitHub-m-labs>
[artiq] jbqubit commented on issue #1043: Using latest from master 20180604 with SAWG vivado 2018.1 07d4145a35c739. Meets timing. I've run 25 scripts involving SAWG via Ethernet. I don't see any errors on UART. https://github.com/m-labs/artiq/issues/1043#issuecomment-394375956
<hartytp>
the board to board variation could well be some piece of rework that's failed
<GitHub-m-labs>
[artiq] jbqubit commented on issue #1026: Using latest from master 20180604 with SAWG vivado 2018.1 07d4145a35c739. Meets timing. I've run 25 scripts involving SAWG via Ethernet. No panics. https://github.com/m-labs/artiq/issues/1026#issuecomment-394379327
<hartytp>
_florent_ did the JTAG rework that Greg suggested (short pins 11 and 13 with a solder blob)
<hartytp>
board seems much happier now
<hartytp>
with the HMC7043 enable tied high, 5 out of 5 times I get to SERDES PLL lock timeout (now expected since there is no output from the HMC7043)
<hartytp>
scope verifies that there is no noise on the HMC7043 output during boot
<hartytp>
spoke too soon, now I have 2/2 serwb init failed, so that's definitely not connected to the 7043
<_florent_>
hartytp: i would not focus too much on serwb for now
<_florent_>
hartytp: for now let's try to get rid of the crashes
<_florent_>
hartytp: if you are able to do the 7043 rework and see that crashes stop, then i would recommend regenerating serwb with 1gbps linerate
<_florent_>
hartytp: also if you no longer have noise due to hmc7043, there is no reason to use low speed serwb, we should be able to use the 1gbps version on all boards
rohitksingh has quit [Read error: Connection reset by peer]
<hartytp>
how do I enable 1GSPS line rate
<hartytp>
okay, that seems to have worked!
<hartytp>
okay, that seems to have worked!
<rjo>
hartytp: github annoyingly put me as the author of that commit of yours. sorry about that.
<hartytp>
np
<hartytp>
thanks again for all the work on the servo
<sb0>
there is definitely jesd breakage but I'm not sure if that explains everything
<hartytp>
sb0: probably not, but shall we try to fight one fire at a time?
<sb0>
I've done one test where I set a small amplitude in the SAWG, but the generated signal would still be full-range; samples getting swapped all over the place would not explain that
rohitksingh has joined #m-labs
<sb0>
whitequark, ping
<rjo>
jesd (at least the core) splits it up into nibbles.
<rjo>
i'd debug sawg right now if i knew where to look. imho the proper way to approach this is with prbs (check), stpl, and then the ramp generator.
<GitHub-m-labs>
[artiq] hartytp commented on issue #794: To look at this, I changed the FPGA_CLOCK divider to 12 (100MHz output) and looked at J61 on a fast scope triggered from my 100MHz reference. I can confirm that the HMC7043 configuration currently used in ARTIQ master does not provide deterministic latency. I'll apply the patch I proposed above and recheck.... https://github.com/m-labs/artiq/issues/794#issuecomment-3
<_florent_>
hartytp: ok, now that you no longer have crashes, can you use this patch to use 1gbps serwb?:
<rjo>
_florent_: thanks. just a quick q: what is the smallest granularity that jesd could end up doing wrong ordering or misalignment at? octets? nibbles?
<_florent_>
rjo: so i would say octets, but i have to have a closer look
<rjo>
_florent_: ack.
<larsc>
rjo: what do you mean with wrong ordering?
<larsc>
if you have problems with amplitude, maybe MS octet and LS octet are swapped?
<larsc>
although that should not be random
<larsc>
when I look at your broken waveform I'd say offset binary vs two's complement problem, no idea how that would happen though
<rjo>
larsc: well. i am not certain i'm asking the right question. lmfc alignment granularity is frames, frame alignment granularity is octets, right?
<larsc>
I don't thing lmfc matters here, lmfc is just for determinisitc latency
<larsc>
lane alignment is done based on the first non /K/ character that is received
<larsc>
so unless you have a underflow/overflow in the transceiver after the link has been established things should stay aligned
<rjo>
right. underflows is one thing.
<rjo>
but https://pasteboard.co/HnSYE20.jpg that's from a counter that wraps around and outputs the same sample 4 times into the jesd core.
<rjo>
that is a sample ordering issue.
<larsc>
yes
mumptai has joined #m-labs
<larsc>
but that kind of reordering would only happen in the FPGA
<larsc>
never seen this in a DAC
<larsc>
any CDC FIFOs?
<larsc>
that almost looks like a gray counter
<rjo>
larsc: hmm. yes.
<_florent_>
rjo: could it be related to the elastic buffers?
<larsc>
any 3 bit gray counters in your system?
<rjo>
larsc: sure. EB depth comes to mind.
<rjo>
larsc: why 3 bit?
<larsc>
looks like 3-bit, I don't know
<rjo>
_florent_: i don't know whether any of the sc1 changes are now "actually making use" of the EB.
<rjo>
larsc: there is the 4-periodicity. that matches both the EB depth and the "samples-per-fabric-clock" number.
<_florent_>
rjo: do you mean we should remove the EB?
<rjo>
_florent_: i have no idea. in general: if the two sides of the EB always have the same phase then it can be removed.
<rjo>
but i'd probably debug this with stpl first (assuming the EB is after the STPL gen).
<_florent_>
rjo: no, the EB are not used for STPL
<rjo>
ok. then let's still do stpl to ddx between upstream/downstream of the stpl injection point.
<swivel>
win 11
<larsc>
your data path width is 4? you always process 4 samples in 1 clock cycle?
<swivel>
oops
<larsc>
do you have different elastic buffers for different samples?
<larsc>
or all samples through the same elastic buffer?
<rjo>
iirc data path is 4 samples (certainly in the fabric up to the jesd core). i forget whether the jesd core continues then at 4 samples or at 2 samples. and then i don't know it goes. _florent_ is the man.
<larsc>
it looks like half the sample are one clock cycle late/early, which makes no sense if they are always processed 4 at a time
<rjo>
but i think it is 4 throughout including the EBs
<larsc>
even if the order gets messed up, with a data path width of 4 and 4 consecutive samples with the same value there should at least be 2 consective samples in the output that have the same value
<rjo>
sorry, 2 sample wide EB. from the looks of it.
<rjo>
2 EBs per channel. 1 per lane.
<rjo>
and the eb is 4 entries deep.
<larsc>
but 4 conecutive samples would be 1 entry in the EB
<rjo>
yes.
<larsc>
and if you generate 4 samples with the same value the only patterns we'd see are 0000 0001 0011 0111
mumptai_ has joined #m-labs
<larsc>
so at least 2 samples with the same value
<larsc>
even if the order in the EB is messed up
cjbe_ has joined #m-labs
mumptai_ has quit [Remote host closed the connection]
<larsc>
but the pattern we see in the scope is 0101
<larsc>
or rather 1010
<rjo>
yes. what we are seeing is 1010 2121 3232...
cjbe has quit [Ping timeout: 256 seconds]
<larsc>
makes no sense :)
<rjo>
well. i don't know about the octets. that's a binary counter that has the lowest octet 0.
<rjo>
and i don't know if the sequence assignment between the EBs is 02/13 or 01/23.
<rjo>
larsc: the way our EB is implemented is also without any flow control other than reset. it assumes that after a reset the phase won't make excursions beyond depth/2
<larsc>
top is when it works
<larsc>
bottom is when it doesn't
<larsc>
and you can see in the bottom half lane 1 is 1 clock cycle behind lane 0
<larsc>
samples are read in the order A, B, C, D
<rjo>
yes. one lane+EB being one sample deeper than the other
<rjo>
well beyond depth/2 - 1 = 1
<GitHub-m-labs>
[artiq] hartytp commented on issue #794: Joe I'm being daft and measuring the wr on thing. That was never going to work as I was measuring ref to hmc output phase which we can't control. Should have measured phase between hmc7043 outputs! https://github.com/m-labs/artiq/issues/794#issuecomment-394439986
<larsc>
I'd assume the EB doesn't get it's reset properly or the reset has a asynchronous de-assert or something like that
<GitHub-m-labs>
[artiq] whitequark commented on issue #1007: Yes. I tried to fix this purely in the ARTIQ compiler, and it didn't work. Specifically, hoisting invariant loads requires inlining, which requires devirtualization, which is quite hard to implement due to Python's semantics. (We had devirtualization to support compiler-assisted interleaving, but it broke a while ago, and I wasn't successful in fixing it).... htt
<larsc>
does each for your EBs has its own reset synchronizer?