<sb0>
the issue with the hmc7043 is quite simple: we need the "infinite" digital delay (=slip) to be able to reach all possible SYSREF phases
<sb0>
triggering a slip sometimes works, sometimes not. the effect of this bug on the scope traces is quite obvious: the SYSREFs to the two DACs go completely out of phase, even though the ARTIQ firmware applied the same slips to them
<sb0>
in addition to the long list of other problems, but that's the only one we have not managed to work around
<sb0>
hartytp_: have you checked that the hmc830 doesn't have similar phase uncertainty as the ADF?
<sb0>
measuring the PLL phase with the Artix-7 FPGA should be ok, but we need to make sure we are not driving the input with frequencies above spec, which potentially has negative effects (like hmc7043 noise)
<sb0>
so the signal needs to be gated in hardware and disabled at high rates (where the hmc830 should be deterministic anyway)
<sb0>
maybe add a discrete DFF that samples the DAC clock, sends it to the FPGA, and is clocked by the FPGA?
<sb0>
rjo: my understanding was, we were going to use the clock backplane with RFSYNCIN, the HMC7043 would work properly, and we would sync RTIO to the HMC7043
<sb0>
but there is no clock backplane and nothing to drive it
X-Scale has quit [Ping timeout: 244 seconds]
proteusguy has quit [Ping timeout: 240 seconds]
<sb0>
hartytp_: agreed, stripping it to simpler and debuggable discrete components sounds good
<sb0>
also the PLL phase measurement is, in theory, only needed for 600MHz, so if it breaks, another way out is to use 1.2GHz and turn on DAC interpolation
proteusguy has joined #m-labs
<sb0>
hartytp_: if you're testing the hmc830, also get some stats on what output phases it produces at 600MHz, i.e. if we can actually reset it to change the phase reliably
X-Scale has joined #m-labs
balrog has quit [Quit: Bye]
balrog has joined #m-labs
lkcl has quit [Ping timeout: 240 seconds]
<sb0>
rjo: are the windows returned by tune_sync_delay() supposed to be 0?
<rjo>
hartytp_: i consider the custom_sync approach a bit more risky than you do. and i appear to have higher expectations about the hmc7043.
<rjo>
sb0: have you considered that you don't strictly need the infinite slip. you only need to align sysref to rtio, not to a submultiple. (i.e. max delay 16 2 GHz cycles using coarse delay) and the rest of the phase can be done in gateware (just on the sysref handling path).
<rjo>
sb0: and there are some comments that you added that leave questions on my side.
tmbinc2 has quit [Ping timeout: 246 seconds]
<rjo>
sb0: if you, hartytp_, and joe are set on ditching the hmc7043, i am not going to stand in the way. there seems to be so much personal frustration with the hmc7043 that make it look untenable from a non-technical perspective.
lkcl has joined #m-labs
tmbinc2 has joined #m-labs
<rjo>
hartytp_: i don't have the doc anymore. closed that tab yesterday night.
cedric has quit [Ping timeout: 240 seconds]
cedric has joined #m-labs
cedric has quit [Changing host]
cedric has joined #m-labs
rohitksingh has quit [Remote host closed the connection]
<rjo>
sb0: and i don't think it would be easy with the rf backplane. rfsync would have to be delay-tuned as well.
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<rjo>
not having to rely on brute forcing the pll output phase is another argument.
<rjo>
sb0: in theory the sync windows should be more than 4.
<GitHub-m-labs>
artiq/master 390f05f Sebastien Bourdeauducq: firmware: use smoltcp release
forrestv has joined #m-labs
<hartytp_>
sb0: "have you checked that the hmc830 doesn't have similar phase uncertainty as the ADF?"
<hartytp_>
short answer: "no" this is something we haven't done for any PLL. Effectis like this will be there for any choice of PLL to an extent (they're not perfect devices) and we need to characterise whichever PLL we choose
<hartytp_>
but, it's an independent issue to the SYSREF/SYNC so let's put it to one side for now
<sb0>
rjo: okay, the zero window is reproducible on kasli-1
<sb0>
rjo: yes, rfsync would have to be tuned. as I said: "nothing to drive the RF backplane"
<sb0>
rjo: getting stable phase delta out of the PLL at low frequency is a much smaller mess than the pile of HMC7043 issues and workarounds
<sb0>
it's also something that can be measured on an oscilloscope, unlike obscure hmc7043 internal state
<sb0>
rjo: so your proposal to use the hmc7043 with DAC/RTIO sync is to have SAWG reorder samples depending on the phase between the RTIO counter and the SYSREF generated by the HMC7043?
<sb0>
rjo: this would cause DAC latencies that vary between reboots, no?
<sb0>
hartytp_: if the HMC830 isn't better than the ADF for this particular characteristic, then there is no reason to use it
<hartytp_>
sb0: "if the HMC830 isn't better than the ADF for this particular characteristic, then there is no reason to use it"
<hartytp_>
I don't see a strong argument for either PLL
<hartytp_>
both lock reliably when eval boards are hooked up to kasli
<hartytp_>
both have somewhat poor documentation and quirks/bugs
<sb0>
hartytp_: well, the hmc830 needs a power cycler
<hartytp_>
the ADF has the option of divided feedback, that that has side effects (low pfd freq hence worse noise and potentially worse phase drift)
<sb0>
and there is this divider issue
<hartytp_>
the ADF is generally worse noise than the HMC830
<hartytp_>
and the HMC830 has been better tested for us, so we have seen more of the bugs
<hartytp_>
the power cycler is a non-issue as far as I'm concerned
<sb0>
sunk costs fallacy? :)
<sb0>
anyway, the 830 should be tested for this nasty effect as well
<hartytp_>
and, in any case, we can't say that a power cycler is definitely not needed for the ADF. I haven't tested it enough to know that the SPI machine can't be bricked by abuse. And, we know that this kind of thing affects other chips like the AD9910 as well. It's not only the HMC830
<sb0>
yep, modern silicon is a race to the bottom
<hartytp_>
with correct init sequence and no aborted SPI transactions I don't believe I've had any issues with the HMC830, so I'd put it on a par with similar chips
<sb0>
ok
<hartytp_>
I'd argue that we shouldn't make changes unless we have a strong reason to feel that they will offer an improvement, otherwise we're just wasting time with fresh bug finding. Having tested it and started to understand its quirks, I don't see a strong argument for the ADF
<hartytp_>
re output divider: (1) for the HMC830 we shouldn't need it since interpolation is plan A. It's just risk mitigation for DAC bugs
<sb0>
so, we can keep the 830 and add hardware to measure the output phase in the FPGA
<sb0>
DFF cloked by FPGA as I explained above?
<sb0>
*clocked
<hartytp_>
(with the ADF that's not true since the VCO min freq is 3.4GHz so the divider is always needed. For the HMC830 it's 1.5GHz. That also means that manually resetting the ADF PLL is harder since the phase steps are smaller)
<hartytp_>
sb0: sure, let's add a DFF. However, the next thing on my to do list is to demo that I can use the DAC to measure this phase and synchronise the HMC830
<hartytp_>
so, the DFF is just risk mitigation
<hartytp_>
well, it gives us other options, but it's not strictly required
<sb0>
the DFF should be plan A, it's a scheme that's simpler to understand, simpler to debug, and keeps problems compartmentalized
<sb0>
otherwise, proper operation of the PLL depends on proper operation of the DAC. annoying...
<hartytp_>
well, it's cheap to add and we can play with it when the hw arrives. I'd still like to test using the DAC to do this with the current hw, so we at least know that one approach works before committing
<hartytp_>
rjo: "hartytp_: i consider the custom_sync approach a bit more risky than you do. and i appear to have higher expectations about the hmc7043"
<hartytp_>
what risks in particular do you have in mind? AFAICT most risks for the custom sync approach apply equally to the HMC7043
<hartytp_>
I've already kind of demonstrated the custom sync approach (see git hub issue)
<hartytp_>
I got rid of phase indeterminism at the level of 1 DAC clock
<hartytp_>
there was something odd going on at the level of multiple rtio clock cycles
<hartytp_>
but, afaict, that's not something that can be caused by or fixed by the SYSREF signal
<hartytp_>
also, the ramp gen put out garbage at that point, so there could have been a connection
<hartytp_>
anyway, AFAICT, the DFF + delay line/HMC7043 are pretty comparable in terms of complexity and risk. The approach I've advocated for puts the complexity in places I'm most comfortable debugging (and can directly access with a scope)
<sb0>
rjo: have you tested urukul 9912 recently? it doesn't produce any RF output with latest master
<sb0>
RF switch LEDs are on
<sb0>
using kasli_tester
<sb0>
ffs my kcu105 no longer powers up
<sb0>
If the Power ON LEDs are not lit at power on, you may need to reprogram the Maxim Integrated Power Controllers on your KCU105.
<sb0>
This can be done using the Maxim Integrated PowerTool software package, and the Maxim Integrated Dongle.
<sb0>
is there some association between ultrascale and annoying power ICs or what...
<sb0>
The Maxim Integrated Power Controllers on the KCU105 can be reprogrammed. This is the first debug that should be taken if power issues are discovered on your KCU105.
<sb0>
this sounds like this thing is breaking regularly
<sb0>
"The MAX20751E device can be reprogrammed a limited number of times (4)."
<sb0>
even worse than the sayma stuff
<sb0>
yep, all voltages controlled by this maxim thing are dead...
<sb0>
whitequark: if the artiq firmware moves to the new rust async/await, we can get rid of the modified lifringe?
<sb0>
whitequark: would there be other hacks needed like that libfringe modification? how hard would it be in general?
<whitequark>
sb0: yes, we can get rid of libfringe entirely
<whitequark>
it would not be very hard, about as complex as the time when I redid IO error handling
<whitequark>
and this is something I was always planning to do
rohitksingh_work has quit [Read error: Connection reset by peer]
<GitHub-m-labs>
[artiq] whitequark commented on issue #1125: It's a bit hard to say from the capture. Can you acquire a core device log? It should clearly state why the connection was reset, at least. https://github.com/m-labs/artiq/issues/1125#issuecomment-456805225
<GitHub-m-labs>
[artiq] whitequark commented on issue #1125: Aside: I wonder if it would make sense to broadcast core device logs via UDP so they are associated with captures. This would make tracking down TCP issues much easier. https://github.com/m-labs/artiq/issues/1125#issuecomment-456805589
<sb0>
rjo: okay I see your workaround for the hmc7034 broken phase slip. you have a 8-level shift register at the output of the SAWG, and you select one of the entries depending on the 3 LSBs of the RTIO counter when the SYSREF pulse arrives
<sb0>
this adds the right amount of latency to the DAC output data
<sb0>
it's 64ns of extra latency, but I guess that works
_whitelogger has joined #m-labs
<hartytp_>
sb0: haven't tried with the new vivado yet.
<hartytp_>
how confident are you that it's a vivado bug and not some other intermittent code issue?
<rjo>
hartytp_: is it clear how greg is going to test jesd204b with the xilinx cores? just no sysref?
<rjo>
hartytp_: how much chance of using the xilinx cores does this preclude? the custom sync also makes using Sayma very ARTIQ/RTIO specific.
<rjo>
hartytp_: additional differential risks over other solutions (beyond those mentioned): unstable delays between the deviceclk path and the sysref path (due to PVT or chip quirks). drive level issues/impedance/coupling. brute forcing pll phase may not be reliable. other chip quirks in the delay lines or even the flip flops.
<rjo>
sb0: iirc it is not necessary to compensate for the overall LMFC phase w.r.t. RTIO (meaning compensate more latency than the fpga-deviceclk to RTIO phase). just make sure that (during SYNC) the jesd core is able to react to sysref deterministically. then the latency is deterministic.
<rjo>
sb0: ad9912 works on 4.0.
<rjo>
hartytp_: i feel we have discussed all the risks of the solutions. i just weigh them differently and assign different impact/likelihoods.
<sb0>
rjo: 4.0 release? okay, i'll bisect tomorrow. could be the compiler change that broke ftw
<rjo>
hartytp_: i am not trying to convince you. that's not my job. you have to do that yourself based on the input and data you have ;)
<sb0>
rjo: the HMC sync is also a "custom" sync, and a quirky and buggy one.
<sb0>
programming the HMC chip isn't easier than programming the delay lines + generating a square wave
<sb0>
hartytp_: as I said, it's hard to tell, so give it a try - we need more data...
<rjo>
no. it's not custom because it doesn't use a custom/discrete implementation of sysref generation. instead it uses a standard one that you can use with the xilinx cores. the sync-to-rtio with the frac-pll style measurement would be "custom" but it is just custom gateware and software.
<sb0>
rjo: I don't see any xilinx core for programming the hmc7043
<sb0>
anyway we can give this technique a try, it doesn't seem very hard to implement on v1.0 and it reduces the hw risk if it works
<sb0>
it'll be annoying to rewrite the kernel code with the various workarounds to program that piece of junk though
<sb0>
speaking of junk silicon, the ultrascale bidirectional LVDS issue is even worse than what the documentation says
<sb0>
"There are bias lines in the SelectIO banks that control the true-differential buffer output's common mode and differential voltage levels. When an IOBUFDS or OBUFTDS toggles their tristate control they can cause a disruption to these bias lines and for up to 1us. You can see a disruption to all IOBUFDS and OBUFTDS buffers in the same bank."
<whitequark>
where's that from?
<sb0>
xilinx tech support
<sb0>
i don't even know why the only mentions of that bug are completely hopeless STA parameters for individual IOBUFDS components, and a similarly large value in the datasheet
<sb0>
it's clearly not just a normal slow timing path
<sb0>
all ultrascale and ultrascale+ chips are affected btw
<sb0>
so, basically just don't do bidirectional LVDS with them
<rjo>
afaict in ice40 it's even worse. there is no runtime lvds direction switching at all.
rohitksingh has quit [Ping timeout: 240 seconds]
<whitequark>
ice40 mostly does not natively support lvds
<whitequark>
all LVDS outputs are constructed using single-ended outputs and resitsors
<rjo>
no. there is a differential lvds input. that's all that's needed if it could be combined with standard output.
<whitequark>
there's lvds input in bank 3, yes
<whitequark>
but it requires a different termination network than lvds outputs
<rjo>
exactly. there is native lvds support. the termination is standard and is in-out compatible.
<rjo>
there is a termination network that works for both input and output.
<whitequark>
ah, i was missing that
<whitequark>
i'm actually not sure if SB_IO can have IO_STANDARD SB_LVDS_INPUT and still have OE
<whitequark>
documentation never explicitly prohibits this
<rjo>
the second input is implicit. for output it would need to be explicit.
<whitequark>
yes. but is this a silicon limitation or tool limitation?
<rjo>
it's currently definitely a tool limitation in all tools. don't know about silicon.
hartytp_ has quit [Quit: Page closed]
<rjo>
it would be a very stupid tool limitation if it's not mandated by silicon.
<whitequark>
well, i'm curious now if it *is* a silicon limitation.
rohitksingh has joined #m-labs
<whitequark>
speaking of documentation, i like how ice40 refers to certain properties of their IOBs in terms of their physical location on the chip
<whitequark>
and there is not a single document mapping those to pins
<whitequark>
i looked through them all and had to go through the RE database
<whitequark>
not sure how their customers manage
m4ssi has quit [Remote host closed the connection]
rohitksingh has quit [Remote host closed the connection]
mumptai has joined #m-labs
Gurty has quit [Ping timeout: 264 seconds]
Gurty has joined #m-labs
Gurty has quit [Changing host]
Gurty has joined #m-labs
lkcl has quit [Read error: Connection reset by peer]