<kc5tja>
whitequark: No worries; when you can. I'm snowed with day-job work myself. Just got back from 12 hours, and got maybe another 7 hours left on the project, but running into intermittent hardware failures that must be resolved by Monday evening or else we lose a major sale guaranteed.
<kc5tja>
No pressure on me at all. Why do you ask? >:O :(
proteusguy has joined #m-labs
<sb0>
hartytp: that gives deterministic timing, but then timing can't be met at the transceiver data pins, which are synchronous to TXOUTCLK
proteusguy has quit [Ping timeout: 246 seconds]
<sb0>
hartytp: the clock running through the CPLL and then through an unsynchronized divider is just the way the Xilinx silicon is wired up
<sb0>
hartytp: for DDMTD skew variations on Ultrascale, it's a bit scary indeed. Xilinx said it is affected by possibly local VCCINT loading. probably the poor VCCINT layout on the current PCB does not help.
<sb0>
hartytp: and FWIW, I did not observe such variations when DDMTD was connected between TXOUTCLK and IBUFDS_GTE3 ODIV2. things looked stable then (random phase at FPGA startup, but stays stable until you reboot it)
<sb0>
hartytp: the timing FPGA / the WR PLL can use the recovered clock from the transceiver block directly, using OBUFDS_GTE3 (a new Ultrascale feature), and not go through the fabric. that may help.
<sb0>
hartytp: having clocks that go to transceivers also go to GC pins via a deterministic-latency clock distribution (e.g. NOT the two Si5324 outputs which are not sync'd) can be used to mitigate DDMTD problems. you could keep the critical clock networks isolated from the noisier ones (also suggested by Xilinx, and my results confirm this can improve things, but they said it was hard to achieve reliably).
<sb0>
lkcl: tbh the asics.ws cores are not very good. i tried using several of them for milkymist a long time ago and systematically ended up throwing them away, swearing, and implementing my own.
proteusguy has joined #m-labs
m4ssi has joined #m-labs
proteusguy has quit [Remote host closed the connection]
<lkcl>
sb0: weell... i get the impression they were released as a way to promote the existence of the business, all those years ago
<hartytp_>
sb0: thanks. So, re TXOUTCLK, the situation is that the transceiver interface logic is clocked from TXOUTCLK, and we want to avoid CDCs between that and the RTIO, so we clock the RTIO logic from it as well?
<sb0>
yes
<sb0>
d_n|a_: do you have an idea why that test is failing from time to time?
<hartytp_>
okay, well that's not a problem -- as you say, the RTIO clock is noisy anyway and will never be ps stable, so siphaser is fine
<hartytp_>
doesn't that drive TXOUTCLK directly from the CPLL refclk, which is what we want anyway?
<hartytp_>
anyway, unless I'm missing something, either approach should work fine in principle, so this is just a matter of code complexity...
<hartytp_>
(see UG576 fig 3-29)
<hartytp_>
also, again out of curiosity, if you were really worried about WR timing performance on a loaded FPGA, couldn't you also use an external CDR chip like https://www.analog.com/en/parametricsearch/10771#/p4670=6000000000|11300000000 and a discrete dual DFF to do the WR
early has quit [Quit: Leaving]
early has joined #m-labs
<sb0>
hartytp_: then you need to enable the transceiver's TX elastic buffer, which doesn't have deterministic latency
<sb0>
hartytp_: siphaser is well, for si chips. the phase adjustment mechanism needs to be changed to the transceiver's TXPI, but the idea remains the same.
<hartytp_>
sb0: "siphaser is well, for si chips. the phase adjustment mechanism needs to be changed to the transceiver's TXPI, but the idea remains the same." right
<sb0>
hartytp_: those external CDRs are all "not recommended for new designs" and I don't think they have any advantage since the transceiver block seems quite isolated from the rest of the fabric on xilinx chips
<hartytp_>
"hartytp_: then you need to enable the transceiver's TX elastic buffer, which doesn't have deterministic latency" okay
<hartytp_>
that wasn't obvious to me. But, I haven't read the UG in too much detail
<hartytp_>
"hartytp_: those external CDRs are all "not recommended for new designs" and I don't think they have any advantage since the transceiver block seems quite isolated from the rest of the fabric on xilinx chips"
<hartytp_>
so, the observation you have is that the quality of the recovered clock seems to be fine on the ultrascale
<hartytp_>
you're just having some issues with the DDMTD depending on how you route the clocks internally to the fpga
<hartytp_>
is that correct?
<hartytp_>
if so (a) why a timing FGPA for WR rather than just a discrete dual DFF for the DDMTD? (this would be easy to add to sayma/metlino if we thought it was worth it as a fallback plan)
<hartytp_>
(b) the case where you saw high jitter/skew variations on the DDMTD was when you routed the recovered clock output to the DDMTD DFF internally to the FPGA, right?
<hartytp_>
and the other DDMTD input was a LVDS input
<hartytp_>
so, the interpretation here is that we have to be careful with how we route the recovered clock to the DDMTD FF, right?
rohitksingh has joined #m-labs
<sb0>
hartytp_: IIRC WR has more time-critical parts than just DDMTD, no?
<hartytp_>
does it?
<hartytp_>
maybe we're talking about different things, but once you've measured the phase error, the rest is just dsp
<sb0>
the idea is to move all the time-critical stuff away from the ultrascale core fabric, and provide flexibility to implement time-critical circuits via the timing FPGA. Ultrascale provides OBUFDS_GTE3 output, and receives cleaned clock on IBUFDS_GTE3
<sb0>
I don't know if one can get away with just a few discrete elements.
<hartytp_>
maybe I'm missing something, but I think of WR as being a phase detector (DDMTD), a loop filter and a DCXO
<sb0>
that's a lot more inflexible, requires significant prior thought, and leaves little margin for error, that's for sure
<hartytp_>
only the phase detector is timing critical
<sb0>
look at what the RTM FPGA does now, compared to the initial idea of using channellink chips
<hartytp_>
I think it's a little different
<hartytp_>
i.e. not talking about doing the loop filter or oscillator control in fixed hardware
<hartytp_>
there are only a few timing critical paths. the recovered clock. If the CDC is done by the ultrascale FPGA then we need to keep that path as short as possible
<hartytp_>
(the recovered clock to the DDMTD DFF that is)
<hartytp_>
then the DCXO to the DDMTD
<hartytp_>
finally, the jitter for the DDMTD DFFs themselves
<sb0>
so there are two hardware PLLs, right?
<hartytp_>
you mean the helper PLL
<hartytp_>
?
<sb0>
one PLL for CDR jitter reduction + the DDMTD helper PLL?
<hartytp_>
yes, but the helper PLL jitter is not particularly important (see the CERN WR papers)
<hartytp_>
since it is common mode to the two arms of the DDMTD
<hartytp_>
so, AFAICT, you can keep that on the main FPGA without having to worry. It's really just those two DFFs and the routing of the clocks to them that matters
<sb0>
hmm maybe, but that really cuts avenues for hacking the board.
<sb0>
if things work then fine, if they don't, well
<hartytp_>
performance will always be better with a discrete DFF than an FPGA
<sb0>
things like what I've been doing with RTM DDMTD won't be possible
<hartytp_>
and, all the real complexity is still implemented on the FPGA
<hartytp_>
you mean implementing DDMTD on things other than the main recovered clock?
<hartytp_>
sure, having a complete FPGA with a bunch of signals attached to it gives more options
<hartytp_>
but, I'm really only thinking about clock recovery for the main reference clock right now. and, for that, I don't see the need for an entire FPGA rather than a dual DFF (which would give better performance anyway)
<sb0>
I'm also worried about the stability of Ultrascale I/Os to the FFs
<hartytp_>
you mean the IOs between the DDMTD FFs and the FPGA? That doesn't matter, since after you've made your phase measurement you have a digital signal
<sb0>
maybe it's because of VCCINT problems, or crossing I/O banks between GC and data pins, but RGMII timing has been a PITA
<sb0>
yes but this still has to meet s/h at the helper PLL frequency
<sb0>
and actually, how will we know if it meets s/h or not?
<hartytp_>
okay, you're worried about the IOs being so bad that we'll have issues meeting S/H on a 125MHz clock
<hartytp_>
?
<sb0>
in any case we need a way to check if s/h is met
<sb0>
it can be oversampling with ISERDES, but then one need a circuit to align that BUFGE_DIV due to the TPWS bug
<sb0>
I suppose your way can be done, but it's riskier and inflexible and we have to work around a bunch of ultrascale crap
<hartytp_>
I assumed that at 125MHz we could do that with trace length matching and proper timing constraints at the FPGA
<hartytp_>
but, I agree it needs some thought
<hartytp_>
anyway, taking a step back, do we actually have evidence that this is needed? Your results suggest that there is no issue with DDMTD when both inputs are from IOs in the same bank
<sb0>
one of the inputs is IBUFDS_GTE3 and there is an issue then
<sb0>
you can replicate on a GC with an external clock buffer
<sb0>
but again. it's risky. and xilinx said it was risky.
<hartytp_>
"you can replicate on a GC with an external clock buffer"
<hartytp_>
why GC? It's only used as a data input for an IOB FF
<sb0>
at least one needs to be a GC for clocking the helper PLL
<hartytp_>
in the current Sayma schematics, both DCXOs go to IOs in the same bank (schematic page 20)
<hartytp_>
for exactly this reason
<hartytp_>
the main DCXO also drives a MGTREF pin
<sb0>
the question is, do we want a neat and optimized but risky design, or an overkill but hackable FPGA
<hartytp_>
"but again. it's risky. and xilinx said it was risky." why is that risky?
<sb0>
can your FF solution be prototyped on existing hardware?
<hartytp_>
one thing I'm not clear about is how best to get the recovered clock out of the FPGA and to the DDMTD FF
<sb0>
OBUFDS_GTE3
<sb0>
it can be used on Sayma uFLs
<hartytp_>
does that come to a MGTREF pin?
<sb0>
yes, on US the MGTREFs can be used as outputs
<sb0>
the transceiver is actually the only US I/O feature that was improved
<hartytp_>
okay, so the suggestion is to tie one of the MGTREF IOs to an IO in bank 66 (where the DCXOs go)?
<hartytp_>
should I open an issue about that?
<sb0>
don't you want to send it to the FFs?
<hartytp_>
The best design is the simplest
<hartytp_>
my preferences are
<hartytp_>
1. do everything on the AMC FPGA
<hartytp_>
2. if and only if that doesn't work, add an external DFF
<hartytp_>
3. consider adding a separate "timing FPGA"
<sb0>
but it's not really the simplest. you'll have to test e.g. with a lot of loading in the US FPGA
<sb0>
it has the fewer external components, sure, but there are many tricks under the hood
<hartytp_>
WR isn't on the critical path for Sayma, so I'm okay with using Sayma v2.0 as a prototype environment for it
<hartytp_>
okay, we can ask Greg if he's able to add the DFF and use solder jumpers to allow us to try both options
<hartytp_>
then we can have a play with it
<hartytp_>
yes, testing under load will be an essential part of this and is something we're planning to do
<hartytp_>
or, if we want something we can hack, we could add solder jumpers + UFLs to allow us to hack in a small external PCB with either a FPGA or FFs
<sb0>
it's totally fine if we can protype it and test it properly, I think
<hartytp_>
basically, just make sure that Sayma v2.0 has sufficient connectivity to allow us to play around with this
<hartytp_>
it's hard to prototype with the current hardware, since we don't have any boards with the right connectivity
<hartytp_>
well, I guess I could try with our KC705 setup
<hartytp_>
but, that will take some time to get the parts in place
<hartytp_>
if we're serious about having Sayma v2.0 off in a week or two then I don't think that's an option
<sb0>
but if the hardware has to be rushed then the timing fpga is the most flexible option
<sb0>
on Sayma we have the RTM FPGA, and there's the option of routing a backplane link to it
<sb0>
or maybe even the DRTIO recovered clock from OBUFDS_GTE3, actually
<sb0>
if we do that we might not even need WR on the AMC
<hartytp_>
sb0: what is the timing issue with ultrascale IOs?
<sb0>
which timing issue?
<hartytp_>
is there an issue if the IOs go straight to an IOB FF clocked from a signal in the same bank?
<hartytp_>
what were you worried about with getting the external FF inputs to meet timing?
<sb0>
1. timing variations between P&R runs (might be OK with vivado constraints + using same bank, but NEEDS PROTOTYPING) 2. checking that s/h is met, which can be done with ISERDES oversampling but runs into TPWS bug that needs some annoying hacks to work around
<hartytp_>
adding a new FPGA at this stage of the design is a big ask. particularly given that WR is not on the critical path for sayma. I don't think it's likely to happen
<hartytp_>
well, my preference would still be to start implementing WR inside the AMC FPGA.
<hartytp_>
test carefully under load (we plan to do that anyway)
<hartytp_>
then make sure that Sayma is hackable (e.g. has relevant signals easily accessible on coax connectors) so that if we find performance issues, we can spin up a small PCB with FFs/an FPGA and hack that in
<sb0>
sure. but all those tests also take time. do they take less time than adding a small fpga to metlino?
<hartytp_>
I have no interest in Metlino for the time being
<sb0>
okay, so why are you asking about the timing FPGA? on Sayma we have the RTM FPGA that can be used for that
<hartytp_>
so, I'm not going to comment on that, apart from expressing the view that we would be better off getting Sayma to work really well before spending time on taht
<hartytp_>
"okay, so why are you asking about the timing FPGA? on Sayma we have the RTM FPGA that can be used for that"
<hartytp_>
is your point here that you want to not worry about WR on the AMC FPGA, but only on the RTM?
<sb0>
yep. and the RTM WR could even clock the AMC if needed.
<hartytp_>
since the AMC FPGA sits between the RTM and the master, aren't you worried about it messing up the recovered clock?
<sb0>
OBUFDS_GTE3 direct to RTM
<hartytp_>
i.e. isn't the transceiver that goes to the RTM clocked from the AMC recovered clock
<sb0>
and/or backplane link direct to RTM
<hartytp_>
"OBUFDS_GTE3 direct to RTM" LVDS over the AMC to RTM connector?
<sb0>
it is, but when we have det-lat and flexible phase, we can retime stuff
<hartytp_>
not sure about that due to cross-talk etc
<hartytp_>
(one nice thing about clock recovery using MGTs is that you're not sensitive to pick up at the clock frequency)
<sb0>
OBUFDS_GTE3 outputs CML
<sb0>
ad why is a clock more of a problem than a transceiver link?
<hartytp_>
any pick up near the clock freq will kill you. working at higher frequencies makes that less of an issue
<sb0>
isn't that just a problem of CDR/PLL bandwidth?
<hartytp_>
I'd consider trying to route the recovered clock directly really quite high risk
<hartytp_>
sb0: yes, but if you have things like LVDS links operating at the same frequency then you have effects which are basically DC so aren't removed by the loop
<hartytp_>
scrambled transceivers are nice from that POV (at least, that's my current understanding)
<sb0>
then can we avoid such LVDS links?
<hartytp_>
nope
<hartytp_>
"and/or backplane link direct to RTM" on the AMC backplane? My interest is using sayma with SFPs, so I won't have that BP
<sb0>
what about running them out of phase with the clock?
<sb0>
it's for those ADCs, right?
<hartytp_>
there is probably something we can do, but it will need testing and thought. I wouldn't consider it clearly lower risk than anything else
<hartytp_>
anyway, I'm concerned that this is trying to fix a problem that doesn't exist
<sb0>
the worst thing that can happen is WR and ADCs can't be used at the same time
<hartytp_>
if we do the AMC DDMTD with two IOs in the same bank, that will probably work
<sb0>
Xilinx said it was risky and unreliable.
<hartytp_>
why?
<hartytp_>
what's risky?
<hartytp_>
timing of IOB FFs? If that's not rock solid then I agree we're in a bad place
<sb0>
due to things like crosstalk within the fabric (which depends on vivado routing paths) and coupling VCCINT loading
<sb0>
with external FFs that should not be a problem, but it's not all easy either
<sb0>
I would still route a OBUFDS_GTE3 to the RTM, then we have the option of using it or not
m4ssi has quit [Ping timeout: 240 seconds]
m4ssi has joined #m-labs
rohitksingh has quit [Remote host closed the connection]
rohitksingh has joined #m-labs
<hartytp_>
"I would still route a OBUFDS_GTE3 to the RTM, then we have the option of using it or not" sounds like a good idea
<hartytp_>
but, I don't see an issue with an external DFF
<hartytp_>
and add a timing constraint for the IOs
<hartytp_>
125MHz source-synchronous interface, pretty standard isn't it?
<sb0>
how do you check if it meets s/h or not?
<hartytp_>
verify the DFF timings with a scope
<sb0>
after each place and route?
<hartytp_>
we can verify timings at the FPGA inputs with a scope
<sb0>
on each board to check for process/temp variation?
<hartytp_>
then add a timing constraint
<hartytp_>
so, we add constraints based on the timings at the FPGA inputs
<hartytp_>
so the only PVT variation you'd be worried about is the DFF
<sb0>
this will fail in an obscure manner if s/h isn't met. this definitely needs a clear software diagnostic.
<hartytp_>
propagation delay
rohitksingh has quit [Ping timeout: 246 seconds]
<hartytp_>
sb0: just to be clear, what is your concern here?
<hartytp_>
are you worried that the timing of the external FF will vary significantly?
<hartytp_>
or, are you concerned that even if we specify a proper timing constraint at the FPGA input we will still get S/H violations?
<sb0>
"what is your concern here?" << a board fails sync and we have absolutely no idea what is going on, followed by days wasted looking around with a scope
<sb0>
complex stuff like sayma needs BISTs on everything
<hartytp_>
okay, well, we can definitely add a sw/gw diagnostic for S/H violation verification
<hartytp_>
more work, but doable
<hartytp_>
or, if you really feel that ultrascale is so much more reliable than other FPGAs then I don't object to adding a separate FPGA
<hartytp_>
if you can convince Greg to do that
<hartytp_>
maybe with a sufficiently simple FPGA it's a quick/cheap thing to do
<sb0>
if we can route time-critical paths through the RTM FPGA alone, I think we have enough options
hartytp has quit [Ping timeout: 256 seconds]
<hartytp_>
but, we can't do that right now apart from using the RTM<->AMC connector, which may well add too much noise
<hartytp_>
okay, I think I've written up notes on this.
<hartytp_>
it would be good if you could double check you're happy with the AMC/RTM schematics for v2.0
<sb0>
I remain skeptical that this connector would produce unacceptable and impossible to mitigate noise levels on clock signals, but transceiver links would be fine
<hartytp_>
the thing about the transceivers is that there isn't much at 10G apart from white noise, so it's not such a problem
<hartytp_>
anyway, this is the kind of thing that's best settled with data
<hartytp_>
maybe Greg will be happy adding a cheap timing FPGA, in which case none of this should be an issue...
<sb0>
the timing fpga sounds like a bit too much on sayma
<hartytp_>
well, if we assume that (a) WR on ultrascale is a no-go (b) the pickup/cross-talk in the AMC<->RTM connector is too bad to make WR useful on the RTM without WR on the AMC as well (c) we want WR on Sayma
<hartytp_>
then we don't have a huge number of other options
<hartytp_>
FWIW, the LVDS ADCs are a high priority for us, so there will be lots of traffic on the AMC<->RTM connector around f_rtio
<hartytp_>
(unless we do the servo on the RTM FPGA and then implement a scrambled link to the AMC, but that seems like a lot of work)
<hartytp_>
"and the worst that can happen is WR+ADC has degraded clocking performance" well, if it gets significantly degraded then there isn't really any point to it
<hartytp_>
we could just stick with the Si5324. WR is only interesting if it's very high performance
<hartytp_>
it might be okay, we just need to test carefully. I'm sceptical because I've been bitten by this kind of thing plenty of times before
<hartytp_>
shifting our clock to higher frequencies gives us a lot of noise immunity, and we've already demoed that it can work with <ps stability
<hartytp_>
using a KC705. Anything else needs testing
<hartytp_>
"those signals are rather slow, put them 180 degree out of phase with the clock and then there shouldn't be crosstalk?" hmm...might work, but you have to be careful since there is a source clock from the FPGA and a returned clock from the ADC
<hartytp_>
and, in any case, if you start playing those kinds of tricks, it's not at all clear to me that this is less work than adding a dual FF to Sayma AMC as well as some logic to sort out S/H when the inputs are latched
<hartytp_>
but, I still don't really understand why you are so worried about the external FF approach
<hartytp_>
e.g. it's quite like SUServo
<hartytp_>
we have a source-synchronous interface with well-defined timing and low PVT variation (in this case, it's just the DFF timing that matters, which can be very well specified)
<hartytp_>
if we can't get that to work for WR then why would SUServo work? We don't currently have complex S/H violation logic for SUServo, when it's exactly the same thing.
<hartytp_>
e.g. fanout the helper PLL clock with one of those ADCLK clock buffers. Worst-case propagation delay skew between outputs over PVT is 10s of ps (max total propagation delay is only 125ps)
<hartytp_>
as the FF use an MC100EP29
<hartytp_>
again, worst-case propagation delay through the FF is 500ps. So the timings are all rock solid comapred with the 8ns period of the rtio clock. So, if that doesn't work easily then we're in all kinds of trouble
<hartytp_>
"those signals are rather slow, put them 180 degree out of phase with the clock and then there shouldn't be crosstalk?" oh, and there are also reflections, etc to worry about
cedric has quit [Ping timeout: 255 seconds]
cedric has joined #m-labs
cedric has joined #m-labs
m4ssi has quit [Remote host closed the connection]
mauz555 has joined #m-labs
proteusguy has joined #m-labs
mauz555 has quit []
hartytp_ has quit [Quit: Page closed]
lkcl has quit [Read error: Connection reset by peer]