sb0_ changed the topic of #m-labs to: https://m-labs.hk :: Logs http://irclog.whitequark.org/m-labs
X-Scale has quit [Ping timeout: 245 seconds]
X-Scale has joined #m-labs
mauz555 has joined #m-labs
mauz555 has quit [Client Quit]
_whitelogger has joined #m-labs
ohsix has quit [Ping timeout: 268 seconds]
ohsix has joined #m-labs
<bb-m-labs> build #2212 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2212
kc5tja has joined #m-labs
<bb-m-labs> build #2213 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2213
<sb0> how the fuck does an ultrascale FPGA take almost a microsecond to turn around a LVDS I/O buffer direction?
<sb0> what a piece of junk
<bb-m-labs> build #986 of artiq-win64-test is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/986
<bb-m-labs> build #2800 of artiq is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2800
rohitksingh_work has joined #m-labs
X-Scale` has joined #m-labs
X-Scale has quit [*.net *.split]
shuffle2 has quit [*.net *.split]
X-Scale` is now known as X-Scale
shuffle2 has joined #m-labs
<bb-m-labs> build #2214 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2214
<bb-m-labs> build #2215 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2215
<bb-m-labs> build #2801 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2801 blamelist: Drew <drewrisinger@users.noreply.github.com>, David Nadlinger <code@klickverbot.at>
<bb-m-labs> build #2216 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2216
<bb-m-labs> build #2217 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2217
<bb-m-labs> build #2802 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2802 blamelist: Sebastien Bourdeauducq <sb@m-labs.hk>
futarisIRCcloud has joined #m-labs
<sb0> hartytp: months ago you wanted to remove the LOC constraints for Ethernet on Sayma and proposed alternative RGMII designs. have you made any progress on that?
m4ssi has joined #m-labs
Hartytp has joined #m-labs
<Hartytp> Sb0: wow that is crap
<Hartytp> No, no work on Ethernet. Too many higher priority bugs :(
<sb0> it seems there are some bitstream builds where ethernet does not work (a small proportion)
<Hartytp> Ethernet A’s is wild unlike the rest of sayma so I didn’t prioritise poking it
<sb0> could be because of the large non-reproducibility of I/O timing on US (several ns)
<Hartytp> Okay I didn’t know that.
<sb0> unfortunately RGMII does not have any auto-calibration mechanism...
<sb0> LOC, BUFG and I/O registers works fine on 7series and other FPGAs
<sb0> on US, you're getting nanoseconds of variations :(
<whitequark> sb0: can you floorplan it?
kc5tja has quit [Ping timeout: 244 seconds]
<sb0> I'm not sure what is the proper way to fix that - some vivado incantations to set timing constraints, or have only IOSERDES like what you propose, though there may still be large clock skews between those
<whitequark> sb0: maybe if we can run artiq on nmigen, that could go in a predefined rectangle on the fpg?
<whitequark> *fpga
<sb0> whitequark: the PLL is locked to a certain site and then it feeds a global clock buffer
<sb0> on other FPGAs there are only small skew variations on the global clock buffer and they are the same between P&R run for a given I/O site
<sb0> but not on ultrascale
<whitequark> so where is the problem exactly?
<whitequark> how can there be skew variation between global buffer and I/O site?
<whitequark> that sounds absurd, this is the first thing you wouldn't want your IO pads to have
<sb0> ultrascale routing
<sb0> yeah sure, but it's ultrascale
<whitequark> so why do people use ultrascale? just because it's got more cells?
<sb0> the FPGA that takes 900ns to switch IO directions and cannot synchronize clock dividers
Hartytp has quit [Ping timeout: 256 seconds]
<whitequark> because i listen to you and conclude that it barely works at all
<sb0> yeah, ultrascale chips are larger, and a bit faster
<sb0> we can also avoid RGMII and use the transceiver only
<sb0> then it's just the regular GT crap
<sb0> other interfaces are either self-calibrating (DDR) or slow enough that a few ns do not matter
<whitequark> rgmii is 125 MHz DDR, right?
<whitequark> does that even need a transceiver? the selectio (iirc) thing? i thought xilinx has one of those simple gearboxes in each pin, like ecp5?
<sb0> I mean, connect the transceiver directly to the 1000BASE-X interface without a PHY
<whitequark> ah
<sb0> yes, it's 125MHz DDR, so 1UI = 4ns
Hartytp has joined #m-labs
<whitequark> and there is single ns of skew variation between pnr runs?
<sb0> add manual calibration that is probably not optimal, jitter, and then a few ns of ultrascale bullshit, and then it's possible that some bitstream builds do not work
<sb0> it could be other bugs in the PHY chip too, I haven't investigated this issue much yet
<Hartytp> Could we have used a good old k7 for sayma originally or did we need us?
<sb0> we could have used k7. joe and greg pushed for us.
<Hartytp> Sb0: fwiw I’d like to prioritise the ramp gen issue if possible
<sb0> tbh I didn't expect it to be that bad
<Hartytp> It’s plausible that the sync issue I had are related to whatever is causing the ramp gen issues
<Hartytp> Would be nice to retest my sync demo on a design which produces correct waveforms
<sb0> Hartytp: try the DRTIO satellite design
<sb0> that one produces correct waveforms, for some reason
<Hartytp> Oh it does? Didn’t realise that.
<Hartytp> Do you think this is some cdc/reset/whatever issue in how jesd is hooked up with artiq?
<sb0> no idea
<sb0> maybe that, maybe PI/SI
<Hartytp> maybe but my money is not on pi/si but I could be wrong like last time!
<sb0> the glitches seem to appear at the 150/125MHz beat
<Hartytp> I would be surprised to have pi/si issues that are so repeatable and deterministic
<sb0> changing the 125MHz seems to change the glitch pattern
<Hartytp> And don’t cause other issues like code crashes
<sb0> though more measurements would be good, and also look at the 125 and 150 clocks on the scope simultaneously
<sb0> if this is trashing the transceiver only, it won't crash any code
<Hartytp> Remind me: the 125MHz is only used for the cpu and not at all for the fabric or anything else, right?
<sb0> cpu, sdram, part of the rtio code
<Hartytp> Which part of rtio?
<sb0> the end that talks to the cpu and the dma core
<Hartytp> Ack so pretty decoupled from sawg
<sb0> oh, another difference between satellite and master is (ha) ethernet
<Hartytp> And jesd
<sb0> also running things at 125MHz
<sb0> and using several FPGA clocking resources
<Hartytp> If there were si or pi I’d naively expect some data corruption
<Hartytp> What we have looks more like samples in the wrong order
<sb0> again, not if that affects the transceiver only
<sb0> on xilinx fpgas the transceiver is quite decoupled from the rest of the chip
<sb0> has its own supplies etc.
<Hartytp> Not just data corruption in the fw but in the ramp gen data
<sb0> and look. every single component of sayma has had some bug except the transceiver. that part has been less reviewed...
<Hartytp> Might expect to see noise around the glitches or some non determinism
<Hartytp> Ack I’m not ruling it out, just thinking about what seems most likely
<Hartytp> Anyway as always this needs tracking down so we’ll need to come up with a plan at some point
<Hartytp> We can ask creotech to review the transceivers carefully now
<sb0> ah btw, it should be possible to kill all 125MHz clocks while the ramp gen keeps running
<sb0> once jesd is initialized it doesn't need the system anymore
<Hartytp> Okay that would be interesting to do
<Hartytp> Nice that would rule out some things
<sb0> I also changed the 125M frequency and that seemed to have some impact on the glitches (see issue) but better measurements would help
Hartytp has quit [Ping timeout: 256 seconds]
Hartytp has joined #m-labs
<Hartytp> ?
<Hartytp> it's not clear to me how significant that is as I'm not sure what we're looking for, but I see what you mean
<Hartytp> what would a better measurement be? More careful measurement of the temporal separation between the glitches?
<sb0> hartytp: More careful measurement of the temporal separation between the glitches? << yes
<sb0> at 125M the glitches seem to be spaced by 40ns, which is the beat frequency between the 125M and 150M clocks
<Hartytp> what's limiting the accuracy of your measurements?
<Hartytp> (I can do this carefully, but will take some time to set up as my hw is in pieces right now, so if you can take the data, that will be the quickest path)
cr1901_modern has quit [Read error: Connection reset by peer]
<sb0> same as you: will take some time...
<sb0> and my hw is also in pieces as I'm constantly fighting 3.3V and JTAG bugs
<Hartytp> sb0: ack
<sb0> e.g. one of the boards just developed a cool new bug, "Error: Cannot enable write to flash. Status=0x00000000"
<sb0> accompanied by random reboots. look like a power supply failure of some sort
<Hartytp> too much start-of-year paperwork keeping me out of the lab :( will look at this once I've tested the PLL (wired up and code written, just need to plug into Kasli and test)
<Hartytp> sigh
<sb0> yes, that's definitely a power supply failure. when the flashing breaks, all the power LEDs turn off for a brief time
<Hartytp> would be interesting at some point to sketch out what Sayma as an EEM would look like. 4 DAC channels on a 220mm Eurocard?
<Hartytp> 2 channels?
<Hartytp> K7 FPGA
<Hartytp> make it a DRTIO slave and implement DRTIO via SERDES on the EEM connector
<Hartytp> no AFE mezzanine, just fixed urukul output so Sayma becomes Urukul but with pulse shaping
<sb0> xD
<Hartytp> I wonder what the power consumption/cost per channel would be and whether it would actually make sense
<Hartytp> anyway, at this stage we should finish building Sayma on uTCA even if it ends up being a prototype for the EEM version...
<sb0> the rfsoc stuff could probably fit in a EEM and actually have better performances than sayma (assuming a reasonable amount of xilinx silicon bugs...)
<Hartytp> maybe. I don't know enough about that (analog performance, latency, digital issues) and it would require more development, but might be the way to go. I just don't know
<sb0> okay, RTM DRTIO also works over the SATA cable now (after disabling other channels), the only bug is #1230
<sb0> between AMC and RTM
<Hartytp> FWIW though, it's not clear to me what the advantage of an RFSOC on an EEM would be v JESD.
<Hartytp> AFAICT it's not cheaper or significantly lower power consumption
<sb0> it's not cheaper, but it has fewer components, high channel density, and power consumption is lower
<Hartytp> allows higher density, but we probably wouldn't be trying to push loads of channels on the EEM anyway
<Hartytp> how much lower is power dissipation? Lower because it cuts out the transceivers? Or are there other reasons?
<Hartytp> anyway, not trying to start a long conversation about this now, just musing about the future :)
<sb0> mostly the JESD transceivers
<sb0> those use a lot of power on both FPGA and DAC sides
<Hartytp> ack
<Hartytp> well, integration is always nice when it works. But, black boxes can be hard to probe and debug and documentation is almost always crap. Plus phase noise needs checking. But that's the usual list of worries, maybe if we test them some day we'll have a nice surprise
<sb0> i really worry about the xilinx silicon though. but that's something that can be rather easily tested using a dev kit
<Hartytp> yes
<sb0> yeah, and synchronization between channels can be buggy, too
<Hartytp> I think that moving Sayma to a eurocard might be a good first step. Once it's working on uTCA the risk in porting to an EEM should be quite low. If we can sort out racks with good BPs, PSUs, cooling etc then EEMs start to look more appealing for this kind of project
<Hartytp> once that's working we could look into RF socs, but testing them properly wouldn't be a small job
<sb0> if they're actually well-designed, it's a rather small job
<sb0> after all, we're just compiling portable code pushing a stream of synchronous data into DACs
<sb0> every other thing is due to frustrating vendor cruft
<sb0> sadly, large FPGAs tend to be loaded with it
<Hartytp> yes
<Hartytp> anyway, one thing at a time
marmelada has joined #m-labs
<marmelada> rjo: my comment wasn't to undermine wr on amc, just to point out that I think it's needed at least as much on rtm
<rjo> marmelada: ack. i agree.
<Hartytp> rjo: yes, I may have overstated the case. There isn't no point in WR on the AMC, it's just not particularly useful for the applications i have in mind.
<Hartytp> although, I'm not sure what the non-ARTIQ user base of Sayma AMC will be. At one point I had the impression that Sayma AMC v2.0 is now non-compatible with other projects like the RFSoc RTM, so this is less of a concern
<Hartytp> but I might be wrong
<rjo> i feel that it would be much better to have those hypothetical other users of Sayma_AMC and/or Sayma_RTM involved. otherwise this is all just, well, hypothetical ;)
<Hartytp> yes
<sb0> marmelada: what comment?
<Hartytp> I tend to work on the assumption that any "potential" users who don't care enough to contribute on GitHub are unlikely to ever turn into actual users
<sb0> hartytp: I don't think that's a good assumption
<Hartytp> either that, or they don't actually know what their requirements are, so we can't design for them anyway
<sb0> there are many kasli users that never comment on github or anything
<Hartytp> (a) how do we design for their needs then
<Hartytp> (b) in practice, they often contribute by proxy
<Hartytp> but, yes, it's a problem.
<sb0> if there is no WR on RTM, how to clock the DACs?
<Hartytp> external SMA
<sb0> the alternative is to send the clock on the RTM connector
<sb0> ah yes, those SMA cables that go from AMC to RTM?
<Hartytp> the user is responsible for ensuring that this is phase stable w.r.t the clock reaching the AMC
<marmelada> rtm connector is not suitable for this clock afaik
<Hartytp> in the current design an SMA from the AMC to the RTM isn't an option
<sb0> is the rtm connector really much worse than a SMA and coax?
<sb0> hartytp: then having a phase-stable clock is a problem
<sb0> hartytp: and I'm using a coax like that all the time
<Hartytp> the FP SMAs can't provide a high-quality ref clk output
<sb0> it's very useful at least for development
<Hartytp> I'd need to pull up the PDFs to double check, but right now I don't think the AMC WR PLL output goes to a FP SMA directly
<Hartytp> the plan would be for the users to use an external passive power splitter to route the clock to both the AMC and RTM
<marmelada> it does
<sb0> if we don't clock the DACs with the WR clock, then what's the point of WR?
<Hartytp> sb0: that's my point
<marmelada> sb0: exactly!
<Hartytp> that's why I'm confused with joe saying he only wants it on the AMC
<Hartytp> makes very limited sense
<marmelada> well, we can still synchronize operations on fpgas
<Hartytp> marmelada (I don't have PDFs right now) so currently the DCXO goes into a fanout (ADCLK) and one output of that fanout goes to an MCX/SMA on the FP (not an internal one)?
<marmelada> so I guess it could help with delays between master and slaves?
<sb0> actually, we can remove WR on the *AMC* if we connect the backplane link to the RTM FPGA directly
<Hartytp> sb0: do you mean making the RTM the DRTIO master and having the AMC as the slave?
<sb0> yes
<marmelada> hartytp: correct
<sb0> that doesn't work well if we use the SFPs though
<Hartytp> marmelada: okay, good. We designed this better than I'd remembered :)
<Hartytp> sb0: in any case, this needs to work without WR since we don't have that working in Artiq yet...
<marmelada> I guess that Greg connected it that way, since it made sense
<Hartytp> I'm just surprised there is enough room on the FP
<sb0> if we have a spare backplane link (I think we do) and spare AMC-RTM pins, it may make sense to connect them
<marmelada> it's the same sma clock output as in v1.0
<Hartytp> with the 2 SMAs in V1.0 it was a PITA to tighten the cables since they interfered with the black insertion handle
<sb0> then put WR on both sides
<sb0> then maybe DNP the AMC side later
<sb0> hartytp: they also interfere with the USB connector
<Hartytp> maremalda: ?
<Hartytp> In v1.0 there were two GPIO lines that connected to TTL buffers
<Hartytp> for a DAC clock, we would need a LVPECL buffer driven directly from the DCXO (v1.0 did not have that)
<Hartytp> sb0: correction. The AMC clock does not need to be phase stable w.r.t. the RTM clock in the new design
<Hartytp> at least not if we do it the way I'm planning
<Hartytp> okay, no, scratch that
<Hartytp> nope, sorry, that's right
<marmelada> hartytp: sma clock output on amc is most probably in the same spot as in v1.0
<Hartytp> the RTM sys ref is sampled from the RTM refclk at 150MHz
<Hartytp> so we just need to ensure that the FPGA clock does not drift w.r.t. the ref clock by enough to cause s/h violations in that
<Hartytp> marmelada: okay, so do we now have three SMAs on the FP? or, did we scrap one of the GPIO SMAs?
<marmelada> in schematics there are 3 smas
<sb0> hartytp: no more glitches with vivado 2018.3, it seems.
<sb0> can you test?
<Hartytp> ?
<Hartytp> wtf?
<Hartytp> yes, but not today
<sb0> I'll run a compilation with 2018.2 and flash it without touching the hardware, to be sure...
<sb0> might have been someone else I touched
cr1901_modern has joined #m-labs
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<Hartytp> sb0: who are you touching that can fix bitstreams and how do I get one?
Hartytp has left #m-labs [#m-labs]
<sb0> hahaha
<sb0> oops
monicaleung has joined #m-labs
<sb0> whitequark: did you disable systemd on the lab machines?
<sb0> yep, glitches are gone with 2018.3
<sb0> and present with 2018.2
<sb0> same code, same hw setup
<sb0> only the vivado version differs
<cr1901_modern> how many old versions do you keep around?
monicaleung has quit [Ping timeout: 245 seconds]
monicaleung has joined #m-labs
monicaleung has quit [Quit: This computer has gone to sleep]
marmelada has quit [Quit: Page closed]
<rjo> i have nine.
<cr1901_modern> ouch, that poor hard drive
rohitksingh_work has quit [Read error: Connection reset by peer]
<d_n|a> sb0: After upgrading a Kasli master/satellite setup to latest Git master (i.e. with all the switching changes), I'm seeing some spurious DRTIO-related errors in the core logs: https://gist.github.com/klickverbot/226de6dc6f494f9c51e443efde561af2
<d_n|a> Any idea what could be going on here? Everything *seems* to work fine. (Those didn't appear with gateware/firmware from before switching support landed.)
<d_n|a> Could this be related to resetting RTIO (as in, CoreDevice.reset())?
<d_n|a> I'm not sure where to start looking, really, as I haven't found what triggers the messages, nor any smoking guns regarding broken functionality
key2 has quit [Quit: Connection closed for inactivity]
<d_n|a> There doesn't seem to be anything related in the satellite log
rohitksingh has joined #m-labs
<bb-m-labs> build #2218 of artiq-board is complete: Failure [failed conda_build] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2218 blamelist: Sebastien Bourdeauducq <sb@m-labs.hk>
<bb-m-labs> build #2803 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2803 blamelist: Sebastien Bourdeauducq <sb@m-labs.hk>
kc5tja has joined #m-labs
<bb-m-labs> build #2219 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2219
zng has quit [Ping timeout: 244 seconds]
<bb-m-labs> build #2220 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2220
<bb-m-labs> build #2804 of artiq is complete: Failure [failed python_unittest_2] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2804 blamelist: Robert J?rdens <rj@quartiq.de>
zng has joined #m-labs
zng has quit [Ping timeout: 250 seconds]
zng_ has joined #m-labs
cedric has joined #m-labs
cedric has quit [Changing host]
cedric has joined #m-labs
dlrobertson has joined #m-labs
cedric has quit [Client Quit]
cedric has joined #m-labs
cedric has quit [Changing host]
cedric has joined #m-labs
zng_ has quit [Ping timeout: 268 seconds]
<bb-m-labs> build #2221 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2221
<bb-m-labs> build #2222 of artiq-board is complete: Failure [failed conda_build] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2222 blamelist: Robert J?rdens <rj@quartiq.de>
<bb-m-labs> build #2805 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2805 blamelist: Robert J?rdens <rj@quartiq.de>
rohitksingh has quit [Ping timeout: 250 seconds]
zng has joined #m-labs
m4ssi has quit [Remote host closed the connection]
Gurty has joined #m-labs
<bb-m-labs> build #2223 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2223
kc5tja has quit [Ping timeout: 250 seconds]
kc5tja has joined #m-labs
kc5tja has quit [Ping timeout: 244 seconds]
<bb-m-labs> build #2224 of artiq-board is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/artiq-board/builds/2224
<bb-m-labs> build #2806 of artiq is complete: Failure [failed python_unittest_3] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/2806 blamelist: Robert J?rdens <rj@quartiq.de>
rohitksingh has joined #m-labs
rohitksingh has quit [Ping timeout: 260 seconds]
rohitksingh has joined #m-labs
X-Scale has quit [Ping timeout: 240 seconds]
X-Scale has joined #m-labs
rohitksingh has quit [Ping timeout: 272 seconds]