lekernel changed the topic of #m-labs to: Mixxeo, Migen, MiSoC & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
<GitHub76> [artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/nFV0ZQ
<GitHub76> artiq/master 055942e Sebastien Bourdeauducq: manual/installing: fix paths
<rjo> sb0: correct. once you have synchronized synclk at the fpga by shifting resets and with the knowledge of the synclk-to-fud round trip you get valid fud timing and ddses synchronized (at the fpga that is).
<rjo> if you want to synchronize them at their outputs, we need matched synclks-to-fpga.
<sb0> yes, the synclks do need to be matched. and how well they are matched will determine how well we can sync those DDS
<sb0> the matching needs to be tight if we want to sync them within one 3.5GHz clock cycle
<rjo> i remember getting flash errors before i relaxed the spi timing (in the soc) and vaguely remember seeing very rare flash errors with xc3sprog but they were never reproducible and rare.
<rjo> s/and rare//
<sb0> I've emailed Jack about it
<sb0> maybe he's aware of the problem
<sb0> we don't care about the reset length though, as the FPGA would scan the timings anyway
<sb0> until the synclks are matched.
<sb0> the way I imagine this is similar to DDR3 write leveling
<rjo> the usb forwarding on these vms is quite a bit slower than native. i would imaging it is an interaction between the proxy bitstream's notion of spi and the spi timings of the flash triggered by data hickups..
<sb0> the FPGA has an internal reference SYNC_CLK that is phase-locked to the 3.5GHz SYS_CLK
<sb0> and the reference SYNC_CLK is used to sample the incoming per-DDS SYNC_CLK
<sb0> the FPGA then sends resets with a different delay wrt the internal SYNC_CLK
<sb0> and stops right when it hits the first e.g. 0->1 transition in the sampled SYNC_CLK
<sb0> we don't even need a TDC for this
<rjo> sb0: yes. i think we are just expressing the same constraints in different variables.
<rjo> you always stress that you don't want to sample on an edge ;)
<sb0> by the way, what is going to be the RTIO clock with this new DDS system?
<sb0> with the ad9858 stuff, it was easy - 125MHz for SoC, and 1GHz (exactly 8x) for DDS and RTIO SERDES (and DDR3 data)
<sb0> we could keep things synchronous
<rjo> yes. 125MHz for the soc would be nice.
<rjo> you mean the 9914 dds clock?
<rjo> it has sync_clk*24=sys_clk, right?
<sb0> yes... it's nice to have those clocks phase-locked and with a power-of-2 frequency ratio
<sb0> but I guess it won't be the case anymore with the 9914
<rjo> so far we used the new dds only because of its wider ftw not because of the faster clock.
<sb0> if we use 3GHz for SYS_CLK, we can keep 125MHz SoC and SYNC_CLK
<rjo> at ETHZ we overclocked them to 2**30 Hz because then granularity is just 0.25 Hz...
<rjo> yes. i think 2 or 3 GHz will be the dds sys_clk.
<sb0> and there will be 3 samples per RTIO cycle
<sb0> 2GHz will result in a 83 1/3MHz frequency for SYNC_CLK...
<sb0> it would probably make sense to go asynchronous RTIO instead of trying to get the DDR3 to work on a multiple of that frequenc
<sb0> y
<rjo> sb0: have the rtio fifos be asyncfifos.
<sb0> I'd try 3GHz + sync. if the sync fails, the result will be the DDSes will be off by a coupld samples wrt each other... which may not be a big problem
<rjo> yeah. looks like 2ghz will be experimentally inconvenient. too close to qubit frequencies. so probably 1 or 3ghz
<sb0> it's not that simple, we need to support the "replace last FIFO entry" operation to implement pulse merging
<rjo> ah. yes.
<sb0> additionally, we need instant feedback on underflows so that exceptions can be precisely raised. having the counter in another clock domain complicates that.
<rjo> sb0: there could be a lockout mechanism that runs at cpu freq, buffers that "last FIFO entry" and commits to FIFO a few cycles before the deadline.
<sb0> yes, or keep the FIFO synchronous and use some other clock crossing mechanism
<rjo> a lockout window for pulse merging.
<rjo> ack
<sb0> either way, replace support and instant underflow detection are not going to be straightforward with multiple clock domains
<rjo> no i think 3ghz is in fact experimentally very convenient if there is not too much feedthrough of the clock itself. so lets just assume and push for 2ghz.
<rjo> s/3ghz/2ghz/ and then reverse my statement. push for 3ghz.
<rjo> 2ghz would mean a particularly large and useful 1st nyquist image at 2ghz - f.
<rjo> so lets go with 3ghz that image at 3ghz - f around 2ghz should still be very useful.
<sb0> ah, sync_clk is single ended
<sb0> we may want to put a differential buffer close to it
<sb0> then send that to the backplane, and mux it on the backplane with matched traces to each DDS
<rjo> yep
<rjo> man. your refactoring is making having to re-discover the design each time i look at it ;)
<sb0> the inline transform rewrite?
<rjo> $ git log | egrep 're(factor|write)' | wc -l
<rjo> 5
<rjo> ;)
rjo is now known as rjo_
rjo_ is now known as rjo
<GitHub54> [misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/Djejhg
<GitHub54> misoc/master 09773df Sebastien Bourdeauducq: software: make compiler-rt a submodule
<GitHub86> [artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/6D_DQA
<GitHub86> artiq/master 42accd5 Sebastien Bourdeauducq: manual/installing: remove compiler-rt download instructions
<GitHub44> [misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/F2Zu2A
<GitHub44> misoc/master f4d6ac8 Sebastien Bourdeauducq: README: remove compiler-rt download instructions
<GitHub166> [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/jFy8BA
<GitHub166> migen/master 9f2f8d2 Guy Hutchison: add hamming-code gen/check lib
xiangfu has joined #m-labs
fengling has joined #m-labs
fengling_ has joined #m-labs
fengling has quit [Ping timeout: 265 seconds]
sb0 has quit [Quit: Leaving]
xiangfu has quit [Ping timeout: 245 seconds]
xiangfu has joined #m-labs
mumptai has joined #m-labs
xiangfu has quit [Ping timeout: 255 seconds]
xiangfu has joined #m-labs
mumptai has quit [Ping timeout: 264 seconds]
siruf has quit [Ping timeout: 272 seconds]
_florent_ has joined #m-labs
siruf has joined #m-labs
kyak has quit [Ping timeout: 244 seconds]
kyak has joined #m-labs
xiangfu has quit [Remote host closed the connection]
fengling_ has quit [Quit: WeeChat 1.0]
sj_mackenzie has quit [Remote host closed the connection]
<ysionneau> _florent_ hi!
<_florent_> hi ysionneau
<ysionneau> my simple sdram controller works well for sdram, however it has troubles with DDR
<ysionneau> first I don't exactly understand how to handle DM signal
<ysionneau> do you know how DM should be handled with DDR?
<_florent_> I'm not sure DM is really used
<_florent_> it's similar to byte_enable signals
<ysionneau> that's what I understood yes
<ysionneau> with SDRAM I just tie DQM to 0 and it works
<ysionneau> at least for 32 bits access pattern
<ysionneau> with DDR if I do this I have strange results from the memtest at the BIOS bootup
<_florent_> and why do you think it's related to DQM?
<ysionneau> not sure, because that's something I don't yet understand very well :p
<ysionneau> actually now I do something like wrdata_mask = ~wishbone.sel upon Write command
<ysionneau> here is what I get from the BIOS with some debug prints in the memtest_silent() http://pastebin.com/smeunuhD
<ysionneau> first weird thing is I get everything I read doubled with same data
<ysionneau> and usually it corresponds to some other place in the column rather than the one I want to read
<ysionneau> well, I don't actually know if the bug is in the read or the write part in fact :)
<ysionneau> it seems the first 2 bytes are correct for addresses like 4*k+3
<_florent_> yes that's what I see, I'm pretty sure it's not related to DQM but something else
<_florent_> have you simulate it?
<ysionneau> yes, I have a working simulation for the SDRAM on papilio pro, but it works fine, like when running on the fpga
<ysionneau> I don't have a working simulation with DDR, since the DDR PHY uses some Xilinx elements like OSERDES2 etc
<ysionneau> I am working on a fuse simulation at the moment, but it's not working yet
<ysionneau> it's quite a pain to setup :/
<_florent_> since we know the DDR PHY is working, you can probably to simulation without the PHY and just look at the DFI interface
<_florent_> (or create a simple DDR PHY model)
<_florent_> do a simulation
<ysionneau> yes I could simulate without the PHY
<ysionneau> but then I need at least to drive the rddata_valid and wrdata_valid to ack the controler commands
<ysionneau> sounds doable, even if I will not be sure to do the same as what the PHY would do
<ysionneau> if I make a mistake in the PHY "model" then all the simulation is pointless and will not help me to understand my issue :/
<ysionneau> but I can try
<_florent_> can you share your code?
<ysionneau> sure
<ysionneau> let me check if I have something un-pushed
<ysionneau> in my simplesdramcon branch on my misoc fork
<_florent_> thanks
<ysionneau> to run the testbench you need to download the model (as it will be printed) and also export PYTHONPATH=path/to/misoc
<ysionneau> and then just python3 simplesdramcontb.py
<ysionneau> (thank you for looking into this, I'm losing my hairs on this :p)
<_florent_> OK I'm looking at the code
<_florent_> on this line:
<_florent_> I think you have data_width = 128 (32x2(DDR)x2(2 PHASES), is it what you are expecting?
<ysionneau> ah sorry
<ysionneau> I modified migen/mibuild/platforms/m1.py to only keep 8 dq lines
<ysionneau> 1 dqm 1 dqs
<ysionneau> I'm using target mlabs_video, platform m1, subtarget BaseSoC (cpu or1k)
<ysionneau> but I forgot to tell I modified the m1 platform to reduce the pins
<ysionneau> so that I end up with a 32 bit wishbone bus slave in the ddr controller
<_florent_> OK :)
<ysionneau> here is the migen diff http://pastebin.com/85ph1MfL
<ysionneau> according to the M1 RC3 schematics it corresponds to DQ[7:0]
<ysionneau> and dm[0] dqs[0]
<ysionneau> ok I've got some ISim (/fuse) simulation working, but I don't inject the wishbone reads yet
<ysionneau> that's when you regret the cool Migen+iverilog combo doing your simulation for you
<ysionneau> now I need to write the wishbone transactions by hand ... :(
<_florent_> you can also simulate OSERDES2 with Migen+iverilog
<_florent_> you just have to compile OSERDES2 model like you have done for the Micron model
<ysionneau> ah right
<_florent_> that's only for Kintex7 that it's not possible since OSERDESE2 are use secure-ip...
<ysionneau> first I thought it was ciphered
<ysionneau> but then I saw it was not but I was already on track for the ISim stuff :p
<ysionneau> let's go back to iverilog then!
<_florent_> I not able to find the issue just by looking at the code
<ysionneau> ok, thanks for having looked at it :)
mumptai has joined #m-labs
<ysionneau> pfew at last I have some simulation working with the DDR model + xilinx components
<ysionneau> it was a bit weird to generate a non "sys" clock at 50 MHz from the top level to then feed the mxcrg which will generate the sys_clk (83,3 MHz) from some PLL stuff
<ysionneau> to plug everything together was not plain simple for me
sb0 has joined #m-labs
<ysionneau> sb0 hi!
<sb0> ysionneau, you really can't tie DM to 0. otherwise any byte access from the CPU access will write two bytes instead of one.
<ysionneau> sure
<ysionneau> now I don't do that anymore
<sb0> why don't you copy the code from lasmicon?
<ysionneau> I didn't understand every bit of lasmicon yet
<ysionneau> it's a lot bigger a more complex than my small controller
<ysionneau> for wrdata_mask it is taking the lasmic bus we signal
<sb0> grep <data mask signal name>
<ysionneau> yes I did that
<sb0> yes, take the wishbone sel signal instead
<sb0> don't forget to invert it
<ysionneau> that's exactly what I do
<ysionneau> and it's not any better :/
<sb0> I think that driving DQM during reads as well doesn't cause any problem (iirc)
<ysionneau> if you drive it high you get nothing out of the dram I think
<ysionneau> it sets dq to hi-z
<ysionneau> confirmed by sdram simulation which then stops working
<ysionneau> well sorry I just understood what you meant
<ysionneau> indeed I guess just driving it to ~bus.sel all the time should work
<ysionneau> let's try that
<ysionneau> last time I tried I must have done something like .eq(bus.sel) without the ~ which would explain my hi-z issues
<sb0> bah, you need to slice bus.sel obviously
<sb0> like for data
<sb0> ah, sorry, it's sliced
<sb0> and you need to send it at the same time as the data (at least). it's ignored during the write command (unless a previous burst was already going on, but your ctl doesn't do that)
* sb0 is sick today :(
<ysionneau> arg :(
<ysionneau> that's why you're up so early?
<sb0> no, I'm in SF right now, and enjoy a combination of jetlag and some sort of cold that manifested itself shortly after arriving
<ysionneau> :/
<ysionneau> what are you doing in SF?
<sb0> visiting folks and trying to find someone to help design an excellent artiq gui. not much luck with the latter so far...
<ysionneau> hope you will find some talented UI guy!
<ysionneau> by combinatorialy doing wr_data_mask = ~bus.sel I get this http://pastebin.com/cjW3G6KH :/
<ysionneau> maybe it has nothing to do with DM pin after all...
<sb0> is the downconverter working correctly?
<ysionneau> I'm not using it
<ysionneau> since I get 32 bits wide wishbone directly
<sb0> huh?
<sb0> that's on m1?
<ysionneau> I am only using dq[7:0]
<ysionneau> yes
<sb0> oh, probably you are not sending the data at the right time. iirc with ddr you need to send it 1 cycle after the write command, as opposed to simultaneously with sdr.
<ysionneau> to send what 1 cycle after the write command?
<sb0> I don't remember if the phy already aligns the write data with the write command, but I'd check that
<ysionneau> ah ok
<ysionneau> I thought the phy would take care of that
<sb0> I don't remember, I wrote it in 2012
<ysionneau> indeed in sdram you put write command + data_in on DQ at the same rising edge of clk
<ysionneau> and indeed on ddr you need to wait 1 clock cycle
<ysionneau> (to present data_in on dq)
<sb0> just insert a register on dq_w/dm when ddr
<ysionneau> ah I think I get it
<sb0> but check the phy first
<ysionneau> I need to put cmd at wrcmdphase for DDR
<ysionneau> not at wrphase
<ysionneau> and indeed wrcmdphase is 0 and wrphase is 1
<ysionneau> so 1 cycle latency
* ysionneau resynthesizing
<ysionneau> if that was the mistake, then I should head bang the wall :'
<sb0> you need to send the write command on wrcmdphase on all PHYs
<sb0> not just DDR. it only worked by accident on that SDR PHY.
<ysionneau> sure
<ysionneau> ok now I get this ... http://pastebin.com/5nLCkitL
<GitHub139> [artiq] sbourdeauducq pushed 2 new commits to master: http://git.io/FJsRzw
<GitHub139> artiq/master 391ff10 Sebastien Bourdeauducq: test/full_stack: style and add note about loopback test connections
<GitHub139> artiq/master 62677ed Robert Jordens: test.full_stack: add ARTIQ_NO_HARDWARE environment variable
<ysionneau> not sure if I can call this better ... but heading to bed anyway
<ysionneau> gn8!
* ysionneau pushed his changes anyway
sb0 has quit [Ping timeout: 244 seconds]
<_florent_> ysionneau: I think you have an issue on the write
<_florent_> s
<_florent_> s6ddrphy use CAS Latency of 3 with wrcmdphase=1 and wrphase=0
<_florent_> it seems you are asserting cmd and data at the same sys_clk cycle wich is not correct
<_florent_> you have to do:
<_florent_> - first sys_clk cycle: assert cmd (full_clk = 2xsys_clk, since wrcmdphase=1 it will be sent on the last full_clk cycle)
<_florent_> - second sys_clk cycle: do nothing (2 full_clk cycles)
<_florent_> - third sys_clk cycle: assert wrdata_en (which will be sent on first full_clk_cycle since wrphase is configured to 0)
<_florent_> thus you have your 3 full_clk cycles between cmd and data which is CAS Latency.
_florent_ has quit [Quit: Leaving]
nicksydney has joined #m-labs