lekernel changed the topic of #m-labs to: Mixxeo, Migen, MiSoC & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
[florian1 has joined #m-labs
ohama has quit [Disconnected by services]
ohama has joined #m-labs
[florian] has quit [Ping timeout: 250 seconds]
Hawk777_ has quit [Quit: ZNC - http://znc.in]
Hawk777 has joined #m-labs
early has quit [Remote host closed the connection]
early has joined #m-labs
early has quit [Remote host closed the connection]
early has joined #m-labs
early has quit [Remote host closed the connection]
early has joined #m-labs
fengling has joined #m-labs
fengling_ has joined #m-labs
fengling has quit [Ping timeout: 260 seconds]
pablojavier has joined #m-labs
pablojavier has left #m-labs [#m-labs]
fengling_ is now known as fengling
fengling has quit [Ping timeout: 240 seconds]
fengling has joined #m-labs
fengling has quit [Ping timeout: 240 seconds]
fengling_ has joined #m-labs
<sb0> wow, gcc does static array bound checking now
<GitHub121> [misoc] sbourdeauducq pushed 2 new commits to master: http://git.io/4G9hqw
<GitHub121> misoc/master 6decb35 Sebastien Bourdeauducq: bios: add sdrrderr
<GitHub121> misoc/master 57335bd Sebastien Bourdeauducq: bios: add DQ filtering to sdrrd, add sdrrdbuf command
_florent_ has joined #m-labs
<sb0> _florent_, hi
<_florent_> hi
<sb0> I've been trying to get DQ0-7 working
<sb0> reads are fine, but writes show ~15% errors
<sb0> (error being defined as a 32-bit word with at least one flipped bit)
<sb0> I wonder if the spurious DQS pulses could be responsible for that
<_florent_> hmm ok for the result, what's your test?
<sb0> the bios 'memtest' command
<sb0> and I have added a sdrrderr command to test for bits that toggle (but should not) on reads
<sb0> after a successful write with sdrwr, sdrrd (as tested ~1 million times with sdrrderr) consistently returns the same data
<_florent_> ok, maybe my test over uart was not representative or too slow to show some errors
<sb0> you should see 15%, it's a lot
<sb0> interestingly, it seems I can't reproduce the problem when using sdrwr
<sb0> it could be a problem with the memory controller (e.g. timing violations) too
<_florent_> yes maybe
<_florent_> I can try to simulate the whole SoC with the ddr3 model and see I there are some warnings during memtest
<_florent_> if
<sb0> can we try first to get the spurious DQS pulses out of the way?
<sb0> I think they should be rather easy to remove by changing dqs_serdes_pattern
<sb0> by the way I have fixed the timing closure problem, it was due to the SERDES sampling the tristate control with the 4x clock
<sb0> now it's simply buffering ti
<sb0> *it
<_florent_> I can look at DQS pulses
<sb0> yeah, since you have the simulator running that looks like something you could test efficiently
<_florent_> by the way, I've looked at the issue with XC3SPROG when loading the bitstream, and now I have the same issue you have...
<_florent_> when you load the bitsream with iMPACT it's OK...
<sb0> so according to the micron datasheet p.174, you need a 'preamble' high-to-low DQ transition before the one that registers the first data word
<sb0> and keep DQS low for at least one 'postamble' bit time after the final transition has registered the last data word
<_florent_> but iMPACT seems to change some fields in the bitstream before loading it
<sb0> and when writes are packed, DQS just keeps toggling
<sb0> you mean broken XIP?
<_florent_> yes
<sb0> ah, yes. we have the same bug on the papilio pro, and it works with urjtag, not with xc3sprog
<sb0> right now I'm loading the bitstream to flash, and then press the reconfiguration button as xc3sprog -R is also broken (does nothing)
<sb0> I guess this last bug is easier to fix
<_florent_> hmm ok, maybe I should have a look at the urjtag code to compare
<_florent_> I've fixed xc3sprog -R
<sb0> oh, great
<_florent_> (at least added support for FAMILY_XC7)
<_florent_> progalgxc3s.cpp:
<sb0> well, the XIP bug is a low-priority one; the boards will boot from the flash in most real-use situations...
<sb0> have you tested it?
<sb0> if it works, please send a patch to the ML so that Uwe can merge it :)
<_florent_> IIRC it was working with non XIP bitstreams, but not with XIP bistreams but I'll retry that today
<sb0> okay
<sb0> I wonder if the K7 delays really allow several transitions in the buffer...
<sb0> things are acting pretty weird when trying to compensate for DDR3 skew
<_florent_> maybe I should also implements the fly by delay in the simulation...
<sb0> by the way, do you have experience with dynamic reconfiguration of the k7 PLLs, or clock rerouting?
<sb0> perhaps it could help to have some mode with the DLL disabled and a very low frequency, so we can actually know and change what is in the memory array
<_florent_> not really, I've only reused code for dynamic reconfiguration
<sb0> but if the k7 is anything like the slowtan6, this will go horribly wrong...
<sb0> yeah, the fly-by delay should be in the simulation. it's large enough to cause a lot of trouble ...
<sb0> interestingly enough, I get consistent data on DQ0-7 when removing the other pins in the platform file, and one bit flips sometimes when having all the pins
<sb0> (read data)
<sb0> DQ8-31 are consistent
<_florent_> yes interesting
<_florent_> for my test I removed all pins related to modules 1 to 7 (DQ8-63) and only keep pins related to module 0 (DQ0-7)
<sb0> yes, same
<_florent_> for info, the upstream kc705 target is broken, ddrphy is missing csr_map
<sb0> fixed - forgot to commit it
<GitHub73> [misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/c93qOg
<GitHub73> misoc/master 0eeb0ad Sebastien Bourdeauducq: targets/kc705: add ddrphy to CSR map
<sb0> p179 of the micron datasheet says clearly that DQS is don't care before the preamble - but removing spurious pulses could help with making sure the SDRAM is receiving DQS when we think it is ...
<_florent_> yes I'm running the simulation first with the uptream code and only 1 module and will see if there is a warning about DQS, then I'll try remove the pulses
fengling_ is now known as fengling
<sb0> _florent_, does the bitslip fetch data before or after the word that exists initially (before any bitslip operation)?
<sb0> eg you have the data AB1234XY
<sb0> before bitslip you have 1234
<sb0> what do you get in the different bitslip states?
<sb0> there is _no_ specification of that in the xilinx datasheet afaict :/
<_florent_> with bitslip=0 you have the max latency
<_florent_> by increasing bitslip you reduce latency
<sb0> and they encrypted the verilog model for the serdes. very clever ...
<sb0> so you get A/B, and not X/Y?
<_florent_> it's explained in UG471 P157
<_florent_> what's the temporal order of AB1235XY?
<sb0> A first, then B, then 1, ...
<sb0> UG471 says that the bitslip does a vague "reordering" of the words extracted from those sequences that are periodic with a period equal to the SERDES width...
<_florent_> ok, so if 1234 is outputed with bitslip=0, by increasing it you will be able to output 234X, or 34XY, but you have to look to UG471 P157 for the exact details
masal has joined #m-labs
masal has left #m-labs [#m-labs]
<sb0> ok, so it shifts the sampling window *later* in time, which should be equivalent to *increasing* the IDELAY by a discrete amount of bit times, correct?
<_florent_> no, for me it's the opposite...
<_florent_> with ISERDESE2 in NETWORKING, you add 1 full sys_clk of latency, and thus when it's outputing data, new data are already sampled inside ISERDES
<_florent_> bitslip only allow you to select between old and new data on the output
<sb0> ah yes, you're right
<sb0> it shifts the sampling window *later* in time, which should be equivalent to *decreasing* the IDELAY by a discrete amount of bit times
<sb0> let me draw it ...
<sb0> is that how it works?
<_florent_> for the bitslip part it seems ok, but in DDR3 you have to look at Figure 3-11 P157 for the output pattern
<_florent_> in DDR mode sorry
fengling has quit [Quit: WeeChat 1.0]
<sb0> yes, but the difference is only in the number of bitslip pulses you need to send to reach a certain state, correct?
<_florent_> yes
<sb0> ok
<sb0> one thing that does not make sense is why some DQs *always* have inconsistent values when reading, no matter what the IDELAY setting is
<_florent_> I don't understand either...
<_florent_> do you calibrate IDELAY on group of 8 bits, on on each bits?
<sb0> on groups of 8 bits
<sb0> maybe there's a lot of jitter... I hope not :/
<sb0> or the IDELAY is not working correctly
<sb0> of course, ug471 had to use a different initial pattern in order to make it more difficult to search for the DDR bitslip state that corresponds to a given SDR bitslip state
<sb0> well I'm too forgiving here... why did they have that difference to start with
<sb0> %§&/! this DDR bitslip state reordering is a pure annoyance
<sb0> totally useless, just causes problems
<_florent_> "In DDR mode, every Bitslip operation causes the output pattern to alternate between a shift right by one and shift left by three"
<sb0> yeah. WHY?
<_florent_> don't know, but it seems "easier" to understand than the Figure 3-11...
<sb0> yeah, fig 3-11 is useless
<sb0> so, number of shifts wanted -> number of pulses needed in this idiotic DDR mode
<sb0> 0 -> 0
<sb0> 1 -> 3
<sb0> 2 -> 2
<sb0> 3 -> 5
<sb0> 4 -> 4
<sb0> 5 -> 7
<sb0> 6 -> 6
<sb0> 7 -> 1
<_florent_> so you will have to reset delay between each test
<sb0> ok. things start to make sense. on the higher DQ group (with >1ns skew from the write leveling), after swallowing one bit of skew using the bitslip and then adding some delay, I don't see read errors anymore
<_florent_> idelay
<sb0> yes
<sb0> I'm not writing yet... just reading whatever garbage is in the memory array (which conveniently is pretty random)
<_florent_> sorry, so it's like idelay does not support multiple transisions?
<_florent_> transitions
<sb0> idelay would only *increase* the delay. for the higher DQ group which receives its read command later and therefore outputs its read data later, we need to *decrease* it.
<_florent_> indeed...
<sb0> it could be that the persistent read errors I was seeing were due to the SERDES sampling outside the burst, and it looked like data due to electrical noise ...
<sb0> I have added a hack to sdrrderr to see where the errors occur within the burst
<_florent_> ok, I have to go, will be back after lunch
<_florent_> BTW the DQS pulses do not trigger warnings in simulations
<_florent_> and I don't see others warnings/errors relative to timings
<sb0> yeah. they just make it harder to see if the DRAM is sampling the correct burst
<sb0> as it would still be a correct write (as there are more DQS pulses) if the DRAM sampling window is shifted by a few cycles
<sb0> the datasheet does say that DQS is don't care before the preamble and after the postambl ...
<GitHub141> [misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/jHo9BA
<GitHub141> misoc/master 2234f50 Sebastien Bourdeauducq: k7ddrphy: add bitslip control for incoming DQ
<sb0> alternatively, we can set DQ to zero when the data is invalid. this way, the DRAM behaves in a specified manner when its burst sampling window is off
<sb0> that could make it easier to debug, as we'd see zeros in the data read back, instead of some unspecified (and probably messy) behaviour
_florent_ has quit [Ping timeout: 240 seconds]
_florent_ has joined #m-labs
nicksydney_ has joined #m-labs
nicksydney has quit [Ping timeout: 252 seconds]
<_florent_> sb0, DQ is already set to 0 by LASMICON when it is not writing, so with memtest during preamble sys_clk and postamble sys_clk the PHY send zeros to the DRAM
<sb0> hmm, ok, dfii should do the same then
mumptai has joined #m-labs
<sb0> since we are doing this sort of PHY tweaking using dfii ...
<sb0> memory controllers do not _have_ to do it, since issues like burst window alignment should be sorted out at that stage ...
<_florent_> I've tried to change dqs_pattern on preamble and postamble but wasn't able to have something working due to OSERDESE2 strange behaviour...
nicksydney has joined #m-labs
nicksydney_ has quit [Ping timeout: 252 seconds]
<sb0> interesting. and what makes you think that the OSERDESE2 doesn't similarly act up when dealing with the data?
<_florent_> I don't know exactly by when asserting T1(oe) and applying preamble pattern the same sys_clk cycle new pattern does not seems used...
<sb0> ok I managed to get consistent data with all DQ groups, over 1.6 million reads
<sb0> I used one bitslip (3 pulses with the stupid DDR mode) on the three upper (high skew) DQ groups, plus some delay
<sb0> and some delay on the lower DQ group
<sb0> and by changing the lower 3 address bits, I get the expected burst reordering. so it sounds I'm definitely reading the DRAM array correctly.
<_florent_> great
<sb0> with lasmicon: 881/1048576 failed 32-bit words
<sb0> sdrwr seems to write the pattern correctly, though I have only made one attempt which statistically is not very significant.
<sb0> I've never had this low error rate though :-)
<_florent_> yes it's better that what I had
<sb0> but there can be non-PHY sources (eg bad timing in the controller), so the next test is to exercise writes to the page buffer with dfii
<sb0> I wonder how to automate this calibration.. right now it's just a lot of manual guesswork and it might not work on another PCB/SODIMM...
<_florent_> at least memtest with size of 2048 bytes and l2_size of 128 bytes does not trigger timings issues in the controller
<_florent_> for read leveling you can also use the DRAM pattern to find the center of the sampling window
<sb0> I guess you can assume that there is less than 2 bit time of skew across the module
<sb0> so during write leveling, if you are already sampling CK high at a DQS transition with no delay, then this DQ group has between 1 and 2 bit times of skew
<sb0> in this case, 1) move the DQS edge in the CK=0 zone before continuing leveling 2) add one bitslip on the read path
<sb0> finally, increase read delays for each DQ group until read data is consistent
<sb0> before the last step, you may want to fill the DRAM array with random data
<sb0> I guess this algo should work for simple DDR3 systems...
<sb0> well, "simple"
<sb0> DDR3 is a mess
<_florent_> IIRC LASMICON does not handles ODT
<sb0> isn't dynamic ODT only needed for multi-rank systems?
<_florent_> maybe the last errors we have come from here...
<sb0> and on single-rank systems, you can just drive ODT=1 all the time
mithro has joined #m-labs
<_florent_> ... yes sorry, I'm reading TN4104.pdf and it seems you are right
<_florent_> it seems we are also not using DCI termination on DQ/DQs
<_florent_> but it should impact only reads and not writes
<sb0> do we need that?
<_florent_> if reads are working without, maybe not
<_florent_> can you give me the configuration your are using on idelays to do some tests?
<sb0> "mw 0xe0005010 1" is delay increase
<sb0> "mw 0xe0005008 128" selects the 7th DQ group (2**7=128) for delay manipulation
<_florent_> ok thanks
<sb0> the bitslip numbers are the DQ groups with one bitslip (3 pulses)
<_florent_> with only first module:
mumptai has quit [Ping timeout: 255 seconds]
<_florent_> in fact if we want the drive ODT=1 all time, we have to use Dynamic ODT: cf micron datasheet P192: Dynamic ODT Special Use Case
<_florent_> I'm going to change initsequence.py with that
<_florent_> got it!!! we _NEED_ DCI :)
<_florent_> with the 2 first modules (thus 16 bits) and only write leveling:
<_florent_> > 30000 on memtest without DCI
<_florent_> 30000 errors
<_florent_> 60 errors with DCI...
<_florent_> 0 errors after read leveling
<_florent_> sb0: I clean up my code and send you a patch
Alain has joined #m-labs
gric_ has joined #m-labs
Alain has quit [Quit: ChatZilla 0.9.90.1 [Firefox 31.0/20140716183446]]
_florent_ has quit [Read error: Connection reset by peer]