#m-labs on 2014-09-01 — irc logs at freenode.irclog.whitequark.org

2013-12-11 12:34 lekernel changed the topic of #m-labs to: Mixxeo, Migen, MiSoC & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:02 [florian1 has joined #m-labs

00:02 ohama has quit [Disconnected by services]

00:03 ohama has joined #m-labs

00:11 [florian] has quit [Ping timeout: 250 seconds]

00:30 Hawk777_ has quit [Quit: ZNC - http://znc.in]

00:32 Hawk777 has joined #m-labs

00:57 early has quit [Remote host closed the connection]

01:02 early has joined #m-labs

01:05 early has quit [Remote host closed the connection]

01:07 early has joined #m-labs

01:17 early has quit [Remote host closed the connection]

01:21 early has joined #m-labs

01:53 fengling has joined #m-labs

02:17 fengling_ has joined #m-labs

02:17 fengling has quit [Ping timeout: 260 seconds]

03:20 pablojavier has joined #m-labs

03:21 pablojavier has left #m-labs [#m-labs]

03:37 fengling_ is now known as fengling

05:07 fengling has quit [Ping timeout: 240 seconds]

05:51 fengling has joined #m-labs

06:01 fengling has quit [Ping timeout: 240 seconds]

06:01 fengling_ has joined #m-labs

07:14 <sb0> wow, gcc does static array bound checking now

07:25 <GitHub121> [misoc] sbourdeauducq pushed 2 new commits to master: http://git.io/4G9hqw

07:25 <GitHub121> misoc/master 6decb35 Sebastien Bourdeauducq: bios: add sdrrderr

07:25 <GitHub121> misoc/master 57335bd Sebastien Bourdeauducq: bios: add DQ filtering to sdrrd, add sdrrdbuf command

07:47 _florent_ has joined #m-labs

07:50 <sb0> _florent_, hi

07:51 <_florent_> hi

07:52 <sb0> I've been trying to get DQ0-7 working

07:52 <sb0> reads are fine, but writes show ~15% errors

07:53 <sb0> (error being defined as a 32-bit word with at least one flipped bit)

07:53 <sb0> I wonder if the spurious DQS pulses could be responsible for that

07:54 <_florent_> hmm ok for the result, what's your test?

07:55 <sb0> the bios 'memtest' command

07:55 <sb0> and I have added a sdrrderr command to test for bits that toggle (but should not) on reads

07:56 <sb0> after a successful write with sdrwr, sdrrd (as tested ~1 million times with sdrrderr) consistently returns the same data

07:56 <_florent_> ok, maybe my test over uart was not representative or too slow to show some errors

07:56 <sb0> you should see 15%, it's a lot

07:57 <sb0> interestingly, it seems I can't reproduce the problem when using sdrwr

07:57 <sb0> it could be a problem with the memory controller (e.g. timing violations) too

07:58 <_florent_> yes maybe

07:59 <_florent_> I can try to simulate the whole SoC with the ddr3 model and see I there are some warnings during memtest

07:59 <_florent_> if

07:59 <sb0> can we try first to get the spurious DQS pulses out of the way?

08:00 <sb0> I think they should be rather easy to remove by changing dqs_serdes_pattern

08:01 <sb0> by the way I have fixed the timing closure problem, it was due to the SERDES sampling the tristate control with the 4x clock

08:01 <sb0> now it's simply buffering ti

08:01 <sb0> *it

08:01 <_florent_> I can look at DQS pulses

08:02 <sb0> yeah, since you have the simulator running that looks like something you could test efficiently

08:03 <_florent_> by the way, I've looked at the issue with XC3SPROG when loading the bitstream, and now I have the same issue you have...

08:04 <_florent_> when you load the bitsream with iMPACT it's OK...

08:04 <sb0> so according to the micron datasheet p.174, you need a 'preamble' high-to-low DQ transition before the one that registers the first data word

08:05 <sb0> and keep DQS low for at least one 'postamble' bit time after the final transition has registered the last data word

08:05 <_florent_> but iMPACT seems to change some fields in the bitstream before loading it

08:05 <sb0> and when writes are packed, DQS just keeps toggling

08:06 <sb0> you mean broken XIP?

08:06 <_florent_> yes

08:07 <sb0> ah, yes. we have the same bug on the papilio pro, and it works with urjtag, not with xc3sprog

08:08 <sb0> right now I'm loading the bitstream to flash, and then press the reconfiguration button as xc3sprog -R is also broken (does nothing)

08:09 <sb0> I guess this last bug is easier to fix

08:09 <_florent_> hmm ok, maybe I should have a look at the urjtag code to compare

08:09 <_florent_> I've fixed xc3sprog -R

08:09 <sb0> oh, great

08:09 <_florent_> (at least added support for FAMILY_XC7)

08:11 <_florent_> progalgxc3s.cpp:

08:11 <_florent_> http://pastie.org/9518513

08:11 <sb0> well, the XIP bug is a low-priority one; the boards will boot from the flash in most real-use situations...

08:12 <sb0> have you tested it?

08:13 <sb0> if it works, please send a patch to the ML so that Uwe can merge it :)

08:14 <_florent_> IIRC it was working with non XIP bitstreams, but not with XIP bistreams but I'll retry that today

08:14 <sb0> okay

08:16 <sb0> I wonder if the K7 delays really allow several transitions in the buffer...

08:17 <sb0> things are acting pretty weird when trying to compensate for DDR3 skew

08:19 <_florent_> maybe I should also implements the fly by delay in the simulation...

08:20 <sb0> by the way, do you have experience with dynamic reconfiguration of the k7 PLLs, or clock rerouting?

08:20 <sb0> perhaps it could help to have some mode with the DLL disabled and a very low frequency, so we can actually know and change what is in the memory array

08:21 <_florent_> not really, I've only reused code for dynamic reconfiguration

08:21 <sb0> but if the k7 is anything like the slowtan6, this will go horribly wrong...

08:23 <sb0> yeah, the fly-by delay should be in the simulation. it's large enough to cause a lot of trouble ...

08:32 <sb0> interestingly enough, I get consistent data on DQ0-7 when removing the other pins in the platform file, and one bit flips sometimes when having all the pins

08:32 <sb0> (read data)

08:32 <sb0> DQ8-31 are consistent

08:34 <_florent_> yes interesting

08:36 <_florent_> for my test I removed all pins related to modules 1 to 7 (DQ8-63) and only keep pins related to module 0 (DQ0-7)

08:36 <sb0> yes, same

08:38 <_florent_> for info, the upstream kc705 target is broken, ddrphy is missing csr_map

08:41 <sb0> fixed - forgot to commit it

08:41 <GitHub73> [misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/c93qOg

08:41 <GitHub73> misoc/master 0eeb0ad Sebastien Bourdeauducq: targets/kc705: add ddrphy to CSR map

08:50 <sb0> p179 of the micron datasheet says clearly that DQS is don't care before the preamble - but removing spurious pulses could help with making sure the SDRAM is receiving DQS when we think it is ...

08:57 <_florent_> yes I'm running the simulation first with the uptream code and only 1 module and will see if there is a warning about DQS, then I'll try remove the pulses

09:15 fengling_ is now known as fengling

09:27 <sb0> _florent_, does the bitslip fetch data before or after the word that exists initially (before any bitslip operation)?

09:30 <sb0> eg you have the data AB1234XY

09:30 <sb0> before bitslip you have 1234

09:30 <sb0> what do you get in the different bitslip states?

09:31 <sb0> there is _no_ specification of that in the xilinx datasheet afaict :/

09:31 <_florent_> with bitslip=0 you have the max latency

09:31 <_florent_> by increasing bitslip you reduce latency

09:31 <sb0> and they encrypted the verilog model for the serdes. very clever ...

09:32 <sb0> so you get A/B, and not X/Y?

09:34 <_florent_> it's explained in UG471 P157

09:34 <_florent_> what's the temporal order of AB1235XY?

09:36 <sb0> A first, then B, then 1, ...

09:39 <sb0> UG471 says that the bitslip does a vague "reordering" of the words extracted from those sequences that are periodic with a period equal to the SERDES width...

09:39 <_florent_> ok, so if 1234 is outputed with bitslip=0, by increasing it you will be able to output 234X, or 34XY, but you have to look to UG471 P157 for the exact details

09:46 masal has joined #m-labs

09:46 masal has left #m-labs [#m-labs]

09:48 <sb0> ok, so it shifts the sampling window *later* in time, which should be equivalent to *increasing* the IDELAY by a discrete amount of bit times, correct?

09:52 <_florent_> no, for me it's the opposite...

09:54 <_florent_> with ISERDESE2 in NETWORKING, you add 1 full sys_clk of latency, and thus when it's outputing data, new data are already sampled inside ISERDES

09:54 <_florent_> bitslip only allow you to select between old and new data on the output

09:56 <sb0> ah yes, you're right

09:56 <sb0> it shifts the sampling window *later* in time, which should be equivalent to *decreasing* the IDELAY by a discrete amount of bit times

09:57 <sb0> let me draw it ...

10:05 <sb0> _florent_, http://m-labs.hk/serdes.jpg

10:06 <sb0> is that how it works?

10:13 <_florent_> for the bitslip part it seems ok, but in DDR3 you have to look at Figure 3-11 P157 for the output pattern

10:13 <_florent_> in DDR mode sorry

10:13 fengling has quit [Quit: WeeChat 1.0]

10:16 <sb0> yes, but the difference is only in the number of bitslip pulses you need to send to reach a certain state, correct?

10:16 <_florent_> yes

10:18 <sb0> ok

10:19 <sb0> one thing that does not make sense is why some DQs *always* have inconsistent values when reading, no matter what the IDELAY setting is

10:25 <_florent_> I don't understand either...

10:26 <_florent_> do you calibrate IDELAY on group of 8 bits, on on each bits?

10:27 <sb0> on groups of 8 bits

10:27 <sb0> maybe there's a lot of jitter... I hope not :/

10:28 <sb0> or the IDELAY is not working correctly

10:33 <sb0> of course, ug471 had to use a different initial pattern in order to make it more difficult to search for the DDR bitslip state that corresponds to a given SDR bitslip state

10:34 <sb0> well I'm too forgiving here... why did they have that difference to start with

10:41 <sb0> %§&/! this DDR bitslip state reordering is a pure annoyance

10:42 <sb0> totally useless, just causes problems

10:52 <_florent_> "In DDR mode, every Bitslip operation causes the output pattern to alternate between a shift right by one and shift left by three"

10:52 <sb0> yeah. WHY?

10:53 <_florent_> don't know, but it seems "easier" to understand than the Figure 3-11...

10:53 <sb0> yeah, fig 3-11 is useless

10:59 <sb0> so, number of shifts wanted -> number of pulses needed in this idiotic DDR mode

10:59 <sb0> 0 -> 0

10:59 <sb0> 1 -> 3

10:59 <sb0> 2 -> 2

10:59 <sb0> 3 -> 5

10:59 <sb0> 4 -> 4

10:59 <sb0> 5 -> 7

10:59 <sb0> 6 -> 6

11:00 <sb0> 7 -> 1

11:02 <_florent_> so you will have to reset delay between each test

11:05 <sb0> ok. things start to make sense. on the higher DQ group (with >1ns skew from the write leveling), after swallowing one bit of skew using the bitslip and then adding some delay, I don't see read errors anymore

11:08 <_florent_> idelay

11:08 <sb0> yes

11:09 <sb0> I'm not writing yet... just reading whatever garbage is in the memory array (which conveniently is pretty random)

11:09 <_florent_> sorry, so it's like idelay does not support multiple transisions?

11:09 <_florent_> transitions

11:10 <sb0> idelay would only *increase* the delay. for the higher DQ group which receives its read command later and therefore outputs its read data later, we need to *decrease* it.

11:11 <_florent_> indeed...

11:16 <sb0> it could be that the persistent read errors I was seeing were due to the SERDES sampling outside the burst, and it looked like data due to electrical noise ...

11:17 <sb0> I have added a hack to sdrrderr to see where the errors occur within the burst

11:19 <_florent_> ok, I have to go, will be back after lunch

11:19 <_florent_> BTW the DQS pulses do not trigger warnings in simulations

11:21 <_florent_> and I don't see others warnings/errors relative to timings

11:54 <sb0> yeah. they just make it harder to see if the DRAM is sampling the correct burst

11:54 <sb0> as it would still be a correct write (as there are more DQS pulses) if the DRAM sampling window is shifted by a few cycles

11:55 <sb0> the datasheet does say that DQS is don't care before the preamble and after the postambl ...

11:56 <GitHub141> [misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/jHo9BA

11:56 <GitHub141> misoc/master 2234f50 Sebastien Bourdeauducq: k7ddrphy: add bitslip control for incoming DQ

11:58 <sb0> alternatively, we can set DQ to zero when the data is invalid. this way, the DRAM behaves in a specified manner when its burst sampling window is off

11:58 <sb0> that could make it easier to debug, as we'd see zeros in the data read back, instead of some unspecified (and probably messy) behaviour

12:00 _florent_ has quit [Ping timeout: 240 seconds]

12:31 _florent_ has joined #m-labs

14:13 nicksydney_ has joined #m-labs

14:14 nicksydney has quit [Ping timeout: 252 seconds]

14:18 <_florent_> sb0, DQ is already set to 0 by LASMICON when it is not writing, so with memtest during preamble sys_clk and postamble sys_clk the PHY send zeros to the DRAM

14:20 <sb0> hmm, ok, dfii should do the same then

14:20 mumptai has joined #m-labs

14:20 <sb0> since we are doing this sort of PHY tweaking using dfii ...

14:21 <sb0> memory controllers do not _have_ to do it, since issues like burst window alignment should be sorted out at that stage ...

14:25 <_florent_> I've tried to change dqs_pattern on preamble and postamble but wasn't able to have something working due to OSERDESE2 strange behaviour...

14:28 nicksydney has joined #m-labs

14:28 nicksydney_ has quit [Ping timeout: 252 seconds]

14:38 <sb0> interesting. and what makes you think that the OSERDESE2 doesn't similarly act up when dealing with the data?

14:50 <_florent_> I don't know exactly by when asserting T1(oe) and applying preamble pattern the same sys_clk cycle new pattern does not seems used...

14:56 <sb0> ok I managed to get consistent data with all DQ groups, over 1.6 million reads

14:57 <sb0> I used one bitslip (3 pulses with the stupid DDR mode) on the three upper (high skew) DQ groups, plus some delay

14:57 <sb0> and some delay on the lower DQ group

14:59 <sb0> and by changing the lower 3 address bits, I get the expected burst reordering. so it sounds I'm definitely reading the DRAM array correctly.

15:01 <_florent_> great

15:01 <sb0> with lasmicon: 881/1048576 failed 32-bit words

15:01 <sb0> sdrwr seems to write the pattern correctly, though I have only made one attempt which statistically is not very significant.

15:02 <sb0> I've never had this low error rate though :-)

15:02 <_florent_> yes it's better that what I had

15:03 <sb0> but there can be non-PHY sources (eg bad timing in the controller), so the next test is to exercise writes to the page buffer with dfii

15:05 <sb0> I wonder how to automate this calibration.. right now it's just a lot of manual guesswork and it might not work on another PCB/SODIMM...

15:06 <_florent_> at least memtest with size of 2048 bytes and l2_size of 128 bytes does not trigger timings issues in the controller

15:08 <_florent_> for read leveling you can also use the DRAM pattern to find the center of the sampling window

15:08 <sb0> I guess you can assume that there is less than 2 bit time of skew across the module

15:09 <sb0> so during write leveling, if you are already sampling CK high at a DQS transition with no delay, then this DQ group has between 1 and 2 bit times of skew

15:10 <sb0> in this case, 1) move the DQS edge in the CK=0 zone before continuing leveling 2) add one bitslip on the read path

15:11 <sb0> finally, increase read delays for each DQ group until read data is consistent

15:11 <sb0> before the last step, you may want to fill the DRAM array with random data

15:11 <sb0> I guess this algo should work for simple DDR3 systems...

15:12 <sb0> well, "simple"

15:12 <sb0> DDR3 is a mess

15:14 <_florent_> IIRC LASMICON does not handles ODT

15:15 <sb0> isn't dynamic ODT only needed for multi-rank systems?

15:15 <_florent_> maybe the last errors we have come from here...

15:15 <sb0> and on single-rank systems, you can just drive ODT=1 all the time

15:16 mithro has joined #m-labs

15:19 <_florent_> ... yes sorry, I'm reading TN4104.pdf and it seems you are right

15:36 <_florent_> it seems we are also not using DCI termination on DQ/DQs

15:37 <_florent_> https://github.com/Elphel/eddr3/blob/master/phy/dq_single.v#L94

15:37 <_florent_> but it should impact only reads and not writes

15:54 <sb0> do we need that?

15:57 <_florent_> if reads are working without, maybe not

15:59 <_florent_> can you give me the configuration your are using on idelays to do some tests?

16:05 <sb0> http://pastebin.com/edL0SS68

16:05 <sb0> "mw 0xe0005010 1" is delay increase

16:06 <sb0> "mw 0xe0005008 128" selects the 7th DQ group (2**7=128) for delay manipulation

16:08 <_florent_> ok thanks

16:12 <sb0> the bitslip numbers are the DQ groups with one bitslip (3 pulses)

16:42 <_florent_> with only first module:

16:42 <_florent_> http://pastie.org/9519516

16:58 mumptai has quit [Ping timeout: 255 seconds]

18:01 <_florent_> in fact if we want the drive ODT=1 all time, we have to use Dynamic ODT: cf micron datasheet P192: Dynamic ODT Special Use Case

18:02 <_florent_> I'm going to change initsequence.py with that

19:34 <_florent_> got it!!! we _NEED_ DCI :)

19:34 <_florent_> with the 2 first modules (thus 16 bits) and only write leveling:

19:35 <_florent_> > 30000 on memtest without DCI

19:35 <_florent_> 30000 errors

19:35 <_florent_> 60 errors with DCI...

19:35 <_florent_> 0 errors after read leveling

19:36 <_florent_> sb0: I clean up my code and send you a patch

19:39 Alain has joined #m-labs

19:57 gric_ has joined #m-labs

20:15 Alain has quit [Quit: ChatZilla 0.9.90.1 [Firefox 31.0/20140716183446]]

21:22 _florent_ has quit [Read error: Connection reset by peer]