#milkymist on 2011-08-17 — irc logs at freenode.irclog.whitequark.org

01:33 <aw> 0x32: fix2b, stopped @ 'Bitstream length: 1484404' while reflashing...

01:35 <aw> 0x32: tp36 - 690mV, tp37 - 793mV

01:36 <aw> the voltage is not at correct Low or High, i am going to power off

01:39 <aw> voltage of tp36, tp37 is the same. if first flash was not successed before, it seems that keep to stop at 'length: 1484404'...go for another board.

01:40 <wolfspraul> hmm

01:41 <GitHub120> [scripts] xiangfu pushed 1 new commit to master: http://bit.ly/q2vsz7

01:41 <GitHub120> [scripts/master] add debug all to jtag - Xiangfu Liu

01:41 <xiangfu> aw, Hi

01:41 <aw> xiangfu, hi, any news?

01:42 <wolfspraul> aw: let's look at one more board

01:42 <wolfspraul> with fix2b

01:42 <wolfspraul> although I think we already know it's no the magic solution yet

01:43 <wolfspraul> if wpwrak is here we can look into 0x32 more, otherwise fix other bugs first, don't apply fix2b to a lot of boards until we know more

01:43 <wolfspraul> aw: on 0x32, have you checked D16?

01:43 <xiangfu> aw, I just update the reflash_m1.sh under for-rc3 folder, here: http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.sh

01:44 <GitHub21> [scripts] xiangfu force-pushed master from aa59a1a to dbd0372: http://bit.ly/nGHAhd

01:44 <GitHub21> [scripts/master] add debug all to jtag - Xiangfu Liu

01:44 <xiangfu> aw, it just enable the 'debug all' option for output more info for us to debug.

01:47 <aw> wolfspraul, yes, checked on D16...i found some interesting difference, second...will know soon to compare to the good one (0x39). ;-)

01:48 <aw> xiangfu, so with that 'debug all' with default ? , or i need to enable it?

01:48 <wolfspraul> we don't need that right now [debug all]

01:48 <aw> wolfspraul, wait

01:49 <aw> xiangfu, if this is just add 'debug all' option with default while I run it, i think that i can use it, why not?

01:49 <xiangfu> aw, http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.shÂ Â just updated, it is default.

01:50 <wolfspraul> it will not help with our problems

01:54 <aw> 0x39: D16 (in-circuit); forwarding voltage - 152mV, reversing voltage - 1545mV

01:54 <aw> 0x32: D16 (in-circuit); forwarding voltage - 153mV, reversing voltage - 1114mV

01:55 <aw> so I am going to replace a new D16 firstly to see if this problem

02:11 <wolfspraul> aw: I just noticed 0x32 is a board that never rendered before

02:11 <wolfspraul> that could be a different problem...

02:11 <aw> wolfspraul, yes

02:12 <aw> it seems that different catagory failure

02:12 <wolfspraul> possible

02:12 <wolfspraul> what's your latest results with 0x32 now?

02:13 <aw> btw: a good new diode(off-circuit): forwarding voltage - 153mV, reversing voltage - no voltage can measured

02:15 <wolfspraul> you replace D16 on 0x32 with a new one?

02:15 <aw> now 0x32: D16 (in-circuit) reversing voltage is 1114mV, but I replaced a new diode on board, it got 886mV, it must porgram_b loop let reversing voltage down a bit.

02:16 <wolfspraul> does flashing work?

02:16 <aw> also the replaced new D16, I tried to take apart and measure its reversing voltage is still good

02:16 <aw> wolfspraul, i didn't do reflashing

02:17 <aw> try again...if still can't reflashing ...leave it apart then

02:17 <wolfspraul> ok

02:17 <wolfspraul> let's look at 0x34 now

02:18 <aw> maybe just another failure classification

02:18 <wolfspraul> that one rendered before

02:18 <wolfspraul> what is TP36/TP37 on 0x32 now?

02:19 <aw> tp36 - 770mV, tp37 - 838mV , wrong

02:20 <wolfspraul> hmm

02:20 <wolfspraul> ok

02:20 <wolfspraul> try 0x34

02:20 <aw> yes, of course it can't reflashing and stop at 1484404

02:20 <aw> right

02:21 <wolfspraul> no need to test flashing with those tp36/tp37 values

02:22 <aw> no

02:22 <aw> i think this is good evidence. ;-)

02:22 <wolfspraul> no

02:23 <wolfspraul> makes no sense. I trust the tp36/tp37 values we measure.

02:23 <wolfspraul> 100%

02:23 <aw> 0x34: D16 (in circuit) forwarding - 154mV , rev. V - 1547 mV

02:23 <wolfspraul> once we have hard data, let's use it

02:23 <wolfspraul> well

02:23 <wolfspraul> aw: one by one

02:23 <wolfspraul> can we measure meaningful data in-circuit or not?

02:24 <wolfspraul> if not, let's stop doing it

02:24 <aw> wait wait

02:24 <wolfspraul> if yes, those values mean the diode is damaged and needs to be replaced?

02:24 <aw> let's test more boards and we see how reasonable For. & Rev. voltage they would be.

02:24 <wolfspraul> wait

02:25 <wolfspraul> I don't want all sorts of random data

02:25 <wolfspraul> that's a bad time waste

02:25 <aw> well

02:25 <wolfspraul> aw: is the data meaningful?

02:25 <aw> now 0x34 can reconfigure surely

02:25 <wolfspraul> did you apply fix2b to 0x34 already?

02:25 <aw> yes.

02:26 <aw> you are like baby-watching though. ;-) no problems

02:26 <wolfspraul> yes, sorry. try to understand the test data ;-)

02:26 <aw> it must somewhere let in-circuit voltage gets low (without power on)

02:26 <GitHub15> [scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd

02:26 <GitHub15> [scripts/master] add debug all to jtag - Xiangfu Liu

02:27 <aw> so later we consult with Werner, he may provide more details to us maybe. ;-)

02:27 <wolfspraul> aw: you talk about measuring D16 performance in-circuit?

02:27 <aw> yes

02:28 <aw> For. & Rev. voltage measured before power-ed -on but in-circuit.

02:28 <wolfspraul> ok

02:28 <wolfspraul> alright, back to 0x34

02:28 <wolfspraul> so it is booting now?

02:29 <wolfspraul> I think you should reflash (reflash_m1.sh), and re-run all tests and rendering cycles (10)

02:29 <aw> now 0x32 has worse Rev. voltage (below 1545mV), this means somewhere others influence D16's specification/behavior

02:29 <aw> now to reflashing. ;-)

02:30 <wolfspraul> see how it goes...

02:30 <wolfspraul> aw: maybe the reset ic on 0x32 has a problem?

02:30 <wolfspraul> (guessing)

02:31 <aw> xiangfu, wow..man! your debug log msg is many..let's see...reflashing now...;-)

02:31 <xiangfu> aw, for disable is. just remove the whole "debug all" line, just fyi

02:32 <wolfspraul> I was worried about that. I hope your terminal history is enough. You may have to increase it so that we don't loose data.

02:32 <wolfspraul> xiangfu: maybe it should be disabled by default

02:32 <wolfspraul> we are currently (as of right now) not aware of any problem that 'debug all' may help us with

02:32 <wolfspraul> so we can enable it when we run into such a problem

02:32 <aw> wolfspraul, not enough to show history. ;-)

02:32 <wolfspraul> yeah, well

02:33 <xiangfu> this commit disable it by default: GitHub15> [scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd

02:33 <aw> no problem, just try first one. ;-)

02:34 <aw> then i remove "debug all" line. :)

02:38 <aw> msg log stops at http://pastebin.com/M1ezi1AG

02:39 <aw> but m1 led still flashs, so it still in flashing i think...let's wait one more minutes

02:40 <aw> xiangfu, if its log is wrong, directly tell me.

02:41 <aw> @ full speed, it seems needs more much time...good now led2 and led3 are still flashing...let's see

02:42 <wolfspraul> ok

02:43 <wolfspraul> first of all, we turn off debug all

02:43 <wolfspraul> it's a bad idea to turn it on all the time

02:43 <aw> yup

02:43 <wolfspraul> full-speed should not matter much but I'm guessing, I can try here to set a baseline if that helps

02:44 <wolfspraul> if your log is screwed up now, you should redo the flashing

02:44 <wolfspraul> not "wait for some minutes"

02:44 <wolfspraul> that sounds wrong

02:44 <aw> leds are still flashing...;-)

02:44 <wolfspraul> what does urjtag do on your notebook now?

02:44 <wolfspraul> still logging something?

02:45 <aw> yes still logging output

02:45 <aw> stop it anyway?

02:45 <wolfspraul> the crazy 'debug all' output?

02:45 <aw> :-) don't know then

02:46 <wolfspraul> yes, stop and redo

02:46 <wolfspraul> without 'debug all'

02:47 <aw> redo now...

02:49 <aw> 0x34 good now...at least not stop at crazy 'length: 1484404' ;-)

02:51 <aw> good new: finished reflashing successfully.

02:51 <wolfspraul> :-)

02:52 <wolfspraul> xiangfu: can reflash_m1.sh log stdout and stderr into a file?

02:53 <wolfspraul> Adam could run it with redirection too like > urjtag_0x32.log 2>&1

02:53 <aw> crc checked okay

02:53 <aw> now go to rendering for 10 times

02:53 <wolfspraul> aw: wait

02:53 <wolfspraul> one idea for when you run reflash_m1.sh

02:53 <wolfspraul> so right now you just execute "./reflash_m1.sh", right?

02:54 <aw> sudo ./reflash_m1.sh 00 34

02:54 <wolfspraul> but you can run "sudo ./reflash_m1.sh 00 34 >> urjtag_0x32.log 2>&1"

02:54 <aw> listening..and standby

02:54 <wolfspraul> the >> should append to that log file, so even if you run multiple times it will be added to the end of the log file

02:55 <wolfspraul> and the 2>&1 will redirect error messages into the log file as well

02:55 <xiangfu> wolfspraul, yes.

02:55 <wolfspraul> xiangfu: does this work? can you try?

02:55 <xiangfu> yes

02:55 <wolfspraul> sometimes there are issues with redirecting bash scripts and sub-processes...

02:55 <xiangfu> works just fine.

02:55 <wolfspraul> well, then we should tell Adam

02:55 <wolfspraul> it helps him

02:56 <wolfspraul> aw: next time you reflash a board, try that

02:56 <wolfspraul> for example for 0x34, it would be:

02:56 <wolfspraul> sudo ./reflash_m1.sh 00 34 >> urjtag_0x34.log 2>&1

02:56 <aw> wolfspraul, do you mean that if everytime I run the same commands above, the massage will be "added" increasingly to .log file?

02:56 <wolfspraul> yes correct

02:57 <wolfspraul> so you only need to watch the board number

02:57 <wolfspraul> so you don't write into the wrong log file

02:57 <wolfspraul> you can collect all log files on your disk, and upload to the downloads server later

02:57 <wolfspraul> also saves time

02:57 <wolfspraul> just remember two things:

02:57 <aw> okay...good so that i dont need to do such stupid work "copy" and "paste" fro terminal. ;-)

02:57 <wolfspraul> 1. use >> (two characters, not one)

02:57 <wolfspraul> 2. always use the same board number in the reflash_m1.sh parameter and the name of the log file

02:58 <wolfspraul> yes

02:58 <aw> try now...second

02:58 <wolfspraul> xiangfu: can you try that this really works?

02:58 <wolfspraul> I don't have my m1 here right now

02:59 <xiangfu> wolfspraul, yes. I am running that now. only the output will be a little confuse since there are ^M when eraseflash. but it's ok

02:59 <wolfspraul> don't understand

03:00 <wolfspraul> where does the ^M come from, and where is it written to?

03:00 <wolfspraul> the log file?

03:02 <xiangfu> when you open the log file with VIM or Emacs there will be a little confuse, but open with 'gedit' will ok.

03:02 <wolfspraul> ok so it goes into the log file - good

03:02 <wolfspraul> and where does it come from?

03:02 <wolfspraul> and why?

03:02 <aw> after I types that commands above, the terminal doesn't show msg log, it should write directly into .log file

03:02 <wolfspraul> maybe remove it?

03:02 <wolfspraul> aw: yes, the terminal will show nothing

03:03 <wolfspraul> that's a little unfortunate if you run into an error

03:03 <wolfspraul> but eventually reflash_m1.sh will stop

03:03 <wolfspraul> and then you can look in the log file

03:03 <aw> can it show also msg log in the terminal? so i can see how it goes on..

03:03 <xiangfu> wolfspraul, when eraseflash it output like: (0% Completed) FLASH Block 0 : Unlocking ... Erasing ... Ok.

03:03 <xiangfu> wolfspraul, then ^M not '\n' %1 .. %2

03:04 <wolfspraul> alright, don't know whether that's the best/right but no time now :-)

03:04 <wolfspraul> xiangfu: one thing we could do is this:

03:04 <xiangfu> wolfspraul, so when you open with VIM there will be a BIG line. 0% --> 100% but this is ok in gedit

03:04 <xiangfu> aw, you can use : /reflash_m1_rc3.sh 00 2a 2>&1 | tee >> log

03:04 <wolfspraul> the reflash_m1.sh makes the stdout/stderr redirection inside the script, into the file, and also shows it on its own stdout/stderr

03:04 <xiangfu> aw, then you will get output both under terminal and log

03:05 <wolfspraul> ah

03:05 <wolfspraul> good idea

03:05 <wolfspraul> but let's be more precise please

03:05 <wolfspraul> sudo ./reflash_m1.sh 00 34 2>&1 | tee >> urjtag_0x34.log

03:06 <wolfspraul> why is there a special _rc3.sh btw?

03:06 <wolfspraul> aw: can you try that new line?

03:07 <wolfspraul> just reflash again with that line: sudo ./reflash_m1.sh 00 34 2>&1 | tee >> urjtag_0x34.log

03:07 <aw> xiangfu, is it the newest one: http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.sh

03:07 <wolfspraul> man I hope we have only one script

03:07 <wolfspraul> :-)

03:08 <wolfspraul> aw: don't change anything with your script now, it worked before

03:08 <wolfspraul> don't touch it

03:08 <wolfspraul> just try the new line and add: 2>&1 | tee >> urjtag_0x34.log

03:09 <aw> i am asking that script if it's with default settings and newest?

03:09 <wolfspraul> don't touch your script

03:09 <aw> i want to download it again. ;-)

03:09 <wolfspraul> it worked before, it works now

03:09 <wolfspraul> NO!

03:09 <wolfspraul> you can only get new bugs :-)

03:10 <wolfspraul> aw: let's try the new line, and add: 2>&1 | tee >> urjtag_0x34.log

03:10 <aw> okay

03:13 <aw> terminal doesn't show up msg. :(

03:14 <aw> not parallel , so that i don't see anything in time.

03:14 <wolfspraul> hmm

03:14 <wolfspraul> maybe a side-effect of sudo?

03:14 <wolfspraul> of course it's not properly tested before, sorry about that

03:15 <aw> don't know

03:15 <aw> forget about this now. ;-)

03:15 <wolfspraul> wait

03:15 <aw> just copy and paste

03:15 <wolfspraul> wait one moment

03:16 <wolfspraul> hmm

03:16 <wolfspraul> aw: so you flashed the board?

03:16 <wolfspraul> and there was no output in the terminal?

03:16 <wolfspraul> or all at the end?

03:17 <aw> one point: a good command can let me stop anytime and it can still write into .log file and also shows up them in terminal though. I hope . ;-)

03:17 <wolfspraul> yes sure it's easy. just needs to be properly tested and done.

03:17 <aw> yes, no any msg shows up in termianl with commands above

03:17 <aw> now reflashed is doone

03:18 <wolfspraul> hmm

03:18 <wolfspraul> wait

03:18 <aw> used gedit to open .log file, it's okay

03:19 <wolfspraul> let's try one more random idea

03:19 <wolfspraul> if this doesn't work, then we need to get this right first, then talk to you :-)

03:19 <wolfspraul> but one more, here it is:

03:20 <wolfspraul> ah wait

03:21 <aw> m

03:21 <wolfspraul> xiangfu's line was wrong

03:21 <wolfspraul> :-)

03:21 <wolfspraul> try this:

03:22 <wolfspraul> sudo ./reflash_m1.sh 00 34 2>&1 | tee -a urjtag_0x34.log

03:22 <aw> okay

03:22 <wolfspraul> xiangfu: don't you think tee >> log is wrong? Adam needs tee -a log

03:23 <xiangfu> wolfspraul, both are ok. I have tested. with >>

03:23 <aw> mm..now terminal shows msg. ;-)

03:23 <xiangfu> but yes. sounds like -a is better

03:24 <wolfspraul> Adam wants to see the output

03:24 <aw> btw. can i use "ctrl + C" while reflashing..if I see reflashing stops

03:24 <wolfspraul> so if he uses >>, then the tee output is gone (no file parameter for tee)

03:24 <wolfspraul> aw: yes you can use ctrl-c, no hesitation

03:24 <aw> ans still can write into log file, which won't interrupt by my CTRL + C?

03:25 <wolfspraul> sure, it will all interrupt, like before

03:25 <wolfspraul> but the log is written

03:25 <aw> s/ans/and

03:25 <aw> hmm..okay..good

03:25 <aw> thanks

03:25 <wolfspraul> the log is always safe, no worries

03:25 <wolfspraul> you cannot loose anything in the log

03:25 <aw> good

03:25 <wolfspraul> just remember the syntax of the line

03:25 <wolfspraul> 2>&1 | tee -a urjtag_0x34.log

03:25 <wolfspraul> that will always append to the log, perfect for our use

03:26 <aw> yes, i recorded into my file already. ;-)

03:26 <wolfspraul> of course you need to make sure the filename has the correct board number

03:26 <wolfspraul> so whenever you work on a particular board, you add to the log file for that board

03:26 <wolfspraul> then upload all log files to the downloads server

03:26 <aw> okay

03:26 <wolfspraul> so...

03:27 <wolfspraul> back to 0x34 :-)

03:27 <wolfspraul> keep us posted

03:27 <aw> sure

03:27 <aw> you can go to the server folder to see log file when you back. :)

03:27 <wolfspraul> he

03:27 <wolfspraul> the bigger problem is what we saw on 0x32

03:28 <wolfspraul> but let's finish 0x34 now

03:28 <aw> reflashed done again

03:28 <aw> let's test it

03:28 <wolfspraul> after that is 0x39, also good (rendered before)

03:28 <wolfspraul> but 0x3A did not render before

03:28 <wolfspraul> anyway one by one

03:29 <wolfspraul> it's tough to mix design uncertainties with production surprises...

03:29 <wolfspraul> but we get through it

03:31 <aw> how about "flterm --port /dev/ttyUSB0 --kernel boot.bin"?

03:31 <aw> can it be added ">>" to log file too?

03:32 <aw> i still use stupid copy/paste method. :)

03:33 <wolfspraul> wait

03:33 <wolfspraul> you can add 2>&1 | tee -a log_file

03:34 <wolfspraul> I think we should write the urjtag and flterm into the same log file

03:34 <wolfspraul> so let's give it another name

03:34 <wolfspraul> for example rc3_0x34.log

03:34 <wolfspraul> so that would be:

03:34 <wolfspraul> 1. sudo ./reflash_m1.sh 00 34 2>&1 | tee -a rc3_0x34.log

03:35 <aw> since I'll test 10 times, so the log will be longer

03:35 <aw> okay

03:35 <wolfspraul> 2. flterm --port /dev/ttyUSB0 --kernel boot.bin 2>&1 | tee -a rc3_0x34.log

03:35 <aw> try now

03:36 <wolfspraul> even if you run another script like read_flash_m1.sh, you can append to the same log file

03:36 <xiangfu> flterm is different

03:36 <wolfspraul> read_flash_m1.sh 2>&1 | tee -a rc3_0x34.log

03:36 <wolfspraul> oops

03:36 <wolfspraul> :-)

03:36 <wolfspraul> xiangfu: alright, what works?

03:37 <aw> hmm...seems 'flterm' doesn't accept other parameters .:(

03:40 <wolfspraul> do copy/paste for now

03:40 <aw> it wrote logs as: http://pastebin.com/R0uFzSN9

03:40 <xiangfu> wolfspraul, it needs modify the flterm source code for log

03:40 <wolfspraul> but you can already use the name rc3_0x34.log when running reflash_m1.sh

03:40 <wolfspraul> it's a better name

03:41 <aw> not fully all msg saved into log file while using 'flterm'

03:41 <aw> sure sure

03:41 <aw> done

03:41 <wolfspraul> xiangfu: or we need to find a terminal program that supports logging/stdout somehow

03:41 <wolfspraul> adam needs practical solutions now. which is copy/paste for flterm

03:41 <wolfspraul> and the tee thing for reflash_m1.sh

03:41 <wolfspraul> xiangfu: if you can find an easy solution for terminal logging, tell us :-)

03:43 <xiangfu> yes

03:45 <xiangfu> aw, you can wrap the reflash_m1.sh to another script file like:

03:45 <xiangfu> #!/bin/bash

03:45 <xiangfu> mkdir -p log

03:45 <xiangfu> ./reflash_m1.sh $1 $2 2>&1 | tee -a log/urjtag_$2.log

03:45 <xiangfu> then you will not worry about the log name.

03:46 <aw> good solutions! thanks.

04:24 <aw> 0x34 rendering pass

04:34 <wolfspraul> nice

04:35 <wolfspraul> 0x39 now?

04:41 <aw> 0x39 I wrote reflash log again. will upload

04:42 <aw> rework 0x3a now

04:46 <wolfspraul> aw: what happened on 0x39 ?

04:46 <aw> 0x39: this was successfully yesterday . ;-)

04:46 <wolfspraul> ah ok, all pass

04:46 <wolfspraul> oh, forgot

04:46 <wolfspraul> confused

04:46 <wolfspraul> so 0x3A now, got it

04:47 <aw> yes

04:47 <wolfspraul> ok - 0x3A now, then 0x3C

04:47 <wolfspraul> 0x3A did never render before, 0x3C did

04:47 <wolfspraul> let's see...

04:48 <aw> 0x3A: D16(in circuit) For.V.=153mV, Rev.V.=1120mV, can reconfigure.

04:48 <aw> mm this is not the same 0x32. ;-)

04:48 <aw> measure tp36, tp37 for records first

04:49 <aw> 0x3A histories: never reflashed successfully before

04:52 <aw> 0x3A: tp36 - 2.66V, tp37 - 2.91V, no good; it must be reached to rough 3.3V

04:52 <aw> try to reflash now

04:54 <aw> mm..yes ...stop at 'Bitstream length: 1484404'

04:55 <wolfspraul> hmm

04:55 <aw> so once the tp36, tp37 voltage is not high enough, reflashing must be unsuccessful

04:55 <wolfspraul> interesting

04:55 <wolfspraul> oh sure

04:55 <aw> i leave 0x3A apart now.

04:55 <wolfspraul> wait

04:55 <wolfspraul> thinking

04:55 <aw> mm

04:56 <aw> or I replace a new diode . ;-)

04:56 <aw> let's do it. ;-)

04:56 <wolfspraul> wait

04:57 <wolfspraul> you mean replace D16 ?

04:57 <aw> yes

04:57 <wolfspraul> no I'm against that

04:57 <wolfspraul> I don't want to make random experiments

04:57 <wolfspraul> what is the theory behind that?

04:57 <wolfspraul> there is none

04:57 <wolfspraul> so - no

04:57 <wolfspraul> let me think for a moment

04:58 <aw> since the tp37, tp36 is directly connected to diode

04:58 <aw> mmm

04:58 <wolfspraul> ok but I want to think more, not randomly switch parts

04:58 <aw> if diode (in-circuit) is not fully acted as 0x39

04:58 <aw> alright..just discuss first

04:58 <wolfspraul> I still don't know whether those numbers are meaningful, when measure in-circuit

04:59 <wolfspraul> so it's just noise

04:59 <wolfspraul> let's see. so far we applied fix2b to 4 boards: 0x32 0x34 0x39 0x3A

05:00 <wolfspraul> 0x39 was the first one, and where we built the fix2b theory.

05:00 <wolfspraul> 0x34 works

05:00 <aw> 0x39, 0x34 with good diode(in-circuit) also tp36, tp37 are all good

05:00 <wolfspraul> 0x32 and 0x3A do not work. both never rendered before, and now they show bad tp36/tp37 values

05:00 <wolfspraul> so far all correct?

05:01 <aw> 0x32: no good on D16(in-circuit): For.V. = 153mV, Rev.V. = 1114mV

05:01 <aw> 0x3A: relatively D16(in circuit) For.V.=153mV, Rev.V.=1120mV, can reconfigure.

05:02 <aw> 0x3A: tp36, tp37 voltage is not pull high enough

05:02 <wolfspraul> what do you mean with "relatively"?

05:03 <wolfspraul> ok - let's measure forward and reverse voltage of D16 on 0x34. what is it there?

05:03 <aw> 0x32: tp36 - 900mV, tp37 - 1.1V

05:04 <aw> 0x34:Â Â D16(in-circuit), For. V. = 154mV, Rev. V. = 1547mV

05:04 <wolfspraul> I found it. "0x34: D16 (in circuit) forwarding - 154mV , rev. V - 1547 mV"

05:04 <aw> i noitced if good diode(in-circuit) the Rev.V needs to be 1545mV

05:05 <aw> For.V is almost ~153mV

05:05 <wolfspraul> and you think the difference between ca. 1120mV and ca. 1545mV is the difference between bad and good?

05:06 <aw> if Rev. V is lower. means that could be have few current leakage

05:06 <aw> well..i just noticed but no theory to approve it

05:06 <wolfspraul> well since nobody else is awake, just try

05:07 <wolfspraul> random is fun ;-)

05:07 <wolfspraul> so you put a new D16 on 0x3A ?

05:07 <wolfspraul> and measure the old one after it's removed...

05:08 <aw> so probably only both in-circuit and tp36 tp37 are all correct. then d2/d3 is fully off and can reflash successfully

05:08 <aw> let's try to replace now. ;-)

05:09 <aw> mmm...this made me think C238

05:10 <wolfspraul> check C238

05:10 <aw> since program_b is one of the terminal of D16, also connected to C238, if my soldering is no good, thus C238 may be also not good a little.

05:11 <wolfspraul> what do you do now?

05:11 <wolfspraul> you put a new D16 on 0x3A?

05:11 <aw> mm try now

05:14 <aw> D16 I took apart is perfect: For.V = 154mV, Rev.V.=no value.

05:14 <wolfspraul> ok, so the problem was elsewhere

05:14 <aw> yes

05:15 <wolfspraul> but we still know good values for D16 when measured in-circuit, which seems to be ca. 150mV and ca. 1550 mV

05:15 <wolfspraul> so put the new one on, and measure

05:15 <wolfspraul> maybe the problem is C238, or somewhere else?

05:16 <aw> 0x3A: tp37 - 3.29V perfect, tp36 - 900mV

05:16 <aw> thinkning

05:19 <wolfspraul> what voltages do you measure on the new D16 (in-circuit) now?

05:20 <aw> mm...possible points: 1. C238 is not good quality after soldering 2. reset out

05:20 <aw> no

05:20 <aw> i haven't soldered new diode on boards

05:20 <wolfspraul> ah

05:20 <aw> somewhere is wrong to let tp36 not pull high enough

05:21 <wolfspraul> replace C238? replace reset ic? (I'm just guessing)

05:21 <aw> i am going to replace a new c238 first

05:21 <wpwrak> aha ! more tests :) lemme catch up ...

05:22 <aw> welcome!

05:23 <aw> i need your help. ;-)

05:23 <wpwrak> (fpga) oh, a bit of techno-mumble never hurts :)

05:23 <wolfspraul> just the right time for the savior, and I have to run to a meeting with Jon...

05:24 <wolfspraul> (in a little bit)

05:31 <aw> good

05:31 <aw> after replace a new C238

05:31 <wpwrak> still catching up .. lots of stuff :)

05:31 <aw> d2/d3 is fully OFF. man!

05:31 <aw> i hate myself though

05:32 <wolfspraul> focus

05:32 <wolfspraul> tp36/tp37 good now?

05:32 <wolfspraul> reflashing?

05:32 <aw> tp36 and tp37 go back to good 3.3V

05:32 <aw> now to solder diode back again

05:32 <wolfspraul> ok

05:33 <wpwrak_> let's parallelize this :)

05:33 <aw> from now on ...soldering back diode I always use a new one. ;-)

05:34 <wpwrak_> lots of bad diodes ?

05:35 <wolfspraul> aw: of course!

05:35 <wolfspraul> come on we don't need to slow ourselves down for trying to save 20 cent items

05:36 <wolfspraul> every chance that a diode is unsoldered, of course is a chance to put a new one there

05:36 <aw> good now

05:36 <wpwrak_> (0x32) that's all after removing the INIT_B diode ?

05:36 <wolfspraul> where are you reading now?

05:36 <wolfspraul> we are a bit ahead already

05:36 <aw> 0x3A: D16(in-circuit): For.V. = 153mV, Rev.V = 1547mV

05:36 <wolfspraul> yes perfect

05:36 <aw> 0x3A: tp36 and tp37 are all 3.29V

05:37 <wolfspraul> wpwrak_: I don't think a lot of diode problems

05:37 <aw> so there's big FACTs now:

05:37 <wpwrak_> (replace parts) in general, i would try to discard anything that got unsoldered (unless really really difficult to replace)

05:37 <wolfspraul> correct, fully agree

05:37 <wpwrak_> wolfspraul: i'm around "let's look at 0x34 now". just started catching up

05:37 <wolfspraul> wpwrak_: basically we have a reference value for D16 now when measured in-circuit - ca. 150mV forward, 1545mV reverse

05:38 <wolfspraul> when we see those numbers, we can assume D16 and C238 to be correct

05:38 <wpwrak_> sounds reasonable. those 1.5 V are some obscure path, but that's the price of measuring in-circuit

05:38 <wolfspraul> wpwrak_: ok, read top to bottom first...

05:38 <aw> 1. before I go to test these boards, just go for measure in-circuit voltage of D16, if not right. must be some other area is wrong, typical C238 and diode itself

05:39 <wolfspraul> aw: let's try to fix 0x32 now

05:39 <wpwrak_> ah, C238 acts up too ? interesting :) reading

05:39 <aw> 2. measure tp36 tp37 to confirm if 3.3V high enough

05:40 <aw> good now is reflashing.....this won't stop at 1484404 there. ;-)

05:40 <aw> now we have clear direction to fix these kinds of bugs. ;-)

05:41 <aw> but bugs belongs to me Adam...;-)

05:41 <aw> oah...man!

05:43 <aw> after reflash 0x3A, let;s back to 0x32. ;-)

05:44 <aw> oah~ no. 0x3A is d2/d3 dimly lit after reflash. :(

05:45 <wolfspraul> no problem

05:45 <wolfspraul> actually that's good

05:45 <wolfspraul> aw: measure TP36/TP37

05:45 <aw> tp36, tp37 is still 3.3V. good

05:45 <wolfspraul> D16 forward/reverse (in-circuit)

05:46 <aw> need to power off to measure

05:46 <wolfspraul> wait

05:46 <wolfspraul> d2/d3 is dimly lit right now?

05:46 <aw> yes

05:46 <wolfspraul> what was the process?

05:46 <wolfspraul> 1. you ran reflash_m1.sh

05:46 <wolfspraul> 2. it succeeded

05:46 <wolfspraul> then what?

05:46 <aw> wait

05:46 <wolfspraul> you power cycled?

05:46 <wolfspraul> or press middle button?

05:47 <aw> 1. I ran reflash_m1.sh

05:47 <wolfspraul> wpwrak_: caught up?

05:47 <aw> 2. do nothing....until it terminal log shows finished and saw d2/d3 dimly lit

05:48 <aw> i did nothing though. ;-)

05:48 <wolfspraul> huh? did it finish flashing?

05:48 <aw> no power off

05:48 <wolfspraul> can you upload the log?

05:48 <aw> yes, this failure was few cases in first round of tests though

05:48 <aw> okay

05:48 <wpwrak_> not yet. currently at the i/o redirection. maybe consider using "script"

05:48 <wolfspraul> it may be a software problem only

05:49 <wolfspraul> wpwrak_: ok so when you make it here :-)...

05:49 <wolfspraul> basically fix2b worked well for 0x39 (yesterday) and 0x34

05:49 <wolfspraul> it did not work for 0x32 and 0x3a (values see above)

05:50 <wolfspraul> on 0x3A, it turned out that replacing D16 and C238 made it work (well, not 100% sure yet, see the dimly lit story just unfolding)

05:50 <wpwrak_> btw, does reflashing still use "debug all" ?

05:50 <wolfspraul> no

05:50 <aw> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/log/urjtag_3A.log

05:50 <wolfspraul> I killed that :-)

05:50 <wpwrak_> (debug all killed) good :)

05:51 <aw> when you saw log, there's stop 1484404 there, after that I replaced C238 and diode. then can reflashed. ;-)

05:51 <aw> but do nothing once reflashed done

05:51 <wolfspraul> looks good

05:51 <aw> yes

05:51 <wolfspraul> still dimly lit now?

05:51 <aw> sure

05:52 <wolfspraul> press the middle button

05:52 <aw> no any flash on leds

05:52 <aw> no boot up

05:53 <wolfspraul> ok

05:53 <wolfspraul> now - power cycle

05:53 <aw> now tp37 tp36 is stll good 3.3V

05:53 <wolfspraul> ah wait

05:53 <wpwrak_> what's the voltage on INIT_B ?

05:53 <wolfspraul> no power cycle

05:53 <aw> can't reconfigure after power cycle

05:53 <aw> wpwrak, bad..

05:54 <aw> i powered

05:54 <wolfspraul> before we do measurements, I suggest to disconnect/reconnect the jtag-serial board, and flash again (remember to check that you flash in usb full-speed)

05:55 <wolfspraul> this board was just flashed for the very first time, so it could be related to that

05:56 <wpwrak_> a virgin board. maybe it's a little shy :)

05:56 <aw> moment...the init_b is now at bottom side..phew~

05:56 <wpwrak_> aw: ;-)

05:56 <wolfspraul> aw: I suggest - reseat jtag-serial board, flash again

05:56 <wolfspraul> maybe there was a problem writing into nor, whatever problem

05:56 <wpwrak_> an item for the shopping list: lab at zero gravity ;-)

05:57 <wolfspraul> and this was the first flashing. so it may be something totally different from our 'permanent reset' issue before.

05:57 <aw> wpwrak, init_b = 3.3V while d2/d3 dimly lit

05:58 <wpwrak_> that means that the FPGA is happy

05:58 <aw> so now power off and replug jtag board and reflash again?

05:58 <wpwrak_> maybe see if you can load the test program ?

05:59 <wolfspraul> won't work, no reconfig

05:59 <aw> wpwrak, once d2/d3 dimly lit, the middle btn is no action so that can not enter test s/w

05:59 <wpwrak_> ah, i see

05:59 <wolfspraul> aw: disconnect/reconnect jtag-serial, reflash

05:59 <aw> okay

05:59 <wpwrak_> INIT_B = 3.3 V means either that the FPGA didn't even begin to reconfigure, or that it succeeded

06:00 <wolfspraul> wpwrak_: theoretically a boot path entirely over jtag/fpga/sdram could be written, but a number of pieces are missing now

06:00 <wolfspraul> I think we can load the bitstream over jtag, but then the bios has to come from nor

06:00 <wolfspraul> but even for that we have no scripts ready now, right now

06:01 <wpwrak_> wolfspraul: you need a devirginator ;-)

06:01 <wpwrak_> (like we had at openmoko)

06:01 <wolfspraul> yes I know

06:02 <wolfspraul> people complained to me about inappropriate naming of technology by some rogue staff...

06:02 <wpwrak_> ;-))))

06:02 <wolfspraul> to which I said it's beyond my control :-)

06:02 <wpwrak_> so somebody noticed. i was wondering ;-)

06:03 <wolfspraul> oh sure. this is actually not so pleasant to talk through with Taiwanese staff, female staff, etc.

06:03 <wolfspraul> but we are all for free speech etc.

06:03 <wolfspraul> in the US you would be in big trouble

06:04 <wpwrak_> yeah. i never expected the name to stay around for long. so i'm quite surprised it did :)

06:05 <wolfspraul> the problem is they take it serious, look it up in a dictionary etc.

06:05 <wolfspraul> not so good

06:05 <wolfspraul> :-)

06:05 <wpwrak_> oh dear :)

06:05 <wolfspraul> here you go. devirginator "A person who consistently sleeps with virgins i.e. removes their virginity or pops their cherry. Can be male or female."

06:06 <wolfspraul> want me to discuss this with Taiwanese staff? no! please not!

06:06 <wpwrak_> i think i got the idea from someone calling fresh-from-the-fab boards "virgin" boards

06:06 <wpwrak_> heh :)

06:06 <wolfspraul> well. they look it up.

06:06 <wolfspraul> and that's what they find

06:06 <wpwrak_> duly noted. need to find more obscure names

06:06 <wolfspraul> nicely explained in Chinese maybe even

06:07 <wpwrak_> the depravity of us westeners

06:08 <wolfspraul> I should have suggested they schedule it to be added as an 'new words seen in the office' for the weekly English class

06:08 <wolfspraul> move the problem to that teacher, so they earn their money...

06:09 <wpwrak_> ;-))

06:10 <wpwrak_> make sure they all use it in daily conversation with other people :)

06:12 <wpwrak_> our current board is 0x32, right ?

06:14 <wpwrak_> to see what's happening, maybe monitor TP35 (DONE) with a scope when power cycling

06:14 <wpwrak_> even better: monitor INIT_B too

06:15 <wolfspraul> no it's 0x3A now

06:16 <wolfspraul> but same case as 0x32 in that before fix2b, it never flashed or rendered

06:16 <wolfspraul> aw: any update on 0x3A ?

06:16 <wolfspraul> Adam is a little silent :-)

06:16 <wolfspraul> wpwrak_: I vaguely remember one case in the US where a developer did something similar, naming some internal little tool in an 'inappropriate' way

06:17 <wolfspraul> well, he had a nice little chat with general counsel or CEO or so, and then it got 'cleaned up' :-)

06:17 <wolfspraul> all fine with his job etc. but that kind of stuff will just not be tolerated in the corporate US world

06:18 <wolfspraul> so he ran around frantically trying to erase all traces of his neat little tool :-)

06:18 <wolfspraul> the pussies are in control

06:18 <wolfspraul> :-)

06:19 <wolfspraul> ah Adam just told me he got interrupted, back soon. and I'm out to meet Jon. crossing my fingers...

06:19 <wpwrak_> (us) yeah, that was of course part of the fun. knowing that this would never fly over there :)

06:20 <wpwrak_> 0x32 is in limbo, too ?

06:20 <wolfspraul> put aside

06:20 <wolfspraul> at that point we wanted to see some more fix2b results first

06:20 <wolfspraul> because 0x32 never rendered before

06:21 <wpwrak_> (more fix2b) sounds fair

06:21 <wolfspraul> then we did 0x34 (which rendered before and fix2b turned it all good)

06:21 <wolfspraul> and then 0x3A (which initially behaved same as 0x32 but then with C238 it got a little further, eventual resolution pending)

06:21 <wolfspraul> that's where it stands now

06:22 <wolfspraul> Adam thought 0x3A is a done deal, and he wanted to go back to 0x32, but then of course a problem still did show up on 0x3A

06:22 <wpwrak_> in murphy we trust

06:23 <aw> alright

06:23 <aw> i am back

06:23 <wolfspraul> ah, but I need to run. l8 and good luck!

06:23 <aw> wolfspraul, sure

06:23 <aw> wpwrak, so you got all histories of this moring test. ?

06:23 <aw> wpwrak, hehe..

06:24 <wpwrak_> still working on the backlog

06:24 <aw> alright

06:24 <aw> i think now i leave 0x3A apart firstly and back to see 0x32

06:24 <aw> ;-)

06:25 <aw> but before this, i need to record first

06:25 <wpwrak_> but i think if a board has okay voltages (after fix2b) but still has dim LEDs, the things to look at (with a scope) would be DONE and INIT_B. if INIT_B is inconvenient, use PROGRAM_B instead.

06:26 <aw> i see. now mine isÂ Â 0x3A

06:26 <wpwrak_> DONE = TP35. at least that's easy :)

06:26 <aw> so okay...that's scope TP35 to trigger with program_b?

06:27 <wpwrak_> hmm, okay, trigger on PROGRAM_B rising

06:27 <aw> man! 0x3A now is dimly lit again after power cycle

06:28 <aw> let's see tp36, tp37 normal voltage first again

06:28 <wpwrak_> let's say 100 ms/div, peak, ~3 div before, ~7 div after the trigger

06:28 <aw> tp36 tp37 is still 3.29V good enough

06:28 <aw> okay

06:30 <aw> need to solder wire on TPs...second

06:39 <aw> 0x3a: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3a_ch-program_b_ch2-done.JPG

06:39 <aw> ch1-TP36, ch2-TP35

06:40 <aw> wpwrak, i think i need to scope init_b though trigger with program_b .;-)

06:41 <wpwrak_> hmm, never finished configuration

06:41 <wpwrak_> yes, INIT_B would be interesting then

06:41 <aw> wpwrak, wait

06:41 <aw> not sure

06:41 <wpwrak_> pity you have only two channels

06:43 <aw> fro rc2 the waveforms I scoped , i should set to more 250 ms/div and to see if done has been pulled high?

06:43 <aw> wpwrak, aggreed?

06:43 <wpwrak_> dunno. in rc2, DONE should rise within ~300 ms. here, you have ~700 ms

06:44 <wpwrak_> but you can try. maybe the speed is variable / has gotten slower

06:44 <aw> let me try..hope not miss more important info.

06:44 <wpwrak_> another feature for your next scope: MEMORY :)

06:46 <aw> wpwrak, yes, ch2 is over 8 div, and still no pull high...so even fpga didn't enter reconfigure stage

06:46 <aw> wpwrak, ha..you can push Wolfspraul though..

06:46 <aw> phew~ try init_b now

06:47 <wpwrak_> (push wolfgang: yeah, i have a few ideas what needs to get bought if we should ever come across significant money. better scopes it pretty high on that list ;-)

06:48 <wpwrak_> (alas, good scopes aren't cheap. the ones i have my eyes set on are all in the USD ~10k+ segment)

06:53 <aw> yes, i remembered when i at OM, Wolfgang and Ruby tried to gather those info for you. ;-)

06:53 <aw> init_b captured.

06:53 <wpwrak> okay, caught up :)

06:54 <wpwrak> let's see what it shows :)

06:55 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3a_ch1-program_b_ch2-init_b.JPG

06:56 <aw> from rc2 waveform, init_b should stay at 1.2V roughly once program_b goes high, right?

06:56 <aw> lemme check, not sure though.

06:57 <wpwrak> seems that it should be around 3.3 V

06:57 <wpwrak> so it appears to hit a CRC error immediately

06:58 <wpwrak> does the script that reads the NOR via USB-JTAG work ? (in general)

06:59 <aw> http://en.qi-hardware.com/w/images/e/e3/M1rc2_powerOnOff_sequences_manuscript.jpg

06:59 <aw> lemme try it now

07:00 <aw> once fails on reconfigure, it may not read back...not sure...try this 0x3A first

07:00 <aw> if not , I go back to read 0x34

07:02 <aw> hey...wait

07:03 <aw> new discovery, i triggered again. and see init_b waveform is different though. ;) but d2/d3 still dimly lit.

07:03 <wpwrak> show me ! :)

07:05 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3a_ch1-program_b_ch2-init_b_goHigh.JPG

07:05 <aw> goes high!~

07:06 <aw> totally different act from previous one. ;-)

07:07 <wpwrak> hmm, that's a weird one. i don't like the drops at t = +75 ms and t = +145 ms

07:07 <wpwrak> but maybe that's retries

07:07 <aw> this may show init_b can be output also input as an indicator

07:08 <aw> could be?

07:09 <wpwrak> sure. but we can't always tell when init_b is an input. input looks the same as output high :)

07:09 <aw> wpwrak, sorry that i don't understand you don't like the drops at ....?

07:11 <wpwrak> i wonder what they mean. but let's assume they're CRC errors.

07:11 <aw> so init_b indicator should be High once fpga inside finished CRC checked and show High syncronized to PROGRAM_B?

07:12 <wpwrak> so FPGA comes out of reset at t = 0, tries to load from NOR, gets a CRC error at t = +75 ms, tries again, gets another CRC error at t = +145 ms, tries again, ... and then seems to succeed (?)

07:12 <wpwrak> INIT_B should be high while the CRC is okay

07:12 <aw> mm...so this needs to consult with lekernelÂ Â to confirm?

07:13 <aw> yup~reasonable from rc2 waveforms. got it

07:13 <wpwrak> (consult) naw, i think we don't need to bother him with this. yet :)

07:14 <aw> oah~okay

07:14 <wpwrak> now .. why would the NOR have troubles. hmm.

07:15 <wpwrak> next test: CH2 stays in INIT_B, move CH1 to TP37 (FLASH_RESET_N). then trigger on CH2 rising. move trigger to -200 ms so that we get the same time window as the last time

07:15 <aw> hmm..let's trigger other pin of flash chip, to see if flash is in correct assertion?

07:15 <aw> hmm...okay...

07:16 <wpwrak> if the reset looks good, then we'd have to test the other NOR pins, yes. this will be fun :)

07:18 <wpwrak> but i think if the reset is fine, then 0x32 should go to the "try to fix this when you have plenty of time" queue. because that can easily keep you busy for a whole day.

07:22 <aw> oah...yup..whole day or directly replace flash chip...but too bad now is out of stock here. :(

07:30 <wpwrak> so ... INIT_B + TP37 ?

07:31 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3a_ch1-FLASH_RESET_N_ch2-INIT_B.JPG

07:31 <aw> phew~ :(

07:32 <wpwrak> ;-)

07:33 <wpwrak> reset is good.

07:33 <wpwrak> regarding the NOR reading script, have you ever used this script successfully ? (with a board that works okay)

07:33 <wpwrak> if yes, maybe you can try it here too

07:34 <aw> never used but now lemme try 0x39

07:36 <aw> since i doubt fpga will let jtag access with flash chip if unsuccessful on reconfigure. but worthy to read though that i 've never read before. ;-)

07:38 <aw> reading from 0x39. :)

07:38 <aw> xiangfu, reading flash image will be slow?

07:39 <xiangfu> aw, yes.

07:39 <aw> hours?

07:39 <wpwrak> i think it will allow jtag access :) before fix2b, failure to reconfigure meant reset trouble, which also blocked NOR access via jtag. now, failure to reconfigure means something else. so NOR access via jtag should work.

07:40 <wpwrak> aw: planning a five-course dinner ? :)

07:40 <wpwrak> ah nice. finally got M1rc2_powerOnOff_sequences_manuscript.jpg printed. now i no longer need a screen just for this :)

07:41 <xiangfu> aw, you want read whole 32MB flash? that needs ~4 hours.

07:41 <wpwrak> ouch :)

07:41 <aw> wpwrak, i'll always be beaten when do this with PHD. :)

07:41 <wpwrak> xiangfu: what does the reading script read by default ? everything ?

07:41 <xiangfu> wpwrak, no. it only read first 640KB.

07:42 <aw> xiangfu, what's the image from 640K?

07:42 <wpwrak> (640 kB) hmm, so that's a bit less than half the bitstream ?

07:43 <aw> xiangfu, man! i stop reading

07:43 <wpwrak> aw: 640 kB should take about 5 minutes

07:43 <xiangfu> wpwrak, it only read standby.

07:43 <wpwrak> oh, is see

07:43 <wpwrak> s/is/I/

07:43 <aw> wpwrak, hmm? so keep reading?

07:43 <xiangfu> whole standby partition, I mean.

07:44 <wpwrak> aw: yeah, let it finish. should be soon.

07:44 <aw> alright

07:44 <xiangfu> aw, if you want read soc bitstream. you can modify the script file a little.

07:45 <aw> no no...i think we just need standby

07:45 <xiangfu> ok

07:45 <aw> so do i need to modify script file or itself is for standby already?

07:46 <aw> xiangfu, so 5 minutes only?

07:46 <xiangfu> aw, no. by default only read standby . yes about 5 minutes

07:46 <aw> alright read again now. thanks

07:47 <aw> wpwrak, so how do we go next?

07:48 <wpwrak> aw: let's label 0x32 with "possible NOR instability" and put it on the pile of boards that need deeper analysis later

07:48 <wpwrak> aw: the, the next would be 0x3a, right ?

07:48 <aw> wpwrak, no , swapped them though

07:49 <wpwrak> swapped ?

07:49 <aw> so 0x3a is possible NOR instability

07:49 <aw> next is 0x32. ;-)

07:49 <wpwrak> ah, you were working on 0x3a ?

07:49 <wpwrak> i see

07:49 <aw> yup

07:49 <wpwrak> okay, let's see what 0x32 can do :)

07:49 <aw> now just reading 0x39 flash chip back. ;-)

07:50 <aw> after this, i go back to read 0x3a since you said it could be read though. ;-)

07:51 <aw> xiangfu, so Files is under /home/adam/.qi/milkymist/readback/20110817-1544

07:51 <aw> will always be the same name file? or everytime is different

07:52 <aw> hm...seems that you used system time. ;-)

07:52 <xiangfu> aw, filename always the same. the folder is changed.

07:52 <xiangfu> 20110817-1544 <-- is the data time

07:52 <xiangfu> yes

07:52 <aw> xiangfu, oah ..got it

07:54 <wpwrak> xiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?

07:58 <aw> xiangfu, i would like the saved/readback file name is related to mac address, is it possible? ;-)

07:59 <aw> 0x39 read back is done

07:59 <aw> now back to try to read 0x3a

07:59 <wpwrak> (name by MAC) mv readback/2011<Tab> 0x39-standby.bitÂ Â :-)

08:00 <wpwrak> or, ratherÂ Â mv readback/2011<Tab> readback/0x39-standby.bit

08:00 <aw> wpwrak, oah~ sweety, you know i poor on cmd. ;-)

08:01 <aw> 0x39: http://pastebin.com/RWrwtPyg

08:03 <wpwrak> mv /home/adam/.qi/milkymist/readback/20110817-1546 /home/adam/.qi/milkymist/readback/0x39-standby.bitÂ Â Â Â Â Â Â Â Â Â (or wherever you want it)

08:03 <aw> wpwrak, yes. you are right. 0x3a is reading now. ;-)

08:03 <wpwrak> see :) we're winning !

08:04 <aw> wpwrak, that's because we've seen program_b/init_b/rp# are all correct, so that's why you wanted to me to buy you a dinner! .;-)

08:05 <aw> wpwrak, so later how we compare those two bitstream files?

08:06 <wpwrak> (dinner) ah no, i was asking there, whether you were planning to take a long break (e.g., for a lavish dinner) while a very slow download is happening

08:06 <aw> wpwrak, oah...misunderstood though...

08:06 <aw> ha

08:07 <aw> wpwrak, can we from those two (0x39 and 0x3a) bitstream files to discover secrets behind?

08:07 <wpwrak> to compare:Â Â diff -u <(hexdump first-file) <(hexdump second-file)

08:07 <wpwrak> (if you're using the bash shell)

08:08 <aw> wait..so we need to go deeply 0x3a or work for 0x32(next board) next after read from 0x3a?

08:09 <wpwrak> once you've downloaded the standby bitstream from 0x3a, please download it a second time. that way, we can see if it changes (e.g., if there is noise on the bus)

08:09 <wpwrak> xiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?

08:09 <aw> wpwrak, aha~ good idea.

08:12 <aw> seems he's not here. :)

08:16 <aw> 0x3a: read again now.

08:16 <wpwrak> yeah, seems that we lost him :-(

08:17 <wpwrak> can you upload the bitstreams you got so far (0x39 and 0x3a) somewhere ?

08:19 <lekernel> morning

08:19 <xiangfu_> my last message is : wpwrak, the test is standby --> soc bitstream --> BIOS --> test bin

08:19 <lekernel> imagines talking to someone in business suit: "- How do we program the boards? - Well you have to take the..., ahem,... the Devirginator"

08:19 <lekernel> hahaha

08:20 <wpwrak> xiangfu_: thanks !

08:20 <xiangfu_> morning

08:20 <wpwrak> lekernel: it gets better: at the factory, the girls working there were running ./devirginate from the command line :)

08:21 <wpwrak> (i didn't expect that, though :)

08:23 <aw> wpwrak, yes...lemme upload it first

08:26 <wpwrak> lekernel: we have some new interesting behaviour: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3a_ch1-FLASH_RESET_N_ch2-INIT_B.JPG

08:26 <wpwrak> lekernel: top is FLASH_RESET_N (all is fine there), bottom is INIT_B

08:27 <wpwrak> lekernel: looks as if it hits a CRC error at +75 ms, retries, hits another CRC error at +145 ms, and then succeeds

08:28 <wpwrak> lekernel: now, i wonder if that CRC error would be in the standby or the "regular" bitstream. does the "regular" bitstream also use a load mechanism involving INIT_B, DONE, etc., as the initial hardwired loader ?

08:33 <wpwrak> afk for ~20 min

08:34 <aw> wpwrak, bitstream files under: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/

08:35 <aw> 0x3a has two read back files

08:44 <lekernel> if you only powered up, it only reads the standby bitstream

08:44 <lekernel> the regular bitstream is only read after middle pushbutton is pressed

08:45 <xiangfu_> aw, read the mac address, I can try.

08:47 <wpwrak> lekernel: ah, excellent

08:50 <wpwrak> the two 0x3a bitstreams differ from each other.

08:51 <wpwrak> here are the first few differences: http://pastebin.com/kLcGuu9a

08:52 <wpwrak> here's a better section: http://pastebin.com/d9nXTPbY

08:53 <aw_> wow..bad with each other

08:53 <wpwrak> DQ7 or DQ15 seems to have trouble

08:53 <aw_> that was done by 'diff -u <(hexdump first-file) <(hexdump second-file)'?

08:54 <lekernel> can we read the bitstreams several times?

08:54 <wpwrak> almost :)

08:54 <wpwrak> lekernel: that's already from two successive reads of the same NOR (no reflash in between)

08:55 <wpwrak> aw: this would be the command I used: diff -u <(hexdump -C 0x3a-1.bit) <(hexdump -C 0x3a-2.bit)

08:55 <wpwrak> aw: the -C adds the ASCII column on the right side

08:55 <aw_> wpwrak, okay..thanks, i record cmd first. ;-)

08:56 <lekernel> maybe that's just a urjtag bug?

08:56 <aw_> wpwrak, okay

08:56 <lekernel> urjtag won't use the same timings as the configuration system

08:56 <lekernel> so if you get intermittent read failures, it doesn't mean much

08:56 <wpwrak> lekernel: hmm, could be. would you expect urjtag to always have such issues ? or just with that usb-jtag board ?

08:58 <wpwrak> aw_: when you do your experiments, does each M1 has its own usb-tag board or do you use the same usb-jtag board for all the M1s ?

08:58 <wpwrak> s/has/have/

08:58 <lekernel> also, if it's a problem on a data line, why don't we get problems when writing too?

08:58 <lekernel> and why is the software CRC in the test tool passing most of the time?

08:58 <aw_> wpwrak, each board has its own usb-jtag board

08:58 <lekernel> this simply looks like urjtag bugs to me

08:59 <wpwrak> aw_: can you please read the 0x39 bitstream a second time ?

08:59 <wpwrak> lekernel: let's find out :)

08:59 <aw_> wpwrak, okay

09:00 <wpwrak> lekernel: this board hasn't booted in its life so far, so we haven't made it to the software CRC yet

09:00 <lekernel> it always failed to boot?

09:00 <wpwrak> yes

09:00 <lekernel> ah

09:00 <lekernel> ok

09:00 <lekernel> then maybe this is the problem

09:00 <wpwrak> now fix2b has been applied and it seems to work a little better. but still not okay

09:00 <wpwrak> meanwhile, fix2b has "cured" two boards (i think)

09:01 <wpwrak> so this is a new/different problem

09:01 <lekernel> what is fix2b? disconnect INIT_B?

09:01 <wpwrak> yes

09:01 <lekernel> this should not have any influence

09:01 <wpwrak> and also check D16 and replace if it looks suspicious

09:01 <lekernel> except if we use crappy diodes

09:01 <wpwrak> we do :)

09:03 <wpwrak> adam's current procedure is to disconnect INIT_B on the boards "in the cluster", then check TP36 and TP37 voltages. also measure D16 in-circuit, which seems to work more or less reliably. (he has removed a few good diodes, though)

09:03 <wpwrak> ah, and C238 once had an issue too

09:03 <wpwrak> so the whole fix2 rework is a bit fragile

09:04 <lekernel> argh....

09:04 <wpwrak> the joy of hardware ;-)

09:04 <wpwrak> lekernel: if you think this is bad, you should have seen how things went at openmoko :)

09:06 <wpwrak> lekernel: response times measured in days, unexplained departures from the procedure you asked them to perform, quick nonsensical ad hoc fixes thrown into the mix, and so on. pure chaos.

09:06 <aw_> wpwrak, hehe..at least all these are done myself though. not OM everyone could involved then you couldn't find out the root cause. ;-)

09:07 <wpwrak> lekernel: it once took me about half a year just to figure out whether they had fixed a missing resistor on the base of a transistor ...

09:07 <aw_> wpwrak, there's a fact: since i have to improve my soldering , but seems hard a bit. :)

09:08 <lekernel> here we get intermittent and weird problems that redefine peskiness

09:08 <lekernel> that compensates

09:09 <wpwrak> aw_: (everyone could play) yeah, the "chain of command" was a little ... strange over there :)

09:10 <wpwrak> it got much better if you were physically present, though. shorten the loop and catch suspicious activities quickly :)

09:10 <aw_> wpwrak, and i tried to openly as possible as i can. ha ;-)

09:10 <aw_> wpwrak, so how's next after 0x39's dump?

09:10 <wpwrak> lekernel: some of the component issues are indeed a bit surprising to me

09:11 <aw_> if no err with each other?

09:11 <wpwrak> aw_: upload the dump and then we'll see if 0x39 also changes from dump to dump. if yes, then the dumps are worthless. if the 0x39 dumps are the same, then we can try 0x3a with the usb-jtag board of 0x39

09:13 <aw_> wpwrak, i see

09:14 <wpwrak> hmm, in the dumps, is the first byte DQ0-DQ7 or DQ8-DQ15 ?

09:15 <wpwrak> lekernel: ah, here's a way to test whether it's the DQx bus or usb-jtag: if is's always the same bit that changes, then the bus is the likely suspect. else, it's something else.

09:16 <wpwrak> writes a tester

09:16 <aw_> wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby1.bit/

09:17 <wpwrak> identical to the first 0x39 dump

09:18 <aw_> good..so now use 0x3a with the usb-jtag board of 0x39

09:18 <wpwrak> yup, let's try that

09:20 <aw_> so let's read twice or just one time ?

09:20 <wpwrak> hmm, let's do it twice

09:21 <wpwrak> since it will almost certainly differ from 0x3a-1 and 0x3a-2

09:21 <aw_> okay..so i name both as -3 and -4

09:34 <aw_> wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby3.bit/

09:36 <aw_> wpwrak, if both -3 and -4 are identical, this'll be a trouble for me. :)

09:36 <wpwrak> aw_: you can stop

09:36 <wpwrak> 0x3a-3 is identical to 0x39

09:36 <aw_> wpwrak, big trouble now. :(

09:36 <wpwrak> throw away the usb-jtag that was in 0x39 ;-)

09:36 <wpwrak> okay, now: reflash

09:37 <wpwrak> (reflash 0x3a)

09:37 <aw_> okay. ;-)

09:38 <aw_> reflashing...

09:38 <lekernel> wpwrak: from what we have seen so far, it seems to be bits 7 and 15

09:38 <lekernel> first byte is DQ8-5

09:38 <lekernel> DQ8-15

09:41 <wpwrak> adds big-endian mode to the bit comparer

09:42 <aw_> wpwrak, well...here I have usb-jtag boards with rc1 and rc2 version.Â Â I have to know if they are different. 0x39 and 0x3a used the same usb-jtag rc1 vesion.

09:43 <wpwrak> aw_: heh, no idea what the differences are ;-) maybe it was also just a bad connection. we can find out later.

09:43 <aw_> 0x3a: reflashed done but d2/d3 dimly lit still there.

09:43 <aw_> sure we find out later.

09:44 <aw_> i didn't power off

09:44 <aw_> so let's quickly measure some TPs.

09:47 <aw_> tp36/tp37 stay well 3.3V

09:48 <wpwrak> the bit errors aren't uniformly distributed but affect most bits: http://pastebin.com/KfWwu3vb

09:49 <aw_> init_b keeps zero.

09:49 <wpwrak> aw_: now let's read back the bitstream

09:50 <aw_> okay

09:50 <wpwrak> the board doesn't know it yet, but it *will* boot today ;-)

09:51 <aw_> wow~ i am expecting that boot. :)

09:51 <wpwrak> resistance is futile :)

09:55 <wpwrak> just on the radio: there's some marihuana plantation burning (somewhere in buenos aires, it seems). and they say "some of the firefighters are affected by the smoke" ;-))

09:55 <aw_> wpwrak, so how's best and quick way that i can know if usb-jtag board is bad while testing m1?

09:56 <aw_> via 'diff" cmd to know?

09:57 <wpwrak> for now, that seems to be the best test, yes. actually, it could also be the M1

09:57 <wpwrak> but we'll find out soon :)

10:00 <wpwrak> bit comparison utility is here: http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/bitcmp/

10:02 <wpwrak> oh, wait. there's a bug :)

10:03 <aw_> wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby4.bit/

10:04 <wpwrak> hmm, it's corrupt

10:04 <aw_> this is used 'good' usb-jtag of 0x39 and dump from just reflashed

10:05 <aw_> umm..so 0x3a-4 is not identical to 0x39?

10:05 <wpwrak> can you download again ?

10:05 <aw_> sure

10:06 <wpwrak> 0x3a-4 is very different from 0x39

10:06 <wpwrak> here's the beginning: http://pastebin.com/TfyV0f7W

10:06 <wpwrak> and then it gets much worse

10:08 <aw_> hmm

10:10 <wpwrak> strange patterns: http://pastebin.com/vdTjuDcy

10:13 <wpwrak> what's interesting is that 0x3a-2 was correct

10:14 <wpwrak> so the errors some to come and go

10:14 <aw_> you had have suspected before about usb speed transmission effect, will this related to that? or start to think if it might be a flash chip problem itself?

10:14 <wpwrak> s/some/seem/

10:14 <lekernel> wpwrak, have you tried a board that works? that might just be stupid urjtag bugs

10:14 <lekernel> let's not spend any time on those

10:14 <wpwrak> lekernel: yes, 0x39 is a "good" board

10:14 <lekernel> ok, and if it had so many failures, the software CRC wouldn't work so well

10:14 <wpwrak> lekernel: and two dumps from 0x39 were identical

10:15 <lekernel> mh

10:15 <lekernel> so the flash really behaves like crap on that 3a board which doesn't work...

10:16 <wpwrak> what's odd is that the bit position changes

10:16 <lekernel> maybe we simply have sourced broken flash chips

10:17 <lekernel> is the pattern reproducible with GDB read memory command?

10:17 <wpwrak> doesn't gdb need the BIOS ?

10:17 <lekernel> no

10:17 <lekernel> only you won't be able to access the SDRAM

10:18 <lekernel> you can simply 'pld load' the SoC and GDB will work no matter what

10:18 <wpwrak> ah, great. maybe you can walk adam through gdb use then (i'll be watching, since i don't know the process either)

10:18 <aw_> the flash chip this time i ordered from authorized here Taipei (WPI), it should be okay though i think.

10:18 <wpwrak> aw_: should i ask where the other NORs came from ? ;-)

10:18 <aw_> WPI taipei

10:19 <aw_> http://www.wpi-group.com/

10:19 <wpwrak> so the NOR in the rc3 boards is from WPI ?

10:19 <aw_> in one batch of order

10:20 <aw_> no splitted shipments

10:20 <wpwrak> better than "Flash soup kitchen" in wolfgang's backyard ;-)

10:21 <aw_> ordered 96pcs in one batch

10:21 <aw_> but not sure if i have stock now

10:21 <aw_> i hope smt sent me all back.

10:21 <wpwrak> yeah, let's hope the NOR is good in general. there could still be the occasional bad chip, of course. either factory-bad or didn't like SMT or whatever

10:22 <aw_> well..but do you have any idea on 0x3a? or let's still name it as 'possible NOR instability'? ;-)

10:23 <aw_> then we back to see 0x32. :)

10:23 <wpwrak> lekernel: have you heard of "baking" ? that's also a fun thing: components absorb water. some only very little others more. when you SMT them, the water evaporates and the steam pressure may crack the plastic ... somewhere. lots of fun to debug :)

10:24 <wpwrak> did you do the 2nd download ?

10:24 <aw_> oah..yes...second

10:24 <aw_> finished..upload

10:26 <aw_> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby5.bit/

10:26 <wpwrak> interesting. it's identical to the previous one

10:27 <wpwrak> so the corruption occurred when writing, not when reading

10:27 <wpwrak> so .. can you please reflash ? :)

10:27 <aw_> wpwrak, yes, the 'baking' thing is a normal process in smt factory. i saw them do this thing.

10:28 <wpwrak> oh, and please upload the file you're using for the flashing, too

10:28 <aw_> wpwrak, you meant the script file or the results of reflashing log?

10:29 <wpwrak> the binary file that's the input of the script

10:29 <wpwrak> or if you don't know which file this is, the script first

10:31 <aw_> i upload script file first then we ask xiangfu which is the exactly bin as the input under my folder.

10:31 <aw_> reflashing...

10:32 <aw_> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/tool/

10:34 <aw_> two scripts but now i am using 'reflash.sh', so cmd like this: ./reflash.sh 00 3A

10:35 <wpwrak> the file seems to beÂ Â ../standby.fpgÂ Â (?)

10:36 <aw_> okay..found it. lemme upload it

10:37 <aw_> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/tool/standby.fpg

10:37 <wpwrak> excellent. thanks !

10:38 <wpwrak> the data read back from 0x39 is indeed correct

10:38 <aw_> now reflashed is done: log is here: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/log/urjtag_3A.log

10:39 <aw_> but scrolling down to the bottom. :-)

10:39 <aw_> it will be added into file everytime I reflashed

10:40 <wpwrak> i see. okay, let's try to boot. maybe it works ;-)

10:40 <aw_> wait...you said 0x3a-5 is identical original file standby.fpg?

10:41 <aw_> but now d2/d3 are still dimly lit. did i understand correctly?

10:41 <wpwrak> no, 0x3a-3 was identical

10:41 <aw_> okay

10:42 <wpwrak> and 0x3a-2

10:42 <aw_> 0x3a-5 was not, wasn't it?

10:42 <wpwrak> 0x3a-4 = 0x3a-5, but different from 0x39

10:43 <aw_> got it

10:43 <wpwrak> did you try to boot ?

10:43 <aw_> since d2/d3 is dimly lit now. but let's try to press middle btn first

10:43 <aw_> if not

10:43 <aw_> i go for power off to see if d2/d3 is fully off.

10:43 <wpwrak> alright

10:44 <aw_> can't boot surely while dimly lit

10:45 <aw_> d2/d3 dimly lit after power - cycle. :(

10:45 <wpwrak> so still no go. hmm.

10:45 <wpwrak> okay, can you please take two more dumps ?

10:46 <aw_> yup

10:46 <aw_> okay...

10:46 <wpwrak> and then 0x3a goes back to the queue, "NOR mystery corruption"

10:46 <aw_> hehe :)

10:49 <wpwrak> after that, i'd like three more dumps from 0x39. to make sure that the reading does indeed work. that way, we can be sure we can use this tool to analyze future NOR issues. (or, if the reads of 0x39 also yield inconsistencies, then we know that we don't have a reliable tool :)

10:49 <wpwrak> but first the two from 0x3a

10:51 <aw_> hmm...good idea about 'reliable tool' preparation indeed.

10:51 <aw_> okay

10:53 <aw_> so I'll use current usb-jtag (i.e the original good usb-jtag we guessed) on 0x3a to 0x39 too.

10:53 <wpwrak> no, keep them as they are

10:54 <wpwrak> the original 0x3a usb-jtag is now in M1 0x39 and the original usb-jtag from 0x39 is now in M1 0x3a, correct ?

10:54 <aw_> correct

10:55 <wpwrak> since M1 0x3a is acting weird with both, the usb jtag may be okay. we'll test this implicitly when taking the dumps from 0x39

10:55 <aw_> so later we dump 0x39, will we use the 0x39's original jtag board? is that you wanted?

10:56 <wpwrak> no, M1 0x39 with usb-jtag 0x3a

10:56 <wpwrak> i.e., don't swap the usb-jtag. use them as they are now

10:56 <aw_> got it

11:00 <aw_> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby6.bit/

11:01 <wpwrak> *hmm*

11:01 <wpwrak> this is identical to the one you had before reflashing

11:02 <aw_> man..so it's not identical to the latest reflash. :(

11:03 <aw_> i still reading another one...

11:03 <wpwrak> very very strange ...

11:09 <wpwrak> maybe DQ15 now simply fails consistently

11:09 <aw_> i hope this is not radiation problem though..but can be more clear after three dumps that worked 0x39. :-) exciting to know the results about 0x39 later.

11:10 <wpwrak> the problem strikes on average every 33.5875 bytes

11:10 <wpwrak> it seems way to reproducible to be just something random

11:14 <aw_> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby7.bit/

11:14 <wpwrak> the differences are only in the first third of the NOR. then, suddenly, all is good

11:15 <aw_> hmm...so let's start 0x39 workable board to see secret though. ;-)

11:15 <wpwrak> hah, 0x3a-7 is different ;-)

11:15 <aw_> okay?

11:15 <wpwrak> yes, on to 0x39 !

11:16 <wpwrak> maybe put 0x3a in the fridge :)

11:16 <aw_> ha

11:17 <aw_> i think before i read 0x39 back , to boot up again and to see CRC check ?

11:17 <aw_> or no need though. ;-)

11:18 <wpwrak> ah no. i had made a mistake. 0x3a-7 is the same as 0x37-6. okay, that's what i expected.

11:18 <wpwrak> well, why not :)

11:18 <GitHub25> [rtems] sbourdeauducq pushed 1 new commit to mmstaging: https://github.com/milkymist/rtems/commit/8d6bc82d5a56faaae02ec9e1b25a2da4a19714b6

11:18 <GitHub25> [rtems/mmstaging] Merge branch 'master' into mmstaging - Sebastien Bourdeauducq

11:18 <wpwrak> s/0x37-6/0x3a-6/

11:18 <aw_> wpwrak, well...good that -6 & -7 at least they are reflashed again

11:19 <wpwrak> 0x3a is consistently wrong now. at least we've achieved that much ;-)

11:20 <aw_> alright: CRC pass and rendering too

11:21 <aw_> 0x39 reading...

11:23 <wpwrak> here's the error distribution in 0x3a: http://downloads.qi-hardware.com/people/werner/m1/tmp/errors-3A.png

11:23 <wpwrak> almost all in the first third

11:25 <aw_> wow!

11:26 <wpwrak> and yes, 20000 of them. no surprise it doesn't boot :)

11:27 <aw_> is there theory about this curve you did distribution?

11:28 <wpwrak> i don't see anything revealing there

11:28 <wpwrak> maybe i'll find something later :)

11:28 <aw_> or is a statistics?

11:28 <wpwrak> for now it's just "bad" and "weird" :)

11:29 <wpwrak> let's try the fridge approach. or maybe freezer. i guess the board should be okay with that too.

11:29 <lekernel> this rather looks like a bad flash chip, no?

11:29 <lekernel> those are _read_ errors, right?

11:30 <wpwrak> if there's anything even remotely temperature-related in the behaviour, the fridge/freezer will uncover it :)

11:30 <wpwrak> no, write errors

11:30 <wpwrak> all on DQ15. it read back several times perfectly

11:30 <lekernel> so read always works reliably now?

11:30 <wpwrak> also, the exact same errors occurred in two independent writes

11:30 <wpwrak> so it seems

11:31 <lekernel> ok

11:31 <lekernel> then we might blame urjtag too

11:31 <lekernel> bad write timing, maybe

11:31 <wpwrak> but the exact same pattern ?

11:31 <lekernel> imo the next thing to try is xilinx impact

11:31 <wpwrak> "same" as in "bitwise identical"

11:32 <lekernel> that's the standby bitstream right?

11:32 <wpwrak> (impact) ah right, we have that too

11:32 <wpwrak> yes, standby

11:32 <lekernel> aw_, do you still have your xilinx jtag cable?

11:32 <lekernel> ok i'm preparing a .mcs

11:32 <aw_> lekernel, yes, i have that

11:32 <lekernel> aw_, ok, wake up your ISE installation

11:33 <lekernel> we will reflash that problem bitstream with impact

11:33 <wpwrak> aw_: please take 0x3a out of the fridge again. we need it in the torture chamber ;-)

11:33 <wpwrak> lekernel: that somehow sounds as if it involved a hammer :)

11:34 <lekernel> yes, tough problems need tough solutions

11:34 <lekernel> ok, digging out the git history, the mcs generation command is: promgen -w -p mcs -o standby.mcs -s 32768 -u 0x00000000 ../standby/build/standby.bit -bpi_dc parallel -data_width 16

11:35 <lekernel> i'm resynthesizing the .bit atm ...

11:35 <wpwrak> we have the bit somewhere ...

11:35 <lekernel> it should be quic

11:35 <lekernel> it's a small design

11:35 <wpwrak> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby7.bit/standby.fpg

11:35 <lekernel> that's .fpg

11:36 <wpwrak> ah, yet another format ?

11:36 <aw_> wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby2.bit/

11:36 <lekernel> .bit is the xilinx "standard" format with header

11:36 <aw_> wpwrak, still need other two, right?

11:36 <wpwrak> ah, wait

11:36 <lekernel> .fpg is raw flash content, with the words reversed to meet the idiosyncrasies of the way the fpga reads the flash, i.e. LSB first

11:37 <lekernel> ok I have the .mcs, emailing it to aw_

11:37 <aw_> lekernel, okay

11:38 <wpwrak> lekernel: this is the "original" bitstream adam used (fpg, though): http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/tool/standby.fpg

11:38 <wpwrak> aw_: (other two) lemme check first ...

11:39 <aw_> wpwrak, it still be good to know how 0x39 will be reliable or not though. pls check it. thanks. ;-)

11:39 <wpwrak> aw_: yes, please keep them coming

11:39 <wpwrak> aw_: 0x39-2 is good

11:39 <aw_> wpwrak, great

11:39 <aw_> keep reading

11:41 <aw_> lekernel, received standby.mcs, tks

11:41 <aw_> wpwrak, after these three dumps, i go for dinner first. ;-) sorry

11:41 <togi> i was at ccc last week, but i missed the milkymist talk :/ anyone know if it's available somewhere?

11:43 <aw_> and when I'm back. let's to see using xilinx tool. yup..long time not use it :-)

11:45 <wpwrak> aw_: we should put you on a sushi diet. maki, to be precise. then you can quickly eat a bit each time you have to wait for some up- or download :)

11:46 <aw_> oah~yup...i do really sorry on this.

11:47 <wpwrak> thinking of it, that would work for me too. i have a sushi restaurant just around the corner. and they do delivery :)

11:49 <wpwrak> well, actually japanese restaurant. but it seems their non-sushi stuff isn't so great.

11:49 <aw_> seems i have to buy more foods in preparation.

11:49 <lekernel> togi, http://media.ccc.de/browse/conferences/camp2011/cccamp11-4412-latest_developments_around_the_milkymist_system_on_chip-en.html

11:53 <aw_> wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby3.bit/

11:54 <wpwrak> also good

11:58 <aw_> so this means usb-jtag boards are not the problem source at least, right?

11:59 <lekernel> aw_, let's see how it goes with impact

11:59 <lekernel> flash the .mcs and see if the standby bitstream works now (ie. good readback + LEDs go fully off when power is applied)

12:01 <aw_> lekernel, sure but lemme go out for foods first ;-)

12:01 <lekernel> enjoy :)

12:01 <aw_> the third one hasn't been finished though.:)

12:05 <togi> lekernel: thanks!

12:07 <aw_> wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby4.bit/

12:08 <wpwrak> aw_: also identical. thanks !

12:08 <aw_> wpwrak, great! so this shows up that we currently no need to worry usb-jtag board. i'll be back soon

12:09 <wpwrak> yup, usb-jtag looks good. enjoy your meal !

12:09 <aw_> k

12:15 <lekernel> wpwrak, thanks so much for your help.

12:16 <wpwrak> no problem. it's fun ;-)

12:16 <lekernel> wpwrak, to sum up now, it may seem that we have a combination of a) unreliable writes b) intermittent reset circuit fuckup that causes boards to fail in the field?

12:17 <wpwrak> i think b) is starting to disappear. we may not have found all the critters there, but at least some.

12:18 <wpwrak> what causes a) is still a mystery. oh, and we also had changes between reads. so it's all still foggy.

12:18 <lekernel> you said that reads were working reliably?

12:18 <wpwrak> now they are. on 0x3a, there were successive reads with differences.

12:18 <lekernel> let's always use impact on problem boards now

12:19 <lekernel> it received more testing than urjtag and the usb-jtag board

12:19 <wpwrak> let's first see what impact impact makes :)

13:22 <scrts2> who is the one here coded ethernet? I wonder if the milkymist is connected to a bigger network through a few switches, is there a packet queue for packets, which are received in different time or are duplicated? e.g. ip header identification field shows, that the later packet has identification value smaller than the previous packet, which means that this particular packet must be processed before the other

13:23 <wpwrak> now let's find out what impact "impact" has :)

13:54 <aw_> wpwrak, hi yes, i am looking for my rc2's previous script~ phew

14:01 <aw_> lekernel, i tried to reflash 0x39 firstly via xilinx jtag

14:02 <aw_> lekernel, http://pastebin.com/81cix6fb

14:04 <aw_> but i have question: while reflashing standby.mcs file , do the d2/d3 flash? i forgot that if they must be flashed via xilinx tool.

14:06 <aw_> http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/tool/flash-mcs-adam.sh

14:07 <aw_> this is i currently use for xilinx jtag to reflash standby.mcs

14:07 <lekernel> aw_, all you should do is 1. load the standby.mcs I sent you using the impact gui (not the script) 2. test if the flash was correctly written

14:07 <lekernel> period

14:08 <aw_> hmm...okay..i go to open impact gui first

14:08 <lekernel> actually that script you pointed might work as well

14:09 <lekernel> just don't forget the template.cmd file

14:10 <aw_> hmm....no included template.cmd under folder...try again

14:12 <aw_> mm..no template.cmd already there.

14:19 <aw_> i opened impact gui. as i knew before: only used this to do read device status/device id etc... I 've not loaded into standby.mcs with this iMPACT before. only used script. :(

14:19 <aw_> how to load standby.mcs via iMPACT gui? i need to set many parameters?

14:20 <lekernel> .....

14:20 <lekernel> no

14:21 <lekernel> create new project, select "autodetect devices with boundary scan", then when it asks whether you want to program a flash attached to the fpga say yes and select the .mcs

14:21 <lekernel> it's completely trivial

14:21 <aw_> okay

14:22 <lekernel> i cannot give you step by step instructions, I lost the ribbon cable of my xilinx jtag cable

14:22 <Fallenou> I confirm, it's trivial

14:24 <Fallenou> even for non fpga-expert like me :)

14:32 <aw_> man! created a project as autodetect with boundary scan. now "Identify Succeeded", but which item that I go for selecting my *.mcs?

14:33 <aw_> Fallenou, he..seems not trivial for me. :)

14:34 <Fallenou> well you want to put the bitstream in the flash ? or just program the fpga ?

14:36 <aw_> load standby.mcs file in fpga

14:36 <Fallenou> right click on the FPGA

14:36 <Fallenou> and there should be a menu element that says "load bitstream" or something like this

14:37 <aw_> oah~ i see it, thanks

14:37 <Fallenou> assigner configuration file

14:38 <aw_> i assigned done with 16bit data bus/BPI/Flash chip

14:38 <aw_> and ?

14:39 <Fallenou> when you have assigned the configuration file to the proper device

14:39 <Fallenou> then you can do thing like right click, configure or something like that

14:40 <Fallenou> (I don't have impact on my computer, sorry)

14:40 <wolfspraul> wow, need to read the backlog...

14:41 <aw_> oaw~ i see...must right click on "flash" icon then program it. :)

14:41 <aw_> now it's programming...

14:41 <Fallenou> if you right click on the "flash device", then you are programming the flash

14:41 <Fallenou> not the fpga directly

14:41 <Fallenou> I don't know what you are trying to do exactly though

14:42 <aw_> oah~ man! but good now my d2/d3 is fully OFF now..

14:42 <Fallenou> oh ok reading backlog I understand

14:42 <Fallenou> you should be ok

14:43 <aw_> Fallenou, i saw the console with the most likely message same as script though. :)

14:44 <aw_> lekernel, i right clicked on 'Flash' icon not fpga itself, is that right?

14:44 <Fallenou> ok good

14:45 <aw_> hm...i lost my self though

14:45 <lekernel> yes, click the flash icon

14:45 <lekernel> we do not care about what the leds are doing while you are in impact

14:45 <Fallenou> aw_: if you just want to reflash the board, so that the board would be able to boot without being plugged to a computer, then yes

14:45 <aw_> hm...good

14:46 <aw_> so now 0x39 boot up and rendering well

14:46 <aw_> so now let's go for 0x3a :)

14:47 <lekernel> ok, so it simply seems urjtag has some bugs that make writing unreliable particularly at the beginning of the flash

14:47 <lekernel> if you have boards that do not configure at all, give them the impact treatment

14:47 <aw_> just noticed that d2/d3 is fully off after xilinx tool finished programming

14:47 <lekernel> ah, hm

14:47 <lekernel> no, power cycle the board

14:48 <aw_> yes, 0x39 i powered cycle . it works well now. :)

14:48 <aw_> boot up and rendering

14:49 <aw_> so let's see 0x3a via xilinx tool next :)

14:49 <wolfspraul> as usual. it's not great but if the xilinx tool is more reliable, we should probably always use the xilinx tool.

14:50 <wolfspraul> lekernel: what do you mean with "writing unreliable particularly at the beginning of the flash"?

14:50 <wolfspraul> how is that possible?

14:50 <lekernel> wolfspraul, just fix the boards that did not pass with impact

14:51 <lekernel> if the CRC check is good, then writing was ok...

14:51 <lekernel> someone needs to fix this annoying bug in urjtag, but later...

14:52 <aw_> one question first: will i need to "identify" fpga eveytime when i am going to program a new board?

14:54 <wolfspraul> lekernel: ah yes, of course I agree. But what is this bug?

14:55 <lekernel> some pesky and mundane time sink

14:55 <lekernel> nothing very interesting I think

14:56 <aw_> hmm...i answered my question, just directly click 'flash' icon to program though. :)

14:57 <aw_> 0x3a: http://pastebin.com/mnV0US5W

14:58 <aw_> copied them from xilinx iMPACT's console. :)

15:00 <aw_> recorded first though.

15:01 <wolfspraul> lekernel: I see. bug dismissed I guess :-)

15:02 <wolfspraul> we can definitely use Impact for the rc3 run, but then I will try to find at least a workaround for the bug.

15:02 <aw_> good that xilinx iMPACT have readback function, but it read failed

15:03 <wolfspraul> I guess what is does is that when we write into nor, what arrives is not what we wrote?

15:03 <wolfspraul> now that we are on Impact, we can fix Impact issues :-)

15:04 <wolfspraul> I haven't completed the backlog yet, but is it possible that a wire to the nor chip is bad? do you want to try resoldering the pins?

15:05 <wolfspraul> I'm still reading backlog though, do what you think is right...

15:05 <kristianpaul> i jsut receiver (at work) a network appliance it said soemthing interesting, Memory test 4hr. System Stress test 1hr

15:05 <kristianpaul> received***

15:06 <kristianpaul> do we have memory test in milkymist?

15:06 <aw_> i programmed 0x3a again, still failed while "Reading device contents..." ~ phew~

15:07 <aw_> wolfspraul, not plan to soldering pins now.

15:07 <aw_> i'd rather tomorrow morning go for other pieces to keep on fix2b rework

15:08 <aw_> now...just back to 0x32 to see if i can fix like we did this morning...it's a long day story though.

15:12 <aw_> 0x39 example: good standby.mcs program log - http://pastebin.com/QuCz5fZk

15:13 <wolfspraul> yes

15:14 <wolfspraul> I lost overview with 0x32 0x39 0x3A

15:14 <wolfspraul> the backlog is scary, I cannot follow all details :-)

15:14 <wolfspraul> I think we should definitely move forward to other boards

15:14 <wolfspraul> not get stuck

15:14 <wolfspraul> I just need realibility that we are able to produce 100% stable and tested boards, so we can start selling.

15:14 <wolfspraul> reliability

15:15 <wolfspraul> it seems fix2b is good

15:15 <wolfspraul> right?

15:15 <wolfspraul> I mean I find no evidence in today's long work that there is any problem with fix2b.

15:15 <wolfspraul> so I think we should continue with more boards from the 19 and fix2b.

15:15 <wolfspraul> and if there is a problem, just move to the next board.

15:16 <wolfspraul> aw_: do you agree?

15:16 <wolfspraul> if you feel better, always use Xilinx Impact. Impact or reflash_m1.sh - your choice.

15:16 <wolfspraul> but pick one and stick to it

15:18 <wolfspraul> ah, finally finished

15:18 <aw_> wolfspraul, yes, agreed, Werner & me just tried to discover others we may pretty not sure. even for if usb-jtag is the problem source, but now this consideration is gone

15:18 <wolfspraul> but it seems Xilinx Impact did not help :-)

15:18 <wolfspraul> Xilinx Impact only showed right away that the read failed

15:19 <wolfspraul> in that case I would continue to use the jtag-serial board and reflash_m1.sh

15:19 <aw_> wolfspraul, no no...the xilinx tool i have only standby.mcs file from lekernel, with this only. i can't rely on xilinx for reflash all other boards

15:20 <wolfspraul> ok

15:20 <wolfspraul> and Xilinx Impact did not improve anything if I understood the backlog correctly

15:20 <wolfspraul> so just use reflash_m1.sh

15:20 <lekernel> wolfspraul, it did fix urjtag write problems with one board

15:20 <lekernel> no?

15:20 <aw_> just like lekernel said if I have some trouble with NOR problems, this xilinx tool with standby.msc could be helpful.

15:20 <wolfspraul> I'm overwhelmed with the details of the backlog.

15:20 <wolfspraul> I thought no

15:21 <wolfspraul> it just said 'failed' by itself

15:21 <wolfspraul> aw_: I think tomorrow we need to go to full speed mode. not get stuck on a few boards.

15:21 <wolfspraul> just power through the whole batch of 19...

15:22 <wolfspraul> if anything doesn't work or is unclear, just take a note and move to the next one

15:22 <wolfspraul> I feel pretty good about fix2b now

15:22 <wpwrak> aw_: hmm, but 0x39 worked before. and 0x3a fails with impact as well. so it seems with 0x39 both work and with 0x3a neither.

15:22 <wolfspraul> yes

15:23 <wolfspraul> wpwrak: did we find any evidence for problems with fix2b today? doesn't look like to me...

15:23 <wpwrak> okay, all agree :)

15:23 <wpwrak> wolfspraul: in fix2b we (still) trust :)

15:23 <wolfspraul> good

15:24 <aw_> agreed though...We only reworked 4 boards only this morning, and got 2 unknown reasons caused. Werner 7 me tried to figure this out hopefully..just don't want more boards like this...surely need to speed up...but tough decision though..

15:24 <lekernel> <aw_> yes, 0x39 i powered cycle . it works well now. :)

15:24 <wpwrak> i'd suggest putting 0x3a in the fridge. see if temperature changes it. we've has it work better and worse and the course of these experiments. very confusing.

15:24 <wpwrak> lekernel: 0x39 worked before :)

15:24 <lekernel> so why the hell did we flash a board with impact that worked before ?!?

15:24 <wpwrak> lekernel: adam tried a good board first. only then the problem board.

15:25 <wpwrak> :)

15:25 <aw_> 0x39 : both usb-jtag & iMPACT all works well

15:26 <wpwrak> wolfspraul: we verified that urjtag can read back the NOR quite reliably. so we can use it in the future for verifications, if necessary

15:27 <wpwrak> wolfspraul: what's a bit troubling is that there doesn't seem to be a proper verification of what gets written. at least we once got completely bogus content flashed. maybe the ... "verify skipped" (?) in the logs is a hint :)

15:39 <wpwrak> aw_: trying any more boards today ? or entering suspend mode ?

15:41 <aw_> wpwrak, yup..i gotta entering suspend mode to myself to start another day.

15:41 <lekernel> aw_, did we have similar impact flashing problems in run 2?

15:42 <lekernel> this sounds like a brand new problem, no?

15:42 <aw_> lekernel, in rc2, we finally got 35/40 pcs done

15:42 <lekernel> (and like crappy flash chips, too)

15:43 <kristianpaul> why are crappy the flash chips? is that a new discovering on rc3?

15:43 <lekernel> right now there are 51 working boards?

15:43 <kristianpaul> sorry i missed all backlog..

15:43 <lekernel> aw_, did the missing 5 run2 boards have similar flashing problems?

15:43 <aw_> lekernel, and those 4 pcs rest were mostly to yes d2/d3 dimly lit problems but at that time we guessed they were damages by "fast power-cyling"

15:44 <lekernel> there are no damages by fast power cycling

15:44 <wpwrak> lekernel: not sure if it's the NOR. could also be the FPGA. or soldering on either.

15:44 <aw_> and eventually those 4 boards are finally "dead" though... so which is if actually belongs to flash NOR problems, this is really good question!

15:44 <lekernel> what do you mean, "finally dead"?

15:46 <wpwrak> lekernel: what we've seen with board 0x3a were 1) good NOR content but (variable) errors on read and 2a) bad NOR writing with 100% reliable read (of the bad data) or 2b) good NOR writing (and unrelated failure to configure) with 100% reproducible corruption on NOR read

15:46 <aw_> thus cant reconfigure, but at that time we thought it was an unnormal production process on switch fast power-cycling then.

15:47 <wpwrak> lekernel: so, a bit scary that one. a moving target. but it think we did enough tests to be reasonably sure of these results.

15:47 <lekernel> try replacing the flash chip

15:48 <lekernel> wolfspraul, can we move forward with the other boards?

15:48 <aw_> well...these failure boards I'll leave them apart firstly

15:49 <aw_> lekernel, tomorrow i go directly for other boards with fix2b circuit

15:49 <lekernel> aw_, what is your next target?

15:49 <lekernel> what 'other' boards? the 51 working ones?

15:50 <aw_> lekernel, no the first 19pcs boards (including today's 4 boards already) and see what they move.

15:50 <wpwrak> lekernel: there are some more in the fix2b "cluster"

15:51 <lekernel> aw_, you are not touching the 51 working/available ones, right?

15:51 <aw_> so I'll go for rest 14pcs cluster tomorrow firstly

15:51 <aw_> lekernel, right

15:52 <aw_> well...time to go

15:53 <wpwrak> lekernel: the 51 "available" ones should at least be checked. some may also need fix2b and are just at the edge of not working. some of the boards in the cluster have worked a little once and then went worse, so the fix2b problem isn't just black and white

15:53 <aw_> I'll work on 1st 14 rest boards.

15:53 <aw_> good night

15:53 <wpwrak> lekernel: but it would be good to be able to test them in a non-intrusive way, to avoid more rework

15:54 <wpwrak> aw_: sweet dreams ! :)

16:02 <lekernel> wpwrak, should we apply fix2b on all the working boards?

16:02 <lekernel> they look nicer after that (no messy cable)

16:16 <roh> hey. how is it going?

16:22 <wpwrak> lekernel: i'm slightly in favour of applying it everywhere, yes

16:22 <wpwrak> lekernel: seems to be low-risk enough

16:23 <wpwrak> lekernel: and yes, we get rid of the cable. all evidence of human fallibility destroyed ;-)

16:24 <wpwrak> roh: today we had one with problems somewhere between FPGA core, NOR, and back. or, rather, it had us.

16:25 <wpwrak> roh: to make things more interesting, the problem pattern shifted. first it looked merely like a usb-jtag problem, but then that part turned out to be quite reliable but NOR reads or writes caused trouble.

16:26 <roh> oh. pcb routing problems?

16:26 <wpwrak> roh: fix2b is still looking good, though. we're getting further than we used to.

16:27 <wpwrak> roh: hard to say. could be bus, could be I/O pad drivers dying, could be a bad NOR bank, ...

16:27 <roh> wpwrak: whats fix2b?

16:28 <roh> i just learnt about burning streets in london etc. 10 days of camping take its toll (there was ip and power in my tent but i was too drunk and met too many interresting people to care)

16:31 <wpwrak> roh: fix2b = remove the diode between INIT_B and PROGRAM_B (and the wire going around the board). also, check that diode D16 is okay. some aren't, and let FLASH_RESET_N get pulled low or into an undefined state

16:32 <wpwrak> roh: fix2b solves: the problems with usb-jtag flashing stopping at "bit stream length = 14xxxxxxx" and failure to (re)configure on some boards

16:33 <wpwrak> roh: success rate about 50% so far on those afflicted by such problems (i.e., 2 out of 4)

16:33 <roh> hm

16:35 <wpwrak> interesting detail: when NOR reading on 0x3a was a problem, the bit flips were all 0 -> 1. when the reading stabilized, the bit flips were all 1 -> 0

18:24 <kristianpaul> lekernel: mm_i2l.pdf, thanks for publishing it !

18:25 <kristianpaul> the one about plasma looks worth to look to nice :)

18:27 <kristianpaul> if you have more slides about HDL specific and milkymist, please share :)

18:27 <lekernel> have you tried the demo binary on your board?

18:30 <kristianpaul> milkymist demo?

18:30 <kristianpaul> no never, i saw wolfgang to used at cparty no more

18:30 <lekernel> no, the demo bin from masteri2l plasma

18:30 <kristianpaul> no no

18:31 <kristianpaul> i'm at work now, and just reading rss now

18:34 <kristianpaul> nice, that tp_files tarball is a hello world at the milkymist style :)