<aw> 0x32: fix2b, stopped @ 'Bitstream length: 1484404' while reflashing...
<aw> 0x32: tp36 - 690mV, tp37 - 793mV
<aw> the voltage is not at correct Low or High, i am going to power off
<aw> voltage of tp36, tp37 is the same. if first flash was not successed before, it seems that keep to stop at 'length: 1484404'...go for another board.
<wolfspraul> hmm
<GitHub120> [scripts] xiangfu pushed 1 new commit to master: http://bit.ly/q2vsz7
<GitHub120> [scripts/master] add debug all to jtag - Xiangfu Liu
<xiangfu> aw, Hi
<aw> xiangfu, hi, any news?
<wolfspraul> aw: let's look at one more board
<wolfspraul> with fix2b
<wolfspraul> although I think we already know it's no the magic solution yet
<wolfspraul> if wpwrak is here we can look into 0x32 more, otherwise fix other bugs first, don't apply fix2b to a lot of boards until we know more
<wolfspraul> aw: on 0x32, have you checked D16?
<xiangfu> aw, I just update the reflash_m1.sh under for-rc3 folder, here: http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.sh
<GitHub21> [scripts] xiangfu force-pushed master from aa59a1a to dbd0372: http://bit.ly/nGHAhd
<GitHub21> [scripts/master] add debug all to jtag - Xiangfu Liu
<xiangfu> aw, it just enable the 'debug all' option for output more info for us to debug.
<aw> wolfspraul, yes, checked on D16...i found some interesting difference, second...will know soon to compare to the good one (0x39). ;-)
<aw> xiangfu, so with that 'debug all' with default ? , or i need to enable it?
<wolfspraul> we don't need that right now [debug all]
<aw> wolfspraul, wait
<aw> xiangfu, if this is just add 'debug all' option with default while I run it, i think that i can use it, why not?
<xiangfu> aw, http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.sh  just updated, it is default.
<wolfspraul> it will not help with our problems
<aw> 0x39: D16 (in-circuit); forwarding voltage - 152mV, reversing voltage - 1545mV
<aw> 0x32: D16 (in-circuit); forwarding voltage - 153mV, reversing voltage - 1114mV
<aw> so I am going to replace a new D16 firstly to see if this problem
<wolfspraul> aw: I just noticed 0x32 is a board that never rendered before
<wolfspraul> that could be a different problem...
<aw> wolfspraul, yes
<aw> it seems that different catagory failure
<wolfspraul> possible
<wolfspraul> what's your latest results with 0x32 now?
<aw> btw: a good new diode(off-circuit): forwarding voltage - 153mV, reversing voltage - no voltage can measured
<wolfspraul> you replace D16 on 0x32 with a new one?
<aw> now 0x32: D16 (in-circuit) reversing voltage is 1114mV, but I replaced a new diode on board, it got 886mV, it must porgram_b loop let reversing voltage down a bit.
<wolfspraul> does flashing work?
<aw> also the replaced new D16, I tried to take apart and measure its reversing voltage is still good
<aw> wolfspraul, i didn't do reflashing
<aw> try again...if still can't reflashing ...leave it apart then
<wolfspraul> ok
<wolfspraul> let's look at 0x34 now
<aw> maybe just another failure classification
<wolfspraul> that one rendered before
<wolfspraul> what is TP36/TP37 on 0x32 now?
<aw> tp36 - 770mV, tp37 - 838mV , wrong
<wolfspraul> hmm
<wolfspraul> ok
<wolfspraul> try 0x34
<aw> yes, of course it can't reflashing and stop at 1484404
<aw> right
<wolfspraul> no need to test flashing with those tp36/tp37 values
<aw> no
<aw> i think this is good evidence. ;-)
<wolfspraul> no
<wolfspraul> makes no sense. I trust the tp36/tp37 values we measure.
<wolfspraul> 100%
<aw> 0x34: D16 (in circuit) forwarding - 154mV , rev. V - 1547 mV
<wolfspraul> once we have hard data, let's use it
<wolfspraul> well
<wolfspraul> aw: one by one
<wolfspraul> can we measure meaningful data in-circuit or not?
<wolfspraul> if not, let's stop doing it
<aw> wait wait
<wolfspraul> if yes, those values mean the diode is damaged and needs to be replaced?
<aw> let's test more boards and we see how reasonable For. & Rev. voltage they would be.
<wolfspraul> wait
<wolfspraul> I don't want all sorts of random data
<wolfspraul> that's a bad time waste
<aw> well
<wolfspraul> aw: is the data meaningful?
<aw> now 0x34 can reconfigure surely
<wolfspraul> did you apply fix2b to 0x34 already?
<aw> yes.
<aw> you are like baby-watching though. ;-) no problems
<wolfspraul> yes, sorry. try to understand the test data ;-)
<aw> it must somewhere let in-circuit voltage gets low (without power on)
<GitHub15> [scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd
<GitHub15> [scripts/master] add debug all to jtag - Xiangfu Liu
<aw> so later we consult with Werner, he may provide more details to us maybe. ;-)
<wolfspraul> aw: you talk about measuring D16 performance in-circuit?
<aw> yes
<aw> For. & Rev. voltage measured before power-ed -on but in-circuit.
<wolfspraul> ok
<wolfspraul> alright, back to 0x34
<wolfspraul> so it is booting now?
<wolfspraul> I think you should reflash (reflash_m1.sh), and re-run all tests and rendering cycles (10)
<aw> now 0x32 has worse Rev. voltage (below 1545mV), this means somewhere others influence D16's specification/behavior
<aw> now to reflashing. ;-)
<wolfspraul> see how it goes...
<wolfspraul> aw: maybe the reset ic on 0x32 has a problem?
<wolfspraul> (guessing)
<aw> xiangfu, wow..man! your debug log msg is many..let's see...reflashing now...;-)
<xiangfu> aw, for disable is. just remove the whole "debug all" line, just fyi
<wolfspraul> I was worried about that. I hope your terminal history is enough. You may have to increase it so that we don't loose data.
<wolfspraul> xiangfu: maybe it should be disabled by default
<wolfspraul> we are currently (as of right now) not aware of any problem that 'debug all' may help us with
<wolfspraul> so we can enable it when we run into such a problem
<aw> wolfspraul, not enough to show history. ;-)
<wolfspraul> yeah, well
<xiangfu> this commit disable it by default: GitHub15> [scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd
<aw> no problem, just try first one. ;-)
<aw> then i remove "debug all" line. :)
<aw> msg log stops at http://pastebin.com/M1ezi1AG
<aw> but m1 led still flashs, so it still in flashing i think...let's wait one more minutes
<aw> xiangfu, if its log is wrong, directly tell me.
<aw> @ full speed, it seems needs more much time...good now led2 and led3 are still flashing...let's see
<wolfspraul> ok
<wolfspraul> first of all, we turn off debug all
<wolfspraul> it's a bad idea to turn it on all the time
<aw> yup
<wolfspraul> full-speed should not matter much but I'm guessing, I can try here to set a baseline if that helps
<wolfspraul> if your log is screwed up now, you should redo the flashing
<wolfspraul> not "wait for some minutes"
<wolfspraul> that sounds wrong
<aw> leds are still flashing...;-)
<wolfspraul> what does urjtag do on your notebook now?
<wolfspraul> still logging something?
<aw> yes still logging output
<aw> stop it anyway?
<wolfspraul> the crazy 'debug all' output?
<aw> :-) don't know then
<wolfspraul> yes, stop and redo
<wolfspraul> without 'debug all'
<aw> redo now...
<aw> 0x34 good now...at least not stop at crazy 'length: 1484404' ;-)
<aw> good new: finished reflashing successfully.
<wolfspraul> :-)
<wolfspraul> xiangfu: can reflash_m1.sh log stdout and stderr into a file?
<wolfspraul> Adam could run it with redirection too like > urjtag_0x32.log 2>&1
<aw> crc checked okay
<aw> now go to rendering for 10 times
<wolfspraul> aw: wait
<wolfspraul> one idea for when you run reflash_m1.sh
<wolfspraul> so right now you just execute "./reflash_m1.sh", right?
<aw> sudo ./reflash_m1.sh 00 34
<wolfspraul> but you can run "sudo ./reflash_m1.sh 00 34 >> urjtag_0x32.log 2>&1"
<aw> listening..and standby
<wolfspraul> the >> should append to that log file, so even if you run multiple times it will be added to the end of the log file
<wolfspraul> and the 2>&1 will redirect error messages into the log file as well
<xiangfu> wolfspraul, yes.
<wolfspraul> xiangfu: does this work? can you try?
<xiangfu> yes
<wolfspraul> sometimes there are issues with redirecting bash scripts and sub-processes...
<xiangfu> works just fine.
<wolfspraul> well, then we should tell Adam
<wolfspraul> it helps him
<wolfspraul> aw: next time you reflash a board, try that
<wolfspraul> for example for 0x34, it would be:
<wolfspraul> sudo ./reflash_m1.sh 00 34 >> urjtag_0x34.log 2>&1
<aw> wolfspraul, do you mean that if everytime I run the same commands above, the massage will be "added" increasingly to .log file?
<wolfspraul> yes correct
<wolfspraul> so you only need to watch the board number
<wolfspraul> so you don't write into the wrong log file
<wolfspraul> you can collect all log files on your disk, and upload to the downloads server later
<wolfspraul> also saves time
<wolfspraul> just remember two things:
<aw> okay...good so that i dont need to do such stupid work "copy" and "paste" fro terminal. ;-)
<wolfspraul> 1. use >> (two characters, not one)
<wolfspraul> 2. always use the same board number in the reflash_m1.sh parameter and the name of the log file
<wolfspraul> yes
<aw> try now...second
<wolfspraul> xiangfu: can you try that this really works?
<wolfspraul> I don't have my m1 here right now
<xiangfu> wolfspraul, yes. I am running that now. only the output will be a little confuse since there are ^M when eraseflash. but it's ok
<wolfspraul> don't understand
<wolfspraul> where does the ^M come from, and where is it written to?
<wolfspraul> the log file?
<xiangfu> when you open the log file with VIM or Emacs there will be a little confuse, but open with 'gedit' will ok.
<wolfspraul> ok so it goes into the log file - good
<wolfspraul> and where does it come from?
<wolfspraul> and why?
<aw> after I types that commands above, the terminal doesn't show msg log, it should write directly into .log file
<wolfspraul> maybe remove it?
<wolfspraul> aw: yes, the terminal will show nothing
<wolfspraul> that's a little unfortunate if you run into an error
<wolfspraul> but eventually reflash_m1.sh will stop
<wolfspraul> and then you can look in the log file
<aw> can it show also msg log in the terminal? so i can see how it goes on..
<xiangfu> wolfspraul, when eraseflash it output like: (0% Completed) FLASH Block 0 : Unlocking ... Erasing ... Ok.
<xiangfu> wolfspraul, then ^M not '\n' %1 .. %2
<wolfspraul> alright, don't know whether that's the best/right but no time now :-)
<wolfspraul> xiangfu: one thing we could do is this:
<xiangfu> wolfspraul, so when you open with VIM there will be a BIG line. 0% --> 100% but this is ok in gedit
<xiangfu> aw, you can use : /reflash_m1_rc3.sh 00 2a 2>&1 | tee >> log
<wolfspraul> the reflash_m1.sh makes the stdout/stderr redirection inside the script, into the file, and also shows it on its own stdout/stderr
<xiangfu> aw, then you will get output both under terminal and log
<wolfspraul> ah
<wolfspraul> good idea
<wolfspraul> but let's be more precise please
<wolfspraul> sudo ./reflash_m1.sh 00 34 2>&1 | tee >> urjtag_0x34.log
<wolfspraul> why is there a special _rc3.sh btw?
<wolfspraul> aw: can you try that new line?
<wolfspraul> just reflash again with that line: sudo ./reflash_m1.sh 00 34 2>&1 | tee >> urjtag_0x34.log
<wolfspraul> man I hope we have only one script
<wolfspraul> :-)
<wolfspraul> aw: don't change anything with your script now, it worked before
<wolfspraul> don't touch it
<wolfspraul> just try the new line and add: 2>&1 | tee >> urjtag_0x34.log
<aw> i am asking that script if it's with default settings and newest?
<wolfspraul> don't touch your script
<aw> i want to download it again. ;-)
<wolfspraul> it worked before, it works now
<wolfspraul> NO!
<wolfspraul> you can only get new bugs :-)
<wolfspraul> aw: let's try the new line, and add: 2>&1 | tee >> urjtag_0x34.log
<aw> okay
<aw> terminal doesn't show up msg. :(
<aw> not parallel , so that i don't see anything in time.
<wolfspraul> hmm
<wolfspraul> maybe a side-effect of sudo?
<wolfspraul> of course it's not properly tested before, sorry about that
<aw> don't know
<aw> forget about this now. ;-)
<wolfspraul> wait
<aw> just copy and paste
<wolfspraul> wait one moment
<wolfspraul> hmm
<wolfspraul> aw: so you flashed the board?
<wolfspraul> and there was no output in the terminal?
<wolfspraul> or all at the end?
<aw> one point: a good command can let me stop anytime and it can still write into .log file and also shows up them in terminal though. I hope . ;-)
<wolfspraul> yes sure it's easy. just needs to be properly tested and done.
<aw> yes, no any msg shows up in termianl with commands above
<aw> now reflashed is doone
<wolfspraul> hmm
<wolfspraul> wait
<aw> used gedit to open .log file, it's okay
<wolfspraul> let's try one more random idea
<wolfspraul> if this doesn't work, then we need to get this right first, then talk to you :-)
<wolfspraul> but one more, here it is:
<wolfspraul> ah wait
<aw> m
<wolfspraul> xiangfu's line was wrong
<wolfspraul> :-)
<wolfspraul> try this:
<wolfspraul> sudo ./reflash_m1.sh 00 34 2>&1 | tee -a urjtag_0x34.log
<aw> okay
<wolfspraul> xiangfu: don't you think tee >> log is wrong? Adam needs tee -a log
<xiangfu> wolfspraul, both are ok. I have tested. with >>
<aw> mm..now terminal shows msg. ;-)
<xiangfu> but yes. sounds like -a is better
<wolfspraul> Adam wants to see the output
<aw> btw. can i use "ctrl + C" while reflashing..if I see reflashing stops
<wolfspraul> so if he uses >>, then the tee output is gone (no file parameter for tee)
<wolfspraul> aw: yes you can use ctrl-c, no hesitation
<aw> ans still can write into log file, which won't interrupt by my CTRL + C?
<wolfspraul> sure, it will all interrupt, like before
<wolfspraul> but the log is written
<aw> s/ans/and
<aw> hmm..okay..good
<aw> thanks
<wolfspraul> the log is always safe, no worries
<wolfspraul> you cannot loose anything in the log
<aw> good
<wolfspraul> just remember the syntax of the line
<wolfspraul> 2>&1 | tee -a urjtag_0x34.log
<wolfspraul> that will always append to the log, perfect for our use
<aw> yes, i recorded into my file already. ;-)
<wolfspraul> of course you need to make sure the filename has the correct board number
<wolfspraul> so whenever you work on a particular board, you add to the log file for that board
<wolfspraul> then upload all log files to the downloads server
<aw> okay
<wolfspraul> so...
<wolfspraul> back to 0x34 :-)
<wolfspraul> keep us posted
<aw> sure
<aw> you can go to the server folder to see log file when you back. :)
<wolfspraul> he
<wolfspraul> the bigger problem is what we saw on 0x32
<wolfspraul> but let's finish 0x34 now
<aw> reflashed done again
<aw> let's test it
<wolfspraul> after that is 0x39, also good (rendered before)
<wolfspraul> but 0x3A did not render before
<wolfspraul> anyway one by one
<wolfspraul> it's tough to mix design uncertainties with production surprises...
<wolfspraul> but we get through it
<aw> how about "flterm --port /dev/ttyUSB0 --kernel boot.bin"?
<aw> can it be added ">>" to log file too?
<aw> i still use stupid copy/paste method. :)
<wolfspraul> wait
<wolfspraul> you can add 2>&1 | tee -a log_file
<wolfspraul> I think we should write the urjtag and flterm into the same log file
<wolfspraul> so let's give it another name
<wolfspraul> for example rc3_0x34.log
<wolfspraul> so that would be:
<wolfspraul> 1. sudo ./reflash_m1.sh 00 34 2>&1 | tee -a rc3_0x34.log
<aw> since I'll test 10 times, so the log will be longer
<aw> okay
<wolfspraul> 2. flterm --port /dev/ttyUSB0 --kernel boot.bin 2>&1 | tee -a rc3_0x34.log
<aw> try now
<wolfspraul> even if you run another script like read_flash_m1.sh, you can append to the same log file
<xiangfu> flterm is different
<wolfspraul> read_flash_m1.sh 2>&1 | tee -a rc3_0x34.log
<wolfspraul> oops
<wolfspraul> :-)
<wolfspraul> xiangfu: alright, what works?
<aw> hmm...seems 'flterm' doesn't accept other parameters .:(
<wolfspraul> do copy/paste for now
<aw> it wrote logs as: http://pastebin.com/R0uFzSN9
<xiangfu> wolfspraul, it needs modify the flterm source code for log
<wolfspraul> but you can already use the name rc3_0x34.log when running reflash_m1.sh
<wolfspraul> it's a better name
<aw> not fully all msg saved into log file while using 'flterm'
<aw> sure sure
<aw> done
<wolfspraul> xiangfu: or we need to find a terminal program that supports logging/stdout somehow
<wolfspraul> adam needs practical solutions now. which is copy/paste for flterm
<wolfspraul> and the tee thing for reflash_m1.sh
<wolfspraul> xiangfu: if you can find an easy solution for terminal logging, tell us :-)
<xiangfu> yes
<xiangfu> aw, you can wrap the reflash_m1.sh to another script file like:
<xiangfu> #!/bin/bash
<xiangfu> mkdir -p log
<xiangfu> ./reflash_m1.sh $1 $2 2>&1 | tee -a log/urjtag_$2.log
<xiangfu> then you will not worry about the log name.
<aw> good solutions! thanks.
<aw> 0x34 rendering pass
<wolfspraul> nice
<wolfspraul> 0x39 now?
<aw> 0x39 I wrote reflash log again. will upload
<aw> rework 0x3a now
<wolfspraul> aw: what happened on 0x39 ?
<aw> 0x39: this was successfully yesterday . ;-)
<wolfspraul> ah ok, all pass
<wolfspraul> oh, forgot
<wolfspraul> confused
<wolfspraul> so 0x3A now, got it
<aw> yes
<wolfspraul> ok - 0x3A now, then 0x3C
<wolfspraul> 0x3A did never render before, 0x3C did
<wolfspraul> let's see...
<aw> 0x3A: D16(in circuit) For.V.=153mV, Rev.V.=1120mV, can reconfigure.
<aw> mm this is not the same 0x32. ;-)
<aw> measure tp36, tp37 for records first
<aw> 0x3A histories: never reflashed successfully before
<aw> 0x3A: tp36 - 2.66V, tp37 - 2.91V, no good; it must be reached to rough 3.3V
<aw> try to reflash now
<aw> mm..yes ...stop at 'Bitstream length: 1484404'
<wolfspraul> hmm
<aw> so once the tp36, tp37 voltage is not high enough, reflashing must be unsuccessful
<wolfspraul> interesting
<wolfspraul> oh sure
<aw> i leave 0x3A apart now.
<wolfspraul> wait
<wolfspraul> thinking
<aw> mm
<aw> or I replace a new diode . ;-)
<aw> let's do it. ;-)
<wolfspraul> wait
<wolfspraul> you mean replace D16 ?
<aw> yes
<wolfspraul> no I'm against that
<wolfspraul> I don't want to make random experiments
<wolfspraul> what is the theory behind that?
<wolfspraul> there is none
<wolfspraul> so - no
<wolfspraul> let me think for a moment
<aw> since the tp37, tp36 is directly connected to diode
<aw> mmm
<wolfspraul> ok but I want to think more, not randomly switch parts
<aw> if diode (in-circuit) is not fully acted as 0x39
<aw> alright..just discuss first
<wolfspraul> I still don't know whether those numbers are meaningful, when measure in-circuit
<wolfspraul> so it's just noise
<wolfspraul> let's see. so far we applied fix2b to 4 boards: 0x32 0x34 0x39 0x3A
<wolfspraul> 0x39 was the first one, and where we built the fix2b theory.
<wolfspraul> 0x34 works
<aw> 0x39, 0x34 with good diode(in-circuit) also tp36, tp37 are all good
<wolfspraul> 0x32 and 0x3A do not work. both never rendered before, and now they show bad tp36/tp37 values
<wolfspraul> so far all correct?
<aw> 0x32: no good on D16(in-circuit): For.V. = 153mV, Rev.V. = 1114mV
<aw> 0x3A: relatively D16(in circuit) For.V.=153mV, Rev.V.=1120mV, can reconfigure.
<aw> 0x3A: tp36, tp37 voltage is not pull high enough
<wolfspraul> what do you mean with "relatively"?
<wolfspraul> ok - let's measure forward and reverse voltage of D16 on 0x34. what is it there?
<aw> 0x32: tp36 - 900mV, tp37 - 1.1V
<aw> 0x34:  D16(in-circuit), For. V. = 154mV, Rev. V. = 1547mV
<wolfspraul> I found it. "0x34: D16 (in circuit) forwarding - 154mV , rev. V - 1547 mV"
<aw> i noitced if good diode(in-circuit) the Rev.V needs to be 1545mV
<aw> For.V is almost ~153mV
<wolfspraul> and you think the difference between ca. 1120mV and ca. 1545mV is the difference between bad and good?
<aw> if Rev. V is lower. means that could be have few current leakage
<aw> well..i just noticed but no theory to approve it
<wolfspraul> well since nobody else is awake, just try
<wolfspraul> random is fun ;-)
<wolfspraul> so you put a new D16 on 0x3A ?
<wolfspraul> and measure the old one after it's removed...
<aw> so probably only both in-circuit and tp36 tp37 are all correct. then d2/d3 is fully off and can reflash successfully
<aw> let's try to replace now. ;-)
<aw> mmm...this made me think C238
<wolfspraul> check C238
<aw> since program_b is one of the terminal of D16, also connected to C238, if my soldering is no good, thus C238 may be also not good a little.
<wolfspraul> what do you do now?
<wolfspraul> you put a new D16 on 0x3A?
<aw> mm try now
<aw> D16 I took apart is perfect: For.V = 154mV, Rev.V.=no value.
<wolfspraul> ok, so the problem was elsewhere
<aw> yes
<wolfspraul> but we still know good values for D16 when measured in-circuit, which seems to be ca. 150mV and ca. 1550 mV
<wolfspraul> so put the new one on, and measure
<wolfspraul> maybe the problem is C238, or somewhere else?
<aw> 0x3A: tp37 - 3.29V perfect, tp36 - 900mV
<aw> thinkning
<wolfspraul> what voltages do you measure on the new D16 (in-circuit) now?
<aw> mm...possible points: 1. C238 is not good quality after soldering 2. reset out
<aw> no
<aw> i haven't soldered new diode on boards
<wolfspraul> ah
<aw> somewhere is wrong to let tp36 not pull high enough
<wolfspraul> replace C238? replace reset ic? (I'm just guessing)
<aw> i am going to replace a new c238 first
<wpwrak> aha ! more tests :) lemme catch up ...
<aw> welcome!
<aw> i need your help. ;-)
<wpwrak> (fpga) oh, a bit of techno-mumble never hurts :)
<wolfspraul> just the right time for the savior, and I have to run to a meeting with Jon...
<wolfspraul> (in a little bit)
<aw> good
<aw> after replace a new C238
<wpwrak> still catching up .. lots of stuff :)
<aw> d2/d3 is fully OFF. man!
<aw> i hate myself though
<wolfspraul> focus
<wolfspraul> tp36/tp37 good now?
<wolfspraul> reflashing?
<aw> tp36 and tp37 go back to good 3.3V
<aw> now to solder diode back again
<wolfspraul> ok
<wpwrak_> let's parallelize this :)
<aw> from now on ...soldering back diode I always use a new one. ;-)
<wpwrak_> lots of bad diodes ?
<wolfspraul> aw: of course!
<wolfspraul> come on we don't need to slow ourselves down for trying to save 20 cent items
<wolfspraul> every chance that a diode is unsoldered, of course is a chance to put a new one there
<aw> good now
<wpwrak_> (0x32) that's all after removing the INIT_B diode ?
<wolfspraul> where are you reading now?
<wolfspraul> we are a bit ahead already
<aw> 0x3A: D16(in-circuit): For.V. = 153mV, Rev.V = 1547mV
<wolfspraul> yes perfect
<aw> 0x3A: tp36 and tp37 are all 3.29V
<wolfspraul> wpwrak_: I don't think a lot of diode problems
<aw> so there's big FACTs now:
<wpwrak_> (replace parts) in general, i would try to discard anything that got unsoldered (unless really really difficult to replace)
<wolfspraul> correct, fully agree
<wpwrak_> wolfspraul: i'm around "let's look at 0x34 now". just started catching up
<wolfspraul> wpwrak_: basically we have a reference value for D16 now when measured in-circuit - ca. 150mV forward, 1545mV reverse
<wolfspraul> when we see those numbers, we can assume D16 and C238 to be correct
<wpwrak_> sounds reasonable. those 1.5 V are some obscure path, but that's the price of measuring in-circuit
<wolfspraul> wpwrak_: ok, read top to bottom first...
<aw> 1. before I go to test these boards, just go for measure in-circuit voltage of D16, if not right. must be some other area is wrong, typical C238 and diode itself
<wolfspraul> aw: let's try to fix 0x32 now
<wpwrak_> ah, C238 acts up too ? interesting :) reading
<aw> 2. measure tp36 tp37 to confirm if 3.3V high enough
<aw> good now is reflashing.....this won't stop at 1484404 there. ;-)
<aw> now we have clear direction to fix these kinds of bugs. ;-)
<aw> but bugs belongs to me Adam...;-)
<aw> oah...man!
<aw> after reflash 0x3A, let;s back to 0x32. ;-)
<aw> oah~ no. 0x3A is d2/d3 dimly lit after reflash. :(
<wolfspraul> no problem
<wolfspraul> actually that's good
<wolfspraul> aw: measure TP36/TP37
<aw> tp36, tp37 is still 3.3V. good
<wolfspraul> D16 forward/reverse (in-circuit)
<aw> need to power off to measure
<wolfspraul> wait
<wolfspraul> d2/d3 is dimly lit right now?
<aw> yes
<wolfspraul> what was the process?
<wolfspraul> 1. you ran reflash_m1.sh
<wolfspraul> 2. it succeeded
<wolfspraul> then what?
<aw> wait
<wolfspraul> you power cycled?
<wolfspraul> or press middle button?
<aw> 1. I ran reflash_m1.sh
<wolfspraul> wpwrak_: caught up?
<aw> 2. do nothing....until it terminal log shows finished and saw d2/d3 dimly lit
<aw> i did nothing though. ;-)
<wolfspraul> huh? did it finish flashing?
<aw> no power off
<wolfspraul> can you upload the log?
<aw> yes, this failure was few cases in first round of tests though
<aw> okay
<wpwrak_> not yet. currently at the i/o redirection. maybe consider using "script"
<wolfspraul> it may be a software problem only
<wolfspraul> wpwrak_: ok so when you make it here :-)...
<wolfspraul> basically fix2b worked well for 0x39 (yesterday) and 0x34
<wolfspraul> it did not work for 0x32 and 0x3a (values see above)
<wolfspraul> on 0x3A, it turned out that replacing D16 and C238 made it work (well, not 100% sure yet, see the dimly lit story just unfolding)
<wpwrak_> btw, does reflashing still use "debug all" ?
<wolfspraul> no
<wolfspraul> I killed that :-)
<wpwrak_> (debug all killed) good :)
<aw> when you saw log, there's stop 1484404 there, after that I replaced C238 and diode. then can reflashed. ;-)
<aw> but do nothing once reflashed done
<wolfspraul> looks good
<aw> yes
<wolfspraul> still dimly lit now?
<aw> sure
<wolfspraul> press the middle button
<aw> no any flash on leds
<aw> no boot up
<wolfspraul> ok
<wolfspraul> now - power cycle
<aw> now tp37 tp36 is stll good 3.3V
<wolfspraul> ah wait
<wpwrak_> what's the voltage on INIT_B ?
<wolfspraul> no power cycle
<aw> can't reconfigure after power cycle
<aw> wpwrak, bad..
<aw> i powered
<wolfspraul> before we do measurements, I suggest to disconnect/reconnect the jtag-serial board, and flash again (remember to check that you flash in usb full-speed)
<wolfspraul> this board was just flashed for the very first time, so it could be related to that
<wpwrak_> a virgin board. maybe it's a little shy :)
<aw> moment...the init_b is now at bottom side..phew~
<wpwrak_> aw: ;-)
<wolfspraul> aw: I suggest - reseat jtag-serial board, flash again
<wolfspraul> maybe there was a problem writing into nor, whatever problem
<wpwrak_> an item for the shopping list: lab at zero gravity ;-)
<wolfspraul> and this was the first flashing. so it may be something totally different from our 'permanent reset' issue before.
<aw> wpwrak, init_b = 3.3V while d2/d3 dimly lit
<wpwrak_> that means that the FPGA is happy
<aw> so now power off and replug jtag board and reflash again?
<wpwrak_> maybe see if you can load the test program ?
<wolfspraul> won't work, no reconfig
<aw> wpwrak, once d2/d3 dimly lit, the middle btn is no action so that can not enter test s/w
<wpwrak_> ah, i see
<wolfspraul> aw: disconnect/reconnect jtag-serial, reflash
<aw> okay
<wpwrak_> INIT_B = 3.3 V means either that the FPGA didn't even begin to reconfigure, or that it succeeded
<wolfspraul> wpwrak_: theoretically a boot path entirely over jtag/fpga/sdram could be written, but a number of pieces are missing now
<wolfspraul> I think we can load the bitstream over jtag, but then the bios has to come from nor
<wolfspraul> but even for that we have no scripts ready now, right now
<wpwrak_> wolfspraul: you need a devirginator ;-)
<wpwrak_> (like we had at openmoko)
<wolfspraul> yes I know
<wolfspraul> people complained to me about inappropriate naming of technology by some rogue staff...
<wpwrak_> ;-))))
<wolfspraul> to which I said it's beyond my control :-)
<wpwrak_> so somebody noticed. i was wondering ;-)
<wolfspraul> oh sure. this is actually not so pleasant to talk through with Taiwanese staff, female staff, etc.
<wolfspraul> but we are all for free speech etc.
<wolfspraul> in the US you would be in big trouble
<wpwrak_> yeah. i never expected the name to stay around for long. so i'm quite surprised it did :)
<wolfspraul> the problem is they take it serious, look it up in a dictionary etc.
<wolfspraul> not so good
<wolfspraul> :-)
<wpwrak_> oh dear :)
<wolfspraul> here you go. devirginator "A person who consistently sleeps with virgins i.e. removes their virginity or pops their cherry. Can be male or female."
<wolfspraul> want me to discuss this with Taiwanese staff? no! please not!
<wpwrak_> i think i got the idea from someone calling fresh-from-the-fab boards "virgin" boards
<wpwrak_> heh :)
<wolfspraul> well. they look it up.
<wolfspraul> and that's what they find
<wpwrak_> duly noted. need to find more obscure names
<wolfspraul> nicely explained in Chinese maybe even
<wpwrak_> the depravity of us westeners
<wolfspraul> I should have suggested they schedule it to be added as an 'new words seen in the office' for the weekly English class
<wolfspraul> move the problem to that teacher, so they earn their money...
<wpwrak_> ;-))
<wpwrak_> make sure they all use it in daily conversation with other people :)
<wpwrak_> our current board is 0x32, right ?
<wpwrak_> to see what's happening, maybe monitor TP35 (DONE) with a scope when power cycling
<wpwrak_> even better: monitor INIT_B too
<wolfspraul> no it's 0x3A now
<wolfspraul> but same case as 0x32 in that before fix2b, it never flashed or rendered
<wolfspraul> aw: any update on 0x3A ?
<wolfspraul> Adam is a little silent :-)
<wolfspraul> wpwrak_: I vaguely remember one case in the US where a developer did something similar, naming some internal little tool in an 'inappropriate' way
<wolfspraul> well, he had a nice little chat with general counsel or CEO or so, and then it got 'cleaned up' :-)
<wolfspraul> all fine with his job etc. but that kind of stuff will just not be tolerated in the corporate US world
<wolfspraul> so he ran around frantically trying to erase all traces of his neat little tool :-)
<wolfspraul> the pussies are in control
<wolfspraul> :-)
<wolfspraul> ah Adam just told me he got interrupted, back soon. and I'm out to meet Jon. crossing my fingers...
<wpwrak_> (us) yeah, that was of course part of the fun. knowing that this would never fly over there :)
<wpwrak_> 0x32 is in limbo, too ?
<wolfspraul> put aside
<wolfspraul> at that point we wanted to see some more fix2b results first
<wolfspraul> because 0x32 never rendered before
<wpwrak_> (more fix2b) sounds fair
<wolfspraul> then we did 0x34 (which rendered before and fix2b turned it all good)
<wolfspraul> and then 0x3A (which initially behaved same as 0x32 but then with C238 it got a little further, eventual resolution pending)
<wolfspraul> that's where it stands now
<wolfspraul> Adam thought 0x3A is a done deal, and he wanted to go back to 0x32, but then of course a problem still did show up on 0x3A
<wpwrak_> in murphy we trust
<aw> alright
<aw> i am back
<wolfspraul> ah, but I need to run. l8 and good luck!
<aw> wolfspraul, sure
<aw> wpwrak, so you got all histories of this moring test. ?
<aw> wpwrak, hehe..
<wpwrak_> still working on the backlog
<aw> alright
<aw> i think now i leave 0x3A apart firstly and back to see 0x32
<aw> ;-)
<aw> but before this, i need to record first
<wpwrak_> but i think if a board has okay voltages (after fix2b) but still has dim LEDs, the things to look at (with a scope) would be DONE and INIT_B. if INIT_B is inconvenient, use PROGRAM_B instead.
<aw> i see. now mine is  0x3A
<wpwrak_> DONE = TP35. at least that's easy :)
<aw> so okay...that's scope TP35 to trigger with program_b?
<wpwrak_> hmm, okay, trigger on PROGRAM_B rising
<aw> man! 0x3A now is dimly lit again after power cycle
<aw> let's see tp36, tp37 normal voltage first again
<wpwrak_> let's say 100 ms/div, peak, ~3 div before, ~7 div after the trigger
<aw> tp36 tp37 is still 3.29V good enough
<aw> okay
<aw> need to solder wire on TPs...second
<aw> ch1-TP36, ch2-TP35
<aw> wpwrak, i think i need to scope init_b though trigger with program_b .;-)
<wpwrak_> hmm, never finished configuration
<wpwrak_> yes, INIT_B would be interesting then
<aw> wpwrak, wait
<aw> not sure
<wpwrak_> pity you have only two channels
<aw> fro rc2 the waveforms I scoped , i should set to more 250 ms/div and to see if done has been pulled high?
<aw> wpwrak, aggreed?
<wpwrak_> dunno. in rc2, DONE should rise within ~300 ms. here, you have ~700 ms
<wpwrak_> but you can try. maybe the speed is variable / has gotten slower
<aw> let me try..hope not miss more important info.
<wpwrak_> another feature for your next scope: MEMORY :)
<aw> wpwrak, yes, ch2 is over 8 div, and still no pull high...so even fpga didn't enter reconfigure stage
<aw> wpwrak, ha..you can push Wolfspraul though..
<aw> phew~ try init_b now
<wpwrak_> (push wolfgang: yeah, i have a few ideas what needs to get bought if we should ever come across significant money. better scopes it pretty high on that list ;-)
<wpwrak_> (alas, good scopes aren't cheap. the ones i have my eyes set on are all in the USD ~10k+ segment)
<aw> yes, i remembered when i at OM, Wolfgang and Ruby tried to gather those info for you. ;-)
<aw> init_b captured.
<wpwrak> okay, caught up :)
<wpwrak> let's see what it shows :)
<aw> from rc2 waveform, init_b should stay at 1.2V roughly once program_b goes high, right?
<aw> lemme check, not sure though.
<wpwrak> seems that it should be around 3.3 V
<wpwrak> so it appears to hit a CRC error immediately
<wpwrak> does the script that reads the NOR via USB-JTAG work ? (in general)
<aw> lemme try it now
<aw> once fails on reconfigure, it may not read back...not sure...try this 0x3A first
<aw> if not , I go back to read 0x34
<aw> hey...wait
<aw> new discovery, i triggered again. and see init_b waveform is different though. ;) but d2/d3 still dimly lit.
<wpwrak> show me ! :)
<aw> goes high!~
<aw> totally different act from previous one. ;-)
<wpwrak> hmm, that's a weird one. i don't like the drops at t = +75 ms and t = +145 ms
<wpwrak> but maybe that's retries
<aw> this may show init_b can be output also input as an indicator
<aw> could be?
<wpwrak> sure. but we can't always tell when init_b is an input. input looks the same as output high :)
<aw> wpwrak, sorry that i don't understand you don't like the drops at ....?
<wpwrak> i wonder what they mean. but let's assume they're CRC errors.
<aw> so init_b indicator should be High once fpga inside finished CRC checked and show High syncronized to PROGRAM_B?
<wpwrak> so FPGA comes out of reset at t = 0, tries to load from NOR, gets a CRC error at t = +75 ms, tries again, gets another CRC error at t = +145 ms, tries again, ... and then seems to succeed (?)
<wpwrak> INIT_B should be high while the CRC is okay
<aw> mm...so this needs to consult with lekernel  to confirm?
<aw> yup~reasonable from rc2 waveforms. got it
<wpwrak> (consult) naw, i think we don't need to bother him with this. yet :)
<aw> oah~okay
<wpwrak> now .. why would the NOR have troubles. hmm.
<wpwrak> next test: CH2 stays in INIT_B, move CH1 to TP37 (FLASH_RESET_N). then trigger on CH2 rising. move trigger to -200 ms so that we get the same time window as the last time
<aw> hmm..let's trigger other pin of flash chip, to see if flash is in correct assertion?
<aw> hmm...okay...
<wpwrak> if the reset looks good, then we'd have to test the other NOR pins, yes. this will be fun :)
<wpwrak> but i think if the reset is fine, then 0x32 should go to the "try to fix this when you have plenty of time" queue. because that can easily keep you busy for a whole day.
<aw> oah...yup..whole day or directly replace flash chip...but too bad now is out of stock here. :(
<wpwrak> so ... INIT_B + TP37 ?
<aw> phew~ :(
<wpwrak> ;-)
<wpwrak> reset is good.
<wpwrak> regarding the NOR reading script, have you ever used this script successfully ? (with a board that works okay)
<wpwrak> if yes, maybe you can try it here too
<aw> never used but now lemme try 0x39
<aw> since i doubt fpga will let jtag access with flash chip if unsuccessful on reconfigure. but worthy to read though that i 've never read before. ;-)
<aw> reading from 0x39. :)
<aw> xiangfu, reading flash image will be slow?
<xiangfu> aw, yes.
<aw> hours?
<wpwrak> i think it will allow jtag access :) before fix2b, failure to reconfigure meant reset trouble, which also blocked NOR access via jtag. now, failure to reconfigure means something else. so NOR access via jtag should work.
<wpwrak> aw: planning a five-course dinner ? :)
<wpwrak> ah nice. finally got M1rc2_powerOnOff_sequences_manuscript.jpg printed. now i no longer need a screen just for this :)
<xiangfu> aw, you want read whole 32MB flash? that needs ~4 hours.
<wpwrak> ouch :)
<aw> wpwrak, i'll always be beaten when do this with PHD. :)
<wpwrak> xiangfu: what does the reading script read by default ? everything ?
<xiangfu> wpwrak, no. it only read first 640KB.
<aw> xiangfu, what's the image from 640K?
<wpwrak> (640 kB) hmm, so that's a bit less than half the bitstream ?
<aw> xiangfu, man! i stop reading
<wpwrak> aw: 640 kB should take about 5 minutes
<xiangfu> wpwrak, it only read standby.
<wpwrak> oh, is see
<wpwrak> s/is/I/
<aw> wpwrak, hmm? so keep reading?
<xiangfu> whole standby partition, I mean.
<wpwrak> aw: yeah, let it finish. should be soon.
<aw> alright
<xiangfu> aw, if you want read soc bitstream. you can modify the script file a little.
<aw> no no...i think we just need standby
<xiangfu> ok
<aw> so do i need to modify script file or itself is for standby already?
<aw> xiangfu, so 5 minutes only?
<xiangfu> aw, no. by default only read standby . yes about 5 minutes
<aw> alright read again now. thanks
<aw> wpwrak, so how do we go next?
<wpwrak> aw: let's label 0x32 with "possible NOR instability" and put it on the pile of boards that need deeper analysis later
<wpwrak> aw: the, the next would be 0x3a, right ?
<aw> wpwrak, no , swapped them though
<wpwrak> swapped ?
<aw> so 0x3a is possible NOR instability
<aw> next is 0x32. ;-)
<wpwrak> ah, you were working on 0x3a ?
<wpwrak> i see
<aw> yup
<wpwrak> okay, let's see what 0x32 can do :)
<aw> now just reading 0x39 flash chip back. ;-)
<aw> after this, i go back to read 0x3a since you said it could be read though. ;-)
<aw> xiangfu, so Files is under /home/adam/.qi/milkymist/readback/20110817-1544
<aw> will always be the same name file? or everytime is different
<aw> hm...seems that you used system time. ;-)
<xiangfu> aw, filename always the same. the folder is changed.
<xiangfu> 20110817-1544 <-- is the data time
<xiangfu> yes
<aw> xiangfu, oah ..got it
<wpwrak> xiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?
<aw> xiangfu, i would like the saved/readback file name is related to mac address, is it possible? ;-)
<aw> 0x39 read back is done
<aw> now back to try to read 0x3a
<wpwrak> (name by MAC) mv readback/2011<Tab> 0x39-standby.bit   :-)
<wpwrak> or, rather  mv readback/2011<Tab> readback/0x39-standby.bit
<aw> wpwrak, oah~ sweety, you know i poor on cmd. ;-)
<wpwrak> mv /home/adam/.qi/milkymist/readback/20110817-1546 /home/adam/.qi/milkymist/readback/0x39-standby.bit          (or wherever you want it)
<aw> wpwrak, yes. you are right. 0x3a is reading now. ;-)
<wpwrak> see :) we're winning !
<aw> wpwrak, that's because we've seen program_b/init_b/rp# are all correct, so that's why you wanted to me to buy you a dinner! .;-)
<aw> wpwrak, so later how we compare those two bitstream files?
<wpwrak> (dinner) ah no, i was asking there, whether you were planning to take a long break (e.g., for a lavish dinner) while a very slow download is happening
<aw> wpwrak, oah...misunderstood though...
<aw> ha
<aw> wpwrak, can we from those two (0x39 and 0x3a) bitstream files to discover secrets behind?
<wpwrak> to compare:  diff -u <(hexdump first-file) <(hexdump second-file)
<wpwrak> (if you're using the bash shell)
<aw> wait..so we need to go deeply 0x3a or work for 0x32(next board) next after read from 0x3a?
<wpwrak> once you've downloaded the standby bitstream from 0x3a, please download it a second time. that way, we can see if it changes (e.g., if there is noise on the bus)
<wpwrak> xiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?
<aw> wpwrak, aha~ good idea.
<aw> seems he's not here. :)
<aw> 0x3a: read again now.
<wpwrak> yeah, seems that we lost him :-(
<wpwrak> can you upload the bitstreams you got so far (0x39 and 0x3a) somewhere ?
<lekernel> morning
<xiangfu_> my last message is : wpwrak, the test is standby --> soc bitstream --> BIOS --> test bin
<lekernel> imagines talking to someone in business suit: "- How do we program the boards? - Well you have to take the..., ahem,... the Devirginator"
<lekernel> hahaha
<wpwrak> xiangfu_: thanks !
<xiangfu_> morning
<wpwrak> lekernel: it gets better: at the factory, the girls working there were running ./devirginate from the command line :)
<wpwrak> (i didn't expect that, though :)
<aw> wpwrak, yes...lemme upload it first
<wpwrak> lekernel: top is FLASH_RESET_N (all is fine there), bottom is INIT_B
<wpwrak> lekernel: looks as if it hits a CRC error at +75 ms, retries, hits another CRC error at +145 ms, and then succeeds
<wpwrak> lekernel: now, i wonder if that CRC error would be in the standby or the "regular" bitstream. does the "regular" bitstream also use a load mechanism involving INIT_B, DONE, etc., as the initial hardwired loader ?
<wpwrak> afk for ~20 min
<aw> 0x3a has two read back files
<lekernel> if you only powered up, it only reads the standby bitstream
<lekernel> the regular bitstream is only read after middle pushbutton is pressed
<xiangfu_> aw, read the mac address, I can try.
<wpwrak> lekernel: ah, excellent
<wpwrak> the two 0x3a bitstreams differ from each other.
<wpwrak> here are the first few differences: http://pastebin.com/kLcGuu9a
<wpwrak> here's a better section: http://pastebin.com/d9nXTPbY
<aw_> wow..bad with each other
<wpwrak> DQ7 or DQ15 seems to have trouble
<aw_> that was done by 'diff -u <(hexdump first-file) <(hexdump second-file)'?
<lekernel> can we read the bitstreams several times?
<wpwrak> almost :)
<wpwrak> lekernel: that's already from two successive reads of the same NOR (no reflash in between)
<wpwrak> aw: this would be the command I used: diff -u <(hexdump -C 0x3a-1.bit) <(hexdump -C 0x3a-2.bit)
<wpwrak> aw: the -C adds the ASCII column on the right side
<aw_> wpwrak, okay..thanks, i record cmd first. ;-)
<lekernel> maybe that's just a urjtag bug?
<aw_> wpwrak, okay
<lekernel> urjtag won't use the same timings as the configuration system
<lekernel> so if you get intermittent read failures, it doesn't mean much
<wpwrak> lekernel: hmm, could be. would you expect urjtag to always have such issues ? or just with that usb-jtag board ?
<wpwrak> aw_: when you do your experiments, does each M1 has its own usb-tag board or do you use the same usb-jtag board for all the M1s ?
<wpwrak> s/has/have/
<lekernel> also, if it's a problem on a data line, why don't we get problems when writing too?
<lekernel> and why is the software CRC in the test tool passing most of the time?
<aw_> wpwrak, each board has its own usb-jtag board
<lekernel> this simply looks like urjtag bugs to me
<wpwrak> aw_: can you please read the 0x39 bitstream a second time ?
<wpwrak> lekernel: let's find out :)
<aw_> wpwrak, okay
<wpwrak> lekernel: this board hasn't booted in its life so far, so we haven't made it to the software CRC yet
<lekernel> it always failed to boot?
<wpwrak> yes
<lekernel> ah
<lekernel> ok
<lekernel> then maybe this is the problem
<wpwrak> now fix2b has been applied and it seems to work a little better. but still not okay
<wpwrak> meanwhile, fix2b has "cured" two boards (i think)
<wpwrak> so this is a new/different problem
<lekernel> what is fix2b? disconnect INIT_B?
<wpwrak> yes
<lekernel> this should not have any influence
<wpwrak> and also check D16 and replace if it looks suspicious
<lekernel> except if we use crappy diodes
<wpwrak> we do :)
<wpwrak> adam's current procedure is to disconnect INIT_B on the boards "in the cluster", then check TP36 and TP37 voltages. also measure D16 in-circuit, which seems to work more or less reliably. (he has removed a few good diodes, though)
<wpwrak> ah, and C238 once had an issue too
<wpwrak> so the whole fix2 rework is a bit fragile
<lekernel> argh....
<wpwrak> the joy of hardware ;-)
<wpwrak> lekernel: if you think this is bad, you should have seen how things went at openmoko :)
<wpwrak> lekernel: response times measured in days, unexplained departures from the procedure you asked them to perform, quick nonsensical ad hoc fixes thrown into the mix, and so on. pure chaos.
<aw_> wpwrak, hehe..at least all these are done myself though. not OM everyone could involved then you couldn't find out the root cause. ;-)
<wpwrak> lekernel: it once took me about half a year just to figure out whether they had fixed a missing resistor on the base of a transistor ...
<aw_> wpwrak, there's a fact: since i have to improve my soldering , but seems hard a bit. :)
<lekernel> here we get intermittent and weird problems that redefine peskiness
<lekernel> that compensates
<wpwrak> aw_: (everyone could play) yeah, the "chain of command" was a little ... strange over there :)
<wpwrak> it got much better if you were physically present, though. shorten the loop and catch suspicious activities quickly :)
<aw_> wpwrak, and i tried to openly as possible as i can. ha ;-)
<aw_> wpwrak, so how's next after 0x39's dump?
<wpwrak> lekernel: some of the component issues are indeed a bit surprising to me
<aw_> if no err with each other?
<wpwrak> aw_: upload the dump and then we'll see if 0x39 also changes from dump to dump. if yes, then the dumps are worthless. if the 0x39 dumps are the same, then we can try 0x3a with the usb-jtag board of 0x39
<aw_> wpwrak, i see
<wpwrak> hmm, in the dumps, is the first byte DQ0-DQ7 or DQ8-DQ15 ?
<wpwrak> lekernel: ah, here's a way to test whether it's the DQx bus or usb-jtag: if is's always the same bit that changes, then the bus is the likely suspect. else, it's something else.
<wpwrak> writes a tester
<wpwrak> identical to the first 0x39 dump
<aw_> good..so now use 0x3a with the usb-jtag board of 0x39
<wpwrak> yup, let's try that
<aw_> so let's read twice or just one time ?
<wpwrak> hmm, let's do it twice
<wpwrak> since it will almost certainly differ from 0x3a-1 and 0x3a-2
<aw_> okay..so i name both as -3 and -4
<aw_> wpwrak, if both -3 and -4 are identical, this'll be a trouble for me. :)
<wpwrak> aw_: you can stop
<wpwrak> 0x3a-3 is identical to 0x39
<aw_> wpwrak, big trouble now. :(
<wpwrak> throw away the usb-jtag that was in 0x39 ;-)
<wpwrak> okay, now: reflash
<wpwrak> (reflash 0x3a)
<aw_> okay. ;-)
<aw_> reflashing...
<lekernel> wpwrak: from what we have seen so far, it seems to be bits 7 and 15
<lekernel> first byte is DQ8-5
<lekernel> DQ8-15
<wpwrak> adds big-endian mode to the bit comparer
<aw_> wpwrak, well...here I have usb-jtag boards with rc1 and rc2 version.  I have to know if they are different. 0x39 and 0x3a used the same usb-jtag rc1 vesion.
<wpwrak> aw_: heh, no idea what the differences are ;-) maybe it was also just a bad connection. we can find out later.
<aw_> 0x3a: reflashed done but d2/d3 dimly lit still there.
<aw_> sure we find out later.
<aw_> i didn't power off
<aw_> so let's quickly measure some TPs.
<aw_> tp36/tp37 stay well 3.3V
<wpwrak> the bit errors aren't uniformly distributed but affect most bits: http://pastebin.com/KfWwu3vb
<aw_> init_b keeps zero.
<wpwrak> aw_: now let's read back the bitstream
<aw_> okay
<wpwrak> the board doesn't know it yet, but it *will* boot today ;-)
<aw_> wow~ i am expecting that boot. :)
<wpwrak> resistance is futile :)
<wpwrak> just on the radio: there's some marihuana plantation burning (somewhere in buenos aires, it seems). and they say "some of the firefighters are affected by the smoke" ;-))
<aw_> wpwrak, so how's best and quick way that i can know if usb-jtag board is bad while testing m1?
<aw_> via 'diff" cmd to know?
<wpwrak> for now, that seems to be the best test, yes. actually, it could also be the M1
<wpwrak> but we'll find out soon :)
<wpwrak> oh, wait. there's a bug :)
<wpwrak> hmm, it's corrupt
<aw_> this is used 'good' usb-jtag of 0x39 and dump from just reflashed
<aw_> umm..so 0x3a-4 is not identical to 0x39?
<wpwrak> can you download again ?
<aw_> sure
<wpwrak> 0x3a-4 is very different from 0x39
<wpwrak> here's the beginning: http://pastebin.com/TfyV0f7W
<wpwrak> and then it gets much worse
<aw_> hmm
<wpwrak> strange patterns: http://pastebin.com/vdTjuDcy
<wpwrak> what's interesting is that 0x3a-2 was correct
<wpwrak> so the errors some to come and go
<aw_> you had have suspected before about usb speed transmission effect, will this related to that? or start to think if it might be a flash chip problem itself?
<wpwrak> s/some/seem/
<lekernel> wpwrak, have you tried a board that works? that might just be stupid urjtag bugs
<lekernel> let's not spend any time on those
<wpwrak> lekernel: yes, 0x39 is a "good" board
<lekernel> ok, and if it had so many failures, the software CRC wouldn't work so well
<wpwrak> lekernel: and two dumps from 0x39 were identical
<lekernel> mh
<lekernel> so the flash really behaves like crap on that 3a board which doesn't work...
<wpwrak> what's odd is that the bit position changes
<lekernel> maybe we simply have sourced broken flash chips
<lekernel> is the pattern reproducible with GDB read memory command?
<wpwrak> doesn't gdb need the BIOS ?
<lekernel> no
<lekernel> only you won't be able to access the SDRAM
<lekernel> you can simply 'pld load' the SoC and GDB will work no matter what
<wpwrak> ah, great. maybe you can walk adam through gdb use then (i'll be watching, since i don't know the process either)
<aw_> the flash chip this time i ordered from authorized here Taipei (WPI), it should be okay though i think.
<wpwrak> aw_: should i ask where the other NORs came from ? ;-)
<aw_> WPI taipei
<wpwrak> so the NOR in the rc3 boards is from WPI ?
<aw_> in one batch of order
<aw_> no splitted shipments
<wpwrak> better than "Flash soup kitchen" in wolfgang's backyard ;-)
<aw_> ordered 96pcs in one batch
<aw_> but not sure if i have stock now
<aw_> i hope smt sent me all back.
<wpwrak> yeah, let's hope the NOR is good in general. there could still be the occasional bad chip, of course. either factory-bad or didn't like SMT or whatever
<aw_> well..but do you have any idea on 0x3a? or let's still name it as 'possible NOR instability'? ;-)
<aw_> then we back to see 0x32. :)
<wpwrak> lekernel: have you heard of "baking" ? that's also a fun thing: components absorb water. some only very little others more. when you SMT them, the water evaporates and the steam pressure may crack the plastic ... somewhere. lots of fun to debug :)
<wpwrak> did you do the 2nd download ?
<aw_> oah..yes...second
<aw_> finished..upload
<wpwrak> interesting. it's identical to the previous one
<wpwrak> so the corruption occurred when writing, not when reading
<wpwrak> so .. can you please reflash ? :)
<aw_> wpwrak, yes, the 'baking' thing is a normal process in smt factory. i saw them do this thing.
<wpwrak> oh, and please upload the file you're using for the flashing, too
<aw_> wpwrak, you meant the script file or the results of reflashing log?
<wpwrak> the binary file that's the input of the script
<wpwrak> or if you don't know which file this is, the script first
<aw_> i upload script file first then we ask xiangfu which is the exactly bin as the input under my folder.
<aw_> reflashing...
<aw_> two scripts but now i am using 'reflash.sh', so cmd like this: ./reflash.sh 00 3A
<wpwrak> the file seems to be   ../standby.fpg   (?)
<aw_> okay..found it. lemme upload it
<wpwrak> excellent. thanks !
<wpwrak> the data read back from 0x39 is indeed correct
<aw_> but scrolling down to the bottom. :-)
<aw_> it will be added into file everytime I reflashed
<wpwrak> i see. okay, let's try to boot. maybe it works ;-)
<aw_> wait...you said 0x3a-5 is identical original file standby.fpg?
<aw_> but now d2/d3 are still dimly lit. did i understand correctly?
<wpwrak> no, 0x3a-3 was identical
<aw_> okay
<wpwrak> and 0x3a-2
<aw_> 0x3a-5 was not, wasn't it?
<wpwrak> 0x3a-4 = 0x3a-5, but different from 0x39
<aw_> got it
<wpwrak> did you try to boot ?
<aw_> since d2/d3 is dimly lit now. but let's try to press middle btn first
<aw_> if not
<aw_> i go for power off to see if d2/d3 is fully off.
<wpwrak> alright
<aw_> can't boot surely while dimly lit
<aw_> d2/d3 dimly lit after power - cycle. :(
<wpwrak> so still no go. hmm.
<wpwrak> okay, can you please take two more dumps ?
<aw_> yup
<aw_> okay...
<wpwrak> and then 0x3a goes back to the queue, "NOR mystery corruption"
<aw_> hehe :)
<wpwrak> after that, i'd like three more dumps from 0x39. to make sure that the reading does indeed work. that way, we can be sure we can use this tool to analyze future NOR issues. (or, if the reads of 0x39 also yield inconsistencies, then we know that we don't have a reliable tool :)
<wpwrak> but first the two from 0x3a
<aw_> hmm...good idea about 'reliable tool' preparation indeed.
<aw_> okay
<aw_> so I'll use current usb-jtag (i.e the original good usb-jtag we guessed) on 0x3a to 0x39 too.
<wpwrak> no, keep them as they are
<wpwrak> the original 0x3a usb-jtag is now in M1 0x39 and the original usb-jtag from 0x39 is now in M1 0x3a, correct ?
<aw_> correct
<wpwrak> since M1 0x3a is acting weird with both, the usb jtag may be okay. we'll test this implicitly when taking the dumps from 0x39
<aw_> so later we dump 0x39, will we use the 0x39's original jtag board? is that you wanted?
<wpwrak> no, M1 0x39 with usb-jtag 0x3a
<wpwrak> i.e., don't swap the usb-jtag. use them as they are now
<aw_> got it
<wpwrak> *hmm*
<wpwrak> this is identical to the one you had before reflashing
<aw_> man..so it's not identical to the latest reflash. :(
<aw_> i still reading another one...
<wpwrak> very very strange ...
<wpwrak> maybe DQ15 now simply fails consistently
<aw_> i hope this is not radiation problem though..but can be more clear after three dumps that worked 0x39. :-) exciting to know the results about 0x39 later.
<wpwrak> the problem strikes on average every 33.5875 bytes
<wpwrak> it seems way to reproducible to be just something random
<wpwrak> the differences are only in the first third of the NOR. then, suddenly, all is good
<aw_> hmm...so let's start 0x39 workable board to see secret though. ;-)
<wpwrak> hah, 0x3a-7 is different ;-)
<aw_> okay?
<wpwrak> yes, on to 0x39 !
<wpwrak> maybe put 0x3a in the fridge :)
<aw_> ha
<aw_> i think before i read 0x39 back , to boot up again and to see CRC check ?
<aw_> or no need though. ;-)
<wpwrak> ah no. i had made a mistake. 0x3a-7 is the same as 0x37-6. okay, that's what i expected.
<wpwrak> well, why not :)
<GitHub25> [rtems] sbourdeauducq pushed 1 new commit to mmstaging: https://github.com/milkymist/rtems/commit/8d6bc82d5a56faaae02ec9e1b25a2da4a19714b6
<GitHub25> [rtems/mmstaging] Merge branch 'master' into mmstaging - Sebastien Bourdeauducq
<wpwrak> s/0x37-6/0x3a-6/
<aw_> wpwrak, well...good that -6 & -7 at least they are reflashed again
<wpwrak> 0x3a is consistently wrong now. at least we've achieved that much ;-)
<aw_> alright: CRC pass and rendering too
<aw_> 0x39 reading...
<wpwrak> here's the error distribution in 0x3a: http://downloads.qi-hardware.com/people/werner/m1/tmp/errors-3A.png
<wpwrak> almost all in the first third
<aw_> wow!
<wpwrak> and yes, 20000 of them. no surprise it doesn't boot :)
<aw_> is there theory about this curve you did distribution?
<wpwrak> i don't see anything revealing there
<wpwrak> maybe i'll find something later :)
<aw_> or is a statistics?
<wpwrak> for now it's just "bad" and "weird" :)
<wpwrak> let's try the fridge approach. or maybe freezer. i guess the board should be okay with that too.
<lekernel> this rather looks like a bad flash chip, no?
<lekernel> those are _read_ errors, right?
<wpwrak> if there's anything even remotely temperature-related in the behaviour, the fridge/freezer will uncover it :)
<wpwrak> no, write errors
<wpwrak> all on DQ15. it read back several times perfectly
<lekernel> so read always works reliably now?
<wpwrak> also, the exact same errors occurred in two independent writes
<wpwrak> so it seems
<lekernel> ok
<lekernel> then we might blame urjtag too
<lekernel> bad write timing, maybe
<wpwrak> but the exact same pattern ?
<lekernel> imo the next thing to try is xilinx impact
<wpwrak> "same" as in "bitwise identical"
<lekernel> that's the standby bitstream right?
<wpwrak> (impact) ah right, we have that too
<wpwrak> yes, standby
<lekernel> aw_, do you still have your xilinx jtag cable?
<lekernel> ok i'm preparing a .mcs
<aw_> lekernel, yes, i have that
<lekernel> aw_, ok, wake up your ISE installation
<lekernel> we will reflash that problem bitstream with impact
<wpwrak> aw_: please take 0x3a out of the fridge again. we need it in the torture chamber ;-)
<wpwrak> lekernel: that somehow sounds as if it involved a hammer :)
<lekernel> yes, tough problems need tough solutions
<lekernel> ok, digging out the git history, the mcs generation command is: promgen -w -p mcs -o standby.mcs -s 32768 -u 0x00000000 ../standby/build/standby.bit -bpi_dc parallel -data_width 16
<lekernel> i'm resynthesizing the .bit atm ...
<wpwrak> we have the bit somewhere ...
<lekernel> it should be quic
<lekernel> it's a small design
<lekernel> that's .fpg
<wpwrak> ah, yet another format ?
<lekernel> .bit is the xilinx "standard" format with header
<aw_> wpwrak, still need other two, right?
<wpwrak> ah, wait
<lekernel> .fpg is raw flash content, with the words reversed to meet the idiosyncrasies of the way the fpga reads the flash, i.e. LSB first
<lekernel> ok I have the .mcs, emailing it to aw_
<aw_> lekernel, okay
<wpwrak> lekernel: this is the "original" bitstream adam used (fpg, though): http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/tool/standby.fpg
<wpwrak> aw_: (other two) lemme check first ...
<aw_> wpwrak, it still be good to know how 0x39 will be reliable or not though. pls check it. thanks. ;-)
<wpwrak> aw_: yes, please keep them coming
<wpwrak> aw_: 0x39-2 is good
<aw_> wpwrak, great
<aw_> keep reading
<aw_> lekernel, received standby.mcs, tks
<aw_> wpwrak, after these three dumps, i go for dinner first. ;-) sorry
<togi> i was at ccc last week, but i missed the milkymist talk :/ anyone know if it's available somewhere?
<aw_> and when I'm back. let's to see using xilinx tool. yup..long time not use it :-)
<wpwrak> aw_: we should put you on a sushi diet. maki, to be precise. then you can quickly eat a bit each time you have to wait for some up- or download :)
<aw_> oah~yup...i do really sorry on this.
<wpwrak> thinking of it, that would work for me too. i have a sushi restaurant just around the corner. and they do delivery :)
<wpwrak> well, actually japanese restaurant. but it seems their non-sushi stuff isn't so great.
<aw_> seems i have to buy more foods in preparation.
<wpwrak> also good
<aw_> so this means usb-jtag boards are not the problem source at least, right?
<lekernel> aw_, let's see how it goes with impact
<lekernel> flash the .mcs and see if the standby bitstream works now (ie. good readback + LEDs go fully off when power is applied)
<aw_> lekernel, sure but lemme go out for foods first ;-)
<lekernel> enjoy :)
<aw_> the third one hasn't been finished though.:)
<togi> lekernel: thanks!
<wpwrak> aw_: also identical. thanks !
<aw_> wpwrak, great! so this shows up that we currently no need to worry usb-jtag board. i'll be back soon
<wpwrak> yup, usb-jtag looks good. enjoy your meal !
<aw_> k
<lekernel> wpwrak, thanks so much for your help.
<wpwrak> no problem. it's fun ;-)
<lekernel> wpwrak, to sum up now, it may seem that we have a combination of a) unreliable writes b) intermittent reset circuit fuckup that causes boards to fail in the field?
<wpwrak> i think b) is starting to disappear. we may not have found all the critters there, but at least some.
<wpwrak> what causes a) is still a mystery. oh, and we also had changes between reads. so it's all still foggy.
<lekernel> you said that reads were working reliably?
<wpwrak> now they are. on 0x3a, there were successive reads with differences.
<lekernel> let's always use impact on problem boards now
<lekernel> it received more testing than urjtag and the usb-jtag board
<wpwrak> let's first see what impact impact makes :)
<scrts2> who is the one here coded ethernet? I wonder if the milkymist is connected to a bigger network through a few switches, is there a packet queue for packets, which are received in different time or are duplicated? e.g. ip header identification field shows, that the later packet has identification value smaller than the previous packet, which means that this particular packet must be processed before the other
<wpwrak> now let's find out what impact "impact" has :)
<aw_> wpwrak, hi yes, i am looking for my rc2's previous script~ phew
<aw_> lekernel, i tried to reflash 0x39 firstly via xilinx jtag
<aw_> but i have question: while reflashing standby.mcs file , do the d2/d3 flash? i forgot that if they must be flashed via xilinx tool.
<aw_> this is i currently use for xilinx jtag to reflash standby.mcs
<lekernel> aw_, all you should do is 1. load the standby.mcs I sent you using the impact gui (not the script) 2. test if the flash was correctly written
<lekernel> period
<aw_> hmm...okay..i go to open impact gui first
<lekernel> actually that script you pointed might work as well
<lekernel> just don't forget the template.cmd file
<aw_> hmm....no included template.cmd under folder...try again
<aw_> mm..no template.cmd already there.
<aw_> i opened impact gui. as i knew before: only used this to do read device status/device id etc... I 've not loaded into standby.mcs with this iMPACT before. only used script. :(
<aw_> how to load standby.mcs via iMPACT gui? i need to set many parameters?
<lekernel> .....
<lekernel> no
<lekernel> create new project, select "autodetect devices with boundary scan", then when it asks whether you want to program a flash attached to the fpga say yes and select the .mcs
<lekernel> it's completely trivial
<aw_> okay
<lekernel> i cannot give you step by step instructions, I lost the ribbon cable of my xilinx jtag cable
<Fallenou> I confirm, it's trivial
<Fallenou> even for non fpga-expert like me :)
<aw_> man! created a project as autodetect with boundary scan. now "Identify Succeeded", but which item that I go for selecting my *.mcs?
<aw_> Fallenou, he..seems not trivial for me. :)
<Fallenou> well you want to put the bitstream in the flash ? or just program the fpga ?
<aw_> load standby.mcs file in fpga
<Fallenou> right click on the FPGA
<Fallenou> and there should be a menu element that says "load bitstream" or something like this
<aw_> oah~ i see it, thanks
<Fallenou> assigner configuration file
<aw_> i assigned done with 16bit data bus/BPI/Flash chip
<aw_> and ?
<Fallenou> when you have assigned the configuration file to the proper device
<Fallenou> then you can do thing like right click, configure or something like that
<Fallenou> (I don't have impact on my computer, sorry)
<wolfspraul> wow, need to read the backlog...
<aw_> oaw~ i see...must right click on "flash" icon then program it. :)
<aw_> now it's programming...
<Fallenou> if you right click on the "flash device", then you are programming the flash
<Fallenou> not the fpga directly
<Fallenou> I don't know what you are trying to do exactly though
<aw_> oah~ man! but good now my d2/d3 is fully OFF now..
<Fallenou> oh ok reading backlog I understand
<Fallenou> you should be ok
<aw_> Fallenou, i saw the console with the most likely message same as script though. :)
<aw_> lekernel, i right clicked on 'Flash' icon not fpga itself, is that right?
<Fallenou> ok good
<aw_> hm...i lost my self though
<lekernel> yes, click the flash icon
<lekernel> we do not care about what the leds are doing while you are in impact
<Fallenou> aw_: if you just want to reflash the board, so that the board would be able to boot without being plugged to a computer, then yes
<aw_> hm...good
<aw_> so now 0x39 boot up and rendering well
<aw_> so now let's go for 0x3a :)
<lekernel> ok, so it simply seems urjtag has some bugs that make writing unreliable particularly at the beginning of the flash
<lekernel> if you have boards that do not configure at all, give them the impact treatment
<aw_> just noticed that d2/d3 is fully off after xilinx tool finished programming
<lekernel> ah, hm
<lekernel> no, power cycle the board
<aw_> yes, 0x39 i powered cycle . it works well now. :)
<aw_> boot up and rendering
<aw_> so let's see 0x3a via xilinx tool next :)
<wolfspraul> as usual. it's not great but if the xilinx tool is more reliable, we should probably always use the xilinx tool.
<wolfspraul> lekernel: what do you mean with "writing unreliable particularly at the beginning of the flash"?
<wolfspraul> how is that possible?
<lekernel> wolfspraul, just fix the boards that did not pass with impact
<lekernel> if the CRC check is good, then writing was ok...
<lekernel> someone needs to fix this annoying bug in urjtag, but later...
<aw_> one question first: will i need to "identify" fpga eveytime when i am going to program a new board?
<wolfspraul> lekernel: ah yes, of course I agree. But what is this bug?
<lekernel> some pesky and mundane time sink
<lekernel> nothing very interesting I think
<aw_> hmm...i answered my question, just directly click 'flash' icon to program though. :)
<aw_> copied them from xilinx iMPACT's console. :)
<aw_> recorded first though.
<wolfspraul> lekernel: I see. bug dismissed I guess :-)
<wolfspraul> we can definitely use Impact for the rc3 run, but then I will try to find at least a workaround for the bug.
<aw_> good that xilinx iMPACT have readback function, but it read failed
<wolfspraul> I guess what is does is that when we write into nor, what arrives is not what we wrote?
<wolfspraul> now that we are on Impact, we can fix Impact issues :-)
<wolfspraul> I haven't completed the backlog yet, but is it possible that a wire to the nor chip is bad? do you want to try resoldering the pins?
<wolfspraul> I'm still reading backlog though, do what you think is right...
<kristianpaul> i jsut receiver (at work) a network appliance it said soemthing interesting, Memory test 4hr. System Stress test 1hr
<kristianpaul> received***
<kristianpaul> do we have memory test in milkymist?
<aw_> i programmed 0x3a again, still failed while "Reading device contents..." ~ phew~
<aw_> wolfspraul, not plan to soldering pins now.
<aw_> i'd rather tomorrow morning go for other pieces to keep on fix2b rework
<aw_> now...just back to 0x32 to see if i can fix like we did this morning...it's a long day story though.
<aw_> 0x39 example: good standby.mcs program log - http://pastebin.com/QuCz5fZk
<wolfspraul> yes
<wolfspraul> I lost overview with 0x32 0x39 0x3A
<wolfspraul> the backlog is scary, I cannot follow all details :-)
<wolfspraul> I think we should definitely move forward to other boards
<wolfspraul> not get stuck
<wolfspraul> I just need realibility that we are able to produce 100% stable and tested boards, so we can start selling.
<wolfspraul> reliability
<wolfspraul> it seems fix2b is good
<wolfspraul> right?
<wolfspraul> I mean I find no evidence in today's long work that there is any problem with fix2b.
<wolfspraul> so I think we should continue with more boards from the 19 and fix2b.
<wolfspraul> and if there is a problem, just move to the next board.
<wolfspraul> aw_: do you agree?
<wolfspraul> if you feel better, always use Xilinx Impact. Impact or reflash_m1.sh - your choice.
<wolfspraul> but pick one and stick to it
<wolfspraul> ah, finally finished
<aw_> wolfspraul, yes, agreed, Werner & me just tried to discover others we may pretty not sure. even for if usb-jtag is the problem source, but now this consideration is gone
<wolfspraul> but it seems Xilinx Impact did not help :-)
<wolfspraul> Xilinx Impact only showed right away that the read failed
<wolfspraul> in that case I would continue to use the jtag-serial board and reflash_m1.sh
<aw_> wolfspraul, no no...the xilinx tool i have only standby.mcs file from lekernel, with this only. i can't rely on xilinx for reflash all other boards
<wolfspraul> ok
<wolfspraul> and Xilinx Impact did not improve anything if I understood the backlog correctly
<wolfspraul> so just use reflash_m1.sh
<lekernel> wolfspraul, it did fix urjtag write problems with one board
<lekernel> no?
<aw_> just like lekernel said if I have some trouble with NOR problems, this xilinx tool with standby.msc could be helpful.
<wolfspraul> I'm overwhelmed with the details of the backlog.
<wolfspraul> I thought no
<wolfspraul> it just said 'failed' by itself
<wolfspraul> aw_: I think tomorrow we need to go to full speed mode. not get stuck on a few boards.
<wolfspraul> just power through the whole batch of 19...
<wolfspraul> if anything doesn't work or is unclear, just take a note and move to the next one
<wolfspraul> I feel pretty good about fix2b now
<wpwrak> aw_: hmm, but 0x39 worked before. and 0x3a fails with impact as well. so it seems with 0x39 both work and with 0x3a neither.
<wolfspraul> yes
<wolfspraul> wpwrak: did we find any evidence for problems with fix2b today? doesn't look like to me...
<wpwrak> okay, all agree :)
<wpwrak> wolfspraul: in fix2b we (still) trust :)
<wolfspraul> good
<aw_> agreed though...We only reworked 4 boards only this morning, and got 2 unknown reasons caused. Werner 7 me tried to figure this out hopefully..just don't want more boards like this...surely need to speed up...but tough decision though..
<lekernel> <aw_> yes, 0x39 i powered cycle . it works well now. :)
<wpwrak> i'd suggest putting 0x3a in the fridge. see if temperature changes it. we've has it work better and worse and the course of these experiments. very confusing.
<wpwrak> lekernel: 0x39 worked before :)
<lekernel> so why the hell did we flash a board with impact that worked before ?!?
<wpwrak> lekernel: adam tried a good board first. only then the problem board.
<wpwrak> :)
<aw_> 0x39 : both usb-jtag & iMPACT all works well
<wpwrak> wolfspraul: we verified that urjtag can read back the NOR quite reliably. so we can use it in the future for verifications, if necessary
<wpwrak> wolfspraul: what's a bit troubling is that there doesn't seem to be a proper verification of what gets written. at least we once got completely bogus content flashed. maybe the ... "verify skipped" (?) in the logs is a hint :)
<wpwrak> aw_: trying any more boards today ? or entering suspend mode ?
<aw_> wpwrak, yup..i gotta entering suspend mode to myself to start another day.
<lekernel> aw_, did we have similar impact flashing problems in run 2?
<lekernel> this sounds like a brand new problem, no?
<aw_> lekernel, in rc2, we finally got 35/40 pcs done
<lekernel> (and like crappy flash chips, too)
<kristianpaul> why are crappy the flash chips? is that a new discovering on rc3?
<lekernel> right now there are 51 working boards?
<kristianpaul> sorry i missed all backlog..
<lekernel> aw_, did the missing 5 run2 boards have similar flashing problems?
<aw_> lekernel, and those 4 pcs rest were mostly to yes d2/d3 dimly lit problems but at that time we guessed they were damages by "fast power-cyling"
<lekernel> there are no damages by fast power cycling
<wpwrak> lekernel: not sure if it's the NOR. could also be the FPGA. or soldering on either.
<aw_> and eventually those 4 boards are finally "dead" though... so which is if actually belongs to flash NOR problems, this is really good question!
<lekernel> what do you mean, "finally dead"?
<wpwrak> lekernel: what we've seen with board 0x3a were 1) good NOR content but (variable) errors on read and 2a) bad NOR writing with 100% reliable read (of the bad data) or 2b) good NOR writing (and unrelated failure to configure) with 100% reproducible corruption on NOR read
<aw_> thus cant reconfigure, but at that time we thought it was an unnormal production process on switch fast power-cycling then.
<wpwrak> lekernel: so, a bit scary that one. a moving target. but it think we did enough tests to be reasonably sure of these results.
<lekernel> try replacing the flash chip
<lekernel> wolfspraul, can we move forward with the other boards?
<aw_> well...these failure boards I'll leave them apart firstly
<aw_> lekernel, tomorrow i go directly for other boards with fix2b circuit
<lekernel> aw_, what is your next target?
<lekernel> what 'other' boards? the 51 working ones?
<aw_> lekernel, no the first 19pcs boards (including today's 4 boards already) and see what they move.
<wpwrak> lekernel: there are some more in the fix2b "cluster"
<lekernel> aw_, you are not touching the 51 working/available ones, right?
<aw_> so I'll go for rest 14pcs cluster tomorrow firstly
<aw_> lekernel, right
<aw_> well...time to go
<wpwrak> lekernel: the 51 "available" ones should at least be checked. some may also need fix2b and are just at the edge of not working. some of the boards in the cluster have worked a little once and then went worse, so the fix2b problem isn't just black and white
<aw_> I'll work on 1st 14 rest boards.
<aw_> good night
<wpwrak> lekernel: but it would be good to be able to test them in a non-intrusive way, to avoid more rework
<wpwrak> aw_: sweet dreams ! :)
<lekernel> wpwrak, should we apply fix2b on all the working boards?
<lekernel> they look nicer after that (no messy cable)
<roh> hey. how is it going?
<wpwrak> lekernel: i'm slightly in favour of applying it everywhere, yes
<wpwrak> lekernel: seems to be low-risk enough
<wpwrak> lekernel: and yes, we get rid of the cable. all evidence of human fallibility destroyed ;-)
<wpwrak> roh: today we had one with problems somewhere between FPGA core, NOR, and back. or, rather, it had us.
<wpwrak> roh: to make things more interesting, the problem pattern shifted. first it looked merely like a usb-jtag problem, but then that part turned out to be quite reliable but NOR reads or writes caused trouble.
<roh> oh. pcb routing problems?
<wpwrak> roh: fix2b is still looking good, though. we're getting further than we used to.
<wpwrak> roh: hard to say. could be bus, could be I/O pad drivers dying, could be a bad NOR bank, ...
<roh> wpwrak: whats fix2b?
<roh> i just learnt about burning streets in london etc. 10 days of camping take its toll (there was ip and power in my tent but i was too drunk and met too many interresting people to care)
<wpwrak> roh: fix2b = remove the diode between INIT_B and PROGRAM_B (and the wire going around the board). also, check that diode D16 is okay. some aren't, and let FLASH_RESET_N get pulled low or into an undefined state
<wpwrak> roh: fix2b solves: the problems with usb-jtag flashing stopping at "bit stream length = 14xxxxxxx" and failure to (re)configure on some boards
<wpwrak> roh: success rate about 50% so far on those afflicted by such problems (i.e., 2 out of 4)
<roh> hm
<wpwrak> interesting detail: when NOR reading on 0x3a was a problem, the bit flips were all 0 -> 1. when the reading stabilized, the bit flips were all 1 -> 0
<kristianpaul> lekernel: mm_i2l.pdf, thanks for publishing it !
<kristianpaul> the one about plasma looks worth to look to nice :)
<kristianpaul> if you have more slides about HDL specific and milkymist, please share :)
<lekernel> have you tried the demo binary on your board?
<kristianpaul> milkymist demo?
<kristianpaul> no never, i saw wolfgang to used at cparty no more
<lekernel> no, the demo bin from masteri2l plasma
<kristianpaul> no no
<kristianpaul> i'm at work now, and just reading rss now
<kristianpaul> nice, that tp_files tarball is a hello world at the milkymist style :)