<wolfspraul> wpwrak: those sound like really interesting ideas (remove D16, second reset ic for FLASH_RESET_N), but the problem I see is in how to test the result
<wolfspraul> I think that's also the reason why this bug has not been fixed yet. We first need to find a way to reproduce some 'bad' thing consistently, then we fix the 'bad' thing, then we verify that it's gone.
<wolfspraul> but how is this possible now?
<wolfspraul> aw: good morning :-) you are early!
<aw> wolfspraul, good morning ;-)
<wpwrak> wolfspraul: yes, let's first get some statistical data on 0x39. that one seems to be very good at generating the problem.
<wolfspraul> wpwrak: should we do 0x39 tests now?
<wpwrak> aw: and maybe you can check the reset circuit rework on 0x39 to see if there are any obvious issues (such as a reversed diode, capacitor not properly soldered, etc.)
<wolfspraul> aw: can we do some 0x39 work now?
<wpwrak> wolfspraul: first, the last status of 0x39 was that it didn't reconfigure, correct ? so the next test should be to see if it does now
<wpwrak> wolfspraul: then repeat the power-cycling with CRC loop until it stops again (which should be soon, probably less than 10 tries, if past results are any indication)
<wpwrak> wolfspraul: then try to retrieve the NOR content via jtag. check that it's okay. (that is, if we get this far. if the NOR chip is just messed up, e.g., held in reset, then jtag won't work either)
<wpwrak> lekernel: btw, does the FPGA assert the pull-up on FLASH_RESET_N during its built-in load process ?
<wolfspraul> wpwrak: no we will not be able to read nor then (same as yesterday)
<wolfspraul> of course we have to try
<wpwrak> wolfspraul: wasn't the failure yesterday a problem with the script ?
<wolfspraul> no
<wolfspraul> adam may have gone out, maybe pickup roh's second package...
<aw> okay...let's try to power on to if 0x39 can reconfigure now.
<aw> answer is NO after powered -on with whole night
<aw> so now going to read_flash_m1.sh
<aw> still stopped 'Bitstream length: 1484404'
<aw> what's next steps will you suggested? I would like to power off it firstly. ;-)
<wpwrak> maybe try a few times if you can read the bitstream (power on and off as you see fit :)
<wolfspraul> aw: can you try to use Xilinx Impact and Xilinx cable? can you use Xilinx Impact to read nor? Or just detect the nor chip?
<wolfspraul> I'm wondering whether xilinx impact would give us any new clues...
<wpwrak> ah yes, good idea
<aw> okay
<wpwrak> "xilinx cable" = also replaces the usb-jtag board ?
<aw> although this way from xilinx tool...but not sure if it can work
<aw> i do only follow the instructions from rc2
<aw> don't know if they are suitable.
<wolfspraul> yes remove usb-jtag
<aw> wpwrak, yes, use 'xilinx cable' can instead of usb-jtag boards
<wolfspraul> wpwrak: if we had a spare nor chip, I would suggest switching the nor chip to a new one, just to get another data point
<wpwrak> i'd save that option until the very end :)
<wolfspraul> but we don't have one, and taking one from another board would insert more variables into an already questionable fact finding
<wolfspraul> well, it could be interesting
<wpwrak> add two to the next digi-key/mouser order ?
<wolfspraul> for example if it then first worked, but after a few cycles fails again
<wolfspraul> 5 already on the way
<wolfspraul> let's say it first works, then fails
<wpwrak> yes, if it works at first ... :)
<wolfspraul> well
<wpwrak> it's kinda major rework :)
<wolfspraul> :-)
<wolfspraul> data point
<wolfspraul> I think it's not too hard, no?
<wpwrak> 56 pins .. depends ...
<wpwrak> i think my maximum is ~28. doable with chip-quick ;-)
<wpwrak> but maybe adam has better tools
<wolfspraul> nah maybe not
<wpwrak> or techniques :)
<wolfspraul> but in the factory they are swapping such chips within 30 seconds or so
<wolfspraul> would need to take a video and watch in slow-motion to see how they do it :-)
<wolfspraul> but a) we have no spare chip b) Adam may not be able to do the rework that easily
<wpwrak> in any case, i'd try swapping the nor last. too much can go wrong there. also, it may destroy evidence. e.g., if there's some short
<wpwrak> or near-short
<kristianpaul> xray ? :)
<wpwrak> gamma-ray laser ? :)
<kristianpaul> hum, whay so fancy?
<wpwrak> kristianpaul: more threatening :)
<wolfspraul> sorry, got disconnected
<wolfspraul> aw: any news? trying xilinx impact?
<aw> wolfspraul, this is not easy for me now. but have to try
<wolfspraul> aw: wait
<aw> 0x39 via sudo jtag: http://pastebin.com/HafSjUGL
<wolfspraul> if it's not easy, don't do it
<aw> i am going to see the instructions in rc1
<wolfspraul> or describe the steps one by one here, either we can make it work or not
<wolfspraul> no
<wolfspraul> that sounds like you will disappear for a long time
<wolfspraul> if the xilinx impact test cannot be done in 5 minutes, don't do it
<wolfspraul> :-)
<aw> well...
<aw> meanwhile should we just send one boards to someone right away?
<wolfspraul> no
<wolfspraul> I am looking at pastebin
<wolfspraul> why did you do 'quit' after 'detect'?
<wolfspraul> ping
<wolfspraul> aw: u there?
<aw> I only know these commands though
<wolfspraul> ok
<wolfspraul> let's clarify first
<wolfspraul> you are currently working on x039
<wolfspraul> 0x39
<wolfspraul> 0x39 cannot boot, when you plug in the DC cable D2/D3 become dimly lit?
<aw> unless else can tell me that i can directly use other commands to dump into file from standby bitstream. ;-)
<wolfspraul> is that correct?
<aw> yes, now 0x39, can't reconfiguration when power-ed on
<wolfspraul> d2/d3 are dimly lit?
<aw> now it's in this d2/d3 dimly lit still
<kristianpaul> (pastebin) yes, what happended with detect command ? :)
<wolfspraul> aw: try this: turn off power
<wolfspraul> remove jtag-serial daughterboard
<wolfspraul> power on again
<wolfspraul> what happens?
<aw> same d2/d3 dimly lit
<aw> if someone can lead me to use some commands in UrJTAG tool, there's must data can be dump from flash. not sure if xiangfu know this.
<wolfspraul> wait
<kristianpaul> yes :)
<wolfspraul> we can walk through, but it won't work of course, since the script also doesn't work
<aw> no not use script, i mean directly used UrTAJ
<aw> it doesn't work I don't know if the script doesn't work somewhere or UrJTAG itself
<aw> but if i enter UrJTAG, there's more commands can be used.
<kristianpaul> detect is easy to remenber
<aw> good, but now how i can dump standby bitstream?
<aw> from which address which commands?
<wolfspraul> waste of time imo
<kristianpaul> yeah
<aw> sorry that I very much poor on this
<kristianpaul> may be a voltage verification
<wolfspraul> ah good idea
<kristianpaul> where?, i dont know :)
<wolfspraul> aw: let's measure some signals :-)
<aw> so that's why i said that should we send one rc3 board to whom can dump it?
<wolfspraul> NO!
<wolfspraul> we are not covering up our incompetence by spreading the problem so thin that we can eventually claim it doesn't exist
<kristianpaul> aw: what i understand problem right now is not related to NOR content, yet :)
<wolfspraul> we don't know
<kristianpaul> sure
<kristianpaul> with this dimmy lit fpga is detected i guess but problem rais when loading bistream?
<kristianpaul> raise*
<aw> wpwrak, do you think that where/or which parts pin's signal I should measure? or I directly rework diode and C238 again?
<wolfspraul> not rework
<wpwrak> maybe do this: without power-cycling, put scope probe on TP37 (FLASH_RESET_N), scope set to AUTO, check the voltage and look for any noise
<wpwrak> then, keep the probe pressed to TP37 and reset or power cycle
<wpwrak> see if it reconfigures then
<kristianpaul> dont we have a list somwhere of know voltage expected for TPs (that apply to power suply and such)?
<wolfspraul> you can start making one with your rc2 :-)
<aw> wpwrak, stay tuned
<kristianpaul> good idea !
<wolfspraul> actually seriously you could provide some reference measurements for wires into or out of the nor flash as well, if you want to help
<kristianpaul> yes why not, let me check rc2 datahseet for avaliable testpoints
<wolfspraul> we are a bit asymmetric here. Werner has the clearest mind, but no board. I have a board, but no electrical capabilities. Xiangfu has a board but is hard to reach, Sebastien is sleeping. and so on :-)
<kristianpaul> yeah..
<wolfspraul> Adam has a lot of boards but is always worried he will damage something when running this or that software :-)
<wpwrak> (clearest mind) still with a cold, though :-(
<wpwrak> next project: an internet-attached alarm clock ;-)
<wolfspraul> ouch
<wolfspraul> kristianpaul: here's what werner said "maybe set trigger on OE#, then start with RP#, WE#, DQ0, A0, then do the rest of DQx and Ax"
<wolfspraul> those are reference measurements around the nor you could do
<wpwrak> /msg qi-bot wake lekernel    ;-)
<wolfspraul> it's on rc2, but those datapoints may help
<wolfspraul> well, I actually think Sebastien is thinking a lot about what the root cause could be, but has no striking idea.
<kristianpaul> ok as soon i can measure with no soldering cables is okay
<wolfspraul> this thing is really difficult because we can't pin it down
<wolfspraul> cannot really reproduce in a controlled way
<wolfspraul> problem appears and disappears without us understanding why it did that
<wpwrak> yup
<wolfspraul> it affects > 20 % of boards, at least. maybe with tougher testing even more. We don't know.
<wolfspraul> we don't know whether some boards have 'genes' that will make them never show the problem
<wolfspraul> and so on
<wpwrak> for all we know, it could affect all of them
<wolfspraul> yes, definitely
<aw> wpwrak, the TP37 while (d2/d3 dimly lit) is 259mV now
<wpwrak> aw: by the way, did you do the visual inspection of the reset rework ("fix2") on 0x39 ?
<wpwrak> that's a reset !
<aw> yes
<aw> reset status
<wpwrak> does it constantly stay at ~260 mV ? or does it change
<aw> i need to power-cyle to see it
<aw> but I bet it wil pull high once d2/d3 dimly lit is gone for sure. ;-)
<wpwrak> do we know at what point in time urjtag load fjmem.bit ? i.e., is there a specific step in the script ? or does it just do it automatically ?
<aw> wpwrak, TP36 is 120mV now
<kristianpaul> looks for TP37
<wpwrak> and TP37 ?
<aw> TP36 is program_b
<wpwrak> so we just have a permanent reset. interesting.
<aw> please be noticed that I knew a fact is:
<wpwrak> (manuscript) yes, PROGRAM_B_2 should be high, not low
<aw> the DONE pin will be from low to hi to show up fgpa finish reconfiguration.
<aw> wpwrak, wait
<wpwrak> DONE shouldn't matter. it doesn't connect anywhere near the NOR.
<wpwrak> (unless we have some interesting shorts :)
<aw> i said xilinx guy told me before and i checked the DONE pin which described the duration is "done" once fpga firstly access with flash. ;-)
<wpwrak> okay, but DONE is TP35
<aw> wpwrak, wait
<aw> the INIT_B will start a short duration of LOW and it acts syncronized with DONE pin reversely.
<aw> can you see that?
<aw> so my question is:
<wolfspraul> 'permanent reset' may be a much better description of the problem we see on rc3
<wolfspraul> at least it fits with the vast majority of test behavior I can think of right now
<wpwrak> aye. now on to the "why" ..
<wpwrak> aw:  can you touch TP37 with a 1-10 kOhm resistor to 3V3 ? and see how the voltage changes ?
<aw> wpwrak, will flash RP# pin acts wrongly while the start situation from reset's IC's output? meanwhile this duration, will standby bitstream acts wrongly if the "start" doens't access well then corrupted somewhere?
<aw> wpwrak, okay
<wpwrak> as far as i understand things, PROGRAM_B_2 low should also keep the FPGA in reset. so it shouldn't try to access the NOR at that time
<aw> wpwrak, it's R60 placeholder.
<wpwrak> (r60) yes :)
<aw> wpwrak, you want me attach a 10K while power is ON
<aw> or solder it after power off
<wpwrak> just see how the voltage on TP37 changes
<wpwrak> with/without R60 "placeholder"
<aw> wpwrak, TP37 is 318mV, TP36 156mV, d2/d3 dimly lit
<aw> after attached R60 10K
<wpwrak> okay, so that's not it. thanks.
<wpwrak> did you check that the diodes have the correct orientation ?
<aw> yes, two diodes are correct. this board 0x39 surely had have reconfigured if keep it days long.
<aw> so i don't know if i directly resoldering new parts of them can solve.
<wpwrak> hmm, tricky
<wolfspraul> no resoldering
<wolfspraul> what is the sequence the board goes through now from the moment power is applied on the DC jack?
<wpwrak> some one is pulling FLASH_RESET_N down. but who ? could be INIT_B_2, PROGRAM_B_2, the reset chip, the FPGA, or something that's not visible from the schematics
<wolfspraul> does the fpga ever start its configuration sequence?
<wolfspraul> or it goes into permanent reset immediately
<wolfspraul> who is in control, in which order?
<wolfspraul> is the fpga in control at some point? or always forced down from outside?
<wolfspraul> wpwrak: ah yes, you think in the same direction :-)
<wolfspraul> who is in control
<wolfspraul> can't we just measure backwards?
<kristianpaul> sorry i dont have all TPs needed to be usefull to you now..
<wolfspraul> flash_reset_n is pulled down
<wolfspraul> which timespan are we talking about between power-on and permanent reset?
<wolfspraul> just a few hundred milliseconds?
<wolfspraul> then we could scope the voltage of FLASH_RESET_N, INIT_B_2, PROGRAM_B_2 and even more and compare them side by side?
<wpwrak> wolfspraul: (measure backwards) maybe adam can make a little "power probe" :)
<aw> kristianpaul, yes, the rc2 doesn't have them. sorry that I should have told you first.
<aw> kristianpaul, thanks though. :-)
<wolfspraul> wpwrak: which timespan are we looking at?
<wolfspraul> from power-on to permanent reset
<wpwrak> aw: do you have throuh-hole resistors around 100 Ohm ?
<aw> wpwrak, i don't have but tell me you r idea firstly. ;-)
<wpwrak> wolfspraul: right now, i'm interested in the permanent reset. i think NOR shouldn't be reset in this state. but i'm not 100% sure
<aw> then I try to get it done
<aw> wpwrak, tell me firstly your idea. ;-)
<wpwrak> aw: (idea) solder a wire to 3V3. solder the other end to a ~100 Ohm. connect voltmeter (or scope) to the open end of the R
<wpwrak> aw: then touch things with the open end. this should do two things: 1) pull them relatively strongly to 3V3. 2) show the voltage
<wolfspraul> oh totally. with permanent reset we are onto something.
<wpwrak> or, maybe easier:
<wolfspraul> I just hope the board shows it long enough :-)
<aw> wpwrak, go on
<wpwrak> connect multimeter in DC current measuring mode to 3V3. then touch things with the other probe. e.g., check how much current TP37 can sink
<wpwrak> points of interest: TP37 (RP#), TP36 (reset chip out), the INIT_B_2 side of R157
<aw> wpwrak, okay..let's do a surgical operation on 0x39. ;-) stay tuned. :)
<wpwrak> no surgery. just measurements :)
<wolfspraul> kristianpaul: can you post your d2 dimly lit situation here?
<wolfspraul> so you had your rc2 board in a state where d1 was dimly lit and it wasn't detected by jtag?
<kristianpaul> yes, correct
<wolfspraul> there may be multiple bugs in this area, and that one may have already been independently fixed on rc3
<wolfspraul> a lot of 'may', sorry
<wolfspraul> :-)
<kristianpaul> sure np
<kristianpaul> FYI
<wolfspraul> is your board back to life now?
<kristianpaul> yes
<kristianpaul> phew.. :)
<wolfspraul> ok good
<wolfspraul> no don't worry
<wolfspraul> if it breaks, you will get a new one. rc3 even :-) but please don't be reckless because of that :-)
<kristianpaul> oh no
<wolfspraul> but also please don't worry
<wpwrak> wolfspraul: you just gave him a lot of reason to be reckless ;-)))
<wolfspraul> I am the manufacturer, and I support my stuff.
<kristianpaul> i still suing it as always
<wolfspraul> that's why I'm so keen on getting rc3 to a higher level...
<kristianpaul> actually i just was going to reflash as always and then... omg :)
<wolfspraul> even this guy hadez and others, I may just end up giving them new rc3. but one by one, first we need to make rc3 at a controlled quality level...
<wpwrak> kristianpaul: maybe as a warm-up, unsolder the FPGA, re-ball, then solder again ;-))
<kristianpaul> wpwrak: nah
<kristianpaul> wolfspraul: i still trusting my rc2 a LOT for now ;)
<wolfspraul> you can
<wolfspraul> it's a good board and we worked hard even then. of course we also learnt a lot since :-)
<wolfspraul> wpwrak: your idea with a second reset ic for FLASH_RESET_N - is it the same 4.4V reset ic as we have now?
<wolfspraul> I'm just asking in case I should get more parts :-)
<wolfspraul> with 'have now' I meant 'will have in a few days'
<wpwrak> wolfspraul: yes, the same
<wolfspraul> that second reset ic would replace the need for logic gates?
<wolfspraul> sounds like we have to do a few more experiments before settling on the reset circuit for rc4...
<wpwrak> it would replace one. not sure about the other
<wolfspraul> (which I wouldn't mix in with the permanent reset research on rc3 we are doing now)
<wolfspraul> ok, got it
<aw> wpwrak, take R60 apart, right?
<aw> then use 100 ohm to pull high on 3V3 then measure dc current.
<wpwrak> aw: probably doesn't matter
<aw> wpwrak, ?
<aw> still let R60 (10k) on it.
<wpwrak> aw: naw, let;s do it the simple way:just multimeter in DC current mode
<wpwrak> aw: measure current between TP36/TP37/INIT_B_2 and 3V3
<wpwrak> R60 probably has no effect
<aw> wpwrak, still need 100 ohm as a limited resistor thoug, right?
<wpwrak> if you can, it would be nice to hace
<wpwrak> haVe
<aw> okay
<wpwrak> else, just avoid touching GND ;-)
<aw> wpwrak, R60 still 10K there, TP37 (RP#, 16mA), TP36(reset pin out, 19mA), INIT_B_2 (24.7mA)
<aw> with 100 ohm to 3V3 measured. ;-)
<aw> TP 37 now 18.9mA
<wpwrak> hmm, they're all very strong
<wpwrak> what do you get on the 3V3 pin of the reset IC ?
<aw> voltage on TP36?
<aw> don't understand your question.
<wpwrak> current between 3V3+100R and pin 3 of U24 (pin 3 is the one on the side that has only one pin)
<wpwrak> expected value: 0.0000000 A ;-)
<aw> i measured pin 3V3 of reset ic is good @ 3.3V. ;-)
<wpwrak> okay
<wpwrak> this is weird. could there be any shorts ?
<wpwrak> high current on TP36 should only exist if:
<aw> i got -0.001 mA while attached pin 3v3 of reset ic
<aw> so no leakage current though. ;-)
<wpwrak> - PROGRAM_B_2 is actively pulling low (which, afaik, it never does)
<wpwrak> - the reset chip is pulling low (which is has no reason to do)
<wpwrak> - something is shorted into that net
<aw> yes, TP36 (program_b) is 67mV now
<wpwrak> and it pulls low with ~20 mA
<aw> yes
<aw> but don't know where surged
<aw> regards to if somewhere is short existed. this is really weird
<wpwrak> next: connect scope to TP36, acquisition: peak, trigger: auto, slow timebase (maybe 100-200 ms/div)
<aw> okay
<wpwrak> then power-cycle the board. see if it ever comes out of reset
<aw> i see a pluse fro low -> high -> low using rising edge
<aw> trying to catch it ;-)
<wpwrak> for how long does it stay high ?
<aw> yeah...wait
<wpwrak> (roughly) picoseconds / milliseconds / days :)
<kristianpaul> oops
<kristianpaul> sorry
<aw> yes
<wolfspraul> I cannot imagine a short, unless it's a short coming from (programmed) inside the fpga
<aw> wpwrak, i may use two channels to compare
<wolfspraul> that's because the board worked before, and got into this state without any hardware action (manual hardware action, like soldering)
<aw> wpwrak, maybe scope DONE pin? RP#?
<aw> in channel 2. ;-)
<wolfspraul> aw: if you think that might be helpful, just do it until Werner is back...
<aw> wpwrak, forget about my last picture, see this new: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_ch1-tp36_ch2-tp37.JPG
<aw> ch1-tp36, ch2-tp37
<aw> they are synced actually.
<aw> i'm going to scope done pin as ch2 to see different
<aw> the duration from the first pull high pulse to second pulse is thus ~180ms (which is the reset delay time)
<aw> alright..i think i need to test others after lunch
<aw> leave 0x39 aside temporarily. ;-)
<wolfspraul> aw: yes, let's wait for Werner's feedback and continue with other boards first
<wolfspraul> 'permanent reset' is an interesting new angle, maybe we are lucky and find something there...
<aw> hopefully get secrets behind it.
<kristianpaul> xiangfu: you can load bistream using jtag pload
<kristianpaul> xiangfu: _but_ you mm1 soc dont support boot from anyhthin besides NOR as is today.. may be the debug ROM is linked to jtag and you can boot from there.. i dont know upto there...
<kristianpaul> i hope i'm wrong on that and i missed that part of the HDL :)
<wolfspraul> now that we know that at least some boards are held in a permanent reset state, some of those earlier ideas lost value (for now)
<kristianpaul> yes,
<wolfspraul> because even if we could load and boot everything without nor on a functioning board, it wouldn't work on 0x39
<kristianpaul> correct
<wolfspraul> same for trying Xilinx Impact (which we skipped)
<wolfspraul> actually - is there still the chance that even on the 0x39 we have now, the fpga first reads a corrupted bitstream and then ends up forcing itself into permanent reset?
<kristianpaul> can we reset fpga from jtag?
<wolfspraul> probably not because then the access path via jtag-serial should still work, which it doesn't
<wolfspraul> good idea [reset fpga from jtag]
<kristianpaul> ah eys !!
<kristianpaul> mom
<kristianpaul> pld reconfigure
<wolfspraul> but that would end up doing the same thing, no?
<wolfspraul> it would read something from nor... ?
<kristianpaul> ah, yes :)
<kristianpaul> well bitstream must be loaded from somwhere
<wolfspraul> can we tell it to reconfigure from elsewhere?
<wolfspraul> like from what we supply over jtag
<kristianpaul> yeah, thinkking same..
<wolfspraul> but first maybe reset, then reconfigure from jtag
<kristianpaul> if the problem that fires permanent reset is on the powecyling
<kristianpaul> a pld reconfigure should sucess
<kristianpaul> as the board is already powred,
<kristianpaul> my guess
<wolfspraul> sure we can try, but let's assume there is a nor corruption
<wolfspraul> then it would hang itself again
<kristianpaul> yeap
<wolfspraul> if that nor corruption triggers the permanent reset
<wolfspraul> but how can we flash the board when nor is still empty?
<kristianpaul> _if_ what hangs is nor corruption and no some wrong tmings with reset IC perhaps?
<kristianpaul> ha, just wipe up a board that is know to boot and see what happen
<wolfspraul> does the fpga 'give up' when there's no bitstream in nor, but later it finds a corrupted bitstream and hangs itself?
<kristianpaul> should be a similar state as the no corruprion, as at the end..
<kristianpaul> hum no may be it starts okay then errors pop up and it just give up...
<wpwrak> hmm ...
<wolfspraul> ah :-)
<kristianpaul> or fpga dint catched bitstream corruption and lock up it self because of that?
<kristianpaul> thats even more trickier ;)
<kristianpaul> agree :)
<kristianpaul> malfctioning diode or reset ic?
<wolfspraul> we can definitely try a pld reconfigure on 0x39 when Adam is back, and measure FLASH_RESET_N etc. then
<wolfspraul> wpwrak: Adam uploaded 2 more later and said "forget the first one"
<wpwrak> it's hard to forget what looks like a 6 V+ pulse on a 3.3 V line :)
<kristianpaul> no no
<kristianpaul> horrible :)
<wpwrak> the others look better. some amplitudes seem too small, but maybe that's a limitation of the scope
<wpwrak> what is weird is that TP36 (PROGRAM_B_2) comes down. so either there's contamination from INIT_B_2 or FLASG_RESET_N, or the reset chip triggers for some unfathomable reason, or PROGRAM_B_2 becomes an output.
<wolfspraul> wpwrak: hmm. but what's wrong with the first picture then?
<wolfspraul> bad measurement?
<wolfspraul> or there was a 6V pulse on a 3.3V line?
<wpwrak> i'm curious what adam thinks he measured there :)
<wpwrak> for now, 0x36 looks more like a case of "multiple organ failure"
<wpwrak> err 0x39
<wpwrak> maybe put it aside and proceed with the next from the list we made yesterday
<wolfspraul> hmm
<wolfspraul> no other ideas for 0x39 ?
<wolfspraul> at least we can try "pld reconfigure" over urjtag
<wolfspraul> we can take another board and see whether we find the same permanent reset state
<kristianpaul> I'm out..
<kristianpaul> gn8
<wpwrak> for 0x39, the next thing i would try with the information we currently have is to remove the diode between reset chip and FLASH_RESET_N
<wpwrak> decouple the two systems, at the risk of now getting real NOR corruption
<wolfspraul> sure let's do that then
<wolfspraul> remove the diode and it may boot again?
<wpwrak> but i'd be curious about what lekernel thinks of the PROGRAM_B_2 net going low
<wpwrak> as i understand things, that can only happen if the reset chip pulls low, which it has no reason to do
<wpwrak> but my understanding may be incomplete
<wolfspraul> I'm curious why Adam said we should 'forget' the first tp36 picture
<wpwrak> if there's a condition in which PROGRAM_B_2 could become an output and pull low, that would be interesting to know
<wpwrak> yes, me too :)
<wolfspraul> we could replace the reset ic
<wolfspraul> but I hesitate to do these kinds of things while we are in analysis mode
<wpwrak> "do not look at the elephant" ;-)
<wolfspraul> well just replace with a new one
<wolfspraul> but maybe then our beautiful study object will work again and not tell us any more interesting stories
<wpwrak> or maybe just leave it out for testing. afaik, the FPGA shouldn't need it
<wpwrak> that could happen :)
<wolfspraul> yes
<wolfspraul> ok so
<wolfspraul> 1) find out why Adam said to ignore the first tp36 scope picture
<wolfspraul> 2) try 'pld reconfigure' from urjtag and see whether it stays in permanent reset
<wolfspraul> 3) remove the diode between reset ic and FLASH_RESET_N, see whether it boots
<wolfspraul> 4) remove the reset IC, see whether it boots
<wolfspraul> correct?
<wpwrak> but as a i said, there are several things that look wrong on 0x39. the 6+ V spike is worrying, if it's real
<wolfspraul> and then maybe, take another board and check whether we find a similar permanent reset condition there
<wpwrak> what does "pld reconfigure" do ? is this a reset ?
<wolfspraul> well
<wolfspraul> yes
<wolfspraul> seems like
<wpwrak> agreed on 1)
<wpwrak> 2) also seems reasonable
<wolfspraul> unfortunately I don't know the exact behavior of reset
<wolfspraul> will it automatically load the standby from nor?
<wolfspraul> is it possible that it loads a corrupted bitstream from nor which then locks itself (the fpga) up in permanent reset?
<kristianpaul> yes it will wolfspraul
<wpwrak> before 3), i'd like to have lekernel's opinion on PROGRAM_B_2 being driven low at ~20 mA while the board is powered
<wolfspraul> how come when we flash a board for the first time (nor empty), it will not load anything from nor (it's impossible because there is nothing there yet)
<wpwrak> it probably tries to load NOR but fails (or maybe the CRC is correct and it just loads garbage :)
<wolfspraul> well
<wolfspraul> can it hang itself up?
<wolfspraul> can the fpga itself be stuck in a loop that always ends in a permanent reset?
<wpwrak> can PROGRAM_B_2 become an output ? :)
<wolfspraul> ok, so no #3 or #4 until we hear from Sebastien
<wolfspraul> but in the meantime we can look at another board, with the new permanent reset focus
<wpwrak> yup
<wpwrak> if the same pattern appears on other boards, that would be good to know
<wolfspraul> ok we have 0x3C
<wolfspraul> but that one may not yet be in this state
<wpwrak> for all we know, 0x39 may have been ESD-fried ;-)
<wolfspraul> unlikely
<wolfspraul> if we would be hunting a rare problem (like 1 out of 100), and always poking around on the same board, then after some time I would agree and say "let's forget it until we have more boards"
<wpwrak> we do have photographic evidence of a 6+ V spike in a system that runs at 3.3 V and is supplied from a ~5 V supply. the supernatural is already there, on digital film :)
<Thihi> Dunno if anyone of you saw this, since I pasted this during the night. Anyway: http://kukka.siilo.fi/~kuutio/11-08-13-kissastuskausi.mkv - you guys might be interested in this. A small sample of what I do with a projector and a camera. Music has been ripped off from Boards of Canada.
<wolfspraul> well that may be cleared up fast
<wolfspraul> but anyway, we have enough boards and a problem cluster now to be sure it's not caused by ESD or other one-off phenomena
<wolfspraul> that's why we couldn't effectively dig in on the rc2 run (in addition to making mistakes how to handle it there)
<wpwrak> wolfspraul: oh, i think the cluster is real. just don't know what's up with 0x39
<wpwrak> 0x39 exhibits at least two phenomena that contradict my understanding of things: 1) the spike, 2) PROGRAM_B_2 being driven low (for more than 200 ms)
<wolfspraul> we could take 0x3C, 0x7F, 0x61, 0x40
<wolfspraul> I think 0x40 is erroneously set to 'available'
<wolfspraul> let's look at 0x40 first, then we can clear that up as well
<wolfspraul> if 0x40 is really good now, we can take 0x61 ?
<wolfspraul> well we have plenty
<wolfspraul> we just try to find a second one to support the permanent reset theory
<wolfspraul> wpwrak: the spike supports my idea that some 'bad' event is happening that may sometimes cause lasting damage
<wolfspraul> and for program_b_2 being driven low, I would think we find more instances of that now that we look for it, on 0x61 and others
<wolfspraul> let's see
<wpwrak> my list of boards that look as if they belonged to the cluser: 0x36, 0x3a, 0x55, 0x67, 0x6d, 0x6f, 0x70, 0x77, maybe 0x7a
<wolfspraul> he
<wolfspraul> all different from mine
<wolfspraul> ok let me look at your list...
<wpwrak> yeah :)
<wpwrak> we can pick one from each list ;-)
<wolfspraul> ah ok
<wolfspraul> I stay away from boards that have never rendered before
<wolfspraul> such as 0x36, 0x3A
<wpwrak> i see
<wolfspraul> your whole list :-)
<wolfspraul> of course it could be the same thing
<wolfspraul> maybe on those boards right from the beginning
<wolfspraul> but we risk running into one that simply has bad flash soldering or so
<wpwrak> yes, could be
<wolfspraul> aw: there you are :-)
<wolfspraul> so...
<wolfspraul> we have plenty of new ideas :-)
<wolfspraul> ready?
<wpwrak> aw: and what do you think was what looks like a >= 6 V spike ?
<aw> roh, back from post office and picked 2nd shipment up, tks.
<aw> wpwrak, that one when you read it, please use divide 10X, i forgot to set the setting to X1. ;-)
<aw> so forget it
<wpwrak> aah ! :)
<aw> and just use the second picture with two channels. sorry that.
<aw> that second is exactly correct. ;-)
<wpwrak> okay, makes a lot more sense then :)
<aw> alright, so how was new ideas?
<wpwrak> waiting for lekernel to tell us if PROGRAM_B_2 can be an output
<wolfspraul> aw: next idea is this:
<wolfspraul> take 0x39, power it with jtag-serial connected to your computer
<wolfspraul> usb full-speed as always
<wolfspraul> then run 'jtag' manually (not a script)
<aw> wpwrak, aha...if PROGRAM_B_2 has been pulled low while powered-up? just guess, right?
<wolfspraul> then "cable milkymist" then "detect" then "pld reconfigure"
<aw> wolfspraul, okay
<wolfspraul> I don't know about the two 'instruction' lines from the script
<wolfspraul> maybe we add those too?
<wolfspraul> so it's
<wolfspraul> 1. cable milkymist
<wolfspraul> 2. detect
<wolfspraul> 3. instruction CFG_OUT 000100 BYPASS
<wolfspraul> 4. instruction CFG_IN 000101 BYPASS
<wolfspraul> 5. pld reconfigure
<wolfspraul> type those commands manually
<wolfspraul> then check whether the board is still in permanent reset
<kristianpaul> try middle button after just in case :)
<wolfspraul> no confusion. that's later I think.
<kristianpaul> s/after/later
<wolfspraul> you want to try whether it boots? it first needs to survive reconfiguration...
<kristianpaul> yeap
<wolfspraul> kristianpaul: are the two 'instruction' lines necessary?
<kristianpaul> dont know..
<kristianpaul> i dont think, but i can arge a reason now
<wolfspraul> we just leave them in
<aw> after those cmd, TP37(RP#) is 236mV
<aw> d2/d3 dimly lit surely :)
<wolfspraul> ok
<kristianpaul> argh..
<wolfspraul> so 'pld reconfigure' does not do any magic
<wolfspraul> no problem
<wolfspraul> aw: I have a question about 0x40
<wolfspraul> why is it set to 'available'? (I look at the wiki test results)
<aw> wolfspraul, mm..good catch. it must be I powered -on again and test rendering pass, so I marked as 'available'; this was done before you told me that don't put it like as 'available' once it has been haven d2/d3 dimly lit before.
<wolfspraul> hmm
<aw> delete it now.
<wolfspraul> so the 'notes' are not complete?
<wolfspraul> ok I got it already, so the board does work now
<wolfspraul> alright, let's look at 0x3C then
<aw> no. it must be all rendering pass, so i marked available but forgot to fill some notes.
<wolfspraul> got it
<wolfspraul> let's put 0x39 aside, and look at 0x3C
<wolfspraul> power it, see whether it boots...
<wolfspraul> I want to find another board that stops with d2/d3 dimly lit, or cannot reconfigure, or cannot reflash
<aw> okay
<aw> 0x3c: can reconfigure
<aw> so do same manual cmd like above?
<aw> 0x3c: TP37 surely is 3.3V now
<wolfspraul> wait
<wolfspraul> try a few power cycles with test software (up to crc check only)
<wolfspraul> 10
<aw> ok
<aw> mm....1st press middle btn then d2/d3 dimly lit now
<wolfspraul> well nice
<wolfspraul> tp37
<aw> wait
<aw> TP37: now is much messy level from 1.2V to 3.3V. messy pulses!
<aw> mm...now is 3v3 and d2/d3 dimly lit is GONE.
<wolfspraul> he
<wolfspraul> but I think what you saw initially "messy from 1.2 to 3.3" and then to 3.3 and then dimly lit is gone all confirms out work so far
<aw> bad that I cant took TP3 pictures when it's pulse in unstable.
<wolfspraul> no problem, we believe you and it's in the chat :-)
<aw> yes
<aw> so how's the next we need?
<wolfspraul> let's see whether wpwrak is still around
<aw> press middle btn again?
<wolfspraul> sure, try
<wolfspraul> I would think it boots
<aw> same now
<aw> i stop the scope
<wolfspraul> same what?
<aw> let me take pictures
<wolfspraul> you pressed the middle button and then?
<wolfspraul> oh, more 'pulses' maybe :-)
<aw> when d2/d3 dimly lit, TP37 has messy pulse fro 1.2V to 3.3V, many pulses variance on stays this area
<wolfspraul> did you press the middle button? what happened?
<wolfspraul> if you can keep it in this state of 'messy pulses', can you measure the other test points as well?
<wolfspraul> aw: did you press the middle button? what happened?
<aw> actually it's low level reached at least down to 1.2V
<wolfspraul> please describe the sequence of events there, then it's easier to come up theories
<aw> it goes into s2/d3 dimly lit after I press middle btn again
<wolfspraul> interesting
<wolfspraul> so it did not boot
<wolfspraul> instead, d2/d3 dimly lit again?
<aw> now its' in dimly lit
<wolfspraul> ok
<aw> maybe my prober let TP37 recovered then rised to 3V3
<wolfspraul> well, we have to wait for more thoughts from Werner or Sebastien. I suggest you continue with regular testing and fixing across the batch.
<aw> so reset on flash chip is asserted then d2/d3 became fully OFF
<aw> then it goes dimly lit after I press middle btn again.
<wolfspraul> with us finding a second board right away that may very well be in a similar or same 'permanent reset' state, I think it confirms what we found on 0x39
<wolfspraul> right now it's dimly lit?
<wolfspraul> can you measure the other test points?
<wolfspraul> TP36, INIT_B_@
<wolfspraul> INIT_B_2
<wolfspraul> :-)
<wolfspraul> I think those, right?
<aw> now 0x3c: TP37 keeps stable low (209mV, surely dimly lit)
<wolfspraul> measure TP36, INIT_B_2
<aw> TP37(flash chip reset pin), TP36 (PROGRAM_B_2)
<wolfspraul> yes just to compare with 0x39, measure TP36 and INIT_B_2
<aw> 0x3c: program_b_2 TP36 is stable 3.3V
<aw> we need to know also from lekernel : what fpga does after pressing middle btn?
<aw> now...i am blind though
<wolfspraul> did you measure init_b_2 ?
<aw> init_b_2 is good low
<wolfspraul> ok
<wolfspraul> I suggest you continue with regular testing and fixing now
<aw> tp36 and tp37 is now unstable pulse together!
<wolfspraul> ok
<wolfspraul> we have enough data about 0x3C (I do)
<aw> unstable (some sort like high impedance from fpga), maybe I don't know.
<wolfspraul> I suggest you go back to the regular testing and fixing
<aw> mm
<wolfspraul> the more 'other' bugs we can fix across all boards, the less likely we are to later be confused when investigating the 'permanent reset' problem more
<wolfspraul> keep the 'notes' column updated with anything unusual or suspicious you see with a board
<wolfspraul> also, from now on, I suggest if you run into any of these problems: 1) d2/d3 dimly lit, 2) cannot reconfigure, 3) cannot reflash, you measure TP36 and write the value you see in the notes column
<wolfspraul> or maybe TP36 and TP37 - both? don't know
<aw> ok
<wolfspraul> maybe both :-)
<aw> well...too many
<wolfspraul> ok, then only TP36
<aw> 0x39 and 0x3c is good data now
<wolfspraul> no I mean for new boards
<wolfspraul> when you test them
<aw> well....umm...ok
<wolfspraul> and if you run into dimly lit/reconfig/reflash problem
<wolfspraul> normally you would stop there
<aw> once i run into failure, i write notes. ;-)
<wolfspraul> but now you measure TP36, and write it into the 'notes' column
<wolfspraul> yes
<wolfspraul> aw: I think that's a good idea, no?
<aw> well...write firstly though
<aw> i am manually operator now to test...a little afraid of my memory to forget many boards though, not bad idea...just slow only. ;-)
<wolfspraul> why forget. just go through the batch one by one, let's fix everything we know we understand.
<wolfspraul> starting from the easiest fixes, to the more difficult ones
<wolfspraul> in parallel when Werner or Sebastien are back we continue with the permanent reset investigatio
<wolfspraul> investigation
<wolfspraul> but now back to regular testing...
<aw> 0x3c & 0x39 & 0x40 I updated results
<wolfspraul> nice
<aw> yes, I go for others now
<wolfspraul> good
<wolfspraul> let's fix all easy and simple things we already know about clearly
<aw> scope pictures linked there.
<aw> yes
<lekernel> "still stopped 'Bitstream length: 1484404'"
<lekernel> please do NOT make such reports again. instead, input the urjtag commands manually and do not use the batch file. better, enable some debug output.
<wolfspraul> lekernel: any idea how it can end up in the permanent reset state as observed earlier?
<lekernel> so it's confirmed? what we though was "NOR corruption" on RC3 is just permanent reset?
<wolfspraul> well. making the judgment is the hard part.
<lekernel> hard? wtf
<lekernel> is the voltage on TP37 high or low when the board fails?
<wolfspraul> no making the judgment of whether we look at a "nor corruption" or "permanent reset"
<wolfspraul> because we think about a lot of boards and even more testing data. there may be multiple bugs, or different problems on different boards.
<lekernel> ok, whatever
<wolfspraul> earlier we did some tests on 0x39, did you see that in the backlog?
<lekernel> on the board you are debugging right now
<lekernel> what is the voltage on TP37 when it fails?
<lekernel> and I just mean after initial power up , no booting etc.
<wolfspraul> on 0x3C, we had pulses between 1.2V and 3.3V on TP37
<wolfspraul> on 0x39, it was around 200mV I think, checking backlog...
<lekernel> it should never be 1.2V
<lekernel> and never pulse
<wolfspraul> yes
<wolfspraul> TP37 318mV on 0x39
<lekernel> 200mV is not correct either, that would permanently reset the flash
<wolfspraul> correct
<wolfspraul> are you reading the backlog at all?
<lekernel> yes but it's quite not clear
<lekernel> wasn't Adam supposed to measure what drives that TP37 low?
<wolfspraul> hmm
<wolfspraul> ok, seems we are stuck
<wolfspraul> did we answer some of Werner's questions?
<wolfspraul> lemme see...
<wolfspraul> "waiting for lekernel to tell us if PROGRAM_B_2 can be an output"
<wolfspraul> the answer seems to be: no
<lekernel> yes, it is "no"
<wolfspraul> good, thanks :-)
<wolfspraul> now, there was another one
<wolfspraul> "high current on TP36 should only exist if: - PROGRAM_B_2 is actively pulling low (which, afaik, it never does) - the reset chip is pulling low (which is has no reason to do) - something is shorted into that net
<wolfspraul> and it pulls low with ~20 mA
<wolfspraul> so PROGRAM_B_2 is not the culprit
<wolfspraul> that leaves the reset ic or some short
<wolfspraul> Werner's next idea was to remove diode and reset ic and see what happens
<lekernel> ok, sounds good
<zumbi> lekernel: hello! someone pointed me to talk to you about some question I had
<zumbi> lekernel: basically I was looking into converting bitstream to netlist
<zumbi> lekernel: I found a tool called 'debit' at ulogic.org, but it does not seem to be online anymore
<zumbi> lekernel: do you know where can I find such tool? or any idea on how to reverse engineer bitstream?
<zumbi> lekernel: oh! wow, thanks
<kristianpaul> lekernel: what's the behavior of fpga when never get a bitstream from nor?
<kristianpaul> of our fpga in rc3 of course
<wpwrak> booting ... (sorry for my erratic napping pattern. bloody cold is messing with me :-( )
<wolfspraul> wpwrak: wow, take good care of yourself
<wolfspraul> your relentless support for Milkymist One is amazing anyway, I'm flattered and feel bad that I cannot debug those bloody circuits better myself :-)
<kristianpaul> indeed
<wolfspraul> wpwrak: Sebastien just confirmed that PROGRAM_B_2 cannot be an output, I guess that means next on 0x39 we remove the diode?
<wolfspraul> aw: can we do another 0x39 session?
<wpwrak> (reading backlog ... INIT_B_2 has no test point. but can be measured on R157)
<aw> yes, go on
<wolfspraul> wpwrak: oh we had some interesting results on 0x3C, but I think they confirm what we saw on 0x39
<wolfspraul> aw_: go back to 0x39, turn it on, tell us what happens
<aw_> are you sure 0x39 or 0x3c(this has messy pulses)
<wolfspraul> 0x39
<aw_> mm
<wpwrak> yeah, let's remove the diode. untangle the knot a bit :)
<wolfspraul> is it clear which one?
<wpwrak> (the diode between reset out and FLASH_RESET_N, keep the one between INIT_B_2 and PROGRAM_B_2 for now)
<wpwrak> (configuration document) ah, a godsent ;-) just wished it was 1/10 the size :)
<wolfspraul> wpwrak: did you see the 3C results? voltage between 1.2 and 3.3...
<wpwrak> (0x3c results) yeah, says "something's weird" :)
<wolfspraul> looks similar to what we see on 0x39, no?
<wpwrak> i wonder if we have some unintended connections between things
<wpwrak> maybe the "fix2" rework left some trouble
<wolfspraul> keep in mind boards first working and then falling into this state
<aw_> wpwrak, remove the diode which is between reset out and FLASH_RESET_N?
<wolfspraul> aw_: wait, first turn 0x39 on and tell us what you see
<wpwrak> aw: yes
<wolfspraul> d2/d3 still dimly lit? voltage on tp36 ?
<wolfspraul> wpwrak: so at least for the part of the problem that we see on a board that first works and then falls into permanent reset, it cannot be cause by some permanent short/connection on the board
<wolfspraul> maybe current flows the wrong way somewhere and slowly damages a part?
<wpwrak> voltages around 1.2-1.3 V look like some things working against each other
<aw_> 0x39: d2/d3 dimly lit, tp36- 78mV, tp37 - 238mV, init_b - 35mV
<wpwrak> i'm not sure we're seeing actual damage
<wolfspraul> ok perfect
<wpwrak> aw_: that's after diode removal ?
<wolfspraul> aw_: now power off, remove the diode
<wolfspraul> wpwrak: no, before (pretty sure)
<wpwrak> ah :)
<aw_> still have diode
<wpwrak> the executioner's axe don't swing so swiftly ;-)
<wolfspraul> wpwrak: one difficulty is that the problem is known to have spontaneously disappeared before
<wolfspraul> so even if the removal of the diode makes it go away, it may not be because of the removal of the diode
<wpwrak> indeed. but it if it doesn't come back on, say, 0x39, that's an indication
<wolfspraul> yes sure
<wpwrak> besides, we should still observe anomalies
<wolfspraul> just saying why I wanted the baseline
<wpwrak> but now the anlomalies will be in separate analog domains. easier to tame (i hope) :)
<aw_> so now, remove diode which between reset out and FLASH_RESET_N ?
<wpwrak> yup
<wolfspraul> have we ruled out some stupid mistake on the affected boards like diodes in wrong polarity? some mistake to the circuit on those particular boards?
<wolfspraul> since some steps were done manually (fix2), those would need to be double-checked
<wpwrak> i think so. i've asked adam to do a visual inspection and also to check the diodes
<wolfspraul> or would a wrong polarity diode lead to different behavior anyway?
<wpwrak> but in any case, a systematic search will turn up such things too
<wpwrak> lekernel: is the internal pull-up on P22 (FLASH_RESET_N) asserted during reconfiguration ?
<aw_> 0x39: d2/d3 dimly lit after removed diode(D16), tp36 - 45mV, tp37 - stable 3.3V, init_b  - 26mV
<lekernel> yes, it should be
<lekernel> but if it's not, that might well be the problem
<wpwrak> aw_: very good. FLASH_RESET_N is off the hook for now
<aw_> wpwrak, yup
<wpwrak> now, who's driving PROGRAM_B low ?
<wpwrak> we have three candidates: 1) the reset chip, 2) INIT_B, 3) divine intervention
<wpwrak> aw_: can you please power down and test whether the diode between PROGRAM_B and INIT_B does indeed work like a diode ? (if your multimeter has a diode test, that would be the easiest)
<aw_> wpwrak, do we need to trigger program_b again to see power on sequence?
<aw_> wpwrak, ok
<wpwrak> (trigger PROGRAM_B) heh, that seems to happen even without us doing anything. little gremlins are at work here ;-)
<aw_> wpwrak, you got right, now it's not activated as a diode behavior
<aw_> wpwrak, but let me confirm this again
<lekernel> crappy diodes breaking down?
<wpwrak> lekernel: maybe bad soldering
<wpwrak> wolfspraul: or fake diodes ? ;-))
<wolfspraul> a ghost
<aw_> wpwrak, forwarding voltage is 11.8mV, reversing voltage is -9mV
<lekernel> lol
<wpwrak> the small voltage difference between INIT_B and PROGRAM_B suggests that they don't act much like diodes ...
<wpwrak> hehe ;-)
<aw_> now I am going to take apart this diode and measure it again
<wpwrak> they're 0R ! ;-))
<lekernel> ah, you did not take it apart?
<wolfspraul> is there a chance we are wearing out diodes, or is that impossible?
<lekernel> you should not measure mounted diodes, this gives wrong measurements in most of the cases
<wpwrak> wolfspraul: highly unlikely
<wolfspraul> ok, scratched off
<lekernel> wolfspraul, with the small currents and voltages they are supposed to handle here, if they do wear out, they're probably counterfeit pieces of crap
<wpwrak> wolfspraul: and i think these are quite sturdy
<wpwrak> wolfspraul: you could probably wear them out with thermal abuse, though. should still take quite an effort, though
<wolfspraul> can we check the one we took off as well (between reset and FLASH_RESET_N)?
<lekernel> we had something weird happening on rc1. Adam reworked two video chips (reconnecting the pixel bus output correctly), which died with an internal power supply short some ~20s after power applied
<lekernel> exact same problem on the two boards
<lekernel> it was never explained
<lekernel> mwalle did the exact same rework, and it went fine on his board
<lekernel> on rc2 we have applied the change in the pcb layout, and it also went fine
<aw_> wpwrak, yes, you analysis is right, my diode doesn't act as a diode though.
<lekernel> I don't know what went on there
<wpwrak> aw_: you removed it and tested it out of circuit ?
<wolfspraul> aw_: can you check the one you took off as well?
<lekernel> but this looks vaguely similar
<aw_> wpwrak, thanks you catching this. Super! so going to soldering a new one. :(
<wolfspraul> lekernel: where is the similarity?
<wpwrak> lekernel: by the way, where did you find that INIT_B needs to be pulled together with PROGRAM_B in order to have an effect ?
<wpwrak> aw_: (solder new one) wait a minute ... try to boot without diodes first
<wolfspraul> aw_: wait. shouldn't we check the diode we removed as well?
<aw_> wolfspraul, the D16 I took just measure too, it's okay. ;-)
<lekernel> that some semiconductor device shorted itself some time after a rework
<lekernel> wpwrak, we shouldn't need to pull PROGRAM_B low, but the initial PCB layout has the trace, and it's additional work to cut it
<lekernel> plus it should not hurt
<Fallenou> lekernel: are you using gcc 4.5.3 or gcc 4.5.2 ?
<wpwrak> lekernel: i was more thinking of pulling only PROGRAM_B, without INIT_B
<wolfspraul> aw_: boot without diodes first now (see werner's msg)
<Fallenou> will try a diff a out git repo (for rtems) and their cvs head, to try to understand why zlib compiles in their cvs head and not in our git
<lekernel> wpwrak, the xilinx doc says you should use INIT_B to delay configuration
<Fallenou> -a out+on our
<lekernel> but it's not very clear
<wpwrak> lekernel: they seem to say the same about PROGRAM_B
<aw_> since the diode I have to solder its two terminals, so yes, it was not acted. bad...i should have measured its forwarding voltage.
<wpwrak> lekernel: e.g., page 51 (picking a random one), before the section title
<lekernel> "Before the Mode pins are sampled, INIT_B is an input that can be held Low to delay configuration. "
<wpwrak> oh, that's actually the place that talk about NOR. lucky coincidence ;-)
<lekernel> ah, yes, page 51 says both approaches are correct
<wpwrak> so maybe we can leave INIT_B out of the mess. that would help in general
<aw_> wpwrak, yes, without (two diodes), now it boot up and rendering.
<wpwrak> then we only need to coordinate PROGRAM_B and FLASH_RESET_N
<wpwrak> does a happy little dance
<lekernel> is sick of those broken components
<wpwrak> welcome to the wonderful world of hardware ;-)
<lekernel> i've never seen anything this bad
<wolfspraul> lekernel: yes. maybe you should do more software and less hardware :-)
<wolfspraul> aw_: I think next step is to put 2 good diodes back on, and see whether it boots.
<wolfspraul> wpwrak: agreed?
<aw_> wpwrak, init_b = 1.2V, tp37 = 3.3V, tp36 = 3.3V
<wolfspraul> lekernel: if you build something one day, Werner and I will volunteer to help you. no worries :-)
<lekernel> I will admit I have a relatively limited experience with manufacturing, but from what I've heard and seen, this project is by far the one which is hit the hardest by broken/counterfeit components
<wpwrak> wolfspraul: no. i'd get rid of the diode connecting INIT_B. lekernel: aqgreed ?
<wolfspraul> lekernel: no it's not.
<wolfspraul> the one thing we could do better is to throw more money and people at the problem.
<wolfspraul> in hardware parallelism works quite well, unlike in software (MMM)
<wpwrak> aw_: lovely
<wolfspraul> that we cannot do, it is beyond my capabilities
<aw_> wpwrak, you did a happy little dance now? ;-)
<aw_> but really sorry that still this was my fault on diode soldering. :(
<wolfspraul> lekernel: yes, just wanted to say. why broken components?
<wolfspraul> maybe soldering
<wolfspraul> plus we have over 20 boards
<wolfspraul> let's see
<wolfspraul> so what is the solution now? only 1 diode now?
<wpwrak> (parallelism) indeed. adam is our bottleneck here. and will all the workload, he probably doesn't even have time to think about those problems himself, so we're wasting another analyst
<lekernel> yeah, and make sure that one diode will not go bad
<wpwrak> aw_: (dance) well, figuratively. i'm too lazy to get out of my chair :)
<wolfspraul> ok only one diode now
<wolfspraul> aw_: have you put that one (and working) diode back on
<wolfspraul> does the board boot?
<lekernel> but didn't we add INIT_B to fix some intermittent no-configuration problems initially?
<wpwrak> aw_: was it soldering or is the component (diode) bad ?
<aw_> wpwrak, yup..good question. I can't realized this is shorten by bad diode or my soldering though.
<wolfspraul> wait, slow down
<wolfspraul> I still have my question open
<wolfspraul> aw_: did you put 1 diode back on? it boots?
<wpwrak> lekernel: as far as i remember, when we ran into the first set of reset troubles, you said that you had discovered in the xilinx docs that PROGRAM_B alone was not enough, and then the change was made. but i don't remember any observation in the real circuit triggering this.
<aw_> wolfspraul, i haven't put 1 diode back on
<wolfspraul> and also what lekernel just said - didn't we add INIT_B to fix something?
<wolfspraul> aw_: ok, put that back on
<wolfspraul> bring the board into the state that we believe it is perfect
<lekernel> actually, if INIT_B is not needed, it becomes a very simple rework
<lekernel> compared to the initial RC3 schematics, basically install C238 and change two resistors
<wpwrak> btw, i only realized yesterday that the "flash length = 14xxxx" or such message meant that the download hadn't even started. all the time, i had somehow imagined that it had stopped somewhere in the middle.
<wolfspraul> hmm
<aw_> and those two diodes are took apart now and rendering
<aw_> so later if need to soldering diode, i can do measure if it's good before soldering then measure it again after soldering.
<aw_> from now on, any diode i took apart, I won't soldering back. will use a new one.
<wolfspraul> of course
<wpwrak> (simple rework) after doing the complicated one ;-))
<wolfspraul> so wait
<wolfspraul> should Adam put the 1 diode back on?
<wolfspraul> what is the best design we have in mind now?
<wpwrak> the one between PROGRAM_B and FLASH_RESET_N, yes
<wolfspraul> and now lekernel wants to install C238 and change two resistors?
<wpwrak> C238 should already be there
<wpwrak> the resistors should already have been changed
<wolfspraul> ah ok
<wolfspraul> sorry too many details I lost track
<wpwrak> now it's really just a matter of removing the INIT_B diode. and the wire, if you want
<wolfspraul> so... aw_ one diode back on and let's see whether it boots
<wolfspraul> we should approach this carefully, we have over 20 boards with reconfigure/flash/leds dimly lit problems
<wolfspraul> if they all go down to a non-working diode, well, great
<wpwrak> lekernel: how confident are you about not needing R60 ? (RP# pull-up)
<aw_> wpwrak, wait you said removing INIT_B diode?
<wpwrak> aw_: yes. the one that you found to be bad
<wpwrak> aw_: so instead of two diodes, we now only use one
<aw_> wpwrak, so keep D16
<lekernel> the Xilinx docs says clearly the FPGA has pull up resistors, and we observe them with the dimly lit LEDs
<lekernel> so I'm pretty confident about that
<lekernel> as long as there are no glitches, though
<aw_> wpwrak, just keep D16? how about R60?
<wpwrak> lekernel: i don't question that the FPGA has them. just whether we're sure it uses them all the time :)
<aw_> wpwrak, the current 0x39 has R60 10K
<wpwrak> ah yes, that's from an experiment earlier tonight
<aw_> yes
<wpwrak> aw_: let's wait for lekernel's verdict
<aw_> so keep R60 or remove it then boot again?
<aw_> wpwrak, hmm..okay
<lekernel> it should use them all the time, yes
<wpwrak> lekernel: so away with R60 ?
<lekernel> yes, that should not be needed
<lekernel> and it increases the load on the reset IC, which has limited current capability
<wpwrak> aw_: so, no R60 then
<aw_> so i go for: 1. solder D16 back 2. remove R60 3. remove diode between program_b and init_b
<wpwrak> yes
<wolfspraul> only on 0x39 for now
<wpwrak> adam needs a nocturnal twin brother who could then take over power-cycling 0x39 a gazillion times to see if it really survives :)
<wpwrak> but i think "we got him"
<wpwrak> wolfspraul: btw, good cluster analysis connecting reconfig failure with usb-jtag
<wolfspraul> well. I'm not so sure about this yet.
<wolfspraul> all of these problems and troubles just because of bad soldering?
<wpwrak> could be. we don't know yet what exactly happened with those diodes
<wpwrak> it's unusual for diodes to fail this way
<wolfspraul> let's see
<wolfspraul> let's assume 0x39 boots now
<wolfspraul> then what? we do 20 render cycles (without CRC checks in between, just the 30 second rendering and power cycle)
<wolfspraul> assume that works well too
<wpwrak> yeah, do a few cycles, check the crc at the end. do a reflash with usb-jtag, just to confirm this is fine, too
<wolfspraul> on all boards with flash/dimly lit/reconfig problems, we remove the diode between program_b and init_b? and check the other diode (onboard, as much as that is possible)?
<wolfspraul> because now we say we don't even want the program_b/init_b diode anymore?
<wolfspraul> so instead of checking it, we just remove it
<wolfspraul> right?
<wolfspraul> and if that fixes those boards, or a large number of them, then we conclude this to be a design improvement and remove the diode between program_b and init_b on all 90 boards?
<wolfspraul> do I roughly understand this right?
<wpwrak> sounds good to me
<wolfspraul> and we also check (again, if practical onboard) the correct functioning of the remaining diode on all boards
<wolfspraul> before lekernel said they cannot be checked while mounted
<wpwrak> if the diode itself has issues, we may also need to check the other one
<wolfspraul> oh sure
<wolfspraul> so - can it be checked onboard or not?
<wpwrak> depends :)
<wolfspraul> or can a check at least give some indication?
<wpwrak> you can inject a little probe current and see what happens
<wolfspraul> I somehow doubt that all problems will magically go away by removing and checking diodes.
<wolfspraul> but ok, maybe they will :-)
<wpwrak> sometimes the diode is more or less isolated, in which case you can test i in-circuit. sometimes other things in the system will happily act in its stead, and all you get is confusion.
<wpwrak> the cluster may go away :)
<wolfspraul> oh I'm sure some isolated cases will pop up
<wpwrak> do we know what specifically caused the high current consumption of some boards ?
<wolfspraul> but my main concern is to finally have a known-good design and test I can 100% trust, so that the board won't fail a few tests after I stopped testing
<wpwrak> yeah
<wolfspraul> ok I'm out for about 30 min, reading backlog when back
<wpwrak> me too :)
<wolfspraul> take enough rest there, and thank you so much for all your help!!
<wolfspraul> aw_: I'm back in 30 minutes
<aw_> wolfspraul, okay
<aw_> wpwrak, shall we change back to high speed though?
<lekernel> aw_, no, stay in full speed
<lekernel> we don't want to take care of any USB/JTAG issues right now
<aw_> mmm..good
<aw_> now i start to count again
<aw_> 5 already
<aw_> crc check between rendering 30 seconds
<wpwrak> back
<wpwrak> wolfspraul: np :) it's fun to finally kill those gremlins :)
<aw_> i didn't reflash 0x39 again
<wpwrak> aw_: if the CRC is right, it's good :)
<aw_> now 10 times already with crc checks between rendering.
<wpwrak> aw_: at the end, we can do some reflash tests, to confirm that this works, too
<aw_> yes i watched it
<aw_> wpwrak, okay
<wpwrak> one mystery remains: why did the board ever work, with that evil diode misbehaviour ? points to a somewhat scary failure model. but we'll see ...
<wpwrak> lekernel: when the M1 is fully up and running, is there some easy way to force a flash reset ?
<aw_> yes, i also didn't realized though it's in success before. :(
<aw_> wpwrak, gui has a "reboot" btn which can let flash reset. ;-)
<aw_> wpwrak, i tired to capture tp37 about reset waveform to you before. ;-)
<aw_> 15
<wpwrak> hmm, what i'm looking for is a flash reset without system reset. to test whether the diode between FLASH_RESET_N and PROGRAM_B is okay. after all, it's the same component ...
<wpwrak> (test in boards that seem "okay")
<aw_> 20
<wolfspraul> I think Adam can remove the crc check between each render cycle
<wolfspraul> that wasn't part of the testing before, and doesn't imitate user behavior either
<wolfspraul> we can do the crc test after 100 render cycles
<wolfspraul> wpwrak: what do you think?
<wpwrak> it's probably safe to do so. doesn't hurt to have it either, though. you never know what you may find ;-)
<wolfspraul> it costs time
<wpwrak> not that'd expect to find anything
<wolfspraul> yes
<wolfspraul> so remove
<wpwrak> yes, it does :)
<wolfspraul> I don't believe in the NOR corruption story anyway
<wolfspraul> so one test after 100 pure render cycles is enough
<wpwrak> seems that was rc2 only
<wolfspraul> if that really shows a crc problem I eat my words :-)
<wpwrak> rc3 has new excitement to offer :)
<wolfspraul> aw_: you can remove the crc check in between render cycles
<wolfspraul> do the render cycles only
<wolfspraul> and you can do _ONE FINAL_ crc check after the last render cycle
<aw_> okay...final crc check at last
<aw_> so now those values: D16 still there, R30 = R157 = 10K, C238 = 220pF, removed R60 and program_b/init_b diode
<aw_> 25
<wpwrak> aw_: do you have the link to your fix2 schematics ? (with the component numbers)
<aw_> need to modify this later if we surely this
<wpwrak> thanks !
<wolfspraul> aw_: wpwrak please let's give this a new name then, like fix2b, or anything, but it must be a new name
<wolfspraul> not fix3 either because we had that already
<wpwrak> i think we went up to "fix4" ;-)
<wolfspraul> I propose fix2b
<wolfspraul> or fix2a, but somehow I like fix2b better
<wpwrak> 2a, 2b seem to be available
<wolfspraul> ok so the new one is fix2b?
<wpwrak> find with me
<wolfspraul> ok
<wolfspraul> I think we can already say that 0x39 is fine now (with fix2b applied)
<wolfspraul> as a next step, I propose that adam applies fix2b to a number of other boards and we look at the results
<wolfspraul> for testing, if he can make it to the render cycles, he should do 10 full render cycles, but without crc checks in between (maybe one at the end is not bad)
<aw_> 30
<wolfspraul> so I try to select a list of boards now for fix2b, my proposal
<wpwrak> (fix2b to other board) yes, do the cluster (or part of it)
<wpwrak> we still need to have an idea of what went wrong with the diode
<aw_> mm...i need to note fix2b in .ods now. phew~
<Fallenou> lekernel: our rtems git repo has a different cpukit/zlib/zconf.h.in than the one in their CVS head, we don't have the definition of z_off64_t but they have it
<wpwrak> could be: 1) bad soldering (short is outside the diode), 2) component in a constant bad state (which may itself have variations), 3) component degenerating
<wpwrak> 3) would be a problem, because we then can't be sure D16 won't act up in the firld
<Fallenou> lekernel: I guess we just need to sync our zconf.h.in with theirs, will try to do a patch for that
<wpwrak> s/firld/field/
<lekernel> Fallenou, if it was changed recently, it should just be a matter of git-cvs update then
<Fallenou> lekernel: well you can try, or we can just cherry pick this file
<wpwrak> 2) has two branches, 2.1) component experienced excessive stresses in rework, 2.2) component arrived at rework in a bad state (fake, production error, box left in the sun, etc.)
<wpwrak> lekernel: when the M1 is fully up and running, is there some easy way to force a flash reset ? (without resetting the whole system)
<lekernel> not at the moment
<wolfspraul> 0x32 0x34 0x39 0x3A 0x3C 0x40 0x48 0x54 0x55 0x5C 0x61 0x63 0x6B 0x6C 0x77 0x7A 0x7D 0x7F 0x85
<wolfspraul> 19 boards
<wolfspraul> they all have a history of d2/d3 dimly lit, cannot reflash or cannot reconfigure. some of them worked before, some not. none have passed all tests.
<wpwrak> lekernel: P22 (FLASH_RESET_N) is driven high when the M1 is up ? or just pull-up ?
<wolfspraul> I think we should apply fix2b to those 19 boards, then look at the results
<wpwrak> wolfspraul: another thing to look for: boards that never had d2/d3 dim but that failed USB JTAG. (if there are any with this combination)
<wolfspraul> somehow I still cannot imagine all this going back to 'bad' diodes
<wolfspraul> wpwrak: what do you mean with "failed usb jtag"?
<wpwrak> that flashing the NOR got stuck
<wolfspraul> yes they are in this group
<wolfspraul> I threw them together with the ones that worked before and then failed now
<aw_> wolfspraul, so that's the next step on 19 boards firstly to apply fix2b?
<wpwrak> ok, perfect
<wolfspraul> aw_: ok let's be precise
<wolfspraul> first we maintain a production focus
<lekernel> driven high
<wolfspraul> aw_: you are testing 0x39 now, if 100% is fine and pass, it goes to 'available' state
<wolfspraul> to be safe, you can write "avail - fix2b" :-)
<wolfspraul> so we remember that it has a fix2b applied
<wpwrak> lekernel: hmm, then we either need a way to command a NOR reset. or see if the diode can be tested in-circuit.
<wolfspraul> aw_: have you finished 0x39 ?
<aw_> wolfspraul, 0x39 was tested by test image successfully just rendering failed then
<wolfspraul> rendering is part of the full test program
<aw_> wolfspraul, yes. so now I fill it as 'avail - fix2b'
<wpwrak> lekernel: that is, if we even care to separate FLASH_RESET_N from PROGRAM_B :)
<wolfspraul> great
<wolfspraul> how many cycles did you do?
<aw_> 30 times only
<wolfspraul> ah ok
<wolfspraul> yes that's enough
<wolfspraul> so yes, I propose to work on those 19 boards I listed
<lekernel> we have to separate flash_reset from program_b; the fpga does reset the flash on a software reset
<wolfspraul> in this way:
<wpwrak> wait ... 0x39: final crc check and then reflash via USB-JTAG
<wpwrak> just to confirm that all is well
<wolfspraul> 1) first, make sure there are no other known bugs on the boards (like usb)
<wolfspraul> 2) apply fix2b, and also check that the remaining diode is good (if possible)
<wolfspraul> 3) reflash and run test software and run 10 render cycles (only 1 crc at the end)
<aw_> wpwrak, 0x39 30 times crc is okay
<wolfspraul> 4) hopefully set them all to "avail - fix2b"
<wolfspraul> :-)
<wpwrak> lekernel: and it wouldn't be happy if the sw reset also causes a reconfig (?) ... or at least we don't want to tempt fate
<wolfspraul> aw_: no wait
<lekernel> no, software reset shouldn't reconfig
<wolfspraul> Werner also wants to reflash again
<wolfspraul> 0x39
<wolfspraul> so just run reflash_m1.sh
<wolfspraul> then boot once to test that it renders, then done
<aw_> wolfspraul, okay, let's reflash it again. and check crc again and rendering. ;-)
<wolfspraul> ok, perfect
<wpwrak> lekernel: okay. maybe we can just probe sw reset. that should be clear enough evidence about the diode's health.
<lekernel> sw/flash reset stays asserted when the 3 pushbuttons are held, btw
<aw_> 0x39: reflashing...
<wpwrak> lekernel: if sw reset did cause a reconfig as well, would this be easy to notice from the outside ? (without scope, just looking at the M1)
<wolfspraul> btw, do we still want to do the 4.4V reset ic rework?
<lekernel> yes, it would turn off
<wpwrak> lekernel: ah, excellent
<lekernel> wolfspraul, if the current solution works, then no because it takes time
<wpwrak> wolfspraul: i think it makes sense, because the current reset solution does not guarantee that all the rails are good
<wolfspraul> I think if extensive testing shows that everything is stable, at least for myself I don't need the 4.4V reset ic rework only because that makes the circuit better conform to the datasheet voltages.
<wolfspraul> :-)
<wolfspraul> 2:1
<wpwrak> wolfspraul: although i don't know if you want to rework or just rc4 :)
<wolfspraul> well
<wolfspraul> those are separate things
<wolfspraul> first I want to make an rc3 of good quality that I can support
<lekernel> after those boards are out, you can try. but please, not before.
<lekernel> enough delays
<wpwrak> i concur
<wolfspraul> if the only reason is that we noticed an 'out of spec' situation, that's not enough
<wolfspraul> sorry but the chip has to handle that :-)
<wolfspraul> since our testing did not show problems
<wolfspraul> testing results win here, imho
<wpwrak> i would also prefer not to have it as a rc3 rework, because it introduces the risk of bridging 3V3 and 5V
<wpwrak> s/as a/as a general/
<wolfspraul> ok
<wolfspraul> so I am still cautious about this whole fix2b and diode magic
<wolfspraul> but we see
<lekernel> yeah, that too
<wolfspraul> I need solid test results then we can start selling
<wpwrak> so my proposal would be to rework one board with 4.4 V at a suitable time, confirm that this doesn't wake any gremlins, and then make it an rc4 feature
<wolfspraul> oh you bet
<wolfspraul> we need to look at the gates and second reset ic anyway, for rc4
<wolfspraul> but that is separate from finishing rc3
<wpwrak> yes :)
<aw_> reflashed successfully
<wpwrak> about the diodes .. where do they come from ? friends in shenzen ? :)
<wpwrak> aw_: champagne time ! ;-)
<aw_> CRC is okay
<wolfspraul> ask Adam about source, I'm not sure whether they are in the bom/wiki
<wolfspraul> but don't always blame the source, you know how many reasons for problems there can be (you listed some above yourself)
<wolfspraul> and amazingly, hello murphy, it's always the unexpected one that hits you, no?
<aw_> rendering done...
<wolfspraul> ok, enough, 0x39 is 'avail - fix2b'
<wpwrak> yes, but it still seems odd. adam's visual inspection didn't show any solder bridges. and he measured after unsoldering, which would further remove any bridges (or make existing ones easier to spot)
<aw_> wpwrak, this diode is the one that BEN used, BEN was produced in China
<wolfspraul> aw_: did you see the plan #1 - #4 for the 19 boards I selected?
<wolfspraul> ouch
<wpwrak> and diodes don't overcook easily. or degrade just like that.
<wpwrak> heh ;-)
<wolfspraul> aw_: maybe we get new diodes one of these days ;-)
<wpwrak> do we have any diode problems in the ben ? :)
<aw_> but this part was original designed by Taiwan company though...so i got them while i producing AVT2. ;-)
<wolfspraul> no
<wpwrak> makes me think of charging problems ...
<wolfspraul> anyway pinpointing the real root cause is difficult
<wolfspraul> aw_: plan! :-)
<wolfspraul> did you see my steps #1 - #4 above?
<wpwrak> i'm worried about D16
<aw_> wolfspraul, yes saw #1 - #4
<wolfspraul> I propose this for the 19 boards I selected
<wpwrak> well, if what happens if D16 transmutates into a 0R is just that a "sw reset" powers down, that won't be a catastrophic failure. so this could be considered an acceptable risk
<aw_> okay
<wolfspraul> 0x32 0x34 0x39 0x3A 0x3C 0x40 0x48 0x54 0x55 0x5C 0x61 0x63 0x6B 0x6C 0x77 0x7A 0x7D 0x7F 0x85
<wolfspraul> and Werner is right - how do we make sure D16 works well?
<wolfspraul> aw_: any ideas?
<wpwrak> so all that would need to be done about D16 is to test whether it works now (procedure TBD), and if it does, go ahead. else, replace, etc.
<wolfspraul> can we order new diodes locally in Taipei? (i.e. tomorrow)
<aw_> wpwrak, how about I measure D16's forwarding / reversing voltage while I test those 19pcs firstly
<wolfspraul> it seems Werner and Sebastien think that is not possible or worthless
<wpwrak> aw_: can you measure D16 in-circuit ?
<aw_> wpwrak, yes. i just checked in-circuit with D16 on 0x39
<wpwrak> okay. then that's probably good enough.
<wpwrak> we can do fancier tests, but they also have more moving parts.
<aw_> wpwrak, but I do really don't know why 0x39 have passed in power-on sequence, since I measured them before I reflashed after first time reworks
<wolfspraul> you measured both diodes earlier?
<wpwrak> aw_: yes, the whole thing is very strange
<aw_> wolfspraul, but fro your analysis above is that i could much probably let diode like as short enough while soldering
<aw_> sorry to wpwrak
<wpwrak> aw_: could the diode have experienced mechanical stress from the wire going around the board ?
<wolfspraul> well
<wolfspraul> I think the next step is fix2b on those 19 boards
<wpwrak> will the wire also be removed ? (as part of fix2b)
<wolfspraul> of course we are not suicidal. if after 3-4-5 we find out it's not right, we pause to think.
<wpwrak> hehe :)
<wpwrak> i hear a "not yet" :)
<wolfspraul> wpwrak: wire removed? that's not clear?
<wolfspraul> aw_: will be wire be removed or not?
<wolfspraul> :-)
<aw_> wpwrak, so i would think it's a component degenerating by my soldering its two terminals(one is program_b soldering, the other is init_b soldering), so TWICE soldering on diode. ;-)
<wolfspraul> I thought that would be so clear it's not worth mentioning, now Werner is asking :-)
<aw_> wolfspraul, I'll remove wire too.
<aw_> wpwrak, diode is in reel I have on hand now
<wpwrak> (wire) alright. no FM antenna ;-)
<wpwrak> aw_: did you solder all those diodes ? or did they do some of them at the SMT fab ?
<wolfspraul> I think slowly I can start as a daredevil PE, maybe in China.
<wolfspraul> I wouldn't hesitate to keep that line running...
<wolfspraul> he he
<wolfspraul> and over time I might even find out a bit about all this strange soldering and circuit stuff
<aw_> wpwrak, D16 was mounted by SMT factory, I soldered all program_b/init_b diode. ;-)
<wolfspraul> after some years in China maybe I can upgrade to Taiwan
<wpwrak> aw_: maybe your soldering iron is running too hot ?
<wolfspraul> don't mention that
<wolfspraul> it's probably on max
<aw_> wpwrak, set 325 degree
<wpwrak> pheeew ....
<wolfspraul> :-)
<aw_> my max can be 425
<wpwrak> yeah, i sometimes go a lot higher too. 370 C if a component is really acting up
<aw_> but this you know , soldering in even less than 1 second on diode terminal. ;-)
<wolfspraul> my 2 hands cannot count the number of foreigners that came into our Taipei labs and eventually to me complaining that they ruined their boards because they use the irons with 'crazy hot' settings that the locals had flying around there...
<wpwrak> yeah, if you're quick, then hot should be fine
<wpwrak> ;-))))
<aw_> wpwrak, well...not to explain though...it's real great that you caught this, so tomorrow, i'll still measure it's forwarding/reversng voltage in-circuit after soldering.
<wolfspraul> aw_: I think we have a solid plan for tomorrow
<wolfspraul> after the first few boards, we double-check the results
<aw_> wolfspraul, we may classify them tomorrow later, you know their failures are different, but we can get results tomorrow. ;-)
<wolfspraul> yes but I grouped carefully
<wolfspraul> those 19 should be interesting
<wolfspraul> I do not expect all 19 to work
<wolfspraul> but I want to see how far this fix2b can take us
<wolfspraul> since we are planning to apply it to all 90 boards (!)
<aw_> alright
<wolfspraul> aw_: if you see anything wrong with the plan, correct it
<wolfspraul> if are much closer to the real problem
<wolfspraul> for example if you want to finish the usb fixes first, do it
<wolfspraul> you must keep a calm head and overview...
<wolfspraul> btw, I find it amazing that we first thought we need this diode (and long wire), but now it seems we don't?
<wolfspraul> how is that possible? are we sure we don't need it? :-)
<aw_> to apply fix2b, I'll select them to do whole all items again to make sure my removal of diode and wire are good
<aw_> also need to clean after reworks though.
<wolfspraul> ok
<aw_> so far now no found steps wrongly in your steps
<aw_> but I would refill those results in whole one row. hope these boards are good news tomorrow.
<wolfspraul> ok
<wpwrak> wolfspraul: i think the long wire was lekernel getting lost in the twisty little maze of xilinx documentation, and the place in the docs that most specifically refers to this function states reasonably clearly that we don't need to worry about init_b. but let's see if sebastien changes his mind :) if xilinx docs are inconsistent about this issue, we may still need to do something. but for now, it seems that init_b (as of fix2) doesn't ne
<wpwrak> ed to be connected.
<wolfspraul> ok got it - thanks!
<wolfspraul> well then, tomorrow is another interesting day in rc3 history
<aw_> wiki 0x39 notes updated
<aw_> wpwrak, thanks for your great helps tonight. ;-)
<wolfspraul> btw - ALL PARTS are now in Taipei!
<wpwrak> aw_: thanks for doing all those experiments ! :)
<wolfspraul> everything
<wpwrak> whee ! :)
<wolfspraul> all accessories, box, labels, cases, leaflet, stickers, everything
<wolfspraul> doesn't make Adam's life easier unfortunately
<wolfspraul> :-)
<wpwrak> henceforth, August 16 shall be celebrated as convergence day in the empire of Qi :)
<wolfspraul> wait wait
<wolfspraul> I want to see this in the test results
<wolfspraul> what I see there now is still a big mess, and some hope
<aw_> go on
<wolfspraul> no that's all
<wolfspraul> 'wait wait' for Werner's celebration
<wolfspraul> :-)
<aw_> okay. ;-)
<aw_> thanks again and night!. ;-)
<wpwrak> naw, celebrate convergence day today, maybe diode day tomorrow :)
<Fallenou> lekernel: I copied CVS HEAD cpukit/zlib/zconf.h.in to a fresh git clone of milkymist rtems, it builds properly, do I commit & push ?
<lekernel> no, I will try a proper CVS upgrade before
<Fallenou> ok, should solve the issue
<lekernel> if it does not I will use your patch
<lekernel> is done writing a milkymist article for xcell
<lekernel> since it seems open source people do not care about/are afraid of fpga's, let's see if fpga people care about open source *g*
<roh> lekernel: the opensource people care about fpgas but are not motivated to fight against windmills
<kristianpaul> they just care abou their fancier IDEs and writing in vhdl ;)
<kristianpaul> (commented biased from my side of course)
<lekernel> roh, ?
<lekernel> you mean the proprietary tools, right?
<kristianpaul> also i wonder what they actually scared of, i mean, i had heard floss related people talking about opencores and openscarc as the path for "fpga freedom"
<lekernel> just fucking do them
<lekernel> GCC, for all its faults, was still great work for its time
<roh> lekernel: opensource means open toolchains. without that you need open docs to write some. thats why opensource is great on basically all cpus/soc with 'available' (not neccessary fully open) documentation and not on stuff you need to reverse first
<lekernel> oh, altera published tons of stuff lately
<roh> see nvidia drivers. same problem. without open docs/specs supports sucks and isnt anywhere near 'production grade'
<kristianpaul> roh: but thats a mental barrier you dont need a floss compiler to start coding or doing something
<lekernel> and for xilinx you have xsl
<lekernel> xdl
<roh> kristianpaul: wrong.
<kristianpaul> why? and in wich part
<kristianpaul> look mm1 soc
<kristianpaul> yes it uses XST, but you can see testbench uses cver and iverilog
<roh> kristianpaul: sure its kinda mental, but as somebody who worked with commercial environments and or dependent on parts of it, i can tell you: never ever again. not worth my lifetime.
<kristianpaul> step by step
<kristianpaul> roh: my first words were about IDEs remenber? :)
<lekernel> can we stop here? if you want free FPGA tools, then write them. period.
<roh> kristianpaul: so my point is: people WILL and DO opensource in anything which makes them able to solve their problems. using binary toolchains is a showstopper. ide's do not count.
<kristianpaul> fpga = hardware thats scare more than one for sure :)
<roh> lekernel: sure. give me all the needed specs and docs and a warranty that i do not need to start over when xilinx has a bad morning and does a new chip with everything different.
<kristianpaul> no roh , thats jsut conding practives
<roh> kristianpaul: no. i think thats false. people are not scared by hardware at all.
<kristianpaul> pracices, as whe you mix you code with dark/propietary libs
<kristianpaul> could be--..
<kristianpaul> may be they need a kick start?
<kristianpaul> more tutorials, friendly people and such
<kristianpaul> as Fedora in software side i mean
<kristianpaul> well, just another guess..
<lekernel> roh, http://rapidsmith.sourceforge.net/, altera quip, debit, etc.
<roh> kristianpaul: my point is: in MY (and propably most other opensource peoples) perspective (which comes from experience) its not helping but slowing down development if your tools are either broken, closed, costly or badly documented.
<roh> lekernel: and thats completely free and production grade? can you build a mm1 with these tools?
<kristianpaul> well i always heard people saying bad words about gcc and still a sucess :)
<kristianpaul> s/sucess/been used
<kristianpaul> :p
<lekernel> roh, you asked about info about fpga internals. so here they are.
<kristianpaul> i think zumbi too ;)
<roh> my point is: opensource people USE opensource tools to create more sw/hw . they do fix bugs in tools here and there but they usually are not motivated enough to do N projects but ONE. means developing a toolchains is not their interrest or something which they find interresting.
<kristianpaul> Was a small discusion last day about difference bitween bitstream from different vendors
<roh> lekernel: i did ask for DOCUMENTATION. not some weird java tool.
<kristianpaul> documentation about?
<lekernel> then they should stop using x86 CPUs, Intel DRAM controllers and what not :-)
<roh> dont get me wrong. i try to explain the whys and not the 'could be done's
<lekernel> well I already have had that discussion. it's boring. free tools = jfdi.
<roh> lekernel: x86 is actually quite well documented and understood (compared to the different fpga archs)
<roh> lekernel: yes. and you need to understand that most b
<roh> people are NOT willing to waste their lifetimes writing compilers and such. thats a VERY small number of people who find that interresting at all.
<kristianpaul> jfdi = just find who can do it ;)
<roh> its boring technology which is neccessary but not somehting to use your time on for most. its a tool. like a wrench. its there. use it. if you need to buy a complicated one or expensive one you will use a screw which can use the free and well known tool and not the expensive one.
<kristianpaul> roh: is really unfair compar and asic (x86) with a FPGA
<roh> kristianpaul: its not about fair. its about reality.
<roh> kristianpaul: if you can solve your problem with an fpga or some soc with gcc support, people WILL choose the latter. even if the soc itself is blackbox.
<roh> as long as there are interface specs/docs people are fine with that.
<kristianpaul> yes, of course (solve problems)
<lekernel> there are interface specs: the fpga does what the standard verilog code tells it to do :-)
<roh> lekernel: well.. only with things outside the hw (binary tools)
<kristianpaul> nocks lekernel
<lekernel> that's not fundamentally different from a CPU scheduler or a DRAM control algorithm
<kristianpaul> i disagree last words from you roh , as for example you can have basis plaform to start with
<kristianpaul> is my point about comparing fpga with asics
<kristianpaul> at the end you need a hardware that works
<kristianpaul> like coming mm1 rc3 it seems :)
<roh> also its a question of complexity. an fpga costs you not only a lot of money and extra (complicated) tools and code/thinking it also needs quite some overhead to work.
<kristianpaul> an yes as wpwrak pointed before, and floss sinthesis may blow out lots of barriers
<kristianpaul> but i think still way to do around verilog, tests benches, automated soc buils scripts?
<kristianpaul> and lots of other fields
<lekernel> in this project, the fpga costs less than half of the case
<kristianpaul> roh: lazy! ;)
<roh> on a typical mcu nowadays you need mostly caps and a powersource. they can even run without crystals and such. have internal flash, easy to use isp/debug possibilies and some decent amount of ram. fpgas still feel like the 8051 times of mcu.
<kristianpaul> roh: just kiding of course :)
<kristianpaul> all take time, and yes fpga world have its own learning curve :)
<roh> kristianpaul: heh.. i am just not interrested enough by stuff i find boring details when there is so much more interresting real problems to solve out there.
<kristianpaul> sure, is a respectfull position
<roh> lekernel: the chipcost itself is negligable. the cost for development of code and support is so much more than for a mcu that whoever can, WILL avoid using one.
<roh> negligable atleast for our amounts of sales. can change if you sell 5 digits or more.
<kristianpaul> wolrd is already done why cares to do more, when there is a big sea to navigate :)
<wpwrak> interesting discussion. is there a topic/direction or is this just the IRC equivalent of a bar brawl ? ;-)
<kristianpaul> i think last :)
<kristianpaul> and is monday :)
<wpwrak> lekernel: (jfdi) i hope that doesn't mean you've lost interest in continuing with llhdl
<roh> wpwrak: *g* .. i am not trying to dicuss something. i am trying to explain a pov i seem to share with loads of other devels from the mcu and opensource side.
<kristianpaul> or event interesting in continuing milkymist project?? :-(
<kristianpaul> interest**
<lekernel> fuck all those developers, that's why I'm stopping hacker conferences now and write for xcell instead
<lekernel> (among other things)
<roh> so its not 'the opensource hackers are not interrested in fpgas'. they are just annoyed enough by devices with nonfree toolchains by experience to avoid them at ALL cost.
<kristianpaul> be friendly and they will come :)
<kristianpaul> friendly and skilled is powerfull combination
<roh> lekernel: maybe you can explain what happend that you are annoyed?
<kristianpaul> if somedy can say jfdi it because at least know how to do it but dont want, so you can provides guideliness for others dot it, even if you dont doa  single line of code
<kristianpaul> no just "fuck" then away... :/
<lekernel> roh, as I said: many hacker/open source people are afraid with this stuff. they obviously prefer blinky-LED arduino gadgets instead. that's why i'm slightly annoyed. it's all.
<wpwrak> lekernel: so what are your plans with llhdl ? was that an affirmative silence, before, i.e., have you lost interest in it ?
<roh> lekernel: maybe you need to differenciate between the different levels of development. you do the basics on another level
<lekernel> wpwrak, no, just try to speak to different people about it.
<roh> most want to solve their problem when developing something as cheap and simple as possible. not as free as possible. and in the end it doesnt matter if the chip is from xilinx or whatever (nxp? atmel? whoever builds soc), you WILL buy a 'chip' which is closed.
<wpwrak> lekernel: ah, good :)
<roh> then the equation is 'free tools' or 'complex closed tools' atm. and THATS what matters.
<wpwrak> roh: i think you're barking up the wrong tree
<wpwrak> roh: you should complain to xilinx, alteros, lattice, etc. convince them that they could sell more fpgas if they open their tools
<roh> wpwrak: i know. i am trying to explain. i also want free fpga tools. but i also do not want to develop one myself if i can solve my problem in MUCH less work.
<wpwrak> roh: lekernel is already working in the "right" direction
<kristianpaul> good point
<roh> wpwrak: i am not complaining. i am describing what the reasoning of developers is what tools to learn to use and where to use their (often quite limited) time
<kristianpaul> well lattice alread move forward a bit, time to push xilinx.. but how?
<wpwrak> roh: well yes, that much is pretty obvious, isn't it ? :)
<roh> wpwrak: sure its the right direction. dont get me wrong. i fully support the way we are going.
<roh> wpwrak: i learnt that every obvious which needs a transformation in thinking is already something which needs expaining from time to time.
<wpwrak> roh: i think you're addressing the wrong audience ;-) do you really think anyone here _likes_ closed tools ?
<roh> opensource devels see chip vendors just as 'producers of something which needs tools too expensive to buy yourself' and are happy that the numbers sold make them cheap
<lekernel> kristianpaul, (guidelines) you can see that i'm doing exactly that http://www.ohwr.org/projects/ohr-meta/wiki/OHWorkshop
<roh> chipvendors sometimes understood that (nxp, atmel) and sometimes not (xilinx)
<wpwrak> roh: i think FPGA jargon gives a pretty strong hint of how people in that biz think. they don't have "software", they have "intellectual property" ;-)
<roh> wpwrak: well.. duh. you see their fault? :)
<wpwrak> roh: the typical FPGA customer wants to be closed. we're the exception. the typical EE is happy with closed tools on windows. and so on
<roh> wpwrak: thats because fpga live from a 'no other way to do' market not from a broad spectrum of possible users.
<wpwrak> roh: i wouldn't look at them for open tools. at the moment, there's little motivation for them. and it would probably very difficult for them to open their tools, because they may not even own all the necessary rights to do this.
<roh> also commercial devels choose a mcu if possible. simply because the amount of money they need to pay their devs to 'make it work' is much less than for a fpga based project.
<wpwrak> roh: fpgas target a higher end market, yes. if an MCU will do, you don't need an FPGA.
<roh> wpwrak: every usecase done by a fpga would be done by a specialized mcu if there would be the number of users to make it worth doing the 'chip' for it. and that also happens from time to time. see high end routers.
<kristianpaul> lekernel: (ohwr) oh, i havnt noticed it :), looks evry interesting, pleaser record your talk for the far away people :)
<kristianpaul> s/pleaser/please
<roh> stuff cisco did in fpgas 10 years ago is now done by a 3$ silicon from realtek. cisco still uses fpgas.. for stuff where there are no specialized chips for (e.g. routing engines)
<kristianpaul> roh: are you plaing to start that copyleft layer 3 network switch? :-)
<roh> kristianpaul: no. just using that example to show why people use fpga and where in commercial projects. it seems to be a 'noc soc buyable. last resort to make a product build-able at all' case
<lekernel> roh, I never said we never will manufacture a milkymist asic. in fact, a large part of the current code should be portable to asics.
<wpwrak> roh: you're overlooking the possibility of using an FPGA for more than some form of ASIC prototyping. i see great potential in partial reconfiguration, adapt the hw for your code. that's a domain that's still pretty much untouched. once synthesis is out in the open (cf. llhdl), work can start in this direction
<lekernel> (i mean verilog)
<zumbi> wpwrak: I was involved in a project like that, reconfigurable FPGA for SDR
<roh> wpwrak: nobody cares abour reconfigurable hw outside of lab equipment or military use to be fair. atleast nobody is willing to pay the extra money that 'feature' does cost.
<zumbi> I wish now I could assist OHR conf
<wpwrak> zumbi: how far did you get ?
<lekernel> everything is possible, that, and free FPGA tools, we just need to get down to it (which also involves generating lots of sales for the asic thing) :p
<roh> wpwrak: maybe that can change with free tools, yes. i sure hope so.
<zumbi> wpwrak: there is an open project, let me search the link
<wpwrak> roh: (nobody cares) well, there's a good amount of basic research that needs doing first ;-)
<roh> also its a question of the power budget. correct me if i am wrong, but afaik an fpga doing the exactly same as an asic based on the same design will eat more watt
<wpwrak> roh: step 1: unlock the secret. step 2: learn. and so on ;-)
<zumbi> wpwrak: I am looking forward for newer Zynq7000 devices
<wpwrak> zumbi: (flexnets) so you design "IP blocks" in the traditional way and then connect them to each other ?
<zumbi> wpwrak: right
<wpwrak> zumbi: what i have in mind would go a little further: generate code and hardware description from the same source
<zumbi> wpwrak: it adapts resources to users, lets say you got a dual BTS with 3G/WiMAX, depending they users you got, you reconfigure the BTS to allocate more resources to the network with more users
<wpwrak> zumbi: e.g., you could write a - maybe C - program that implements some feature at a very low level, bit-banging and so on. then the "compiler" would identify functions that can be synthesized in hardware.
<wpwrak> okay, but it's still at the level of modules
<wpwrak> of course, because you need the heavy proprietary synthesis software to make your bitstreams :)
<zumbi> sure, while free tools sounds attractive, isn't there a free HDL synthesizer done by one of the fellows here
<wpwrak> zumbi: maybe you mean lekernel's llhdl ?
<zumbi> yep
<wpwrak> i think llhdl is a great start. even if it will be relatively primitive, once the whole process is implemented with free tools, it will be much easier to improve the tools.
<kristianpaul> nice, Makefile-driven HDL flow (Pawel Szostek).
<zumbi> wpwrak: were you trying to hint a compiler/synthesizer?
<wpwrak> the pioneering work is always the hardest.
<wpwrak> zumbi: "hint' ?
<zumbi> wpwrak: does such tool exist?
<wpwrak> kristianpaul: death to all IDEs ! ;-)
<zumbi> I have tried Makefile-driven HDL but failed :/
<zumbi> once they upgrade IDE
<wpwrak> zumbi: i don't know. at least nothing widely known. maybe some research projects under NDA, etc. but such secret things usually don't go very far
<wpwrak> we saw this in operating system research. before the Free unices, there were some projects that implemented kernel changes as binary modules for SunOS. Sun were "nice" to academia and let them have the sources, under NDA, of course. and they allowed them to distribute their binaries.
<wpwrak> but that was still difficult to use, and the sources were still closed. so such things weren't really useful.
<wpwrak> now, fast-forward a few years. no kernel research would have much credibility in the days of open source unix if it didn't come with a patch.
<wpwrak> and every once in a while, good work does find its way from academia into real life rather quickly
<wpwrak> e.g., things like RCU and various TCP and scheduling improvements were integrated into Linux fairly quickly. and they're substantial improvements of the art.
<wpwrak> of course, not every linux patch that tweaks the scheduler or TCP is worth a PhD, but i think it's safe to say that research that seeks applicability is in a considerably better shape today than in the dark age of only closed source operating systems (omitting "research" operating systems that had very little scope)
<wpwrak> i hope very much to see the same happen when it comes to FPGAs
<wpwrak> anyway, past 7pm, high time for breakfast :)
<mw|mobile> have anyone revceived my mail to the ml?
<mw|mobile> two times.. ok gn8
<wolfspraul> [reading the backlog] I just checked the m1 box whether it still says 'fpga' outside, and yes - it does. Sebastien told me a few weeks ago that he thought we can remove it, but for some reason I didn't even though now I thought that we had... :-)
<wolfspraul> Sebastien was totally right I think now, we should have removed it. Next batch...
<wolfspraul> fpga is a divisive term, too many people attach too many different experiences and feelings to it. Has nothing to do on the outside of a video synthesizer box.