#milkymist on 2011-08-16 — irc logs at freenode.irclog.whitequark.org

00:35 <wolfspraul> wpwrak: those sound like really interesting ideas (remove D16, second reset ic for FLASH_RESET_N), but the problem I see is in how to test the result

00:36 <wolfspraul> I think that's also the reason why this bug has not been fixed yet. We first need to find a way to reproduce some 'bad' thing consistently, then we fix the 'bad' thing, then we verify that it's gone.

00:36 <wolfspraul> but how is this possible now?

00:39 <wolfspraul> aw: good morning :-) you are early!

00:40 <aw> wolfspraul, good morning ;-)

00:51 <wpwrak> wolfspraul: yes, let's first get some statistical data on 0x39. that one seems to be very good at generating the problem.

00:53 <wolfspraul> wpwrak: should we do 0x39 tests now?

00:54 <wpwrak> aw: and maybe you can check the reset circuit rework on 0x39 to see if there are any obvious issues (such as a reversed diode, capacitor not properly soldered, etc.)

00:54 <wolfspraul> aw: can we do some 0x39 work now?

00:55 <wpwrak> wolfspraul: first, the last status of 0x39 was that it didn't reconfigure, correct ? so the next test should be to see if it does now

00:55 <wpwrak> wolfspraul: then repeat the power-cycling with CRC loop until it stops again (which should be soon, probably less than 10 tries, if past results are any indication)

00:56 <wpwrak> wolfspraul: then try to retrieve the NOR content via jtag. check that it's okay. (that is, if we get this far. if the NOR chip is just messed up, e.g., held in reset, then jtag won't work either)

00:58 <wpwrak> lekernel: btw, does the FPGA assert the pull-up on FLASH_RESET_N during its built-in load process ?

01:00 <wolfspraul> wpwrak: no we will not be able to read nor then (same as yesterday)

01:00 <wolfspraul> of course we have to try

01:00 <wpwrak> wolfspraul: wasn't the failure yesterday a problem with the script ?

01:01 <wolfspraul> no

01:06 <wolfspraul> adam may have gone out, maybe pickup roh's second package...

01:17 <aw> okay...let's try to power on to if 0x39 can reconfigure now.

01:17 <aw> answer is NO after powered -on with whole night

01:18 <aw> so now going to read_flash_m1.sh

01:19 <aw> still stopped 'Bitstream length: 1484404'

01:20 <aw> what's next steps will you suggested? I would like to power off it firstly. ;-)

01:20 <wpwrak> maybe try a few times if you can read the bitstream (power on and off as you see fit :)

01:21 <wolfspraul> aw: can you try to use Xilinx Impact and Xilinx cable? can you use Xilinx Impact to read nor? Or just detect the nor chip?

01:22 <wolfspraul> I'm wondering whether xilinx impact would give us any new clues...

01:22 <wpwrak> ah yes, good idea

01:22 <aw> okay

01:22 <wpwrak> "xilinx cable" = also replaces the usb-jtag board ?

01:22 <aw> although this way from xilinx tool...but not sure if it can work

01:22 <aw> i do only follow the instructions from rc2

01:23 <aw> don't know if they are suitable.

01:23 <wolfspraul> yes remove usb-jtag

01:24 <aw> wpwrak, yes, use 'xilinx cable' can instead of usb-jtag boards

01:25 <wolfspraul> wpwrak: if we had a spare nor chip, I would suggest switching the nor chip to a new one, just to get another data point

01:25 <wpwrak> i'd save that option until the very end :)

01:25 <wolfspraul> but we don't have one, and taking one from another board would insert more variables into an already questionable fact finding

01:26 <wolfspraul> well, it could be interesting

01:26 <wpwrak> add two to the next digi-key/mouser order ?

01:26 <wolfspraul> for example if it then first worked, but after a few cycles fails again

01:26 <wolfspraul> 5 already on the way

01:26 <wolfspraul> let's say it first works, then fails

01:27 <wpwrak> yes, if it works at first ... :)

01:27 <wolfspraul> well

01:27 <wpwrak> it's kinda major rework :)

01:27 <wolfspraul> :-)

01:27 <wolfspraul> data point

01:27 <wolfspraul> I think it's not too hard, no?

01:27 <wpwrak> 56 pins .. depends ...

01:28 <wpwrak> i think my maximum is ~28. doable with chip-quick ;-)

01:28 <wpwrak> but maybe adam has better tools

01:28 <wolfspraul> nah maybe not

01:28 <wpwrak> or techniques :)

01:28 <wolfspraul> but in the factory they are swapping such chips within 30 seconds or so

01:28 <wolfspraul> would need to take a video and watch in slow-motion to see how they do it :-)

01:31 <wolfspraul> but a) we have no spare chip b) Adam may not be able to do the rework that easily

01:33 <wpwrak> in any case, i'd try swapping the nor last. too much can go wrong there. also, it may destroy evidence. e.g., if there's some short

01:33 <wpwrak> or near-short

01:34 <kristianpaul> xray ? :)

01:35 <wpwrak> gamma-ray laser ? :)

01:37 <kristianpaul> hum, whay so fancy?

01:37 <wpwrak> kristianpaul: more threatening :)

01:46 <wolfspraul> sorry, got disconnected

01:46 <wolfspraul> aw: any news? trying xilinx impact?

01:47 <aw> wolfspraul, this is not easy for me now. but have to try

01:50 <wolfspraul> aw: wait

01:50 <aw> 0x39 via sudo jtag: http://pastebin.com/HafSjUGL

01:50 <wolfspraul> if it's not easy, don't do it

01:50 <aw> i am going to see the instructions in rc1

01:50 <wolfspraul> or describe the steps one by one here, either we can make it work or not

01:50 <wolfspraul> no

01:50 <wolfspraul> that sounds like you will disappear for a long time

01:51 <wolfspraul> if the xilinx impact test cannot be done in 5 minutes, don't do it

01:51 <wolfspraul> :-)

01:51 <aw> well...

01:51 <aw> meanwhile should we just send one boards to someone right away?

01:52 <wolfspraul> no

01:52 <wolfspraul> I am looking at pastebin

01:52 <wolfspraul> why did you do 'quit' after 'detect'?

01:54 <wolfspraul> ping

01:54 <wolfspraul> aw: u there?

01:55 <aw> I only know these commands though

01:55 <wolfspraul> ok

01:55 <wolfspraul> let's clarify first

01:55 <wolfspraul> you are currently working on x039

01:55 <wolfspraul> 0x39

01:56 <wolfspraul> 0x39 cannot boot, when you plug in the DC cable D2/D3 become dimly lit?

01:56 <aw> unless else can tell me that i can directly use other commands to dump into file from standby bitstream. ;-)

01:56 <wolfspraul> is that correct?

01:56 <aw> yes, now 0x39, can't reconfiguration when power-ed on

01:56 <wolfspraul> d2/d3 are dimly lit?

01:56 <aw> now it's in this d2/d3 dimly lit still

01:57 <kristianpaul> (pastebin) yes, what happended with detect command ? :)

01:58 <wolfspraul> aw: try this: turn off power

01:58 <wolfspraul> remove jtag-serial daughterboard

01:58 <wolfspraul> power on again

01:58 <wolfspraul> what happens?

01:58 <aw> same d2/d3 dimly lit

01:59 <aw> if someone can lead me to use some commands in UrJTAG tool, there's must data can be dump from flash. not sure if xiangfu know this.

02:00 <wolfspraul> wait

02:00 <kristianpaul> yes :)

02:00 <wolfspraul> of course the commands are here https://raw.github.com/milkymist/scripts/master/scripts/read_flash_m1.sh

02:00 <wolfspraul> we can walk through, but it won't work of course, since the script also doesn't work

02:01 <aw> no not use script, i mean directly used UrTAJ

02:01 <aw> it doesn't work I don't know if the script doesn't work somewhere or UrJTAG itself

02:02 <kristianpaul> http://milkymist.org/wiki/index.php?title=Flashing_the_Milkymist_One#flash_system

02:02 <aw> but if i enter UrJTAG, there's more commands can be used.

02:02 <kristianpaul> detect is easy to remenber

02:03 <aw> good, but now how i can dump standby bitstream?

02:03 <aw> from which address which commands?

02:03 <wolfspraul> waste of time imo

02:03 <kristianpaul> yeah

02:03 <aw> sorry that I very much poor on this

02:03 <kristianpaul> may be a voltage verification

02:03 <wolfspraul> ah good idea

02:04 <kristianpaul> where?, i dont know :)

02:04 <wolfspraul> aw: let's measure some signals :-)

02:04 <aw> so that's why i said that should we send one rc3 board to whom can dump it?

02:04 <wolfspraul> NO!

02:04 <wolfspraul> we are not covering up our incompetence by spreading the problem so thin that we can eventually claim it doesn't exist

02:05 <kristianpaul> aw: what i understand problem right now is not related to NOR content, yet :)

02:05 <wolfspraul> we don't know

02:05 <kristianpaul> sure

02:05 <kristianpaul> with this dimmy lit fpga is detected i guess but problem rais when loading bistream?

02:06 <kristianpaul> raise*

02:06 <aw> wpwrak, do you think that where/or which parts pin's signal I should measure? or I directly rework diode and C238 again?

02:07 <wolfspraul> not rework

02:07 <wpwrak> maybe do this: without power-cycling, put scope probe on TP37 (FLASH_RESET_N), scope set to AUTO, check the voltage and look for any noise

02:07 <wpwrak> then, keep the probe pressed to TP37 and reset or power cycle

02:07 <wpwrak> see if it reconfigures then

02:08 <kristianpaul> dont we have a list somwhere of know voltage expected for TPs (that apply to power suply and such)?

02:08 <wolfspraul> you can start making one with your rc2 :-)

02:08 <aw> wpwrak, stay tuned

02:09 <kristianpaul> good idea !

02:09 <wolfspraul> actually seriously you could provide some reference measurements for wires into or out of the nor flash as well, if you want to help

02:10 <kristianpaul> yes why not, let me check rc2 datahseet for avaliable testpoints

02:10 <wolfspraul> we are a bit asymmetric here. Werner has the clearest mind, but no board. I have a board, but no electrical capabilities. Xiangfu has a board but is hard to reach, Sebastien is sleeping. and so on :-)

02:10 <kristianpaul> yeah..

02:10 <wolfspraul> Adam has a lot of boards but is always worried he will damage something when running this or that software :-)

02:10 <wpwrak> (clearest mind) still with a cold, though :-(

02:11 <wpwrak> next project: an internet-attached alarm clock ;-)

02:11 <wolfspraul> ouch

02:11 <wolfspraul> kristianpaul: here's what werner said "maybe set trigger on OE#, then start with RP#, WE#, DQ0, A0, then do the rest of DQx and Ax"

02:12 <wolfspraul> those are reference measurements around the nor you could do

02:12 <wpwrak> /msg qi-bot wake lekernelÂ Â Â Â ;-)

02:12 <wolfspraul> it's on rc2, but those datapoints may help

02:12 <wolfspraul> well, I actually think Sebastien is thinking a lot about what the root cause could be, but has no striking idea.

02:12 <kristianpaul> ok as soon i can measure with no soldering cables is okay

02:12 <wolfspraul> this thing is really difficult because we can't pin it down

02:13 <wolfspraul> cannot really reproduce in a controlled way

02:13 <wolfspraul> problem appears and disappears without us understanding why it did that

02:13 <wpwrak> yup

02:13 <wolfspraul> it affects > 20 % of boards, at least. maybe with tougher testing even more. We don't know.

02:13 <wolfspraul> we don't know whether some boards have 'genes' that will make them never show the problem

02:13 <wolfspraul> and so on

02:13 <wpwrak> for all we know, it could affect all of them

02:14 <wolfspraul> yes, definitely

02:14 <aw> wpwrak, the TP37 while (d2/d3 dimly lit) is 259mV now

02:14 <wpwrak> aw: by the way, did you do the visual inspection of the reset rework ("fix2") on 0x39 ?

02:14 <wpwrak> that's a reset !

02:14 <aw> yes

02:15 <aw> reset status

02:15 <wpwrak> does it constantly stay at ~260 mV ? or does it change

02:15 <aw> i need to power-cyle to see it

02:15 <aw> but I bet it wil pull high once d2/d3 dimly lit is gone for sure. ;-)

02:16 <wpwrak> do we know at what point in time urjtag load fjmem.bit ? i.e., is there a specific step in the script ? or does it just do it automatically ?

02:16 <aw> wpwrak, TP36 is 120mV now

02:17 <kristianpaul> looks for TP37

02:17 <wpwrak> and TP37 ?

02:17 <aw> TP36 is program_b

02:17 <wpwrak> so we just have a permanent reset. interesting.

02:18 <aw> wpwrak, http://en.qi-hardware.com/wiki/File:M1rc2_powerOnOff_sequences_manuscript.jpg

02:19 <aw> please be noticed that I knew a fact is:

02:19 <wpwrak> (manuscript) yes, PROGRAM_B_2 should be high, not low

02:19 <aw> the DONE pin will be from low to hi to show up fgpa finish reconfiguration.

02:19 <aw> wpwrak, wait

02:20 <wpwrak> DONE shouldn't matter. it doesn't connect anywhere near the NOR.

02:20 <wpwrak> (unless we have some interesting shorts :)

02:20 <aw> i said xilinx guy told me before and i checked the DONE pin which described the duration is "done" once fpga firstly access with flash. ;-)

02:21 <wpwrak> okay, but DONE is TP35

02:21 <aw> wpwrak, wait

02:22 <aw> the INIT_B will start a short duration of LOW and it acts syncronized with DONE pin reversely.

02:22 <aw> can you see that?

02:22 <aw> so my question is:

02:22 <wolfspraul> 'permanent reset' may be a much better description of the problem we see on rc3

02:23 <wolfspraul> at least it fits with the vast majority of test behavior I can think of right now

02:23 <wpwrak> aye. now on to the "why" ..

02:24 <wpwrak> aw:Â Â can you touch TP37 with a 1-10 kOhm resistor to 3V3 ? and see how the voltage changes ?

02:25 <aw> wpwrak, will flash RP# pin acts wrongly while the start situation from reset's IC's output? meanwhile this duration, will standby bitstream acts wrongly if the "start" doens't access well then corrupted somewhere?

02:25 <aw> wpwrak, okay

02:26 <wpwrak> as far as i understand things, PROGRAM_B_2 low should also keep the FPGA in reset. so it shouldn't try to access the NOR at that time

02:27 <aw> wpwrak, it's R60 placeholder.

02:28 <wpwrak> (r60) yes :)

02:28 <aw> wpwrak, you want me attach a 10K while power is ON

02:28 <aw> or solder it after power off

02:28 <wpwrak> just see how the voltage on TP37 changes

02:28 <wpwrak> with/without R60 "placeholder"

02:33 <aw> wpwrak, TP37 is 318mV, TP36 156mV, d2/d3 dimly lit

02:34 <aw> after attached R60 10K

02:34 <wpwrak> okay, so that's not it. thanks.

02:34 <wpwrak> did you check that the diodes have the correct orientation ?

02:35 <aw> yes, two diodes are correct. this board 0x39 surely had have reconfigured if keep it days long.

02:36 <aw> so i don't know if i directly resoldering new parts of them can solve.

02:39 <wpwrak> hmm, tricky

02:40 <wolfspraul> no resoldering

02:40 <wolfspraul> what is the sequence the board goes through now from the moment power is applied on the DC jack?

02:40 <wpwrak> some one is pulling FLASH_RESET_N down. but who ? could be INIT_B_2, PROGRAM_B_2, the reset chip, the FPGA, or something that's not visible from the schematics

02:40 <wolfspraul> does the fpga ever start its configuration sequence?

02:40 <wolfspraul> or it goes into permanent reset immediately

02:40 <wolfspraul> who is in control, in which order?

02:41 <wolfspraul> is the fpga in control at some point? or always forced down from outside?

02:41 <wolfspraul> wpwrak: ah yes, you think in the same direction :-)

02:41 <wolfspraul> who is in control

02:42 <wolfspraul> can't we just measure backwards?

02:42 <kristianpaul> sorry i dont have all TPs needed to be usefull to you now..

02:42 <wolfspraul> flash_reset_n is pulled down

02:42 <wolfspraul> which timespan are we talking about between power-on and permanent reset?

02:42 <wolfspraul> just a few hundred milliseconds?

02:43 <wolfspraul> then we could scope the voltage of FLASH_RESET_N, INIT_B_2, PROGRAM_B_2 and even more and compare them side by side?

02:43 <wpwrak> wolfspraul: (measure backwards) maybe adam can make a little "power probe" :)

02:44 <aw> kristianpaul, yes, the rc2 doesn't have them. sorry that I should have told you first.

02:44 <aw> kristianpaul, thanks though. :-)

02:44 <wolfspraul> wpwrak: which timespan are we looking at?

02:44 <wolfspraul> from power-on to permanent reset

02:44 <wpwrak> aw: do you have throuh-hole resistors around 100 Ohm ?

02:45 <aw> wpwrak, i don't have but tell me you r idea firstly. ;-)

02:45 <wpwrak> wolfspraul: right now, i'm interested in the permanent reset. i think NOR shouldn't be reset in this state. but i'm not 100% sure

02:45 <aw> then I try to get it done

02:45 <aw> wpwrak, tell me firstly your idea. ;-)

02:46 <wpwrak> aw: (idea) solder a wire to 3V3. solder the other end to a ~100 Ohm. connect voltmeter (or scope) to the open end of the R

02:46 <wpwrak> aw: then touch things with the open end. this should do two things: 1) pull them relatively strongly to 3V3. 2) show the voltage

02:46 <wolfspraul> oh totally. with permanent reset we are onto something.

02:46 <wpwrak> or, maybe easier:

02:47 <wolfspraul> I just hope the board shows it long enough :-)

02:47 <aw> wpwrak, go on

02:47 <wpwrak> connect multimeter in DC current measuring mode to 3V3. then touch things with the other probe. e.g., check how much current TP37 can sink

02:48 <wpwrak> points of interest: TP37 (RP#), TP36 (reset chip out), the INIT_B_2 side of R157

02:49 <aw> wpwrak, okay..let's do a surgical operation on 0x39. ;-) stay tuned. :)

02:49 <wpwrak> no surgery. just measurements :)

02:51 <wolfspraul> kristianpaul: can you post your d2 dimly lit situation here?

02:52 <wolfspraul> so you had your rc2 board in a state where d1 was dimly lit and it wasn't detected by jtag?

02:52 <kristianpaul> yes, correct

02:53 <wolfspraul> there may be multiple bugs in this area, and that one may have already been independently fixed on rc3

02:53 <wolfspraul> a lot of 'may', sorry

02:53 <wolfspraul> :-)

02:53 <kristianpaul> sure np

02:53 <kristianpaul> FYI

02:53 <wolfspraul> is your board back to life now?

02:53 <kristianpaul> yes

02:53 <kristianpaul> phew.. :)

02:53 <wolfspraul> ok good

02:53 <wolfspraul> no don't worry

02:54 <wolfspraul> if it breaks, you will get a new one. rc3 even :-) but please don't be reckless because of that :-)

02:54 <kristianpaul> oh no

02:54 <wolfspraul> but also please don't worry

02:54 <wpwrak> wolfspraul: you just gave him a lot of reason to be reckless ;-)))

02:54 <wolfspraul> I am the manufacturer, and I support my stuff.

02:54 <kristianpaul> i still suing it as always

02:54 <wolfspraul> that's why I'm so keen on getting rc3 to a higher level...

02:54 <kristianpaul> actually i just was going to reflash as always and then... omg :)

02:55 <wolfspraul> even this guy hadez and others, I may just end up giving them new rc3. but one by one, first we need to make rc3 at a controlled quality level...

02:55 <wpwrak> kristianpaul: maybe as a warm-up, unsolder the FPGA, re-ball, then solder again ;-))

02:55 <kristianpaul> wpwrak: nah

02:56 <kristianpaul> wolfspraul: i still trusting my rc2 a LOT for now ;)

02:56 <wolfspraul> you can

02:56 <wolfspraul> it's a good board and we worked hard even then. of course we also learnt a lot since :-)

02:57 <wolfspraul> wpwrak: your idea with a second reset ic for FLASH_RESET_N - is it the same 4.4V reset ic as we have now?

02:57 <wolfspraul> I'm just asking in case I should get more parts :-)

02:58 <wolfspraul> with 'have now' I meant 'will have in a few days'

02:58 <wpwrak> wolfspraul: yes, the same

02:58 <wolfspraul> that second reset ic would replace the need for logic gates?

02:59 <wolfspraul> sounds like we have to do a few more experiments before settling on the reset circuit for rc4...

02:59 <wpwrak> it would replace one. not sure about the other

02:59 <wolfspraul> (which I wouldn't mix in with the permanent reset research on rc3 we are doing now)

02:59 <wolfspraul> ok, got it

03:03 <aw> wpwrak, take R60 apart, right?

03:04 <aw> then use 100 ohm to pull high on 3V3 then measure dc current.

03:04 <wpwrak> aw: probably doesn't matter

03:04 <aw> wpwrak, ?

03:04 <aw> still let R60 (10k) on it.

03:05 <wpwrak> aw: naw, let;s do it the simple way:just multimeter in DC current mode

03:05 <wpwrak> aw: measure current between TP36/TP37/INIT_B_2 and 3V3

03:05 <wpwrak> R60 probably has no effect

03:06 <aw> wpwrak, still need 100 ohm as a limited resistor thoug, right?

03:06 <wpwrak> if you can, it would be nice to hace

03:06 <wpwrak> haVe

03:06 <aw> okay

03:06 <wpwrak> else, just avoid touching GND ;-)

03:16 <aw> wpwrak, R60 still 10K there, TP37 (RP#, 16mA), TP36(reset pin out, 19mA), INIT_B_2 (24.7mA)

03:16 <aw> with 100 ohm to 3V3 measured. ;-)

03:17 <aw> TP 37 now 18.9mA

03:18 <wpwrak> hmm, they're all very strong

03:19 <wpwrak> what do you get on the 3V3 pin of the reset IC ?

03:20 <aw> voltage on TP36?

03:20 <aw> don't understand your question.

03:20 <wpwrak> current between 3V3+100R and pin 3 of U24 (pin 3 is the one on the side that has only one pin)

03:21 <wpwrak> expected value: 0.0000000 A ;-)

03:21 <aw> i measured pin 3V3 of reset ic is good @ 3.3V. ;-)

03:21 <wpwrak> okay

03:22 <wpwrak> this is weird. could there be any shorts ?

03:22 <wpwrak> high current on TP36 should only exist if:

03:22 <aw> i got -0.001 mA while attached pin 3v3 of reset ic

03:23 <aw> so no leakage current though. ;-)

03:23 <wpwrak> - PROGRAM_B_2 is actively pulling low (which, afaik, it never does)

03:23 <wpwrak> - the reset chip is pulling low (which is has no reason to do)

03:23 <wpwrak> - something is shorted into that net

03:23 <aw> yes, TP36 (program_b) is 67mV now

03:24 <wpwrak> and it pulls low with ~20 mA

03:24 <aw> yes

03:24 <aw> but don't know where surged

03:26 <aw> regards to if somewhere is short existed. this is really weird

03:26 <wpwrak> next: connect scope to TP36, acquisition: peak, trigger: auto, slow timebase (maybe 100-200 ms/div)

03:26 <aw> okay

03:26 <wpwrak> then power-cycle the board. see if it ever comes out of reset

03:31 <aw> i see a pluse fro low -> high -> low using rising edge

03:32 <aw> trying to catch it ;-)

03:32 <wpwrak> for how long does it stay high ?

03:32 <aw> yeah...wait

03:34 <wpwrak> (roughly) picoseconds / milliseconds / days :)

03:47 <aw> wpwrak, http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG

03:50 <kristianpaul> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG ?

03:50 <kristianpaul> oops

03:50 <kristianpaul> sorry

03:51 <aw> yes

03:53 <wolfspraul> I cannot imagine a short, unless it's a short coming from (programmed) inside the fpga

03:53 <aw> wpwrak, i may use two channels to compare

03:53 <wolfspraul> that's because the board worked before, and got into this state without any hardware action (manual hardware action, like soldering)

03:53 <aw> wpwrak, maybe scope DONE pin? RP#?

03:54 <aw> in channel 2. ;-)

03:54 <wolfspraul> aw: if you think that might be helpful, just do it until Werner is back...

04:11 <aw> wpwrak, forget about my last picture, see this new: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_ch1-tp36_ch2-tp37.JPG

04:11 <aw> ch1-tp36, ch2-tp37

04:11 <aw> they are synced actually.

04:12 <aw> i'm going to scope done pin as ch2 to see different

04:23 <aw> wpwrak, ch1-tp36, ch2-tp35(done pin): http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_ch1-tp36_ch2-done-tp35.JPG

04:25 <aw> the duration from the first pull high pulse to second pulse is thus ~180ms (which is the reset delay time)

04:27 <aw> alright..i think i need to test others after lunch

04:27 <aw> leave 0x39 aside temporarily. ;-)

04:30 <wolfspraul> aw: yes, let's wait for Werner's feedback and continue with other boards first

04:30 <wolfspraul> 'permanent reset' is an interesting new angle, maybe we are lucky and find something there...

04:31 <aw> hopefully get secrets behind it.

04:46 <kristianpaul> xiangfu: you can load bistream using jtag pload

04:47 <kristianpaul> xiangfu: _but_ you mm1 soc dont support boot from anyhthin besides NOR as is today.. may be the debug ROM is linked to jtag and you can boot from there.. i dont know upto there...

04:48 <kristianpaul> i hope i'm wrong on that and i missed that part of the HDL :)

04:49 <wolfspraul> now that we know that at least some boards are held in a permanent reset state, some of those earlier ideas lost value (for now)

04:49 <kristianpaul> yes,

04:49 <wolfspraul> because even if we could load and boot everything without nor on a functioning board, it wouldn't work on 0x39

04:49 <kristianpaul> correct

04:50 <wolfspraul> same for trying Xilinx Impact (which we skipped)

04:51 <wolfspraul> actually - is there still the chance that even on the 0x39 we have now, the fpga first reads a corrupted bitstream and then ends up forcing itself into permanent reset?

04:51 <kristianpaul> can we reset fpga from jtag?

04:51 <wolfspraul> probably not because then the access path via jtag-serial should still work, which it doesn't

04:52 <wolfspraul> good idea [reset fpga from jtag]

04:52 <kristianpaul> ah eys !!

04:52 <kristianpaul> mom

04:52 <kristianpaul> pld reconfigure

04:53 <wolfspraul> but that would end up doing the same thing, no?

04:53 <wolfspraul> it would read something from nor... ?

04:53 <kristianpaul> ah, yes :)

04:53 <kristianpaul> well bitstream must be loaded from somwhere

04:53 <wolfspraul> can we tell it to reconfigure from elsewhere?

04:53 <wolfspraul> like from what we supply over jtag

04:54 <kristianpaul> yeah, thinkking same..

04:54 <wolfspraul> but first maybe reset, then reconfigure from jtag

04:54 <kristianpaul> if the problem that fires permanent reset is on the powecyling

04:54 <kristianpaul> a pld reconfigure should sucess

04:55 <kristianpaul> as the board is already powred,

04:55 <kristianpaul> my guess

04:55 <wolfspraul> sure we can try, but let's assume there is a nor corruption

04:55 <wolfspraul> then it would hang itself again

04:55 <kristianpaul> yeap

04:55 <wolfspraul> if that nor corruption triggers the permanent reset

04:55 <wolfspraul> but how can we flash the board when nor is still empty?

04:56 <kristianpaul> _if_ what hangs is nor corruption and no some wrong tmings with reset IC perhaps?

04:56 <kristianpaul> ha, just wipe up a board that is know to boot and see what happen

04:56 <wolfspraul> does the fpga 'give up' when there's no bitstream in nor, but later it finds a corrupted bitstream and hangs itself?

04:56 <kristianpaul> should be a similar state as the no corruprion, as at the end..

04:57 <kristianpaul> hum no may be it starts okay then errors pop up and it just give up...

04:57 <wpwrak> hmm ...

04:57 <wolfspraul> ah :-)

04:58 <kristianpaul> or fpga dint catched bitstream corruption and lock up it self because of that?

04:58 <kristianpaul> thats even more trickier ;)

05:00 <wpwrak> the pulse looks scary in http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG

05:00 <kristianpaul> agree :)

05:01 <kristianpaul> malfctioning diode or reset ic?

05:01 <wolfspraul> we can definitely try a pld reconfigure on 0x39 when Adam is back, and measure FLASH_RESET_N etc. then

05:02 <wolfspraul> wpwrak: Adam uploaded 2 more later and said "forget the first one"

05:08 <wpwrak> it's hard to forget what looks like a 6 V+ pulse on a 3.3 V line :)

05:08 <kristianpaul> no no

05:08 <kristianpaul> horrible :)

05:10 <wpwrak> the others look better. some amplitudes seem too small, but maybe that's a limitation of the scope

05:12 <wpwrak> what is weird is that TP36 (PROGRAM_B_2) comes down. so either there's contamination from INIT_B_2 or FLASG_RESET_N, or the reset chip triggers for some unfathomable reason, or PROGRAM_B_2 becomes an output.

05:13 <wolfspraul> wpwrak: hmm. but what's wrong with the first picture then?

05:13 <wolfspraul> bad measurement?

05:14 <wolfspraul> or there was a 6V pulse on a 3.3V line?

05:14 <wpwrak> i'm curious what adam thinks he measured there :)

05:16 <wpwrak> for now, 0x36 looks more like a case of "multiple organ failure"

05:16 <wpwrak> err 0x39

05:17 <wpwrak> maybe put it aside and proceed with the next from the list we made yesterday

05:21 <wolfspraul> hmm

05:21 <wolfspraul> no other ideas for 0x39 ?

05:21 <wolfspraul> at least we can try "pld reconfigure" over urjtag

05:23 <wolfspraul> we can take another board and see whether we find the same permanent reset state

05:24 <kristianpaul> I'm out..

05:24 <kristianpaul> gn8

05:24 <wpwrak> for 0x39, the next thing i would try with the information we currently have is to remove the diode between reset chip and FLASH_RESET_N

05:25 <wpwrak> decouple the two systems, at the risk of now getting real NOR corruption

05:25 <wolfspraul> sure let's do that then

05:25 <wolfspraul> remove the diode and it may boot again?

05:25 <wpwrak> but i'd be curious about what lekernel thinks of the PROGRAM_B_2 net going low

05:26 <wpwrak> as i understand things, that can only happen if the reset chip pulls low, which it has no reason to do

05:26 <wpwrak> but my understanding may be incomplete

05:26 <wolfspraul> I'm curious why Adam said we should 'forget' the first tp36 picture

05:27 <wpwrak> if there's a condition in which PROGRAM_B_2 could become an output and pull low, that would be interesting to know

05:27 <wpwrak> yes, me too :)

05:27 <wolfspraul> we could replace the reset ic

05:27 <wolfspraul> but I hesitate to do these kinds of things while we are in analysis mode

05:27 <wpwrak> "do not look at the elephant" ;-)

05:27 <wolfspraul> well just replace with a new one

05:28 <wolfspraul> but maybe then our beautiful study object will work again and not tell us any more interesting stories

05:28 <wpwrak> or maybe just leave it out for testing. afaik, the FPGA shouldn't need it

05:28 <wpwrak> that could happen :)

05:28 <wolfspraul> yes

05:29 <wolfspraul> ok so

05:29 <wolfspraul> 1) find out why Adam said to ignore the first tp36 scope picture

05:29 <wolfspraul> 2) try 'pld reconfigure' from urjtag and see whether it stays in permanent reset

05:30 <wolfspraul> 3) remove the diode between reset ic and FLASH_RESET_N, see whether it boots

05:30 <wolfspraul> 4) remove the reset IC, see whether it boots

05:30 <wolfspraul> correct?

05:30 <wpwrak> but as a i said, there are several things that look wrong on 0x39. the 6+ V spike is worrying, if it's real

05:30 <wolfspraul> and then maybe, take another board and check whether we find a similar permanent reset condition there

05:30 <wpwrak> what does "pld reconfigure" do ? is this a reset ?

05:30 <wolfspraul> well

05:30 <wolfspraul> yes

05:30 <wolfspraul> seems like

05:30 <wpwrak> agreed on 1)

05:31 <wpwrak> 2) also seems reasonable

05:31 <wolfspraul> unfortunately I don't know the exact behavior of reset

05:31 <wolfspraul> will it automatically load the standby from nor?

05:31 <wolfspraul> is it possible that it loads a corrupted bitstream from nor which then locks itself (the fpga) up in permanent reset?

05:31 <kristianpaul> yes it will wolfspraul

05:31 <wpwrak> before 3), i'd like to have lekernel's opinion on PROGRAM_B_2 being driven low at ~20 mA while the board is powered

05:32 <wolfspraul> how come when we flash a board for the first time (nor empty), it will not load anything from nor (it's impossible because there is nothing there yet)

05:32 <wpwrak> it probably tries to load NOR but fails (or maybe the CRC is correct and it just loads garbage :)

05:32 <wolfspraul> well

05:33 <wolfspraul> can it hang itself up?

05:33 <wolfspraul> can the fpga itself be stuck in a loop that always ends in a permanent reset?

05:33 <wpwrak> can PROGRAM_B_2 become an output ? :)

05:34 <wolfspraul> ok, so no #3 or #4 until we hear from Sebastien

05:34 <wolfspraul> but in the meantime we can look at another board, with the new permanent reset focus

05:34 <wpwrak> yup

05:34 <wpwrak> if the same pattern appears on other boards, that would be good to know

05:35 <wolfspraul> ok we have 0x3C

05:35 <wolfspraul> but that one may not yet be in this state

05:35 <wpwrak> for all we know, 0x39 may have been ESD-fried ;-)

05:35 <wolfspraul> unlikely

05:36 <wolfspraul> if we would be hunting a rare problem (like 1 out of 100), and always poking around on the same board, then after some time I would agree and say "let's forget it until we have more boards"

05:36 <wpwrak> we do have photographic evidence of a 6+ V spike in a system that runs at 3.3 V and is supplied from a ~5 V supply. the supernatural is already there, on digital film :)

05:36 <Thihi> Dunno if anyone of you saw this, since I pasted this during the night. Anyway: http://kukka.siilo.fi/~kuutio/11-08-13-kissastuskausi.mkv - you guys might be interested in this. A small sample of what I do with a projector and a camera. Music has been ripped off from Boards of Canada.

05:36 <wolfspraul> well that may be cleared up fast

05:37 <wolfspraul> but anyway, we have enough boards and a problem cluster now to be sure it's not caused by ESD or other one-off phenomena

05:37 <wolfspraul> that's why we couldn't effectively dig in on the rc2 run (in addition to making mistakes how to handle it there)

05:39 <wpwrak> wolfspraul: oh, i think the cluster is real. just don't know what's up with 0x39

05:40 <wpwrak> 0x39 exhibits at least two phenomena that contradict my understanding of things: 1) the spike, 2) PROGRAM_B_2 being driven low (for more than 200 ms)

05:42 <wolfspraul> we could take 0x3C, 0x7F, 0x61, 0x40

05:42 <wolfspraul> I think 0x40 is erroneously set to 'available'

05:43 <wolfspraul> let's look at 0x40 first, then we can clear that up as well

05:43 <wolfspraul> if 0x40 is really good now, we can take 0x61 ?

05:43 <wolfspraul> well we have plenty

05:43 <wolfspraul> we just try to find a second one to support the permanent reset theory

05:44 <wolfspraul> wpwrak: the spike supports my idea that some 'bad' event is happening that may sometimes cause lasting damage

05:44 <wolfspraul> and for program_b_2 being driven low, I would think we find more instances of that now that we look for it, on 0x61 and others

05:45 <wolfspraul> let's see

05:45 <wpwrak> my list of boards that look as if they belonged to the cluser: 0x36, 0x3a, 0x55, 0x67, 0x6d, 0x6f, 0x70, 0x77, maybe 0x7a

05:46 <wolfspraul> he

05:46 <wolfspraul> all different from mine

05:46 <wolfspraul> ok let me look at your list...

05:46 <wpwrak> yeah :)

05:46 <wpwrak> we can pick one from each list ;-)

05:46 <wolfspraul> ah ok

05:46 <wolfspraul> I stay away from boards that have never rendered before

05:47 <wolfspraul> such as 0x36, 0x3A

05:47 <wpwrak> i see

05:47 <wolfspraul> your whole list :-)

05:47 <wolfspraul> of course it could be the same thing

05:47 <wolfspraul> maybe on those boards right from the beginning

05:48 <wolfspraul> but we risk running into one that simply has bad flash soldering or so

05:48 <wpwrak> yes, could be

05:48 <wolfspraul> aw: there you are :-)

05:48 <wolfspraul> so...

05:49 <wolfspraul> we have plenty of new ideas :-)

05:49 <wolfspraul> ready?

05:49 <wpwrak> aw: why did you want us to ignore http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG ?

05:49 <wpwrak> aw: and what do you think was what looks like a >= 6 V spike ?

05:50 <aw> roh, back from post office and picked 2nd shipment up, tks.

05:50 <aw> wpwrak, that one when you read it, please use divide 10X, i forgot to set the setting to X1. ;-)

05:51 <aw> so forget it

05:51 <wpwrak> aah ! :)

05:51 <aw> and just use the second picture with two channels. sorry that.

05:51 <aw> that second is exactly correct. ;-)

05:51 <wpwrak> okay, makes a lot more sense then :)

05:52 <aw> alright, so how was new ideas?

05:53 <wpwrak> waiting for lekernel to tell us if PROGRAM_B_2 can be an output

05:53 <wolfspraul> aw: next idea is this:

05:53 <wolfspraul> take 0x39, power it with jtag-serial connected to your computer

05:53 <wolfspraul> usb full-speed as always

05:53 <wolfspraul> then run 'jtag' manually (not a script)

05:54 <aw> wpwrak, aha...if PROGRAM_B_2 has been pulled low while powered-up? just guess, right?

05:54 <wolfspraul> then "cable milkymist" then "detect" then "pld reconfigure"

05:54 <aw> wolfspraul, okay

05:55 <wolfspraul> I don't know about the two 'instruction' lines from the script

05:55 <wolfspraul> maybe we add those too?

05:55 <wolfspraul> so it's

05:55 <wolfspraul> 1. cable milkymist

05:55 <wolfspraul> 2. detect

05:55 <wolfspraul> 3. instruction CFG_OUT 000100 BYPASS

05:55 <wolfspraul> 4. instruction CFG_IN 000101 BYPASS

05:55 <wolfspraul> 5. pld reconfigure

05:56 <wolfspraul> type those commands manually

05:56 <wolfspraul> then check whether the board is still in permanent reset

05:57 <kristianpaul> try middle button after just in case :)

05:57 <wolfspraul> no confusion. that's later I think.

05:57 <kristianpaul> s/after/later

05:58 <wolfspraul> you want to try whether it boots? it first needs to survive reconfiguration...

05:58 <kristianpaul> yeap

05:59 <wolfspraul> kristianpaul: are the two 'instruction' lines necessary?

05:59 <kristianpaul> dont know..

06:00 <kristianpaul> i dont think, but i can arge a reason now

06:01 <wolfspraul> we just leave them in

06:01 <aw> after those cmd, TP37(RP#) is 236mV

06:02 <aw> http://pastebin.com/qQMgy2a7

06:02 <aw> d2/d3 dimly lit surely :)

06:03 <wolfspraul> ok

06:03 <kristianpaul> argh..

06:03 <wolfspraul> so 'pld reconfigure' does not do any magic

06:03 <wolfspraul> no problem

06:03 <wolfspraul> aw: I have a question about 0x40

06:03 <wolfspraul> why is it set to 'available'? (I look at the wiki test results)

06:07 <aw> wolfspraul, mm..good catch. it must be I powered -on again and test rendering pass, so I marked as 'available'; this was done before you told me that don't put it like as 'available' once it has been haven d2/d3 dimly lit before.

06:07 <wolfspraul> hmm

06:07 <aw> delete it now.

06:07 <wolfspraul> so the 'notes' are not complete?

06:08 <wolfspraul> ok I got it already, so the board does work now

06:08 <wolfspraul> alright, let's look at 0x3C then

06:08 <aw> no. it must be all rendering pass, so i marked available but forgot to fill some notes.

06:08 <wolfspraul> got it

06:08 <wolfspraul> let's put 0x39 aside, and look at 0x3C

06:09 <wolfspraul> power it, see whether it boots...

06:09 <wolfspraul> I want to find another board that stops with d2/d3 dimly lit, or cannot reconfigure, or cannot reflash

06:10 <aw> okay

06:11 <aw> 0x3c: can reconfigure

06:11 <aw> so do same manual cmd like above?

06:12 <aw> 0x3c: TP37 surely is 3.3V now

06:13 <wolfspraul> wait

06:13 <wolfspraul> try a few power cycles with test software (up to crc check only)

06:13 <wolfspraul> 10

06:15 <aw> ok

06:16 <aw> mm....1st press middle btn then d2/d3 dimly lit now

06:16 <wolfspraul> well nice

06:16 <wolfspraul> tp37

06:17 <aw> wait

06:18 <aw> TP37: now is much messy level from 1.2V to 3.3V. messy pulses!

06:19 <aw> mm...now is 3v3 and d2/d3 dimly lit is GONE.

06:19 <wolfspraul> he

06:20 <wolfspraul> but I think what you saw initially "messy from 1.2 to 3.3" and then to 3.3 and then dimly lit is gone all confirms out work so far

06:20 <aw> bad that I cant took TP3 pictures when it's pulse in unstable.

06:20 <wolfspraul> no problem, we believe you and it's in the chat :-)

06:20 <aw> yes

06:20 <aw> so how's the next we need?

06:20 <wolfspraul> let's see whether wpwrak is still around

06:20 <aw> press middle btn again?

06:21 <wolfspraul> sure, try

06:21 <wolfspraul> I would think it boots

06:21 <aw> same now

06:22 <aw> i stop the scope

06:22 <wolfspraul> same what?

06:22 <aw> let me take pictures

06:22 <wolfspraul> you pressed the middle button and then?

06:23 <wolfspraul> oh, more 'pulses' maybe :-)

06:23 <aw> when d2/d3 dimly lit, TP37 has messy pulse fro 1.2V to 3.3V, many pulses variance on stays this area

06:25 <wolfspraul> did you press the middle button? what happened?

06:26 <wolfspraul> if you can keep it in this state of 'messy pulses', can you measure the other test points as well?

06:27 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3c_ch1-tp37.JPG

06:27 <wolfspraul> aw: did you press the middle button? what happened?

06:28 <aw> actually it's low level reached at least down to 1.2V

06:28 <wolfspraul> please describe the sequence of events there, then it's easier to come up theories

06:28 <aw> it goes into s2/d3 dimly lit after I press middle btn again

06:28 <wolfspraul> interesting

06:28 <wolfspraul> so it did not boot

06:28 <wolfspraul> instead, d2/d3 dimly lit again?

06:29 <aw> now its' in dimly lit

06:29 <wolfspraul> ok

06:29 <aw> maybe my prober let TP37 recovered then rised to 3V3

06:30 <wolfspraul> well, we have to wait for more thoughts from Werner or Sebastien. I suggest you continue with regular testing and fixing across the batch.

06:30 <aw> so reset on flash chip is asserted then d2/d3 became fully OFF

06:30 <aw> then it goes dimly lit after I press middle btn again.

06:30 <wolfspraul> with us finding a second board right away that may very well be in a similar or same 'permanent reset' state, I think it confirms what we found on 0x39

06:31 <wolfspraul> right now it's dimly lit?

06:31 <wolfspraul> can you measure the other test points?

06:31 <wolfspraul> TP36, INIT_B_@

06:31 <wolfspraul> INIT_B_2

06:31 <wolfspraul> :-)

06:31 <wolfspraul> I think those, right?

06:33 <aw> now 0x3c: TP37 keeps stable low (209mV, surely dimly lit)

06:33 <wolfspraul> measure TP36, INIT_B_2

06:33 <aw> TP37(flash chip reset pin), TP36 (PROGRAM_B_2)

06:34 <wolfspraul> yes just to compare with 0x39, measure TP36 and INIT_B_2

06:35 <aw> 0x3c: program_b_2 TP36 is stable 3.3V

06:36 <aw> we need to know also from lekernel : what fpga does after pressing middle btn?

06:36 <aw> now...i am blind though

06:37 <wolfspraul> did you measure init_b_2 ?

06:38 <aw> init_b_2 is good low

06:38 <wolfspraul> ok

06:38 <wolfspraul> I suggest you continue with regular testing and fixing now

06:38 <aw> tp36 and tp37 is now unstable pulse together!

06:38 <wolfspraul> ok

06:38 <wolfspraul> we have enough data about 0x3C (I do)

06:38 <aw> unstable (some sort like high impedance from fpga), maybe I don't know.

06:38 <wolfspraul> I suggest you go back to the regular testing and fixing

06:39 <aw> mm

06:39 <wolfspraul> the more 'other' bugs we can fix across all boards, the less likely we are to later be confused when investigating the 'permanent reset' problem more

06:39 <wolfspraul> keep the 'notes' column updated with anything unusual or suspicious you see with a board

06:40 <wolfspraul> also, from now on, I suggest if you run into any of these problems: 1) d2/d3 dimly lit, 2) cannot reconfigure, 3) cannot reflash, you measure TP36 and write the value you see in the notes column

06:40 <wolfspraul> or maybe TP36 and TP37 - both? don't know

06:40 <aw> ok

06:40 <wolfspraul> maybe both :-)

06:41 <aw> well...too many

06:41 <wolfspraul> ok, then only TP36

06:41 <aw> 0x39 and 0x3c is good data now

06:41 <wolfspraul> no I mean for new boards

06:41 <wolfspraul> when you test them

06:41 <aw> well....umm...ok

06:41 <wolfspraul> and if you run into dimly lit/reconfig/reflash problem

06:41 <wolfspraul> normally you would stop there

06:41 <aw> once i run into failure, i write notes. ;-)

06:41 <wolfspraul> but now you measure TP36, and write it into the 'notes' column

06:41 <wolfspraul> yes

06:42 <wolfspraul> aw: I think that's a good idea, no?

06:42 <aw> well...write firstly though

06:43 <aw> i am manually operator now to test...a little afraid of my memory to forget many boards though, not bad idea...just slow only. ;-)

06:51 <wolfspraul> why forget. just go through the batch one by one, let's fix everything we know we understand.

06:51 <wolfspraul> starting from the easiest fixes, to the more difficult ones

06:51 <wolfspraul> in parallel when Werner or Sebastien are back we continue with the permanent reset investigatio

06:51 <wolfspraul> investigation

06:52 <wolfspraul> but now back to regular testing...

06:58 <aw> 0x3c & 0x39 & 0x40 I updated results

06:58 <wolfspraul> nice

06:58 <aw> yes, I go for others now

06:58 <wolfspraul> good

06:58 <wolfspraul> let's fix all easy and simple things we already know about clearly

06:59 <aw> scope pictures linked there.

06:59 <aw> yes

08:37 <lekernel> "still stopped 'Bitstream length: 1484404'"

08:37 <lekernel> please do NOT make such reports again. instead, input the urjtag commands manually and do not use the batch file. better, enable some debug output.

08:43 <lekernel> PROGRAM_B is not driven http://www.xilinx.com/support/documentation/user_guides/ug380.pdf

08:48 <wolfspraul> lekernel: any idea how it can end up in the permanent reset state as observed earlier?

08:48 <lekernel> so it's confirmed? what we though was "NOR corruption" on RC3 is just permanent reset?

08:49 <wolfspraul> well. making the judgment is the hard part.

08:49 <lekernel> hard? wtf

08:49 <lekernel> is the voltage on TP37 high or low when the board fails?

08:50 <wolfspraul> no making the judgment of whether we look at a "nor corruption" or "permanent reset"

08:50 <wolfspraul> because we think about a lot of boards and even more testing data. there may be multiple bugs, or different problems on different boards.

08:51 <lekernel> ok, whatever

08:51 <wolfspraul> earlier we did some tests on 0x39, did you see that in the backlog?

08:51 <lekernel> on the board you are debugging right now

08:51 <lekernel> what is the voltage on TP37 when it fails?

08:51 <lekernel> and I just mean after initial power up , no booting etc.

08:52 <wolfspraul> on 0x3C, we had pulses between 1.2V and 3.3V on TP37

08:52 <wolfspraul> on 0x39, it was around 200mV I think, checking backlog...

08:53 <lekernel> it should never be 1.2V

08:53 <lekernel> and never pulse

08:53 <wolfspraul> yes

08:54 <wolfspraul> TP37 318mV on 0x39

08:54 <lekernel> 200mV is not correct either, that would permanently reset the flash

08:54 <wolfspraul> correct

08:54 <wolfspraul> are you reading the backlog at all?

08:58 <lekernel> yes but it's quite not clear

08:59 <lekernel> wasn't Adam supposed to measure what drives that TP37 low?

09:16 <wolfspraul> hmm

09:16 <wolfspraul> ok, seems we are stuck

09:16 <wolfspraul> did we answer some of Werner's questions?

09:16 <wolfspraul> lemme see...

09:16 <wolfspraul> "waiting for lekernel to tell us if PROGRAM_B_2 can be an output"

09:16 <wolfspraul> the answer seems to be: no

09:17 <lekernel> yes, it is "no"

09:17 <wolfspraul> good, thanks :-)

09:17 <wolfspraul> now, there was another one

09:19 <wolfspraul> "high current on TP36 should only exist if: - PROGRAM_B_2 is actively pulling low (which, afaik, it never does) - the reset chip is pulling low (which is has no reason to do) - something is shorted into that net

09:19 <wolfspraul> and it pulls low with ~20 mA

09:20 <wolfspraul> so PROGRAM_B_2 is not the culprit

09:20 <wolfspraul> that leaves the reset ic or some short

09:21 <wolfspraul> Werner's next idea was to remove diode and reset ic and see what happens

10:18 <lekernel> ok, sounds good

10:45 <zumbi> lekernel: hello! someone pointed me to talk to you about some question I had

10:45 <zumbi> lekernel: basically I was looking into converting bitstream to netlist

10:46 <zumbi> lekernel: I found a tool called 'debit' at ulogic.org, but it does not seem to be online anymore

10:47 <zumbi> lekernel: do you know where can I find such tool? or any idea on how to reverse engineer bitstream?

10:47 <lekernel> http://www.milkymist.org/3rdparty/

10:47 <zumbi> lekernel: oh! wow, thanks

11:24 <kristianpaul> lekernel: what's the behavior of fpga when never get a bitstream from nor?

11:24 <kristianpaul> of our fpga in rc3 of course

11:26 <wpwrak> booting ... (sorry for my erratic napping pattern. bloody cold is messing with me :-( )

11:27 <wolfspraul> wpwrak: wow, take good care of yourself

11:28 <wolfspraul> your relentless support for Milkymist One is amazing anyway, I'm flattered and feel bad that I cannot debug those bloody circuits better myself :-)

11:28 <kristianpaul> indeed

11:29 <wolfspraul> wpwrak: Sebastien just confirmed that PROGRAM_B_2 cannot be an output, I guess that means next on 0x39 we remove the diode?

11:30 <wolfspraul> aw: can we do another 0x39 session?

11:30 <wpwrak> (reading backlog ... INIT_B_2 has no test point. but can be measured on R157)

11:30 <aw> yes, go on

11:30 <wolfspraul> wpwrak: oh we had some interesting results on 0x3C, but I think they confirm what we saw on 0x39

11:32 <wolfspraul> aw_: go back to 0x39, turn it on, tell us what happens

11:33 <aw_> are you sure 0x39 or 0x3c(this has messy pulses)

11:33 <wolfspraul> 0x39

11:33 <aw_> mm

11:36 <wpwrak> yeah, let's remove the diode. untangle the knot a bit :)

11:36 <wolfspraul> is it clear which one?

11:36 <wpwrak> (the diode between reset out and FLASH_RESET_N, keep the one between INIT_B_2 and PROGRAM_B_2 for now)

11:37 <wpwrak> (configuration document) ah, a godsent ;-) just wished it was 1/10 the size :)

11:38 <wolfspraul> wpwrak: did you see the 3C results? voltage between 1.2 and 3.3...

11:38 <wpwrak> (0x3c results) yeah, says "something's weird" :)

11:38 <wolfspraul> looks similar to what we see on 0x39, no?

11:38 <wpwrak> i wonder if we have some unintended connections between things

11:39 <wpwrak> maybe the "fix2" rework left some trouble

11:39 <wolfspraul> keep in mind boards first working and then falling into this state

11:39 <aw_> wpwrak, remove the diode which is between reset out and FLASH_RESET_N?

11:39 <wolfspraul> aw_: wait, first turn 0x39 on and tell us what you see

11:40 <wpwrak> aw: yes

11:40 <wolfspraul> d2/d3 still dimly lit? voltage on tp36 ?

11:40 <wolfspraul> wpwrak: so at least for the part of the problem that we see on a board that first works and then falls into permanent reset, it cannot be cause by some permanent short/connection on the board

11:41 <wolfspraul> maybe current flows the wrong way somewhere and slowly damages a part?

11:41 <wpwrak> voltages around 1.2-1.3 V look like some things working against each other

11:41 <aw_> 0x39: d2/d3 dimly lit, tp36- 78mV, tp37 - 238mV, init_b - 35mV

11:41 <wpwrak> i'm not sure we're seeing actual damage

11:42 <wolfspraul> ok perfect

11:42 <wpwrak> aw_: that's after diode removal ?

11:42 <wolfspraul> aw_: now power off, remove the diode

11:42 <wolfspraul> wpwrak: no, before (pretty sure)

11:42 <wpwrak> ah :)

11:42 <aw_> still have diode

11:42 <wpwrak> the executioner's axe don't swing so swiftly ;-)

11:42 <wolfspraul> wpwrak: one difficulty is that the problem is known to have spontaneously disappeared before

11:43 <wolfspraul> so even if the removal of the diode makes it go away, it may not be because of the removal of the diode

11:43 <wpwrak> indeed. but it if it doesn't come back on, say, 0x39, that's an indication

11:43 <wolfspraul> yes sure

11:43 <wpwrak> besides, we should still observe anomalies

11:43 <wolfspraul> just saying why I wanted the baseline

11:44 <wpwrak> but now the anlomalies will be in separate analog domains. easier to tame (i hope) :)

11:46 <aw_> so now, remove diode which between reset out and FLASH_RESET_N ?

11:46 <wpwrak> yup

11:51 <wolfspraul> have we ruled out some stupid mistake on the affected boards like diodes in wrong polarity? some mistake to the circuit on those particular boards?

11:52 <wolfspraul> since some steps were done manually (fix2), those would need to be double-checked

11:52 <wpwrak> i think so. i've asked adam to do a visual inspection and also to check the diodes

11:52 <wolfspraul> or would a wrong polarity diode lead to different behavior anyway?

11:52 <wpwrak> but in any case, a systematic search will turn up such things too

11:54 <wpwrak> lekernel: is the internal pull-up on P22 (FLASH_RESET_N) asserted during reconfiguration ?

11:54 <aw_> 0x39: d2/d3 dimly lit after removed diode(D16), tp36 - 45mV, tp37 - stable 3.3V, init_bÂ Â - 26mV

11:54 <lekernel> yes, it should be

11:54 <lekernel> but if it's not, that might well be the problem

11:55 <wpwrak> aw_: very good. FLASH_RESET_N is off the hook for now

11:55 <aw_> wpwrak, yup

11:55 <wpwrak> now, who's driving PROGRAM_B low ?

11:56 <wpwrak> we have three candidates: 1) the reset chip, 2) INIT_B, 3) divine intervention

11:57 <wpwrak> aw_: can you please power down and test whether the diode between PROGRAM_B and INIT_B does indeed work like a diode ? (if your multimeter has a diode test, that would be the easiest)

11:57 <aw_> wpwrak, do we need to trigger program_b again to see power on sequence?

11:58 <aw_> wpwrak, ok

11:59 <wpwrak> (trigger PROGRAM_B) heh, that seems to happen even without us doing anything. little gremlins are at work here ;-)

11:59 <aw_> wpwrak, you got right, now it's not activated as a diode behavior

12:00 <aw_> wpwrak, but let me confirm this again

12:00 <lekernel> crappy diodes breaking down?

12:01 <wpwrak> lekernel: maybe bad soldering

12:01 <wpwrak> wolfspraul: or fake diodes ? ;-))

12:02 <wolfspraul> a ghost

12:03 <aw_> wpwrak, forwarding voltage is 11.8mV, reversing voltage is -9mV

12:03 <lekernel> lol

12:03 <wpwrak> the small voltage difference between INIT_B and PROGRAM_B suggests that they don't act much like diodes ...

12:03 <wpwrak> hehe ;-)

12:03 <aw_> now I am going to take apart this diode and measure it again

12:03 <wpwrak> they're 0R ! ;-))

12:03 <lekernel> ah, you did not take it apart?

12:04 <wolfspraul> is there a chance we are wearing out diodes, or is that impossible?

12:04 <lekernel> you should not measure mounted diodes, this gives wrong measurements in most of the cases

12:04 <wpwrak> wolfspraul: highly unlikely

12:04 <wolfspraul> ok, scratched off

12:04 <lekernel> wolfspraul, with the small currents and voltages they are supposed to handle here, if they do wear out, they're probably counterfeit pieces of crap

12:04 <wpwrak> wolfspraul: and i think these are quite sturdy

12:05 <wpwrak> wolfspraul: you could probably wear them out with thermal abuse, though. should still take quite an effort, though

12:06 <wolfspraul> can we check the one we took off as well (between reset and FLASH_RESET_N)?

12:06 <lekernel> we had something weird happening on rc1. Adam reworked two video chips (reconnecting the pixel bus output correctly), which died with an internal power supply short some ~20s after power applied

12:07 <lekernel> exact same problem on the two boards

12:07 <lekernel> it was never explained

12:07 <lekernel> mwalle did the exact same rework, and it went fine on his board

12:07 <lekernel> on rc2 we have applied the change in the pcb layout, and it also went fine

12:07 <aw_> wpwrak, yes, you analysis is right, my diode doesn't act as a diode though.

12:07 <lekernel> I don't know what went on there

12:08 <wpwrak> aw_: you removed it and tested it out of circuit ?

12:08 <wolfspraul> aw_: can you check the one you took off as well?

12:08 <lekernel> but this looks vaguely similar

12:08 <aw_> wpwrak, thanks you catching this. Super! so going to soldering a new one. :(

12:08 <wolfspraul> lekernel: where is the similarity?

12:08 <wpwrak> lekernel: by the way, where did you find that INIT_B needs to be pulled together with PROGRAM_B in order to have an effect ?

12:09 <wpwrak> aw_: (solder new one) wait a minute ... try to boot without diodes first

12:09 <wolfspraul> aw_: wait. shouldn't we check the diode we removed as well?

12:09 <aw_> wolfspraul, the D16 I took just measure too, it's okay. ;-)

12:09 <lekernel> that some semiconductor device shorted itself some time after a rework

12:09 <lekernel> wpwrak, we shouldn't need to pull PROGRAM_B low, but the initial PCB layout has the trace, and it's additional work to cut it

12:09 <lekernel> plus it should not hurt

12:10 <Fallenou> lekernel: are you using gcc 4.5.3 or gcc 4.5.2 ?

12:10 <wpwrak> lekernel: i was more thinking of pulling only PROGRAM_B, without INIT_B

12:10 <wolfspraul> aw_: boot without diodes first now (see werner's msg)

12:10 <Fallenou> will try a diff a out git repo (for rtems) and their cvs head, to try to understand why zlib compiles in their cvs head and not in our git

12:10 <lekernel> wpwrak, the xilinx doc says you should use INIT_B to delay configuration

12:10 <Fallenou> -a out+on our

12:11 <lekernel> but it's not very clear

12:11 <wpwrak> lekernel: they seem to say the same about PROGRAM_B

12:11 <aw_> since the diode I have to solder its two terminals, so yes, it was not acted. bad...i should have measured its forwarding voltage.

12:11 <lekernel> http://www.xilinx.com/support/documentation/user_guides/ug380.pdf

12:11 <wpwrak> lekernel: e.g., page 51 (picking a random one), before the section title

12:11 <lekernel> "Before the Mode pins are sampled, INIT_B is an input that can be held Low to delay configuration. "

12:12 <wpwrak> oh, that's actually the place that talk about NOR. lucky coincidence ;-)

12:12 <lekernel> ah, yes, page 51 says both approaches are correct

12:13 <wpwrak> so maybe we can leave INIT_B out of the mess. that would help in general

12:13 <aw_> wpwrak, yes, without (two diodes), now it boot up and rendering.

12:13 <wpwrak> then we only need to coordinate PROGRAM_B and FLASH_RESET_N

12:14 <wpwrak> does a happy little dance

12:14 <lekernel> is sick of those broken components

12:14 <wpwrak> welcome to the wonderful world of hardware ;-)

12:15 <lekernel> i've never seen anything this bad

12:15 <wolfspraul> lekernel: yes. maybe you should do more software and less hardware :-)

12:15 <wolfspraul> aw_: I think next step is to put 2 good diodes back on, and see whether it boots.

12:15 <wolfspraul> wpwrak: agreed?

12:15 <aw_> wpwrak, init_b = 1.2V, tp37 = 3.3V, tp36 = 3.3V

12:15 <wolfspraul> lekernel: if you build something one day, Werner and I will volunteer to help you. no worries :-)

12:15 <lekernel> I will admit I have a relatively limited experience with manufacturing, but from what I've heard and seen, this project is by far the one which is hit the hardest by broken/counterfeit components

12:15 <wpwrak> wolfspraul: no. i'd get rid of the diode connecting INIT_B. lekernel: aqgreed ?

12:16 <wolfspraul> lekernel: no it's not.

12:16 <wolfspraul> the one thing we could do better is to throw more money and people at the problem.

12:16 <wolfspraul> in hardware parallelism works quite well, unlike in software (MMM)

12:16 <wpwrak> aw_: lovely

12:16 <wolfspraul> that we cannot do, it is beyond my capabilities

12:17 <aw_> wpwrak, you did a happy little dance now? ;-)

12:17 <aw_> but really sorry that still this was my fault on diode soldering. :(

12:17 <wolfspraul> lekernel: yes, just wanted to say. why broken components?

12:17 <wolfspraul> maybe soldering

12:17 <wolfspraul> plus we have over 20 boards

12:17 <wolfspraul> let's see

12:18 <wolfspraul> so what is the solution now? only 1 diode now?

12:18 <wpwrak> (parallelism) indeed. adam is our bottleneck here. and will all the workload, he probably doesn't even have time to think about those problems himself, so we're wasting another analyst

12:18 <lekernel> yeah, and make sure that one diode will not go bad

12:18 <wpwrak> aw_: (dance) well, figuratively. i'm too lazy to get out of my chair :)

12:18 <wolfspraul> ok only one diode now

12:19 <wolfspraul> aw_: have you put that one (and working) diode back on

12:19 <wolfspraul> does the board boot?

12:19 <lekernel> but didn't we add INIT_B to fix some intermittent no-configuration problems initially?

12:19 <wpwrak> aw_: was it soldering or is the component (diode) bad ?

12:20 <aw_> wpwrak, yup..good question. I can't realized this is shorten by bad diode or my soldering though.

12:20 <wolfspraul> wait, slow down

12:20 <wolfspraul> I still have my question open

12:20 <wolfspraul> aw_: did you put 1 diode back on? it boots?

12:20 <wpwrak> lekernel: as far as i remember, when we ran into the first set of reset troubles, you said that you had discovered in the xilinx docs that PROGRAM_B alone was not enough, and then the change was made. but i don't remember any observation in the real circuit triggering this.

12:20 <aw_> wolfspraul, i haven't put 1 diode back on

12:20 <wolfspraul> and also what lekernel just said - didn't we add INIT_B to fix something?

12:21 <wolfspraul> aw_: ok, put that back on

12:21 <wolfspraul> bring the board into the state that we believe it is perfect

12:22 <lekernel> actually, if INIT_B is not needed, it becomes a very simple rework

12:22 <lekernel> compared to the initial RC3 schematics, basically install C238 and change two resistors

12:22 <wpwrak> btw, i only realized yesterday that the "flash length = 14xxxx" or such message meant that the download hadn't even started. all the time, i had somehow imagined that it had stopped somewhere in the middle.

12:22 <wolfspraul> hmm

12:23 <aw_> and those two diodes are took apart now and rendering

12:23 <aw_> so later if need to soldering diode, i can do measure if it's good before soldering then measure it again after soldering.

12:23 <aw_> from now on, any diode i took apart, I won't soldering back. will use a new one.

12:23 <wolfspraul> of course

12:23 <wpwrak> (simple rework) after doing the complicated one ;-))

12:23 <wolfspraul> so wait

12:23 <wolfspraul> should Adam put the 1 diode back on?

12:24 <wolfspraul> what is the best design we have in mind now?

12:24 <wpwrak> the one between PROGRAM_B and FLASH_RESET_N, yes

12:24 <wolfspraul> and now lekernel wants to install C238 and change two resistors?

12:24 <wpwrak> C238 should already be there

12:24 <wpwrak> the resistors should already have been changed

12:25 <wolfspraul> ah ok

12:25 <wolfspraul> sorry too many details I lost track

12:25 <wpwrak> now it's really just a matter of removing the INIT_B diode. and the wire, if you want

12:25 <wolfspraul> so... aw_ one diode back on and let's see whether it boots

12:26 <wolfspraul> we should approach this carefully, we have over 20 boards with reconfigure/flash/leds dimly lit problems

12:26 <wolfspraul> if they all go down to a non-working diode, well, great

12:26 <wpwrak> lekernel: how confident are you about not needing R60 ? (RP# pull-up)

12:26 <aw_> wpwrak, wait you said removing INIT_B diode?

12:26 <wpwrak> aw_: yes. the one that you found to be bad

12:27 <wpwrak> aw_: so instead of two diodes, we now only use one

12:27 <aw_> wpwrak, so keep D16

12:27 <lekernel> the Xilinx docs says clearly the FPGA has pull up resistors, and we observe them with the dimly lit LEDs

12:27 <lekernel> so I'm pretty confident about that

12:27 <lekernel> as long as there are no glitches, though

12:28 <aw_> wpwrak, just keep D16? how about R60?

12:28 <wpwrak> lekernel: i don't question that the FPGA has them. just whether we're sure it uses them all the time :)

12:28 <aw_> wpwrak, the current 0x39 has R60 10K

12:29 <wpwrak> ah yes, that's from an experiment earlier tonight

12:29 <aw_> yes

12:29 <wpwrak> aw_: let's wait for lekernel's verdict

12:29 <aw_> so keep R60 or remove it then boot again?

12:29 <aw_> wpwrak, hmm..okay

12:30 <lekernel> it should use them all the time, yes

12:30 <wpwrak> lekernel: so away with R60 ?

12:30 <lekernel> yes, that should not be needed

12:31 <lekernel> and it increases the load on the reset IC, which has limited current capability

12:32 <wpwrak> aw_: so, no R60 then

12:32 <aw_> so i go for: 1. solder D16 back 2. remove R60 3. remove diode between program_b and init_b

12:33 <wpwrak> yes

12:33 <wolfspraul> only on 0x39 for now

12:34 <wpwrak> adam needs a nocturnal twin brother who could then take over power-cycling 0x39 a gazillion times to see if it really survives :)

12:35 <wpwrak> but i think "we got him"

12:36 <wpwrak> wolfspraul: btw, good cluster analysis connecting reconfig failure with usb-jtag

12:37 <wolfspraul> well. I'm not so sure about this yet.

12:37 <wolfspraul> all of these problems and troubles just because of bad soldering?

12:37 <wpwrak> could be. we don't know yet what exactly happened with those diodes

12:38 <wpwrak> it's unusual for diodes to fail this way

12:38 <wolfspraul> let's see

12:38 <wolfspraul> let's assume 0x39 boots now

12:39 <wolfspraul> then what? we do 20 render cycles (without CRC checks in between, just the 30 second rendering and power cycle)

12:39 <wolfspraul> assume that works well too

12:39 <wpwrak> yeah, do a few cycles, check the crc at the end. do a reflash with usb-jtag, just to confirm this is fine, too

12:39 <wolfspraul> on all boards with flash/dimly lit/reconfig problems, we remove the diode between program_b and init_b? and check the other diode (onboard, as much as that is possible)?

12:40 <wolfspraul> because now we say we don't even want the program_b/init_b diode anymore?

12:40 <wolfspraul> so instead of checking it, we just remove it

12:40 <wolfspraul> right?

12:40 <wolfspraul> and if that fixes those boards, or a large number of them, then we conclude this to be a design improvement and remove the diode between program_b and init_b on all 90 boards?

12:40 <wolfspraul> do I roughly understand this right?

12:41 <wpwrak> sounds good to me

12:41 <wolfspraul> and we also check (again, if practical onboard) the correct functioning of the remaining diode on all boards

12:41 <wolfspraul> before lekernel said they cannot be checked while mounted

12:41 <wpwrak> if the diode itself has issues, we may also need to check the other one

12:41 <wolfspraul> oh sure

12:41 <wolfspraul> so - can it be checked onboard or not?

12:41 <wpwrak> depends :)

12:42 <wolfspraul> or can a check at least give some indication?

12:42 <wpwrak> you can inject a little probe current and see what happens

12:42 <wolfspraul> I somehow doubt that all problems will magically go away by removing and checking diodes.

12:42 <wolfspraul> but ok, maybe they will :-)

12:42 <wpwrak> sometimes the diode is more or less isolated, in which case you can test i in-circuit. sometimes other things in the system will happily act in its stead, and all you get is confusion.

12:43 <wpwrak> the cluster may go away :)

12:43 <wolfspraul> oh I'm sure some isolated cases will pop up

12:43 <wpwrak> do we know what specifically caused the high current consumption of some boards ?

12:43 <wolfspraul> but my main concern is to finally have a known-good design and test I can 100% trust, so that the board won't fail a few tests after I stopped testing

12:43 <wpwrak> yeah

12:44 <wolfspraul> ok I'm out for about 30 min, reading backlog when back

12:44 <wpwrak> me too :)

12:45 <wolfspraul> take enough rest there, and thank you so much for all your help!!

12:45 <wolfspraul> aw_: I'm back in 30 minutes

12:46 <aw_> wolfspraul, okay

12:57 <aw_> wpwrak, shall we change back to high speed though?

12:59 <lekernel> aw_, no, stay in full speed

12:59 <lekernel> we don't want to take care of any USB/JTAG issues right now

13:00 <aw_> mmm..good

13:00 <aw_> now i start to count again

13:00 <aw_> 5 already

13:02 <aw_> crc check between rendering 30 seconds

13:08 <wpwrak> back

13:09 <wpwrak> wolfspraul: np :) it's fun to finally kill those gremlins :)

13:10 <aw_> i didn't reflash 0x39 again

13:10 <wpwrak> aw_: if the CRC is right, it's good :)

13:11 <aw_> now 10 times already with crc checks between rendering.

13:11 <wpwrak> aw_: at the end, we can do some reflash tests, to confirm that this works, too

13:11 <aw_> yes i watched it

13:11 <aw_> wpwrak, okay

13:14 <wpwrak> one mystery remains: why did the board ever work, with that evil diode misbehaviour ? points to a somewhat scary failure model. but we'll see ...

13:14 <wpwrak> lekernel: when the M1 is fully up and running, is there some easy way to force a flash reset ?

13:15 <aw_> yes, i also didn't realized though it's in success before. :(

13:18 <aw_> wpwrak, gui has a "reboot" btn which can let flash reset. ;-)

13:19 <aw_> wpwrak, i tired to capture tp37 about reset waveform to you before. ;-)

13:20 <aw_> 15

13:20 <wpwrak> hmm, what i'm looking for is a flash reset without system reset. to test whether the diode between FLASH_RESET_N and PROGRAM_B is okay. after all, it's the same component ...

13:21 <wpwrak> (test in boards that seem "okay")

13:29 <aw_> 20

13:29 <wolfspraul> I think Adam can remove the crc check between each render cycle

13:30 <wolfspraul> that wasn't part of the testing before, and doesn't imitate user behavior either

13:30 <wolfspraul> we can do the crc test after 100 render cycles

13:30 <wolfspraul> wpwrak: what do you think?

13:30 <wpwrak> it's probably safe to do so. doesn't hurt to have it either, though. you never know what you may find ;-)

13:30 <wolfspraul> it costs time

13:30 <wpwrak> not that'd expect to find anything

13:30 <wolfspraul> yes

13:30 <wolfspraul> so remove

13:30 <wpwrak> yes, it does :)

13:30 <wolfspraul> I don't believe in the NOR corruption story anyway

13:31 <wolfspraul> so one test after 100 pure render cycles is enough

13:31 <wpwrak> seems that was rc2 only

13:31 <wolfspraul> if that really shows a crc problem I eat my words :-)

13:31 <wpwrak> rc3 has new excitement to offer :)

13:31 <wolfspraul> aw_: you can remove the crc check in between render cycles

13:31 <wolfspraul> do the render cycles only

13:32 <wolfspraul> and you can do _ONE FINAL_ crc check after the last render cycle

13:32 <aw_> okay...final crc check at last

13:35 <aw_> so now those values: D16 still there, R30 = R157 = 10K, C238 = 220pF, removed R60 and program_b/init_b diode

13:37 <aw_> 25

13:37 <wpwrak> aw_: do you have the link to your fix2 schematics ? (with the component numbers)

13:38 <aw_> wpwrak, http://en.qi-hardware.com/wiki/File:M1_rc3_hw_fix2.png

13:38 <aw_> need to modify this later if we surely this

13:39 <wpwrak> thanks !

13:40 <wolfspraul> aw_: wpwrak please let's give this a new name then, like fix2b, or anything, but it must be a new name

13:40 <wolfspraul> not fix3 either because we had that already

13:41 <wpwrak> i think we went up to "fix4" ;-)

13:41 <wolfspraul> I propose fix2b

13:41 <wolfspraul> or fix2a, but somehow I like fix2b better

13:41 <wpwrak> 2a, 2b seem to be available

13:42 <wolfspraul> ok so the new one is fix2b?

13:43 <wpwrak> find with me

13:43 <wolfspraul> ok

13:43 <wolfspraul> I think we can already say that 0x39 is fine now (with fix2b applied)

13:43 <wolfspraul> as a next step, I propose that adam applies fix2b to a number of other boards and we look at the results

13:44 <wolfspraul> for testing, if he can make it to the render cycles, he should do 10 full render cycles, but without crc checks in between (maybe one at the end is not bad)

13:44 <aw_> 30

13:44 <wolfspraul> so I try to select a list of boards now for fix2b, my proposal

13:44 <wpwrak> (fix2b to other board) yes, do the cluster (or part of it)

13:44 <wpwrak> we still need to have an idea of what went wrong with the diode

13:45 <aw_> mm...i need to note fix2b in .ods now. phew~

13:46 <Fallenou> lekernel: our rtems git repo has a different cpukit/zlib/zconf.h.in than the one in their CVS head, we don't have the definition of z_off64_t but they have it

13:46 <wpwrak> could be: 1) bad soldering (short is outside the diode), 2) component in a constant bad state (which may itself have variations), 3) component degenerating

13:47 <wpwrak> 3) would be a problem, because we then can't be sure D16 won't act up in the firld

13:47 <Fallenou> lekernel: I guess we just need to sync our zconf.h.in with theirs, will try to do a patch for that

13:47 <wpwrak> s/firld/field/

13:47 <lekernel> Fallenou, if it was changed recently, it should just be a matter of git-cvs update then

13:48 <Fallenou> lekernel: well you can try, or we can just cherry pick this file

13:48 <wpwrak> 2) has two branches, 2.1) component experienced excessive stresses in rework, 2.2) component arrived at rework in a bad state (fake, production error, box left in the sun, etc.)

13:49 <wpwrak> lekernel: when the M1 is fully up and running, is there some easy way to force a flash reset ? (without resetting the whole system)

13:49 <lekernel> not at the moment

13:49 <wolfspraul> 0x32 0x34 0x39 0x3A 0x3C 0x40 0x48 0x54 0x55 0x5C 0x61 0x63 0x6B 0x6C 0x77 0x7A 0x7D 0x7F 0x85

13:50 <wolfspraul> 19 boards

13:51 <wolfspraul> they all have a history of d2/d3 dimly lit, cannot reflash or cannot reconfigure. some of them worked before, some not. none have passed all tests.

13:51 <wpwrak> lekernel: P22 (FLASH_RESET_N) is driven high when the M1 is up ? or just pull-up ?

13:51 <wolfspraul> I think we should apply fix2b to those 19 boards, then look at the results

13:52 <wpwrak> wolfspraul: another thing to look for: boards that never had d2/d3 dim but that failed USB JTAG. (if there are any with this combination)

13:52 <wolfspraul> somehow I still cannot imagine all this going back to 'bad' diodes

13:52 <wolfspraul> wpwrak: what do you mean with "failed usb jtag"?

13:53 <wpwrak> that flashing the NOR got stuck

13:53 <wolfspraul> yes they are in this group

13:53 <wolfspraul> I threw them together with the ones that worked before and then failed now

13:53 <aw_> wolfspraul, so that's the next step on 19 boards firstly to apply fix2b?

13:53 <wpwrak> ok, perfect

13:54 <wolfspraul> aw_: ok let's be precise

13:54 <wolfspraul> first we maintain a production focus

13:54 <lekernel> driven high

13:54 <wolfspraul> aw_: you are testing 0x39 now, if 100% is fine and pass, it goes to 'available' state

13:55 <wolfspraul> to be safe, you can write "avail - fix2b" :-)

13:55 <wolfspraul> so we remember that it has a fix2b applied

13:55 <wpwrak> lekernel: hmm, then we either need a way to command a NOR reset. or see if the diode can be tested in-circuit.

13:55 <wolfspraul> aw_: have you finished 0x39 ?

13:55 <aw_> wolfspraul, 0x39 was tested by test image successfully just rendering failed then

13:55 <wolfspraul> rendering is part of the full test program

13:56 <aw_> wolfspraul, yes. so now I fill it as 'avail - fix2b'

13:56 <wpwrak> lekernel: that is, if we even care to separate FLASH_RESET_N from PROGRAM_B :)

13:56 <wolfspraul> great

13:56 <wolfspraul> how many cycles did you do?

13:56 <aw_> 30 times only

13:56 <wolfspraul> ah ok

13:56 <wolfspraul> yes that's enough

13:56 <wolfspraul> so yes, I propose to work on those 19 boards I listed

13:56 <lekernel> we have to separate flash_reset from program_b; the fpga does reset the flash on a software reset

13:56 <wolfspraul> in this way:

13:57 <wpwrak> wait ... 0x39: final crc check and then reflash via USB-JTAG

13:57 <wpwrak> just to confirm that all is well

13:57 <wolfspraul> 1) first, make sure there are no other known bugs on the boards (like usb)

13:57 <wolfspraul> 2) apply fix2b, and also check that the remaining diode is good (if possible)

13:58 <wolfspraul> 3) reflash and run test software and run 10 render cycles (only 1 crc at the end)

13:58 <aw_> wpwrak, 0x39 30 times crc is okay

13:58 <wolfspraul> 4) hopefully set them all to "avail - fix2b"

13:58 <wolfspraul> :-)

13:58 <wpwrak> lekernel: and it wouldn't be happy if the sw reset also causes a reconfig (?) ... or at least we don't want to tempt fate

13:58 <wolfspraul> aw_: no wait

13:58 <lekernel> no, software reset shouldn't reconfig

13:58 <wolfspraul> Werner also wants to reflash again

13:58 <wolfspraul> 0x39

13:58 <wolfspraul> so just run reflash_m1.sh

13:58 <wolfspraul> then boot once to test that it renders, then done

13:59 <aw_> wolfspraul, okay, let's reflash it again. and check crc again and rendering. ;-)

13:59 <wolfspraul> ok, perfect

13:59 <wpwrak> lekernel: okay. maybe we can just probe sw reset. that should be clear enough evidence about the diode's health.

14:00 <lekernel> sw/flash reset stays asserted when the 3 pushbuttons are held, btw

14:00 <aw_> 0x39: reflashing...

14:00 <wpwrak> lekernel: if sw reset did cause a reconfig as well, would this be easy to notice from the outside ? (without scope, just looking at the M1)

14:00 <wolfspraul> btw, do we still want to do the 4.4V reset ic rework?

14:00 <lekernel> yes, it would turn off

14:00 <wpwrak> lekernel: ah, excellent

14:01 <lekernel> wolfspraul, if the current solution works, then no because it takes time

14:01 <wpwrak> wolfspraul: i think it makes sense, because the current reset solution does not guarantee that all the rails are good

14:01 <wolfspraul> I think if extensive testing shows that everything is stable, at least for myself I don't need the 4.4V reset ic rework only because that makes the circuit better conform to the datasheet voltages.

14:01 <wolfspraul> :-)

14:01 <wolfspraul> 2:1

14:02 <wpwrak> wolfspraul: although i don't know if you want to rework or just rc4 :)

14:02 <wolfspraul> well

14:02 <wolfspraul> those are separate things

14:02 <wolfspraul> first I want to make an rc3 of good quality that I can support

14:02 <lekernel> after those boards are out, you can try. but please, not before.

14:02 <lekernel> enough delays

14:02 <wpwrak> i concur

14:02 <wolfspraul> if the only reason is that we noticed an 'out of spec' situation, that's not enough

14:02 <wolfspraul> sorry but the chip has to handle that :-)

14:03 <wolfspraul> since our testing did not show problems

14:03 <wolfspraul> testing results win here, imho

14:03 <wpwrak> i would also prefer not to have it as a rc3 rework, because it introduces the risk of bridging 3V3 and 5V

14:03 <wpwrak> s/as a/as a general/

14:03 <wolfspraul> ok

14:03 <wolfspraul> so I am still cautious about this whole fix2b and diode magic

14:03 <wolfspraul> but we see

14:04 <lekernel> yeah, that too

14:04 <wolfspraul> I need solid test results then we can start selling

14:04 <wpwrak> so my proposal would be to rework one board with 4.4 V at a suitable time, confirm that this doesn't wake any gremlins, and then make it an rc4 feature

14:04 <wolfspraul> oh you bet

14:04 <wolfspraul> we need to look at the gates and second reset ic anyway, for rc4

14:04 <wolfspraul> but that is separate from finishing rc3

14:05 <wpwrak> yes :)

14:05 <aw_> reflashed successfully

14:05 <wpwrak> about the diodes .. where do they come from ? friends in shenzen ? :)

14:05 <wpwrak> aw_: champagne time ! ;-)

14:05 <aw_> CRC is okay

14:05 <wolfspraul> ask Adam about source, I'm not sure whether they are in the bom/wiki

14:06 <wolfspraul> but don't always blame the source, you know how many reasons for problems there can be (you listed some above yourself)

14:06 <wolfspraul> and amazingly, hello murphy, it's always the unexpected one that hits you, no?

14:06 <aw_> rendering done...

14:07 <wolfspraul> ok, enough, 0x39 is 'avail - fix2b'

14:07 <wpwrak> yes, but it still seems odd. adam's visual inspection didn't show any solder bridges. and he measured after unsoldering, which would further remove any bridges (or make existing ones easier to spot)

14:07 <aw_> wpwrak, this diode is the one that BEN used, BEN was produced in China

14:07 <wolfspraul> aw_: did you see the plan #1 - #4 for the 19 boards I selected?

14:07 <wolfspraul> ouch

14:07 <wpwrak> and diodes don't overcook easily. or degrade just like that.

14:07 <wpwrak> heh ;-)

14:08 <wolfspraul> aw_: maybe we get new diodes one of these days ;-)

14:08 <wpwrak> do we have any diode problems in the ben ? :)

14:08 <aw_> but this part was original designed by Taiwan company though...so i got them while i producing AVT2. ;-)

14:08 <wolfspraul> no

14:08 <wpwrak> makes me think of charging problems ...

14:08 <wolfspraul> anyway pinpointing the real root cause is difficult

14:08 <wolfspraul> aw_: plan! :-)

14:08 <wolfspraul> did you see my steps #1 - #4 above?

14:09 <wpwrak> i'm worried about D16

14:09 <aw_> wolfspraul, yes saw #1 - #4

14:09 <wolfspraul> I propose this for the 19 boards I selected

14:10 <wpwrak> well, if what happens if D16 transmutates into a 0R is just that a "sw reset" powers down, that won't be a catastrophic failure. so this could be considered an acceptable risk

14:10 <aw_> okay

14:10 <wolfspraul> 0x32 0x34 0x39 0x3A 0x3C 0x40 0x48 0x54 0x55 0x5C 0x61 0x63 0x6B 0x6C 0x77 0x7A 0x7D 0x7F 0x85

14:10 <wolfspraul> and Werner is right - how do we make sure D16 works well?

14:10 <wolfspraul> aw_: any ideas?

14:10 <wpwrak> so all that would need to be done about D16 is to test whether it works now (procedure TBD), and if it does, go ahead. else, replace, etc.

14:10 <wolfspraul> can we order new diodes locally in Taipei? (i.e. tomorrow)

14:11 <aw_> wpwrak, how about I measure D16's forwarding / reversing voltage while I test those 19pcs firstly

14:11 <wolfspraul> it seems Werner and Sebastien think that is not possible or worthless

14:11 <wpwrak> aw_: can you measure D16 in-circuit ?

14:11 <aw_> wpwrak, yes. i just checked in-circuit with D16 on 0x39

14:12 <wpwrak> okay. then that's probably good enough.

14:12 <wpwrak> we can do fancier tests, but they also have more moving parts.

14:13 <aw_> wpwrak, but I do really don't know why 0x39 have passed in power-on sequence, since I measured them before I reflashed after first time reworks

14:13 <wolfspraul> you measured both diodes earlier?

14:13 <wpwrak> aw_: yes, the whole thing is very strange

14:14 <aw_> wolfspraul, but fro your analysis above is that i could much probably let diode like as short enough while soldering

14:14 <aw_> sorry to wpwrak

14:14 <wpwrak> aw_: could the diode have experienced mechanical stress from the wire going around the board ?

14:14 <wolfspraul> well

14:14 <wolfspraul> I think the next step is fix2b on those 19 boards

14:15 <wpwrak> will the wire also be removed ? (as part of fix2b)

14:15 <wolfspraul> of course we are not suicidal. if after 3-4-5 we find out it's not right, we pause to think.

14:15 <wpwrak> hehe :)

14:15 <wpwrak> i hear a "not yet" :)

14:15 <wolfspraul> wpwrak: wire removed? that's not clear?

14:15 <wolfspraul> aw_: will be wire be removed or not?

14:15 <wolfspraul> :-)

14:16 <aw_> wpwrak, so i would think it's a component degenerating by my soldering its two terminals(one is program_b soldering, the other is init_b soldering), so TWICE soldering on diode. ;-)

14:16 <wolfspraul> I thought that would be so clear it's not worth mentioning, now Werner is asking :-)

14:16 <aw_> wolfspraul, I'll remove wire too.

14:17 <aw_> wpwrak, diode is in reel I have on hand now

14:17 <wpwrak> (wire) alright. no FM antenna ;-)

14:17 <wpwrak> aw_: did you solder all those diodes ? or did they do some of them at the SMT fab ?

14:18 <wolfspraul> I think slowly I can start as a daredevil PE, maybe in China.

14:18 <wolfspraul> I wouldn't hesitate to keep that line running...

14:18 <wolfspraul> he he

14:18 <wolfspraul> and over time I might even find out a bit about all this strange soldering and circuit stuff

14:18 <aw_> wpwrak, D16 was mounted by SMT factory, I soldered all program_b/init_b diode. ;-)

14:18 <wolfspraul> after some years in China maybe I can upgrade to Taiwan

14:18 <wpwrak> aw_: maybe your soldering iron is running too hot ?

14:18 <wolfspraul> don't mention that

14:19 <wolfspraul> it's probably on max

14:19 <aw_> wpwrak, set 325 degree

14:19 <wpwrak> pheeew ....

14:19 <wolfspraul> :-)

14:19 <aw_> my max can be 425

14:19 <wpwrak> yeah, i sometimes go a lot higher too. 370 C if a component is really acting up

14:19 <aw_> but this you know , soldering in even less than 1 second on diode terminal. ;-)

14:19 <wolfspraul> my 2 hands cannot count the number of foreigners that came into our Taipei labs and eventually to me complaining that they ruined their boards because they use the irons with 'crazy hot' settings that the locals had flying around there...

14:20 <wpwrak> yeah, if you're quick, then hot should be fine

14:20 <wpwrak> ;-))))

14:21 <aw_> wpwrak, well...not to explain though...it's real great that you caught this, so tomorrow, i'll still measure it's forwarding/reversng voltage in-circuit after soldering.

14:21 <wolfspraul> aw_: I think we have a solid plan for tomorrow

14:22 <wolfspraul> after the first few boards, we double-check the results

14:23 <aw_> wolfspraul, we may classify them tomorrow later, you know their failures are different, but we can get results tomorrow. ;-)

14:23 <wolfspraul> yes but I grouped carefully

14:23 <wolfspraul> those 19 should be interesting

14:23 <wolfspraul> I do not expect all 19 to work

14:23 <wolfspraul> but I want to see how far this fix2b can take us

14:23 <wolfspraul> since we are planning to apply it to all 90 boards (!)

14:24 <aw_> alright

14:26 <wolfspraul> aw_: if you see anything wrong with the plan, correct it

14:26 <wolfspraul> if are much closer to the real problem

14:26 <wolfspraul> for example if you want to finish the usb fixes first, do it

14:26 <wolfspraul> you must keep a calm head and overview...

14:29 <wolfspraul> btw, I find it amazing that we first thought we need this diode (and long wire), but now it seems we don't?

14:30 <wolfspraul> how is that possible? are we sure we don't need it? :-)

14:30 <aw_> to apply fix2b, I'll select them to do whole all items again to make sure my removal of diode and wire are good

14:30 <aw_> also need to clean after reworks though.

14:30 <wolfspraul> ok

14:31 <aw_> so far now no found steps wrongly in your steps

14:32 <aw_> but I would refill those results in whole one row. hope these boards are good news tomorrow.

14:34 <wolfspraul> ok

14:36 <wpwrak> wolfspraul: i think the long wire was lekernel getting lost in the twisty little maze of xilinx documentation, and the place in the docs that most specifically refers to this function states reasonably clearly that we don't need to worry about init_b. but let's see if sebastien changes his mind :) if xilinx docs are inconsistent about this issue, we may still need to do something. but for now, it seems that init_b (as of fix2) doesn't ne

14:36 <wpwrak> ed to be connected.

14:37 <wolfspraul> ok got it - thanks!

14:37 <wolfspraul> well then, tomorrow is another interesting day in rc3 history

14:38 <aw_> wiki 0x39 notes updated

14:38 <aw_> wpwrak, thanks for your great helps tonight. ;-)

14:39 <wolfspraul> btw - ALL PARTS are now in Taipei!

14:39 <wpwrak> aw_: thanks for doing all those experiments ! :)

14:39 <wolfspraul> everything

14:39 <wpwrak> whee ! :)

14:39 <wolfspraul> all accessories, box, labels, cases, leaflet, stickers, everything

14:39 <wolfspraul> doesn't make Adam's life easier unfortunately

14:39 <wolfspraul> :-)

14:39 <wpwrak> henceforth, August 16 shall be celebrated as convergence day in the empire of Qi :)

14:40 <wolfspraul> wait wait

14:40 <wolfspraul> I want to see this in the test results

14:40 <wolfspraul> what I see there now is still a big mess, and some hope

14:41 <aw_> go on

14:41 <wolfspraul> no that's all

14:41 <wolfspraul> 'wait wait' for Werner's celebration

14:41 <wolfspraul> :-)

14:41 <aw_> okay. ;-)

14:42 <aw_> thanks again and night!. ;-)

14:43 <wpwrak> naw, celebrate convergence day today, maybe diode day tomorrow :)

15:14 <Fallenou> lekernel: I copied CVS HEAD cpukit/zlib/zconf.h.in to a fresh git clone of milkymist rtems, it builds properly, do I commit & push ?

15:14 <lekernel> no, I will try a proper CVS upgrade before

15:14 <Fallenou> ok, should solve the issue

15:14 <lekernel> if it does not I will use your patch

20:54 <lekernel> is done writing a milkymist article for xcell

20:57 <lekernel> since it seems open source people do not care about/are afraid of fpga's, let's see if fpga people care about open source *g*

20:57 <roh> lekernel: the opensource people care about fpgas but are not motivated to fight against windmills

20:58 <kristianpaul> they just care abou their fancier IDEs and writing in vhdl ;)

20:58 <kristianpaul> (commented biased from my side of course)

20:59 <lekernel> roh, ?

21:00 <lekernel> you mean the proprietary tools, right?

21:00 <kristianpaul> also i wonder what they actually scared of, i mean, i had heard floss related people talking about opencores and openscarc as the path for "fpga freedom"

21:00 <lekernel> just fucking do them

21:00 <lekernel> GCC, for all its faults, was still great work for its time

21:00 <roh> lekernel: opensource means open toolchains. without that you need open docs to write some. thats why opensource is great on basically all cpus/soc with 'available' (not neccessary fully open) documentation and not on stuff you need to reverse first

21:01 <lekernel> oh, altera published tons of stuff lately

21:01 <roh> see nvidia drivers. same problem. without open docs/specs supports sucks and isnt anywhere near 'production grade'

21:01 <kristianpaul> roh: but thats a mental barrier you dont need a floss compiler to start coding or doing something

21:01 <lekernel> and for xilinx you have xsl

21:01 <lekernel> xdl

21:01 <roh> kristianpaul: wrong.

21:01 <kristianpaul> why? and in wich part

21:01 <kristianpaul> look mm1 soc

21:02 <kristianpaul> yes it uses XST, but you can see testbench uses cver and iverilog

21:02 <roh> kristianpaul: sure its kinda mental, but as somebody who worked with commercial environments and or dependent on parts of it, i can tell you: never ever again. not worth my lifetime.

21:02 <kristianpaul> step by step

21:02 <kristianpaul> roh: my first words were about IDEs remenber? :)

21:03 <lekernel> can we stop here? if you want free FPGA tools, then write them. period.

21:03 <roh> kristianpaul: so my point is: people WILL and DO opensource in anything which makes them able to solve their problems. using binary toolchains is a showstopper. ide's do not count.

21:03 <kristianpaul> fpga = hardware thats scare more than one for sure :)

21:03 <roh> lekernel: sure. give me all the needed specs and docs and a warranty that i do not need to start over when xilinx has a bad morning and does a new chip with everything different.

21:04 <kristianpaul> no roh , thats jsut conding practives

21:04 <roh> kristianpaul: no. i think thats false. people are not scared by hardware at all.

21:04 <kristianpaul> pracices, as whe you mix you code with dark/propietary libs

21:04 <kristianpaul> could be--..

21:04 <kristianpaul> may be they need a kick start?

21:05 <kristianpaul> more tutorials, friendly people and such

21:05 <kristianpaul> as Fedora in software side i mean

21:05 <kristianpaul> well, just another guess..

21:05 <lekernel> roh, http://rapidsmith.sourceforge.net/, altera quip, debit, etc.

21:06 <roh> kristianpaul: my point is: in MY (and propably most other opensource peoples) perspective (which comes from experience) its not helping but slowing down development if your tools are either broken, closed, costly or badly documented.

21:06 <roh> lekernel: and thats completely free and production grade? can you build a mm1 with these tools?

21:07 <kristianpaul> well i always heard people saying bad words about gcc and still a sucess :)

21:07 <kristianpaul> s/sucess/been used

21:07 <kristianpaul> :p

21:07 <lekernel> roh, you asked about info about fpga internals. so here they are.

21:08 <kristianpaul> i think zumbi too ;)

21:08 <roh> my point is: opensource people USE opensource tools to create more sw/hw . they do fix bugs in tools here and there but they usually are not motivated enough to do N projects but ONE. means developing a toolchains is not their interrest or something which they find interresting.

21:08 <kristianpaul> Was a small discusion last day about difference bitween bitstream from different vendors

21:08 <roh> lekernel: i did ask for DOCUMENTATION. not some weird java tool.

21:08 <kristianpaul> documentation about?

21:08 <lekernel> then they should stop using x86 CPUs, Intel DRAM controllers and what not :-)

21:09 <roh> dont get me wrong. i try to explain the whys and not the 'could be done's

21:09 <lekernel> well I already have had that discussion. it's boring. free tools = jfdi.

21:09 <roh> lekernel: x86 is actually quite well documented and understood (compared to the different fpga archs)

21:10 <roh> lekernel: yes. and you need to understand that most b

21:10 <roh> people are NOT willing to waste their lifetimes writing compilers and such. thats a VERY small number of people who find that interresting at all.

21:11 <kristianpaul> jfdi = just find who can do it ;)

21:11 <roh> its boring technology which is neccessary but not somehting to use your time on for most. its a tool. like a wrench. its there. use it. if you need to buy a complicated one or expensive one you will use a screw which can use the free and well known tool and not the expensive one.

21:12 <kristianpaul> roh: is really unfair compar and asic (x86) with a FPGA

21:13 <roh> kristianpaul: its not about fair. its about reality.

21:13 <roh> kristianpaul: if you can solve your problem with an fpga or some soc with gcc support, people WILL choose the latter. even if the soc itself is blackbox.

21:14 <roh> as long as there are interface specs/docs people are fine with that.

21:14 <kristianpaul> yes, of course (solve problems)

21:14 <lekernel> there are interface specs: the fpga does what the standard verilog code tells it to do :-)

21:14 <roh> lekernel: well.. only with things outside the hw (binary tools)

21:15 <kristianpaul> nocks lekernel

21:15 <lekernel> that's not fundamentally different from a CPU scheduler or a DRAM control algorithm

21:15 <kristianpaul> i disagree last words from you roh , as for example you can have basis plaform to start with

21:15 <kristianpaul> is my point about comparing fpga with asics

21:16 <kristianpaul> at the end you need a hardware that works

21:16 <kristianpaul> like coming mm1 rc3 it seems :)

21:16 <roh> also its a question of complexity. an fpga costs you not only a lot of money and extra (complicated) tools and code/thinking it also needs quite some overhead to work.

21:17 <kristianpaul> an yes as wpwrak pointed before, and floss sinthesis may blow out lots of barriers

21:17 <kristianpaul> but i think still way to do around verilog, tests benches, automated soc buils scripts?

21:17 <kristianpaul> and lots of other fields

21:18 <lekernel> in this project, the fpga costs less than half of the case

21:18 <kristianpaul> roh: lazy! ;)

21:18 <roh> on a typical mcu nowadays you need mostly caps and a powersource. they can even run without crystals and such. have internal flash, easy to use isp/debug possibilies and some decent amount of ram. fpgas still feel like the 8051 times of mcu.

21:18 <kristianpaul> roh: just kiding of course :)

21:18 <kristianpaul> all take time, and yes fpga world have its own learning curve :)

21:18 <roh> kristianpaul: heh.. i am just not interrested enough by stuff i find boring details when there is so much more interresting real problems to solve out there.

21:19 <kristianpaul> sure, is a respectfull position

21:19 <roh> lekernel: the chipcost itself is negligable. the cost for development of code and support is so much more than for a mcu that whoever can, WILL avoid using one.

21:20 <roh> negligable atleast for our amounts of sales. can change if you sell 5 digits or more.

21:20 <kristianpaul> wolrd is already done why cares to do more, when there is a big sea to navigate :)

21:20 <wpwrak> interesting discussion. is there a topic/direction or is this just the IRC equivalent of a bar brawl ? ;-)

21:20 <kristianpaul> i think last :)

21:21 <kristianpaul> and is monday :)

21:21 <wpwrak> lekernel: (jfdi) i hope that doesn't mean you've lost interest in continuing with llhdl

21:21 <roh> wpwrak: *g* .. i am not trying to dicuss something. i am trying to explain a pov i seem to share with loads of other devels from the mcu and opensource side.

21:21 <kristianpaul> or event interesting in continuing milkymist project?? :-(

21:21 <kristianpaul> interest**

21:21 <lekernel> fuck all those developers, that's why I'm stopping hacker conferences now and write for xcell instead

21:22 <lekernel> (among other things)

21:22 <roh> so its not 'the opensource hackers are not interrested in fpgas'. they are just annoyed enough by devices with nonfree toolchains by experience to avoid them at ALL cost.

21:22 <kristianpaul> be friendly and they will come :)

21:22 <kristianpaul> friendly and skilled is powerfull combination

21:25 <roh> lekernel: maybe you can explain what happend that you are annoyed?

21:26 <kristianpaul> if somedy can say jfdi it because at least know how to do it but dont want, so you can provides guideliness for others dot it, even if you dont doaÂ Â single line of code

21:26 <kristianpaul> no just "fuck" then away... :/

21:28 <lekernel> roh, as I said: many hacker/open source people are afraid with this stuff. they obviously prefer blinky-LED arduino gadgets instead. that's why i'm slightly annoyed. it's all.

21:30 <wpwrak> lekernel: so what are your plans with llhdl ? was that an affirmative silence, before, i.e., have you lost interest in it ?

21:30 <roh> lekernel: maybe you need to differenciate between the different levels of development. you do the basics on another level

21:31 <lekernel> wpwrak, no, just try to speak to different people about it.

21:31 <roh> most want to solve their problem when developing something as cheap and simple as possible. not as free as possible. and in the end it doesnt matter if the chip is from xilinx or whatever (nxp? atmel? whoever builds soc), you WILL buy a 'chip' which is closed.

21:31 <wpwrak> lekernel: ah, good :)

21:32 <roh> then the equation is 'free tools' or 'complex closed tools' atm. and THATS what matters.

21:32 <wpwrak> roh: i think you're barking up the wrong tree

21:33 <wpwrak> roh: you should complain to xilinx, alteros, lattice, etc. convince them that they could sell more fpgas if they open their tools

21:33 <roh> wpwrak: i know. i am trying to explain. i also want free fpga tools. but i also do not want to develop one myself if i can solve my problem in MUCH less work.

21:33 <wpwrak> roh: lekernel is already working in the "right" direction

21:34 <kristianpaul> good point

21:34 <roh> wpwrak: i am not complaining. i am describing what the reasoning of developers is what tools to learn to use and where to use their (often quite limited) time

21:34 <kristianpaul> well lattice alread move forward a bit, time to push xilinx.. but how?

21:34 <wpwrak> roh: well yes, that much is pretty obvious, isn't it ? :)

21:34 <roh> wpwrak: sure its the right direction. dont get me wrong. i fully support the way we are going.

21:35 <roh> wpwrak: i learnt that every obvious which needs a transformation in thinking is already something which needs expaining from time to time.

21:36 <wpwrak> roh: i think you're addressing the wrong audience ;-) do you really think anyone here _likes_ closed tools ?

21:36 <roh> opensource devels see chip vendors just as 'producers of something which needs tools too expensive to buy yourself' and are happy that the numbers sold make them cheap

21:37 <lekernel> kristianpaul, (guidelines) you can see that i'm doing exactly that http://www.ohwr.org/projects/ohr-meta/wiki/OHWorkshop

21:37 <roh> chipvendors sometimes understood that (nxp, atmel) and sometimes not (xilinx)

21:38 <wpwrak> roh: i think FPGA jargon gives a pretty strong hint of how people in that biz think. they don't have "software", they have "intellectual property" ;-)

21:38 <roh> wpwrak: well.. duh. you see their fault? :)

21:38 <wpwrak> roh: the typical FPGA customer wants to be closed. we're the exception. the typical EE is happy with closed tools on windows. and so on

21:39 <roh> wpwrak: thats because fpga live from a 'no other way to do' market not from a broad spectrum of possible users.

21:39 <wpwrak> roh: i wouldn't look at them for open tools. at the moment, there's little motivation for them. and it would probably very difficult for them to open their tools, because they may not even own all the necessary rights to do this.

21:40 <roh> also commercial devels choose a mcu if possible. simply because the amount of money they need to pay their devs to 'make it work' is much less than for a fpga based project.

21:41 <wpwrak> roh: fpgas target a higher end market, yes. if an MCU will do, you don't need an FPGA.

21:42 <roh> wpwrak: every usecase done by a fpga would be done by a specialized mcu if there would be the number of users to make it worth doing the 'chip' for it. and that also happens from time to time. see high end routers.

21:42 <kristianpaul> lekernel: (ohwr) oh, i havnt noticed it :), looks evry interesting, pleaser record your talk for the far away people :)

21:42 <kristianpaul> s/pleaser/please

21:42 <roh> stuff cisco did in fpgas 10 years ago is now done by a 3$ silicon from realtek. cisco still uses fpgas.. for stuff where there are no specialized chips for (e.g. routing engines)

21:44 <kristianpaul> roh: are you plaing to start that copyleft layer 3 network switch? :-)

21:45 <roh> kristianpaul: no. just using that example to show why people use fpga and where in commercial projects. it seems to be a 'noc soc buyable. last resort to make a product build-able at all' case

21:46 <lekernel> roh, I never said we never will manufacture a milkymist asic. in fact, a large part of the current code should be portable to asics.

21:46 <wpwrak> roh: you're overlooking the possibility of using an FPGA for more than some form of ASIC prototyping. i see great potential in partial reconfiguration, adapt the hw for your code. that's a domain that's still pretty much untouched. once synthesis is out in the open (cf. llhdl), work can start in this direction

21:46 <lekernel> (i mean verilog)

21:47 <zumbi> wpwrak: I was involved in a project like that, reconfigurable FPGA for SDR

21:47 <roh> wpwrak: nobody cares abour reconfigurable hw outside of lab equipment or military use to be fair. atleast nobody is willing to pay the extra money that 'feature' does cost.

21:47 <zumbi> I wish now I could assist OHR conf

21:47 <wpwrak> zumbi: how far did you get ?

21:48 <lekernel> everything is possible, that, and free FPGA tools, we just need to get down to it (which also involves generating lots of sales for the asic thing) :p

21:48 <roh> wpwrak: maybe that can change with free tools, yes. i sure hope so.

21:48 <zumbi> wpwrak: there is an open project, let me search the link

21:48 <zumbi> wpwrak: http://flexnets.upc.edu/trac/

21:48 <wpwrak> roh: (nobody cares) well, there's a good amount of basic research that needs doing first ;-)

21:48 <roh> also its a question of the power budget. correct me if i am wrong, but afaik an fpga doing the exactly same as an asic based on the same design will eat more watt

21:49 <wpwrak> roh: step 1: unlock the secret. step 2: learn. and so on ;-)

21:49 <zumbi> wpwrak: I am looking forward for newer Zynq7000 devices

21:50 <wpwrak> zumbi: (flexnets) so you design "IP blocks" in the traditional way and then connect them to each other ?

21:50 <zumbi> wpwrak: right

21:51 <wpwrak> zumbi: what i have in mind would go a little further: generate code and hardware description from the same source

21:52 <zumbi> wpwrak: it adapts resources to users, lets say you got a dual BTS with 3G/WiMAX, depending they users you got, you reconfigure the BTS to allocate more resources to the network with more users

21:52 <wpwrak> zumbi: e.g., you could write a - maybe C - program that implements some feature at a very low level, bit-banging and so on. then the "compiler" would identify functions that can be synthesized in hardware.

21:53 <wpwrak> okay, but it's still at the level of modules

21:53 <wpwrak> of course, because you need the heavy proprietary synthesis software to make your bitstreams :)

21:55 <zumbi> sure, while free tools sounds attractive, isn't there a free HDL synthesizer done by one of the fellows here

21:55 <wpwrak> zumbi: maybe you mean lekernel's llhdl ?

21:56 <zumbi> yep

21:56 <wpwrak> i think llhdl is a great start. even if it will be relatively primitive, once the whole process is implemented with free tools, it will be much easier to improve the tools.

21:57 <kristianpaul> nice, Makefile-driven HDL flow (Pawel Szostek).

21:57 <zumbi> wpwrak: were you trying to hint a compiler/synthesizer?

21:57 <wpwrak> the pioneering work is always the hardest.

21:57 <wpwrak> zumbi: "hint' ?

21:57 <zumbi> wpwrak: does such tool exist?

21:57 <wpwrak> kristianpaul: death to all IDEs ! ;-)

21:58 <zumbi> I have tried Makefile-driven HDL but failed :/

21:58 <zumbi> once they upgrade IDE

21:58 <wpwrak> zumbi: i don't know. at least nothing widely known. maybe some research projects under NDA, etc. but such secret things usually don't go very far

22:00 <wpwrak> we saw this in operating system research. before the Free unices, there were some projects that implemented kernel changes as binary modules for SunOS. Sun were "nice" to academia and let them have the sources, under NDA, of course. and they allowed them to distribute their binaries.

22:00 <wpwrak> but that was still difficult to use, and the sources were still closed. so such things weren't really useful.

22:01 <wpwrak> now, fast-forward a few years. no kernel research would have much credibility in the days of open source unix if it didn't come with a patch.

22:02 <wpwrak> and every once in a while, good work does find its way from academia into real life rather quickly

22:02 <wpwrak> e.g., things like RCU and various TCP and scheduling improvements were integrated into Linux fairly quickly. and they're substantial improvements of the art.

22:04 <wpwrak> of course, not every linux patch that tweaks the scheduler or TCP is worth a PhD, but i think it's safe to say that research that seeks applicability is in a considerably better shape today than in the dark age of only closed source operating systems (omitting "research" operating systems that had very little scope)

22:06 <wpwrak> i hope very much to see the same happen when it comes to FPGAs

22:19 <wpwrak> anyway, past 7pm, high time for breakfast :)

23:06 <mw|mobile> have anyone revceived my mail to the ml?

23:14 <mw|mobile> two times.. ok gn8

23:30 <wolfspraul> [reading the backlog] I just checked the m1 box whether it still says 'fpga' outside, and yes - it does. Sebastien told me a few weeks ago that he thought we can remove it, but for some reason I didn't even though now I thought that we had... :-)

23:31 <wolfspraul> Sebastien was totally right I think now, we should have removed it. Next batch...

23:32 <wolfspraul> fpga is a divisive term, too many people attach too many different experiences and feelings to it. Has nothing to do on the outside of a video synthesizer box.