#milkymist on 2011-08-15 — irc logs at freenode.irclog.whitequark.org

00:09 <Fallenou> I just discovered it's easy to generate a .pkg or a .dmg once you've written a Portfile (MacPorts)

00:09 <Fallenou> just a one-line command

00:09 <Fallenou> So I generated lm32-rtems-binutils.dmg and lm32-rtems-gcc.dmg

00:09 <Fallenou> will upload it somewhere on the wiki I gues

00:09 <Fallenou> guess

00:11 <zumbi> is is possible to reverse engineer a FPGA botstream?

00:11 <Fallenou> yes it is

00:11 <Fallenou> some companies are generating netlist from bitstreams

00:11 <zumbi> yes, I heard that

00:12 <zumbi> but which tools are needed?

00:12 <Fallenou> that's why usually bitstreams are encrypted

00:12 <Fallenou> to prevent you from reverse engineering it

00:12 <Fallenou> I have no idea

00:12 <Fallenou> I never tried such a process

00:12 <Fallenou> But I guess you won't find tools to do that easily out there

00:12 <zumbi> I thought so... :/

00:13 <Fallenou> if such tools exist, they must be jealously kept by those who wrote them

00:13 <Fallenou> but I don't know, never really searched for it

00:13 <Fallenou> maybe lekernel knows such softwares

00:13 <Fallenou> ask him when he will be back :)

00:13 <zumbi> ok, thanks

00:13 <Fallenou> he is the FPGA expert

00:18 <wolfspraul> how close in format are the bitstreams between different fpga vendors?

00:44 <kristianpaul> zumbi: in the meantime you can check http://lekernel.net/blog/2010/04/fpga-reverse-engineering-challenge-hackito-ergo-sum/

00:44 <kristianpaul> jsut in case

00:44 <kristianpaul> ulgic website had this information, dunno what happened..

01:03 <zumbi> kristianpaul: hey! yes I found ulgic site, but nothing there.. but it looks like it had something in the past

01:03 <zumbi> http://web.archive.org/web/20080313185423/http://www.ulogic.org/cgi-bin/gitweb.cgi?p=debit.git;a=summary

01:04 <zumbi> wolfspraul: each vendor has different bitstream afaik

01:06 <wolfspraul> yes different, but I'm wondering whether the fundamentals are different, or more 'how' different they are

01:06 <zumbi> i don't really now

01:07 <zumbi> but i suspect those differ quite a bit

04:41 <aw> wpwrak, about the A4809E3R-440DN, 4.312-4.488 V; bad that we need to search compatible part in digikey or muser for easier sample orders.

06:23 <wolfspraul> aw: in the future, we choose components preferably from standard digi-key parts unless there is a very good reason to not do so

06:24 <aw> wolfspraul, okay

06:25 <aw> including Mouser? or NO?

06:25 <wolfspraul> also OK. _COMMON_ part, that's the key

06:26 <wolfspraul> the choice of the AIC reset part looks wrong to me. in hindsight we are always smarter but I see nothing that's good about it.

06:26 <wolfspraul> we even had to buy a whole reel of 3000 parts for 270 USD. all wrong ;-)

06:26 <wolfspraul> that alone costs 3 USD / board for a run of 90, and 2910 parts forever in our 'archive of bad sourcing decisions'

06:27 <wolfspraul> if we are lucky, we find a matching part from another manufacturer, but I won't hold my breadth

06:28 <wolfspraul> there's a lot of reset ics, but once you go through the exact requirements we have here it shrinks fast (I did a little digikey searching...)

07:21 <wpwrak> hah, i was wondering how that part ended up in M1 :) and it'd say the best parts come from digi-key and at least one other source :)

07:22 <wpwrak> wouldn't do if some shiny new parts was previously on digi-key's archive of bad sourcing decisions ;-) well, they tell you when it becomes non-stocked, so i guess that's a warning

07:38 <wpwrak> hmm, what's the maximum "5V" voltage the chip needs to survive ? are 6 V enough ?

07:38 <wpwrak> (the A4809 goes up to 12 V)

07:40 <wolfspraul> 6V sounds enough (a bit more would probably be better though, I assume this is coming directly from the power adapter?)

07:41 <wpwrak> directly after L10, so after the protection circuit, if that one is still around (not sure what the status is there, i remember you have some problems with it)

07:41 <wpwrak> s/have/had/

07:43 <wolfspraul> don't understand

07:43 <wolfspraul> what are you getting at?

07:44 <wpwrak> weren't there some issues with the protection circuit causing troubles ? or are they resolved now ?

07:44 <wpwrak> something like bad beads

07:44 <wpwrak> or a bad fuse or such

07:45 <wpwrak> i don't remember the details. only that some parts were removed. but i don't know if this applies to rc3.

07:46 <wolfspraul> all problems turned out to be faulty measurements

07:47 <wpwrak> oh, cool :) very good. then 6 V should be plenty :)

07:51 <wpwrak> how about this guy ? http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=APX803-44SAG-7DICT-ND

07:51 <wpwrak> a bit power-hungry in comparison, but who cares

07:53 <wolfspraul> oh wow, taht is the same one I found earlier, I guess my skills do slowly raise a bit up from zero...

07:53 <wolfspraul> it says 140ms 'minimum' reset timeout

07:53 <wolfspraul> what does that mean? the current one has 200 ms

07:54 <wolfspraul> also I wasn't sure whether the pins are at the right places, can it be dropped on the existing rc3 footprint?

07:54 <wpwrak> look a the range. seem to be a very fuzzy parameter. it's nominally 200 ms, too

07:55 <wpwrak> (pin-compatible) as far as i can tell, yes. same size, same pin assignment.

07:55 <wpwrak> this one looks like a second source: http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=576-3834-1-ND

07:55 <wpwrak> also exists in a slower variant ;-) (1.12 s)

07:57 <wolfspraul> we can just buy several at once to try some theories, if that helps

07:58 <wpwrak> that may not be a bad idea. something for the R&D lab :)

07:58 <wpwrak> SOT-23 for such a part seems mighty big, though

07:58 <wpwrak> let me run a package comparison ...

08:02 <lekernel> sot-23 is fine... that's what being used atm

08:02 <wpwrak> hmm no, sot-23 seems to be the most common choice

08:02 <lekernel> so there is space for it

08:02 <lekernel> and it's easier to rework in case of yet another fuckup

08:03 <wpwrak> lekernel: yes, i was looking for that makes sense to stock for future R&D

08:03 <wpwrak> of course :)

08:06 <wolfspraul> aw: you there? before you reflash your next board, can you ping us here? then we can try to force USB into full-speed mode as Werner described

08:07 <wolfspraul> just wait until the next time you need to reflash, then we do it...

08:08 <aw> wolfspraul, alright.

08:09 <wpwrak> 3rd candidate: http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=MAX6348UR44%2BTCT-ND

08:09 <wolfspraul> wow is that expensive

08:09 <wolfspraul> the first one is best

08:10 <wpwrak> well, 3rd source :)

08:10 <wolfspraul> of course for samples we can buy a few

08:10 <wolfspraul> I'm wondering how you can sell something that's 5 or 10 times more expensive than a competitor that can be used as a drop-in replacement

08:10 <wpwrak> (buy maxim) i wouldn't bother. they're for the "due diligence" appendix ;-)

08:11 <wolfspraul> maybe they have some very outstanding performance parameters that some customers need?

08:11 <wolfspraul> or tolerances? or some customers just totally trust their brand?

08:11 <wpwrak> maybe the military likes then ? :)

08:12 <wolfspraul> ok, some old government contracts or other large bureaucratic customers keeping those parts alive? another option

08:13 <wolfspraul> the diodes inc. one is ca. 14 cents / 1k, the Maxim one 1.35 USD / 1k

08:13 <wolfspraul> almost 10 times more

08:13 <wolfspraul> interesting

08:15 <wpwrak> maybe it's because they have such a large choice of parameter and output configurations

08:16 <wpwrak> of course, for all we know, AIT may have a lot more. that data sheet alone could "generate" something like 700 different parts.

08:17 <wolfspraul> AIT parts were used on Ben/AVT

08:17 <wpwrak> (if all specified part number combinations really exist, which seems unlikely)

08:17 <wpwrak> aah, that's where it comes from :)

08:17 <wolfspraul> yes, I also wondered :-)

08:17 <wpwrak> it had that "friends from taiwan" feeling to it :)

08:18 <wpwrak> like so many of those parts we had in openmoko. without data sheets, no second source in the known universe, etc. :)

08:18 <wolfspraul> we can't even say much about the part or manufacturer, but us being such a little guy with so much design verification and changes all the time, it's a difficult source

08:18 <wpwrak> and of course, the company dead before openmoko :)

08:19 <wolfspraul> once you are making large quantities of whatever all the time, they may be the best source of all

08:19 <wolfspraul> who knows

08:19 <wpwrak> you can always switch back once you're sure

08:19 <wolfspraul> in large quantities datasheet availability doesn't matter

08:19 <wolfspraul> oh yes, definitely

08:19 <wpwrak> there's probably great potential if penny-pinching parts

08:20 <wpwrak> each cent you save is a million dollars once you reach 100M+ quantities :)

08:20 <wolfspraul> what matters is that your source can follow your forecast flexibly, that the quality of their parts is stable, that you have a good sales contact for problems, etc.

08:20 <wolfspraul> but at our quantities and level of uncertainty, that's all pretty much the last thing we worry about :-)

08:21 <wpwrak> what's what we dream of worrying about ;-)

08:23 <wolfspraul> ok so those 3 reset parts are all the same idea, should we buy a few of each? anything else to add?

08:23 <wolfspraul> I understand that this fix is surely a fix, since with the 2.6v reset ic we are out of spec. so the fix is correct in any case. the unknown is whether it fixes the flash corruption.

08:24 <wolfspraul> if it does - nothing else to worry about. if it does not - then what?

08:25 <wpwrak> if it doesn't, then it may be a sw or fpga problem. e.g., sending out spurious transactions

08:26 <aw> new steps:1. insert DC jack

08:26 <aw> 2. middle button

08:26 <aw> 3. wait for booting, wait for render, let it render 30 seconds

08:26 <aw> 4. unplug DC jack

08:26 <aw> 5. insert DC jack

08:26 <aw> 6. press middle button but then run the test software over jtag serial

08:26 <aw> 7. run the test software only until the CRC check is finished, and record the results

08:26 <aw> 8. if the CRC check fails, abort the render cycles here

08:26 <aw> 9. if the CRC check passes, unplug DC jack

08:26 <aw> 10. go back to step #1

08:26 <wpwrak> i would only get the ~200 ms from diodes and the 1.12 s from micrel

08:26 <wolfspraul> 1.1s ?

08:26 <aw> now 0x7c: is available. hope that we can run into a flash problem occurred soon

08:27 <wolfspraul> aw: you ran 10 render cycles with crc checks on 0x7c?

08:27 <aw> yes

08:27 <wolfspraul> ok

08:27 <wolfspraul> remember when you do the next flashing, ping us here

08:27 <aw> hope from now on can catch flash problem then dig into

08:27 <wolfspraul> for the usb full-speed thing

08:28 <wpwrak> aw: (new steps) sounds good. i wouldn't call it a "render cycle", though :)

08:28 <aw> okay. ping guys. ;-)

08:28 <wolfspraul> ah ok

08:28 <wolfspraul> :-)

08:28 <wolfspraul> let's see (opening werner's instructions :-))

08:28 <wolfspraul> aw: get the board ready, plug usb cable into your notebook as usual

08:29 <wolfspraul> after connecting the cable, run 'dmesg'

08:29 <wolfspraul> in the last few lines, you should see something like "usb 2-1: new high speed USB device [...]"

08:29 <wolfspraul> do you see that?

08:30 <wpwrak> (nor corruption analysis) i think we'll know more about this when we get better data from the crc experiment. e.g., whether there are patterns in where and when it strikes.

08:30 <aw> wolfspraul, what does this mean? for each board or when meet "next flashing"?

08:30 <aw> yes, i just saw Werner's email and marked firstly

08:30 <wolfspraul> let's try now

08:30 <wolfspraul> if it works, we will probably do it for each board

08:30 <wolfspraul> but let's try

08:30 <wolfspraul> you ready?

08:30 <aw> second

08:30 <wolfspraul> 1. plug in usb cable, like you normally flash

08:31 <wolfspraul> 2. run 'dmesg'

08:31 <wpwrak> (analysis) so far, we only have very spurious results, and many have causal dependencies in them, which further twist the probabilities. so it's hard to tell anything from the existing data, except that bad things happen.

08:31 <wolfspraul> wpwrak: how would you call it [instead of render cycle]

08:31 <wolfspraul> 'render cycle' because it's a full cycle from power on to rendering back to power off

08:32 <wpwrak> (cycle) does the cycle even involve rendering anything ? i thought it was now justÂ Â power up -> CRC -> power down

08:32 <wolfspraul> did you read the list #1 - #10

08:33 <wpwrak> or does the test sw render ?

08:33 <wpwrak> yes

08:33 <aw> [15332.010338] ftdi_sio 2-3:1.1: device disconnected

08:33 <aw> [15491.956073] usb 2-3: new high speed USB device using ehci_hcd and address 8

08:33 <aw> [15492.093767] usb 2-3: configuration #1 chosen from 1 choice

08:33 <aw> [15492.096349] usb 2-3: Ignoring serial port reserved for JTAG

08:33 <aw> [15492.099598] ftdi_sio 2-3:1.1: FTDI USB Serial Device converter detected

08:33 <aw> [15492.099664] usb 2-3: Detected FT2232H

08:33 <aw> [15492.099670] usb 2-3: Number of endpoints 2

08:33 <aw> [15492.099676] usb 2-3: Endpoint 1 MaxPacketSize 512

08:33 <aw> [15492.099682] usb 2-3: Endpoint 2 MaxPacketSize 512

08:33 <wolfspraul> boot, render (30 seconds), cycle, test software (crc)

08:33 <aw> [15492.099687] usb 2-3: Setting MaxPacketSize 512

08:33 <aw> [15492.099989] usb 2-3: FTDI USB Serial Device converter now attached to ttyUSB0

08:33 <wolfspraul> ok enough

08:33 <wolfspraul> :-)

08:33 <wolfspraul> aw: do you see the "usb 2-3:"

08:33 <wolfspraul> two-three:

08:33 <wpwrak> ah, step 3 !

08:34 <wpwrak> right, i skipped some steps :) i wouldn't power cycle twice per loop

08:34 <wolfspraul> so that means the m1 board was connected to your notebook _BUS_ 2, and _PORT_ 3

08:34 <wolfspraul> aw: ok?

08:34 <wpwrak> drop steps 2-5

08:34 <wolfspraul> wpwrak: wait let me do the full-speed thing first

08:35 <wpwrak> nod :)

08:35 <aw> wolfspraul, yes, saw "usb 2-3"

08:35 <wolfspraul> I can do multiple threads in parallel, but Adam probably cannot

08:35 <wpwrak> yes yes :)

08:35 <wolfspraul> aw: ok, that means 'bus' 2, 'port' 3

08:35 <aw> i can't. now do full speed first

08:35 <aw> ;-)

08:36 <wolfspraul> now: echo 3 >/sys/bus/usb/drivers/usb/usb2/../companion

08:36 <wolfspraul> the '3' and '2' are coming from your dmesg output

08:37 <wolfspraul> note that there needs to be a space after the '3' in "echo 3 >/sys/bus/..."

08:38 <wolfspraul> after executing the 'echo' line, run 'dmesg' again and paste some lines from the end here

08:38 <aw> adam@adam-laptop:~/m1_adam/snapshots/2011-07-13/for-rc3$ echo 3 >/sys/bus/usb/drivers/usb/usb2/../companion

08:38 <aw> bash: /sys/bus/usb/drivers/usb/usb2/../companion: Permission denied

08:38 <aw> sudo?

08:40 <aw> ping? you there?

08:40 <wpwrak> sudo, yes

08:41 <wpwrak> sudo /bin/bash

08:41 <wpwrak> then run the command

08:41 <xiangfu> echo 3 | sudo tee /sys/bus/usb/drivers/usb/usb2/../companion

08:42 <wpwrak> wow :)

08:43 <aw> [15492.099989] usb 2-3: FTDI USB Serial Device converter now attached to ttyUSB0

08:43 <aw> [16147.368219] usb 2-3: USB disconnect, address 8

08:43 <aw> [16147.368829] ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0

08:43 <aw> [16147.368870] ftdi_sio 2-3:1.1: device disconnected

08:43 <aw> [16147.624074] usb 6-1: new full speed USB device using uhci_hcd and address 2

08:43 <aw> [16147.767106] usb 6-1: not running at top speed; connect to a high speed hub

08:43 <aw> [16147.795229] usb 6-1: configuration #1 chosen from 1 choice

08:43 <aw> [16147.803204] usb 6-1: Ignoring serial port reserved for JTAG

08:43 <aw> [16147.807510] ftdi_sio 6-1:1.1: FTDI USB Serial Device converter detected

08:43 <aw> [16147.807554] usb 6-1: Detected FT2232H

08:43 <aw> [16147.807557] usb 6-1: Number of endpoints 2

08:43 <aw> [16147.807559] usb 6-1: Endpoint 1 MaxPacketSize 64

08:43 <aw> [16147.807562] usb 6-1: Endpoint 2 MaxPacketSize 64

08:43 <wpwrak> yes ! :)

08:43 <aw> [16147.807564] usb 6-1: Setting MaxPacketSize 64

08:43 <aw> [16147.808882] usb 6-1: FTDI USB Serial Device converter now attached to ttyUSB0

08:43 <aw> mm...full speed device now.

08:43 <wpwrak> triumph ! :)

08:44 <aw> then? does this mean that I have to enter commands everytime when test each board?

08:44 <wpwrak> make sure you use the longest cable you have ;-)

08:44 <wpwrak> the port configuration should be permanent (until you reboot the PC)

08:44 <aw> oah..sorry that i used a shorter cable..okay...change to long cable

08:44 <wpwrak> but you can check with dmesg. unplug and replug, then see if it still comes up as full-speed

08:45 <wolfspraul> argh

08:45 <aw> umm..sounds good (until reboot the PC)

08:45 <wolfspraul> why long cable?

08:45 <aw> i see

08:45 <wolfspraul> we are not trying to fix every bug on the planet

08:45 <wpwrak> worst case: you need to run the command each time you re-plug the usb-jtag

08:45 <wpwrak> wolfspraul: opportunistic testing :)

08:45 <wolfspraul> wait, let's be clear and precise

08:45 <aw> hmm...sounds different idea..i standby and listening firstly. ;-)

08:46 <wolfspraul> yes

08:46 <wolfspraul> I am focusing on the run of 90 boards, already badly delayed

08:46 <wolfspraul> we can postpone discoveries of all kinds until after sales have started

08:46 <wolfspraul> now...

08:46 <wolfspraul> full-speed is good

08:46 <wolfspraul> Adam can switch to 100% full-speed for the rest of the run now

08:46 <wolfspraul> but I would say the same thing about the short cable

08:46 <wolfspraul> we are trying to fix rc3 bugs, not make sure Adam's entire lab is bug free

08:47 <wolfspraul> my opinion

08:47 <wpwrak> (postpone) well, as you wish. confirmation that full-speed is the cure may create an action item before shipping, though.

08:47 <wolfspraul> cure of which bug?

08:47 <wolfspraul> libusb bug?

08:47 <wolfspraul> we don't even know which bug :-)

08:47 <wpwrak> cure of the reflash failures

08:48 <wolfspraul> hmm

08:48 <wpwrak> well, there's that, yes

08:48 <wolfspraul> aw: which m1 board do you have attached now?

08:48 <wpwrak> of course, are we sure there's even a bug in libusb ? :)

08:48 <wolfspraul> :-)

08:48 <wolfspraul> that's exactly what I want to avoid getting into now

08:48 <aw> wolfspraul, 0x7c

08:49 <wpwrak> that's the fun bit with stochastic bugs - it happens, then you change X and it doesn't happen. but are you sure it went away because you changes X or just because you didn't test often enough ? :)

08:49 <wpwrak> anyway, we can deal with this later, okay

08:49 <wolfspraul> aw: above you said 0x7C is available (testing finished)

08:49 <wolfspraul> are you planning to reflash 0x7C now?

08:49 <wpwrak> i think a fully tested and okay board is a good start

08:49 <wpwrak> no need to reflash until CRC errors happen

08:50 <wolfspraul> yes but I don't understand whether or why Adam wants to reflash 0x7C now, if he just said it's 100% pass

08:50 <wolfspraul> probably a misunderstanding somehwere...

08:50 <aw> wolfspraul, yes 0x7c was done successfully with "new steps" for rendering.

08:51 <wolfspraul> aw: ok, so that sounds like 0x7C is finished.

08:51 <wolfspraul> let's make a little test with our new full-speed happiness

08:51 <aw> but 0x7c not ready for reflashing with "full speed" reflash. i just tried to learn commands. ;-)

08:52 <aw> so what's next step here though?

08:52 <aw> or just when I meet d2/d3 dimly list again? then ping here?

08:53 <wolfspraul> aw: you don't need to reflash anything just because the USB speed is full-speed now

08:53 <wolfspraul> the idea is that for new boards that you reflash from now on, you make sure they are flashed in full-speed

08:53 <aw> so i keep using shorter usb cable and fix usb failure boards first. ;-)

08:53 <wolfspraul> aw: should we try a test on 0x32 ?

08:54 <wolfspraul> those things are unrelated

08:54 <wolfspraul> yes, keep using the short cable

08:54 <wpwrak> aw: you should reflash after each CRC failure. we assume that "d2/d3 dim" would also be a CRC failure. but there can be other CRC failures that do not cause "d2/d3 dim"

08:54 <wolfspraul> aw: I just told him earlier to not reflash after crc failure to not remove evidence.

08:54 <wolfspraul> I meant wpwrak : I just told adam earlier ...

08:54 <wolfspraul> phew

08:55 <wolfspraul> that's the hard part now, avoiding confusion

08:55 <aw> umm...confused me now

08:55 <wolfspraul> aw: wait one second

08:55 <wolfspraul> yes of course

08:55 <aw> okay

08:55 <wpwrak> aw: well, detect CRC error -> analyze -> reflash :)

08:55 <wolfspraul> aw: long usb cable versus short usb cable

08:55 <wolfspraul> that one first

08:55 <wolfspraul> aw: please do _NEVER_ use the long cable

08:55 <wolfspraul> NEVER

08:55 <wolfspraul> alwyas use the short cable

08:55 <wolfspraul> ok?

08:55 <aw> okay

08:55 <wolfspraul> next

08:55 <wolfspraul> USB full-speed versus high-speed

08:55 <wpwrak> wolfspraul: i would use it later :) it's still the cable you ship, no ?

08:56 <wolfspraul> no confusion

08:56 <wolfspraul> I want to finish this run.

08:56 <wolfspraul> aw: USB full-speed versus high-speed

08:56 <wolfspraul> from now on, please always set your USB speed to full-speed on your notebook, before running reflash_m1.sh

08:57 <aw> okay: 1. NEVER use long cable 2. from now on use full speed commands

08:57 <wolfspraul> yes

08:57 <wolfspraul> you can check in dmesg

08:57 <wolfspraul> when you see "new high-speed device detected", that's wrong

08:57 <wolfspraul> you want to see "full-speed"

08:57 <wolfspraul> if it says 'high-speed', you need to use the echo command to force it to full-speed

08:58 <aw> alright to use echo commands

08:58 <wolfspraul> you can check it every time after you plug in a new m1, before running reflash_m1.sh

08:58 <wolfspraul> aw: do you think it's clear how you check full-speed, and force to full-speed ?

08:58 <wolfspraul> if you are not clear, before running reflash_m1.sh, just paste the last lines from dmesg here

08:59 <aw> wolfspraul, clear...from now on always use full-speed even i don't meet d2/d3 dimly or else flash problems.

08:59 <wolfspraul> sure

08:59 <wolfspraul> because you have to set full-speed _BEFORE_ running reflash_m1.sh

09:00 <wolfspraul> so you cannot know at that point whether you run into problems or not

09:00 <aw> surely if I meet flash problem again, i ping here in parallel

09:00 <wolfspraul> the full-speed thing is _BEFORE_ running reflash_m1.sh

09:00 <wolfspraul> every time before you run reflash_m1.sh, you check the full-speed thing

09:00 <wolfspraul> ok?

09:00 <aw> alright..okay

09:00 <wolfspraul> next

09:00 <wolfspraul> about reflashing

09:01 <wolfspraul> I think after you have flashed some m1 board once, and it can boot (and render), after that you should _NEVER_ reflash it a second time.

09:01 <aw> okay.you just told me. ;-)

09:01 <wolfspraul> it you run into any error after a successful rendering, leave the board untouched. just note the error, and put the board side.

09:02 <wolfspraul> we can then study the test results and think about which step to take on which board.

09:02 <wolfspraul> that's all from me :-)

09:02 <wolfspraul> 3 easy items

09:02 <wolfspraul> 1) always use short cable

09:02 <aw> okay...and keep testing another board first

09:02 <wolfspraul> 2) always make sure USB is in full-speed before running reflash_m1.sh

09:02 <wolfspraul> 3) do not reflash a board again after it has already rendered

09:02 <wolfspraul> :-)

09:03 <wolfspraul> aw: wait, one more thing

09:03 <wolfspraul> I want to make one test with 0x32

09:03 <wolfspraul> aw: can we make one special test with 0x32 ?

09:04 <aw> now? why not?

09:04 <wolfspraul> well, just asking :-)

09:04 <wolfspraul> so yes, please get 0x32, and plug it in, and paste the last lines of dmesg here

09:04 <wpwrak> hmm, i'd rather focus on one board at a time. so 0x7c. cycle until CRC, analyze CRC, then reflash and test 0x7c some more

09:04 <wolfspraul> "plug it in", I mean usb jtag

09:05 <wpwrak> wolfspraul: 0x32 is with usb problems ?

09:05 <wolfspraul> you said all flashing problems would go away with full-speed

09:05 <wpwrak> aye

09:05 <wolfspraul> I'm trying to pick one where that may actually happen, maybe 0x32 http://en.qi-hardware.com/wiki/Milkymist_One_run_3_schedule#Test_Results

09:05 <wolfspraul> 0x34 was strange, so maybe not

09:06 <wolfspraul> its' just a quick test

09:06 <wolfspraul> in 2 minutes we know that full-speed will make no difference on 0x32 :-)

09:06 <wolfspraul> he he

09:06 <wolfspraul> grant me that joy

09:07 <wpwrak> sure. let's get the small things out of the way first :)

09:07 <wolfspraul> if this works, I'll get my first afternoon beer

09:07 <wpwrak> 0x3a too, no ?

09:07 <aw> 0x32: 17567.832765] ftdi_sio 6-1:1.1: device disconnected

09:07 <aw> [17656.533069] usb 6-1: new full speed USB device using uhci_hcd and address 3

09:07 <aw> [17656.673148] usb 6-1: not running at top speed; connect to a high speed hub

09:07 <aw> [17656.700375] usb 6-1: configuration #1 chosen from 1 choice

09:07 <aw> [17656.707317] usb 6-1: Ignoring serial port reserved for JTAG

09:07 <aw> [17656.712410] ftdi_sio 6-1:1.1: FTDI USB Serial Device converter detected

09:07 <aw> [17656.712563] usb 6-1: Detected FT2232H

09:07 <aw> [17656.712570] usb 6-1: Number of endpoints 2

09:07 <aw> [17656.712576] usb 6-1: Endpoint 1 MaxPacketSize 64

09:07 <aw> [17656.712582] usb 6-1: Endpoint 2 MaxPacketSize 64

09:07 <aw> [17656.712587] usb 6-1: Setting MaxPacketSize 64

09:08 <aw> [17656.717182] usb 6-1: FTDI USB Serial Device converter now attached to ttyUSB0

09:08 <aw> now is full speed, so what steps you want to try?

09:08 <wolfspraul> aw: perfect. just run reflash_m1.sh

09:08 <wolfspraul> yes 0x3A is nice too! thanks. I didn't see it...

09:08 <aw> be noticed that now it stays d2/d3 dimly lit.

09:08 <aw> okay

09:09 <aw> second

09:09 <aw> wait...use xiangfu's last 'erase' version, right?

09:09 <wolfspraul> sure why not

09:09 <aw> okay

09:09 <wolfspraul> always use the new reflash_m1.sh with erase now, I see no reason why not

09:09 <wpwrak> seems we have more: 0x55, 0x67, 0x6d, 0x6f, 0x70, 0x77, ...

09:10 <wpwrak> 0x7a is a bit weird, but may also be the same

09:11 <wolfspraul> well you are brave

09:11 <aw> hmm...stops at 'Bitstream length: 1484404'

09:11 <aw> standby next analysis step now..he he ;-)

09:12 <wpwrak> GRRRR

09:12 <aw> what's meaning of "GRRRR"? ;-)

09:12 <wpwrak> seems that my "full speed" theory is wrong :-(

09:13 <wpwrak> ah well, in any case it shouldn't make things worse ...

09:13 <aw> wpwrak, not bad that a way would be came out from you. :-) never sad though..we here you.

09:13 <wolfspraul> aw: let's try the same quick test on 0x3A

09:13 <aw> okay

09:13 <aw> second

09:14 <wolfspraul> I do believe full-speed is good, we should always use it and it will help eliminate a few strange flashing problems. But I don't believe it has any impact on the physical/electrical condition of a particular m1 board.

09:15 <wolfspraul> I trust the little jtag board and the ftdi chip. once the nor is written it's written. the strangeness must come from the m1 boards themselves.

09:16 <aw> 0x3a: good still detect with full speed and stays d2/d3 dimly lit after powered -on. now to reflash. ;-)

09:16 <aw> mm...same stopped at 'Bitstream length: 1484404'

09:16 <wolfspraul> aw: try to disconnect/reconnect the jtag-serial board too

09:17 <wolfspraul> aw: now try to disconnect/reconnect the jtag-serial board

09:17 <wolfspraul> (power off everything first)

09:17 <aw> hmm...need to power off

09:17 <wolfspraul> then reflash_m1.sh in full-speed again

09:18 <aw> same stopped there. :-(

09:18 <wolfspraul> ok

09:18 <wolfspraul> one sec

09:19 <wolfspraul> can you try reflashing with Xilinx Impact?

09:19 <wolfspraul> and the xilinx cable

09:19 <aw> hmm...seems different image i quite don't know this.

09:19 <aw> need to ask xiangfu before do this. :-)

09:20 <wolfspraul> ok

09:20 <wolfspraul> we can do that later

09:20 <aw> last time rc2 I used Lekernel's image

09:20 <wolfspraul> on all boards with flashing problems, we can try Xilinx Impact and Xilinx cable later

09:20 <wolfspraul> aw: ok, let's stop the full-speed tests right now

09:20 <wolfspraul> Werner had another idea I like

09:20 <aw> mmm..okay.

09:20 <aw> wait

09:21 <wolfspraul> aw: you just finished 0x7C, right?

09:21 <aw> so from now on i still use full-speed to continue tests?

09:21 <wolfspraul> absolutely

09:21 <aw> yes, finished 0x7c

09:21 <wolfspraul> always full-speed

09:21 <aw> alright full-speed now

09:21 <wolfspraul> so werner wants to make a special test on 0x7c

09:21 <wolfspraul> like this:

09:21 <aw> now?

09:21 <wolfspraul> yes

09:21 <wolfspraul> wait I write first

09:22 <wolfspraul> 1. plug DC jack in

09:22 <wolfspraul> 2. middle button, escape to test software, run test software until CRC checks

09:22 <wolfspraul> 3. unplug DC jack

09:22 <wolfspraul> 4. go back to step #1

09:22 <wolfspraul> just that

09:23 <aw> okay

09:23 <wolfspraul> We are hoping that after some cycles, the CRC checks will find a corruption

09:23 <wolfspraul> the cycles should be fast, so you can try 100 or 200

09:23 <wolfspraul> start with 100 :-)

09:23 <wpwrak> and please count the cycles

09:23 <wpwrak> err, i'd stop at the first CRC error

09:23 <wolfspraul> oh sure

09:23 <wpwrak> then analyze

09:24 <wolfspraul> sorry that wasn't clear

09:24 <wolfspraul> aw: of course you stop at the first CRC error

09:24 <aw> alright

09:25 <wolfspraul> wpwrak: be warned (well, I warn myself). I believe this kind of testing may damage the nor chip or more, and turn a board unflashable for days or forever. :-)

09:25 <wolfspraul> aw: no worries, I just explain my theory to Werner... You can have fun :-) We have enough boards now to ruin some :-)

09:25 <aw> wolfspraul, ha...yes, from last rc2 experiences. ;-)

09:26 <wolfspraul> yes

09:26 <wolfspraul> we should have taken it much more seriously on rc2

09:26 <wolfspraul> I learnt a lot

09:26 <wolfspraul> but that's another story, now we try to rescue rc3 and make good boards

09:26 <wpwrak> wolfspraul: the chip should be good for a few kcycles

09:26 <wolfspraul> no

09:26 <wolfspraul> you will see soon

09:26 <wolfspraul> it's a bug somewhere, an electrical problem

09:27 <wolfspraul> some kind of shock, over-current, over-voltage, whatever

09:27 <wolfspraul> you saw Adam's reaction just now when I wrote this :-)

09:27 <aw> 2 times

09:27 <wpwrak> wolfspraul: hmm, let's hope it's not overvoltage or such. the reset chip replacement couldn't fix that.

09:28 <wolfspraul> correct

09:28 <wolfspraul> I know

09:28 <wolfspraul> so I keep asking "how comfortable are we" :-)

09:28 <wolfspraul> because I'm not :-)

09:28 <wolfspraul> I made some big mistakes in rc2, like I said - already learning...

09:29 <aw> 5

09:29 <wolfspraul> but that analysis doesn't help now, so let's make the best out of the rc3 situation we have right in front of us

09:29 <wolfspraul> sometimes all you have left is that some luck happens

09:29 <wolfspraul> a lucky day!

09:29 <wolfspraul> maybe today?

09:29 <wolfspraul> :-)

09:29 <wolfspraul> let's look for signs!!

09:30 <wpwrak> signs and portents :)

09:31 <aw> 10

09:34 <wolfspraul> wpwrak: do we have any theory what kind of damage or impact may turn the nor chip, or something else, unflashable for several days, but then flashable again?

09:34 <wolfspraul> because Adam has seen that so many times now that we can rule out it just being some sort of noise

09:34 <aw> 15

09:35 <wolfspraul> Adam will regularly let an unflashable board 'rest' for several days, and then try again, because we have seen a lot come back alive after such a resting period

09:35 <xiangfu> wpwrak, you may already saw my patches on urjtag 'lockflash' 'unlockflash'. I have some question about how this urjtag works.

09:35 <wolfspraul> it's not 5 minutes, or an hour, the effect is noticeable after 1 day or 2 days or so

09:35 <wpwrak> hmm no, no idea

09:36 <wpwrak> could be some temperature dependency

09:36 <wolfspraul> not sure

09:36 <xiangfu> wpwrak, http://dpaste.com/594592/ line 20 ~ 28 is read back the lock bit and check.

09:36 <wolfspraul> because the boards worked fine before, including reflashing

09:36 <wpwrak> one test could be like this: if board X magically recovers, try all other boards with reflash problems at that time too. if it's temperature, some of them may also come back

09:37 <wolfspraul> you mean room temperature?

09:37 <wpwrak> yes

09:37 <wolfspraul> definitely not. it's a time based phenomenom.

09:37 <xiangfu> wpwrak, I want know how urjtag know the 'cfi_array->address' ?Â Â for now I understand: 1. upload the fjmem.bit 2. then nor flash working 3. how urjtag know what is the address of nor flash data port?

09:37 <aw> wolfspraul, last in rc2 we damaged our boards by "fast-powered cycling" though..not keep 5 seconds between power-on like this time

09:38 <wolfspraul> yes sure, and the reset circuit is also there. let's just focus on trying to reproduce the flash bug now, I'm only saying if it falls into an unflashable state, I wouldn't be surprised.

09:38 <wolfspraul> like 0x32 or 0x3A we just looked at

09:38 <aw> 20

09:39 <wpwrak> xiangfu: (address of flash) isn't this configured somewhere ?

09:39 <aw> wpwrak, why no use high speed to capture the tests I am doing?

09:39 <wpwrak> wolfspraul: (time) hard to distinguish the two

09:39 <aw> i felt this test if use full-speed?

09:40 <wolfspraul> it shouldn't matter. you think it's slower now?

09:40 <wolfspraul> I think you should always use full-speed, even for this test.

09:40 <aw> since last week i met CRC err by high speed. :-)

09:41 <wolfspraul> don't say that otherwise Werner will jump up and hurt his head

09:41 <aw> so i just wanted to clarify what purpose you wanted to catch?

09:41 <aw> oah...sorry ;-)

09:41 <wolfspraul> no no, just joking

09:41 <wolfspraul> I am just joking

09:41 <wolfspraul> :-)

09:41 <wolfspraul> aw: I think always use full-speed

09:41 <wolfspraul> for everything

09:41 <aw> alright .;-)

09:42 <wpwrak> xiangfu: are you sure aboutÂ Â URJ_BUS_WRITE (bus, adr + 0x02, CFI_INTEL_CMD_READ_IDENTIFIER);Â Â ?

09:42 <wpwrak> xiangfu: the data sheet seems to want 0x1a (table 8, page 19)

09:43 <aw> 25

09:44 <wpwrak> xiangfu: ah, sorry, misread it. it's not 0x1A but IA :)

09:45 <wpwrak> xiangfu: so, if i understand things right: URJ_BUS_WRITE (bus, adr, CFI_INTEL_CMD_READ_IDENTIFIER);

09:46 <wpwrak> xiangfu: and then sr = URJ_BUS_READ (bus, cfi_array->address+2);

09:46 <wpwrak> hmm, vanished :(

09:47 <aw> 30

09:47 <wpwrak> aw: you're running the CRC check each time ?

09:48 <aw> sure

09:48 <aw> haven't spotted CRC err though. ;-)

09:49 <aw> 35

09:50 <wolfspraul> it could take 100-200 cycles

09:51 <wpwrak> wasn't 100-200 the rate of "dim LEDs" ?

09:51 <wolfspraul> unfortunately we know so little. it could be that some boards will never exhibit the problem.

09:51 <wpwrak> with the CRC check, we should hit it ~10-20 times more often, assuming uniform distribution

09:51 <wolfspraul> maybe it is caused by some unfortunate part tolerances coming together

09:51 <wpwrak> that could be the case, too

09:51 <wolfspraul> I don't believe that, but let's see

09:51 <aw> 40

09:52 <wpwrak> maybe it's also a question of giving the board enough time to discharge

09:52 <wolfspraul> if we know for sure that some boards are safe, they are good to go

09:52 <wolfspraul> the bad thing is that we currently do 10 render cycles (30 seconds each) in our testing

09:52 <wolfspraul> and we had boards failing on cycle #2 #6 #9 etc.

09:52 <wolfspraul> not good

09:52 <wolfspraul> why should '10' be the magic number to determine that the board is stable?

09:52 <wpwrak> yeah

09:53 <wpwrak> if we have the baseline probability, we can calculate how many tests you need to be, say, 99% sure the problem doesn't appear

09:53 <wolfspraul> we don't need to look at or find root causes for all sorts of strange flash/dim lit/reconfig/whatever boards. we have enough time for that once we have cleared 40, 50, 60 or more to go out

09:54 <wolfspraul> I think what helps is if we can more clearly see the different bugs separately that are probably overlapping here

09:54 <wolfspraul> which is why I like the full-speed stuff, short cable, crc checks, etc.

09:54 <wolfspraul> also the reset ic idea

09:54 <aw> 45

09:55 <wolfspraul> not just idea, that seems to be a clean fix/improvement that is good no matter what other things we find

09:56 <wolfspraul> wpwrak: speaking about that. you really want the 1.12s delay ic?

09:56 <wpwrak> yeah, if the reset chip does anything useful at all, then this is an improvement

09:56 <wolfspraul> I mean - can that work at all?

09:57 <wolfspraul> yes I think the reset ic is fine, helps

09:57 <wolfspraul> pretty sure about that

09:57 <aw> 50

09:57 <wolfspraul> so we order 100 of this: http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=APX803-44SAG-7DICT-ND

09:57 <wpwrak> (1.12 s) for R&D, it may be good to have. not sure it would be desirable in M1

09:57 <wolfspraul> and 3 or 4 of this: http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=576-3835-1-ND

09:58 <wpwrak> i usually get at least 10, unless the item is very expensive :)

09:59 <wolfspraul> I'm half Chinese

09:59 <wolfspraul> so 5

09:59 <wolfspraul> :-)

09:59 <wolfspraul> for the cycle testing adam is doing on 0x7A now, I propose we stop that at 100 successful cycles

09:59 <aw> 55

10:00 <wolfspraul> and let Adam continue to go through the whole batch as planned

10:00 <aw> no , ox7c

10:00 <wolfspraul> sorry 0x7C

10:00 <wolfspraul> that's because we already have several improvements now (short cable, full-speed, crc checks in test software which is logged), and then we have more testing data to look at and thing about

10:00 <wolfspraul> think about

10:01 <wolfspraul> then we can zoom in on clusters, or try to find clusters, or try to find boards where it is easy to reproduce some particularly interesting behavior

10:01 <wpwrak> yeah, 100 should be plenty. i would have expeced to see an error much earlier. maybe we've removed the step that actually causes the problem. but let's try a few more boards first.

10:01 <wolfspraul> yes we may have removed the step

10:01 <wpwrak> (zoom in) yes

10:01 <wolfspraul> or the problem is only showing on particular boards

10:02 <aw> so remove new steps?

10:02 <wpwrak> that could also be

10:02 <wolfspraul> aw: no, just continue

10:02 <wolfspraul> Werner and I are discussing the next steps

10:02 <aw> :-)

10:02 <wpwrak> maybe, when 0x7c is done, pick one that has had NOR corruption before

10:02 <wolfspraul> :-)

10:02 <wolfspraul> Werner cannot wait looking at the interesting stuff NOW ;-)

10:03 <wolfspraul> also from now on, Adam will run crc checks between the 10 render cycles

10:03 <wolfspraul> that may show something (or not)

10:03 <wolfspraul> we could increase the 10 render cycles to 15 ?

10:03 <wolfspraul> they are time consuming though

10:03 <wolfspraul> that's 30 minutes testing for each board, easily

10:04 <aw> 60

10:05 <wolfspraul> nah let's only do 10 now

10:05 <wolfspraul> I don't need more evidence that boards fail at #12 or #14

10:05 <wolfspraul> I need to find the root cause

10:06 <wolfspraul> in that thinking we could even reduce the cycles to 5 :-)

10:06 <wpwrak> lekernel: in adam's usual test, boot and render for some minutes, do NOR access (read or write) occur after the rendering starts ?

10:06 <wolfspraul> 30 seconds render

10:06 <wpwrak> wolfspraul: wait wait .. for now, we don't have rendering in the loop

10:07 <wolfspraul> yes

10:07 <wolfspraul> correct

10:07 <wolfspraul> I am thinking once he's back to going through the batch

10:07 <wolfspraul> I think we can reduce to 5 cycles

10:07 <wpwrak> wolfspraul: let's keep the simplified loop and apply it to a board that's know not to be immune

10:07 <wolfspraul> but with crc checks in between

10:07 <wolfspraul> which one?

10:07 <aw> 65

10:07 <wolfspraul> (looking at list)

10:08 <wolfspraul> wpwrak: how about 0x39 ?

10:08 <wolfspraul> clean and simple

10:08 <wolfspraul> one cycle - and out :-)

10:08 <wolfspraul> maybe too simple, maybe a little later...

10:09 <wpwrak> 0x39 sounds excellent :)

10:09 <wolfspraul> 0x54, also nice

10:09 <wolfspraul> 8th cycle

10:09 <wolfspraul> ok, 0x39

10:09 <wpwrak> another up to 100 tries with 0x39

10:10 <wolfspraul> oh my

10:10 <wolfspraul> dinner time for Adam :-)

10:10 <wpwrak> if that still doesn't do anything, add rendering to the loop

10:10 <wpwrak> can you go from test to render ? or do you have to reset in between ?

10:10 <aw> 70

10:10 <wolfspraul> like I said, instead of doing time consuming tests on single boards now, we can also proceed going through the batch with the process that we improved in details

10:11 <wolfspraul> wpwrak: maybe he could go from test to render over software reset (press three buttons), instead of pulling the DC cable

10:11 <wpwrak> we need a larger number of tests for now. statistical baseline.

10:11 <wolfspraul> yes but on which boards?

10:11 <wpwrak> (sw reset) yes, that's an option

10:11 <wolfspraul> you may be hitting on a board that may never show the problem, just wasting time

10:11 <wpwrak> 0x39 looks promising :) it did it once. we know it can ;-)

10:12 <wpwrak> if it all of a sudden doesn't do it, that's interesting, too

10:13 <aw> 75

10:13 <wolfspraul> wpwrak: what do you want to do on 0x39 actually? reflash it?

10:13 <wolfspraul> first try whether it boots now

10:14 <wolfspraul> boards have come back after X days, though I am not sure exactly from which of the multiple failure conditions we may actually be looking at

10:14 <wpwrak> yeah, would be fun if the NOT corruption would somehow have healed itself ;-)

10:14 <wolfspraul> so first try to boot 0x39, see what happens. if no reconfigure -> reflash_m1.sh with erase and full-speed

10:14 <wolfspraul> I am telling you we have seen enough such cases now

10:15 <wpwrak> try to boot and it it boots, run the CRC check

10:15 <wolfspraul> good idea

10:15 <aw> 80

10:17 <aw> 85

10:18 <wpwrak> i have that mental image of a guy in a prison cell counting the days with scratch marks on the wall. adam must be doing something slimiar, counting the tries until he can lay the board to rest :)

10:20 <aw> 90

10:22 <aw> 95

10:23 <wolfspraul> wpwrak: I do think he should continue going throguh the batch first, before 0x39

10:23 <wpwrak> aw: ah, and please paste (to pastebin.com, or similar) the console output of the 100th run

10:23 <wolfspraul> but if you want him to do 0x39 next, ok with me

10:24 <wpwrak> i'd prefer 0x39. let's make the thing happen before changing an unknown set of variables

10:25 <aw> 100

10:26 <wolfspraul> yay!

10:26 <wolfspraul> aw: thanks a lot!

10:26 <wolfspraul> can you post the console output of the last run to pastebin.com ?

10:26 <wolfspraul> have you used pastebin.com before?

10:26 <aw> http://pastebin.com/QBPNQEcV

10:26 <wolfspraul> nice

10:26 <aw> my logs are over though

10:26 <aw> won't show 100 times for you!

10:27 <aw> but you can only trust my log though. :-)

10:27 <wpwrak> perfect. thanks !

10:27 <aw> or tell me how my terminal log can save longer message it can. ha ;-)

10:27 <wpwrak> naw, just wanted to see one :)

10:28 <wolfspraul> aw: what terminal program do you use?

10:28 <wpwrak> nothing unusual there, so the tests appear to be good

10:28 <aw> I need to wrap my rubbish for preparation pm 7:00 and dinner

10:28 <wpwrak> now, 0x39. this will be fun :)

10:28 <wolfspraul> aw: want to go to dinner first, or next test?

10:29 <aw> wpwrak, I'll back soon with 0x39 though following your idea. ;-)

10:29 <aw> is okay? ;-)

10:29 <wolfspraul> yes perfect

10:30 <wolfspraul> aw: you prefer mouser instead of digi-key, right?

10:30 <wolfspraul> is mouser faster, or what is the reason?

10:30 <aw> wolfspraul, yes. Mouser won't charge me extra business fee when over NTD3000 which including shipping fee. ;-)

10:31 <wolfspraul> ok, so mouser is cheaper

10:31 <wolfspraul> alright, I will lookup the reset parts in mouser, I want to get the order out asap

10:31 <aw> the digikey will always charge me an extra business tax 5% of whole order price of batch.

10:31 <aw> that's why i used to order in Mouser. ;-)

10:32 <wolfspraul> 5%, ok

10:32 <aw> i gotta go though

10:32 <wolfspraul> doesn't sound like a big drama

10:32 <wolfspraul> ok, later

10:32 <aw> k

10:32 <wpwrak> (5% charge) interesting. no such charge here, it seems. at least i see nothing that looks like it in the invoice

11:33 <aw> i'm back. so 0x39 for next. ;-)

11:34 <aw> so use same steps like 0x7c's? 100 times, right?

11:41 <wpwrak> first, let's see if it boots in its present state (without reflashing)

11:43 <wpwrak> if it does, please run the CRC test

11:43 <wpwrak> if it doesn't boot, do you know how to read out the NOR via gdb ?

11:47 <aw> 1. it can boot > rendering now

11:49 <aw> 2. CRC is okay while testing

11:49 <aw> i don't know how to read out of NOR via gdb

11:50 <aw> so now I go for 10 times of rendering -> power cycle -> middle btn -> test program -> CRC -> power-cyle -> rendering ?

11:50 <aw> or 100 times?

11:51 <aw> or i learn to read NOR firstly?

11:55 <wolfspraul> it's rendering now?

11:56 <aw> it's in test program now

11:56 <wolfspraul> aw: what is 0x39 doing now? (sorry, just back)

11:56 <wolfspraul> crc was ok?

11:56 <aw> it was okay

11:56 <wolfspraul> alright

11:57 <wolfspraul> that confirms my suspicion that the 'cannot reconfigure' bug we see is not always a nor corruption

11:57 <wolfspraul> in fact the only time we saw a corruption for now was on an rc2, which may not be comparable

11:57 <wolfspraul> aw: maybe let's do 20 cycles of the same style as 0x7c before

11:58 <aw> in rc2? was xiangfu show it us, right?

11:58 <wolfspraul> yes

11:58 <aw> alright

11:58 <wolfspraul> I think we should ignore that result until we have rc3 data

11:58 <aw> um...got it

11:58 <wolfspraul> let's do 20 cycles like before

11:58 <wolfspraul> my prediction: there will be no problem

11:58 <aw> alright, so i start to count

11:58 <wolfspraul> but who knows :-)

11:59 <aw> oah~~man! d2/d3 dimly lit now. :(

11:59 <wolfspraul> well great

11:59 <aw> so next?

12:00 <wolfspraul> hmm

12:00 <wolfspraul> that was the first power cycle?

12:00 <wolfspraul> you just unplug, replug -> d2/d3 dimly lit?

12:01 <aw> http://pastebin.com/8au5g8CG

12:01 <aw> yes

12:01 <aw> belongs to SECOND power on

12:01 <wolfspraul> do you know how to read nor via jtag?

12:02 <aw> so it's second powered -on

12:02 <aw> don't know

12:02 <aw> i see xinagfu seem have read flash script

12:02 <wolfspraul> yes

12:03 <wolfspraul> I wish we could rework the reset ic to 5v/4.4v on 0x39

12:03 <wolfspraul> I think you should continue with the rest of the batch now, I have no further questions about 0x39 right now

12:04 <wolfspraul> unless xiangfu shows up and tells us how to read nor quickly, or unless wpwrak has other questions

12:04 <aw> this https://github.com/milkymist/scripts/blob/master/scripts/read_flash_m1.shÂ Â ?

12:04 <wolfspraul> we can go back to 0x39 later

12:04 <wolfspraul> sure, you can try :-)

12:04 <wolfspraul> read the standby bitstream first

12:05 <wolfspraul> ah yes, you can just run it

12:05 <aw> how the command's syntax?

12:05 <wolfspraul> just run ./read_flash_m1.sh

12:05 <wolfspraul> :-)

12:06 <aw> and which file it will write in?

12:07 <wolfspraul> in your home dir ~/.qi/milkymist/readback/_date_/standby.fpg

12:07 <wolfspraul> if it works

12:07 <wolfspraul> just be brave, run and see what happens

12:07 <wolfspraul> connect jtag with usb full-speed as always

12:11 <aw> adam@adam-laptop:~/m1_adam/snapshots/2011-07-13/for-rc3$ sudo ./read_flash_m1.sh

12:11 <aw> ./read_flash_m1.sh: 6: Syntax error: newline unexpected

12:11 <aw> after chmod +x read_flash_m1.sh

12:12 <aw> xiangfu, how to use https://github.com/milkymist/scripts/blob/master/scripts/read_flash_m1.sh ?

12:14 <wolfspraul> strange how did you download it? try this url https://raw.github.com/milkymist/scripts/master/scripts/read_flash_m1.sh

12:14 <wolfspraul> maybe something wrong with newlines?

12:15 <aw> i used : wget --no-check-certificate https://github.com/milkymist/scripts/blob/master/scripts/read_flash_m1.sh

12:16 <wolfspraul> try my url

12:16 <aw> okay

12:16 <xiangfu> yes. it should be "https://raw.github.com/milkymist/scripts/master/scripts/read_flash_m1.sh"

12:16 <wolfspraul> maybe you got the entire web page :-)

12:18 <aw> oah :-) poor adam

12:19 <wolfspraul> what? it worked?

12:20 <aw> not work. it stops the same . wait ...i copy msg log

12:21 <wolfspraul> I think we stop work on 0x39 now. wait a day and maybe it boots again :-)

12:21 <wolfspraul> right?

12:21 <aw> http://pastebin.com/bjZMiZNr

12:21 <aw> yes, let's stop 0x39 now

12:22 <wolfspraul> I suggest you go back to the normal procedure, continue with all boards and all known fixes

12:22 <wolfspraul> I propose a change to the render cycles, we already said that you run the crc test software after each render cycle

12:22 <wolfspraul> I also think you should reduce the number from 10 to 5

12:22 <wolfspraul> so here's the list:

12:22 <wolfspraul> 1. only use short cable (as before)

12:22 <aw> mm..so your normal procedure is now becoming:

12:22 <aw> go on

12:23 <wolfspraul> 2. always run reflash_m1.sh in usb full-speed mode

12:23 <wolfspraul> 3. run the test software (crc part) after each render cycle

12:23 <wolfspraul> 4. reduce the number of render cycles from 10 to 5

12:23 <wolfspraul> that's all

12:23 <aw> got it

12:37 <wolfspraul> aw: so what you've found is that if a board is in d2/d3 dimly lit status, it cannot be reflashed over jtag-serial, and the nor can also not be read over jtag-serial

12:37 <wolfspraul> we could try to reseat (disconnect/reconnect) the jtag-serial board, and we could try to reflash with Xilinx Impact

12:38 <wolfspraul> but I suggest to do that later

12:38 <wolfspraul> the real showstopper is to find the reason why a board can go from seemingly normal to this state. we have to fix that before boards can go out.

12:38 <wolfspraul> and the only idea right now seems to be the new reset ic

12:43 <wolfspraul> I think whether it's from the fpga, software or electrical, the m1 is doing something really bad to the nor chip under some circumstances

12:43 <aw> wolfspraul, after it's in d2/d3 dimly lit, cannnot be read over jtag-serial, but bad that I forgot to reflash it again. but from previous other boards's histories, once board is in d2/d3 dimly lit, i t seems always stopped at "Bitstream length: 1484404"

12:43 <aw> but we can try reflash 0x39 tomorrow

12:43 <wolfspraul> :-)

12:43 <wolfspraul> the famous "let's wait 1 day"

12:43 <aw> oah..yeah...

12:43 <wolfspraul> I think let's continue with all boards first

12:44 <wolfspraul> more fixes, more data

12:44 <wolfspraul> I need complete overview over the failure clusters

12:44 <wolfspraul> in parallel the new reset ics are ordered

12:44 <aw> okay..i continue tests

12:44 <wolfspraul> aw: do you know how to _READ_ the nor chip with Xilinx Impact?

12:45 <wolfspraul> you could try to read the nor from 0x39 with Xilinx Impact

12:45 <wolfspraul> but yeah, I suggest - do that later

12:45 <aw> hmm...need to do this later though ;-)

13:57 <wpwrak> hmm, interesting ..

13:57 <wpwrak> (sorry, fell asleep and missed part of the fun)

13:58 <wpwrak> so maybe we don't have a NOR corruption after all. that would be good :)

13:59 <lekernel> wpwrak, any other ideas about what is happening, then?

13:59 <lekernel> temperature dependent timing failures?

13:59 <wpwrak> could be some analog domain weirdness of the diode-based reset circuit ... but i don't have any clear error path for that

14:00 <wpwrak> what's puzzling me is that JTAG and normal operation run into trouble with the NOR

14:01 <wpwrak> otherwise, i would have suspected problems with the timing of NOR bus access cycles

14:01 <wpwrak> maybe some of the signals are just too weak ? a voltage check could help to clarify this

14:03 <wolfspraul> the sample set was smaller (run of 40 instead of 90), and there was a lot less testing in rc2 than rc3, but I am pretty sure this same 'kind' of bug already existed in rc2

14:03 <wolfspraul> so I think that rules out anything new that got introduced by the reset ic or diode

14:03 <wolfspraul> big guess though, just from thinking about what cases I saw or remember

14:04 <wpwrak> wolfspraul: you think it's the same at the NOR corruption ?

14:04 <wolfspraul> well, the best data I have now are the rc3 test results

14:05 <wpwrak> (which could of course just be invalid data showing up, without the NOR itself being compromised)

14:05 <wolfspraul> so I scan them, top to bottom and back up, on the 'notes' column

14:05 <wolfspraul> what I see now, even though Adam is not finished yet, is easily 20-30 boards that all fall into one 'group'

14:06 <wolfspraul> d2/d3 dimly lit, cannot reconfigure, cannot reflash

14:08 <wolfspraul> just counted: 17

14:09 <wolfspraul> 46 have passed, 1 adam, 17 in that 'group', 26 in other failure states currently

14:09 <wolfspraul> that 26 will come down more

14:09 <wpwrak> plus, 0x39 seems to be able to enter this state, whatever it is, relatively easily. let's make this our preferred candidate for now.

14:09 <wpwrak> and if it is in this state, it doesn't seem to get out without a power cycle. but maybe this is just a lack of time

14:10 <wolfspraul> so that's a big group already (17, counted conservatively), and growing

14:10 <wpwrak> has a board with dim LEDs been left running for a long time, say, overnight ?

14:10 <wolfspraul> you mean in dim LED state?

14:10 <wolfspraul> afaik it's not running then

14:10 <wolfspraul> dim LED means no boot

14:11 <wpwrak> lekernel: on CRC failure, will the FPGA just keep on trying forever ? or does it eventually give up ?

14:11 <lekernel> iirc it tries 3 times or something like that

14:11 <wpwrak> wolfspraul: i mean leave it on, see if it eventually succeeds

14:11 <lekernel> but i'm not sure

14:11 <wpwrak> lekernel: and then ?

14:12 <lekernel> stays in unconfigured state

14:12 <wpwrak> bleh :-(

14:12 <lekernel> in any case, loading fjmem.bit will stop all other configuration attempts

14:12 <wpwrak> what's fjmem.bit ?

14:14 <wolfspraul> wpwrak: you want to do voltage check on which wires?

14:14 <wpwrak> the "three button salute" triggers a reset, right ? does it also work if unconfigured ?

14:14 <lekernel> wpwrak, the bitstream that is loaded to give urjtag a "fast" jtag access to the flash

14:15 <wpwrak> wolfspraul: basically all the NOR signals. pick a convenient line, e.g., OE, do, say, a read cycle, then see how they behave

14:17 <wolfspraul> wpwrak: 0x39 does not just 'enter' this state easily, more importantly it is 'in' this state right now and we don't know how to get it out

14:17 <wpwrak> wolfspraul: if one can't quite decide whether it should be 0 or 3.3 V, we may have found our problem. maybe set trigger on OE#, then start with RP#, WE#, DQ0, A0, then do the rest of DQx and Ax

14:17 <wpwrak> wolfspraul: it does seem to get out of the state sometimes.

14:18 <wpwrak> wolfspraul: ah, before DQ0, also CE0

14:18 <wolfspraul> yes, but next time we have a board in a state we may zoom in then

14:18 <wolfspraul> there's two different things I think

14:18 <wpwrak> lekernel: maybe something to check out be if all the FPGA I/O cells of NOR pins are properly configured

14:18 <wolfspraul> some event that gets it into this state, and some situation or effect that holds it there

14:19 <wpwrak> wolfspraul: yes. could be temperature plus tolerances. the tolerances enable. the temperature makes it happen.

14:19 <wpwrak> or maybe humidity, phase of the moon, ... ;-)

14:19 <wolfspraul> I doubt it's room temperature. parts temperature - yes, possible.

14:20 <wpwrak> part temp. starts at room temp. :) the you do a bit of testing, it fails, keeps on failing, you give up, put it away, and then it works, until ...

14:21 <wpwrak> of course, if we're unlucky, probing the signals "fixes" it

14:22 <wolfspraul> pah, tough

14:22 <wolfspraul> can't seem to be able to pin it down

14:22 <wolfspraul> I already ordered some more nor flash, just in case :-)

14:22 <wolfspraul> have to ramp up the efforts a bit

14:22 <wpwrak> you suspect the NOR could simply be bad ?

14:23 <wolfspraul> unfortunately adam doesn't have a tsop-56 or whatever package it is tester that could test and scan the entire nor chip at once :-)

14:23 <wolfspraul> no

14:23 <wolfspraul> well

14:23 <wolfspraul> I don't know

14:23 <wolfspraul> 'bad' as in what?

14:23 <wpwrak> where's a cheap 56 channel analog scope with active probes when you need one ? ;-)

14:23 <wolfspraul> maybe we are operating it outside of spec?

14:23 <wolfspraul> not 'bad' as in broken parts or so, no

14:24 <wolfspraul> not at this rate of 20% or more

14:24 <wpwrak> you got it from a reputable source ?

14:24 <wolfspraul> ahh :-)

14:24 <wolfspraul> yes I think so

14:24 <wpwrak> (-:C

14:25 <wolfspraul> and no, there is no indication that that's the problem

14:25 <wolfspraul> this chip is made on a 65nm process

14:25 <wolfspraul> afaik nobody in China can do it yet

14:25 <wolfspraul> anyway, no, the parts are good

14:25 <wolfspraul> although if replacing some 'fixes' the bug of course I'd do that for now

14:26 <wpwrak> lekernel: is there also a "slow" jtag access to flash ? i.e., just good old bit-banging ?

14:26 <wolfspraul> but since we don't even know how to test whether the bug 'exists' on a particular board or not (if it is even board dependant), that wouldn't help either

14:27 <wpwrak> wolfspraul: yeah, let's consider signal integrity for now

14:27 <lekernel> afaik fjmem.bit is just bit banging, but you don't need to scan the 450+ pins of the BGA every time

14:27 <wolfspraul> lekernel: can you image images of the same sources we have now for Xilinx Impact?

14:28 <wolfspraul> or are they the same?

14:28 <wpwrak> lekernel: so fjmem.bit is different from the regular NOR access algorithm ? i.e., much slower bus cycles ?

14:28 <lekernel> but this has nothing to do with failure of the flash _after_ it has been written

14:28 <lekernel> you need to convert to .mcs to use xilinx impact

14:28 <wolfspraul> that's something we can try to bypass any libusb/urjtag/jtag-serial issue, although I don't think that's the root cause of the problem

14:28 <lekernel> with srecord for example

14:29 <wpwrak> lekernel: and when the FPGA boots from NOR, does it always use a built-in bus protocol or does it, say, load a bit from NOR, then switches ?

14:29 <lekernel> it uses the hardwired configuration system

14:30 <lekernel> it seems you can send commands from the flash to change a few things while it's running, but I don't know

14:30 <wpwrak> okay, so we have 2-3 entirely different bus protocol implementations. seems unlikely that all of them would just be wrong.

14:30 <wolfspraul> wpwrak: hey, you will like this

14:31 <wpwrak> ducks

14:31 <wolfspraul> I followed the wiki to find our flash source, and it is the World Peace Industrial Group!!!

14:31 <wolfspraul> if that's not trustworthy, then sorry, I cannot help you

14:31 <wolfspraul> http://www.wpi-group.com/

14:31 <wolfspraul> I'm serious

14:31 <wolfspraul> world peace

14:31 <wolfspraul> that's where we buy from

14:32 <wpwrak> haven't we met them before ?

14:32 <Fallenou> they cannot sell wrong parts then :-)

14:32 <wpwrak> Fallenou: well, osama got the nobel peace price, so ...

14:32 <wpwrak> err, obama. damn.

14:32 <Fallenou> lol

14:32 <wolfspraul> no but they are fine, really

14:33 <Fallenou> i wtf'ed a few seconds

14:33 <wolfspraul> also in this kind of part you rarely have problems, unless you really buy returned/used parts or so, and who does that...

14:33 <wolfspraul> this part is too high-end

14:33 <wpwrak> well, could be rejects

14:34 <wolfspraul> no

14:34 <wolfspraul> it's not

14:34 <Fallenou> well you would not be the first to have troubles with flash parts

14:34 <wpwrak> but let's assume for now it's a bus problem

14:34 <wpwrak> Fallenou: understatement of the year ;-) you should work as a nucelear power spokesperson :)

14:35 <wpwrak> Fallenou: of course, you admitted that a problem exists at all. so maybe not :)

14:35 <Fallenou> hehe

14:36 <wolfspraul> is it possible that the problem is in the fpga not the nor chip?

14:36 <wpwrak> of course. same story there.

14:37 <wpwrak> it doesn't seem to be a configuration problem, though, since the hardwired bus protocol also trips

14:37 <Fallenou> i meant i heard a few people complaining about flash parts behaving strangely , even just soldered brand new ones

14:37 <Fallenou> bad blocks problems and so on

14:37 <wolfspraul> no I don't mean in terms of bad parts or so, that's not the case for sure. I'm just wondering what kind of problem it might be, theoretically.

14:38 <wpwrak> Fallenou: NOR or NAND ?

14:38 <wolfspraul> we could unsolder the nor of 0x39 and put it on a good board to see what happens there

14:38 <Fallenou> was nand i think

14:38 <wolfspraul> oh no

14:38 <wolfspraul> now Werner will be busy for a while

14:38 <wolfspraul> :-)

14:38 <Fallenou> maybe it does not apply here at all

14:38 <wpwrak> Fallenou: bad blocks are normal in NAND. and they have a very subtle definition of what constitutes a "good" block, too :)

14:39 <wolfspraul> the problem of reseating a nor chip to another board is that it's quite intrusive and may create or mask problems

14:39 <wpwrak> Fallenou:Â Â a "good" block is one with 0 or 1 error, i.e., few enough errors that the ECC can still fix it

14:39 <wolfspraul> so we may just get noise back

14:40 <wpwrak> wolfspraul: i'd look at the signal first. if we trust both FPGA and NOR, the problem must be on the bus :)

14:40 <Fallenou> well ok nevermind sorry for the noise ;)

14:40 <wpwrak> wolfspraul: first step: do something that exercises the bus and see if there's an anomaly

14:41 <wolfspraul> we could take 0x39 and try to read the standby bitstream

14:41 <wolfspraul> and compare with a board where that works

14:42 <wolfspraul> what happens on 0x39 now - the ftdi chip loads a small bitstream into the fpga, and then it tries to read from nor via fpga

14:42 <wolfspraul> but that fails/hangs ?

14:42 <wolfspraul> any visibility into that?

14:42 <wolfspraul> can the fpga 'log' all bus activity? :-)

14:42 <wolfspraul> he he

14:43 <wolfspraul> just thinking, maybe nonsense

14:43 <lekernel> urjtag might have some debug mode

14:43 <lekernel> also, inputting the commands manually one by one instead of using the batch script would already help

14:44 <lekernel> and you can 'pld load' directly the soc design

14:44 <lekernel> and run the test program

14:44 <wpwrak> lekernel: does the FPGA take its master clock from the video codec ? or is there some other crystal ?

14:45 <wpwrak> ah, Y2 .. so there must be a Y1 ...

14:45 <lekernel> for configuration, it uses an internal oscillator

14:46 <wpwrak> (found Y1, it's audio)

14:46 <wolfspraul> pld load bitstream is interesting, we should have a little script for that too

14:46 <wolfspraul> just in case

14:46 <wolfspraul> but most likely the test program would then fail accessing the nor, no?

14:46 <wpwrak> (internal) okay, so no risk of weird clock due to an unconfigured oscillator

14:46 <wolfspraul> definitely something to try though

14:47 <wpwrak> maybe crosstalk, reflections, ...

14:47 <lekernel> why would they happen suddenly?

14:48 <lekernel> also the jtag interface has a very slow clock

14:48 <wpwrak> maybe it happens all of the time, just barely below the threshold

14:49 <wpwrak> (jtag) so fjmem.bit is clocked by jtag, not the internal osc ?

14:50 <wolfspraul> xiangfu: can you make a script that uses urjtag to pld load the soc and then runs the test program, all without accessing the nor chip?

14:52 <lekernel> apparently there's some clock in fjmem too

14:52 <lekernel> https://github.com/milkymist/fjmem-m1/blob/master/boards/milkymist-one/rtl/system.v

14:53 <lekernel> but failure of the clock would not impede configuration

14:53 <lekernel> the dim LEDs would go off at power up even if the 50MHz oscillator has failed

14:56 <wpwrak> ah, by mwalle. just when he left for vacation !

14:56 <lekernel> i don't see fjmem related to the board failure problem

14:57 <wpwrak> just a place where one could put diagnostic things. after all, it's one of the areas where we do experience the problem

14:57 <wpwrak> let me give the NOR data sheet a careful read ...

14:58 <lekernel> the whole things looks as if the NOR chip stops responding now

14:58 <lekernel> maybe its reset pin is held active by a crappy reset circuit?

15:01 <wpwrak> that's one possibility

15:01 <wpwrak> what's surprising then is that rc2 would suffer too. but of course, if we're seeing two different bugs, that would explain it

15:01 <wpwrak> e.g., rc2 corrupts NOR due to absent reset circuit, while rc3 fails to read NOR due to present but faulty reset circuit

15:02 <wpwrak> read or write, maybe

15:02 <lekernel> I have not had or heard of a non-reflashable rc2 board ...

15:02 <lekernel> wolfspraul, do you have rc2 boards that can't be reflashed?

15:02 <lekernel> with a fully non responsive flash chip?

15:03 <wolfspraul> could be, wait (checking wiki)

15:03 <wpwrak> lekernel: do we know the chip is unresponsive ? or does flashing just give up on the first offense ?

15:03 <lekernel> it seems urjtag doesn't even detect the flash chip with CFI. of course this has to be confirmed.

15:05 <wolfspraul> we have to ask Adam tomorrow, there are several marked 'hold' http://en.qi-hardware.com/wiki/Milkymist_One_RC2_Test_Plan#Report_of_Milkymist_One_RC2_Board

15:06 <wolfspraul> 0x1A, 0x2C

15:06 <wolfspraul> but the problem with rc2 is that it's a smaller run

15:06 <wolfspraul> and most boards are sent out, with the tougher rc3 checks more might have shown problems

15:07 <wolfspraul> I don't think Adam has any single functioning rc2 board left

15:07 <wolfspraul> but there are 3-4 as 'hold', so they must have some issues

15:07 <wpwrak> there's always an internal pull-up on STS ?Â Â (replacing DNP R61)

15:07 <wolfspraul> but I would hesitate to put any one of them into the same rc3 'flash problem' category we can see quite clearly now

15:07 <wolfspraul> that's why I'm so eager to focus on this _group_ we can see clearly on rc3 now

15:08 <wolfspraul> so we don't get lost in some rare exotic cases

15:08 <lekernel> I'm not sure STS is used anywhere

15:08 <wpwrak> okay, let's ignore rc2 then. this means that we still have 0 confirmed NOR corruptions in rc3.

15:08 <lekernel> https://github.com/milkymist/fjmem-m1/blob/master/boards/milkymist-one/synthesis/system.ucf#L56

15:08 <lekernel> fjmem has it

15:09 <wolfspraul> wpwrak: yes, possible! [rc2 corrupts nor but rc3 has a different problem]

15:10 <wolfspraul> there is also still the chance that the 4.4v reset ic will do the magic

15:10 <wolfspraul> although verifying that will be hard

15:16 <wolfspraul> Sebastien said "urjtag doesn't even detect the flash chip with CFI"

15:16 <wolfspraul> he was probably looking at some log, but which one?

15:16 <wolfspraul> and what does that mean? the detection is probably also a longer sequence of signals going back and forth

15:16 <wolfspraul> could still be on the fpga or nor side, or in between :-)

15:21 <xiangfu> there is 'debug all' in jtag. which will output a loooot of message :D

15:22 <wolfspraul> add it as a commented-out line to all scripts

15:22 <wolfspraul> that's a good start :-)

15:24 <xiangfu> yes.

15:25 <kristianpaul> hum, a common thing is also the clock that feed fpga main clock (clk50), but i guess this is unlikelly a problem..

15:28 <kristianpaul> yeah, pld load will confirm at least fpga is okay, if you can get full soc to load, afaik bios still need to be read from NOR

15:34 <xiangfu> kristianpaul, yes. I will try to full load soc tomorrow. I maybe needs build a load-able soc bitstream tomorrow.

15:34 <Fallenou> hum I have troubles compiling rtems , it fails in the zlib part

15:35 <Fallenou> http://pastebin.com/DKddf0SZ

15:35 <kristianpaul> zlib magically solves by recompiling it again i think

15:35 <kristianpaul> compile and install

15:35 <kristianpaul> well that was time agoooo

15:35 <Fallenou> I guess now flickernoise is using rtems' zlib, since it's no longer in the requirements on the wiki

15:36 <Fallenou> too bad rtems zlib does not compile :o

15:36 <Fallenou> at least on my mac os

15:38 <wpwrak> maybe it's just an analog domain problem on the signals we can't tell by looking at the schmatics. or maybe it's a chain of events that sets off the trouble.

15:39 <wpwrak> anyway, next step: try to boot 0x39. while it does reconfigure, reboot. when if fails to reconfigure, try to read back the NOR. that should clarify the NOR corruption theory.

15:40 <wpwrak> (at least a little :)

16:41 <lekernel> Fallenou, (zlib issue) please post that to the RTEMS mailing list; I have it, JP Bonn has it, and my friend Ralf is denying any problem exists

16:41 <kristianpaul> (denying) that used to happen :-)

16:51 <Fallenou> lekernel: ahah ok

16:52 <Fallenou> lekernel: you opened a PR ?

16:52 <lekernel> no I posted on the ML and all I got was stupid replies from Ralf

16:53 <Fallenou> ok will post and then open a PR

16:53 <Fallenou> lekernel: how do you workaround it ?

16:53 <lekernel> typedef long z_off64_t in zconf.h

16:53 <Fallenou> ok

17:00 <Fallenou> ok found your e-mail

17:20 <Fallenou> bootstrap is damn slow

17:20 <Fallenou> is testing with their cvs head

17:30 <Fallenou> lekernel: it does work with their lm32_evr cvs head : http://pastebin.com/5JHfDyLj

17:30 <Fallenou> strange

17:30 <Fallenou> I did the same steps as in your last e-mail about this problem

17:30 <Fallenou> it built all lm32_evr bsp without any error

17:31 <Fallenou> will try milkymist bsp from their cvs head

17:35 <lekernel> lm32_evr didn't work for me either

17:36 <Fallenou> for me it did work

17:36 <Fallenou> gotta go bbl

20:01 <Thihi> http://kukka.siilo.fi/~kuutio/11-08-13-kissastuskausi.mkv - you guys might be interested in this. A small sample of what I do with a projector and a camera. Music has been ripped off from Boards of Canada.

22:08 <wpwrak> wonders if the NOR problem could still be INIT_B -> FLASH_RESET contamination

22:09 <wpwrak> e.g., if "fix2" has a design flaw or if it frequently gets implemented in the wrong way

23:13 <wpwrak> one test could be to remove D16 (FLASH_RESET_N to reset out). this should then remove any contamination, but may bring back the NOR corruption.

23:13 <wpwrak> oh, and an alternative to using logic gates instead of the diodes in rc4 could be to have a second reset chip, dedicated on FLASH_RESET_N.

23:21 <wpwrak> that could also be used to test whether properly separating FLASH_RESET_N from PROGRAM_B_2 and INIT_B would solve all the NOR problems. i.e., remove D16, add a reset chip in parallel to the existing one, and let it drive exclusively FLASH_RESET_N