<aw_>
if i want to dump from falsh, use "readmem ADR LEN FILENAME", my question is that with my met problem on "No boot medium found", What's the ADR I need to fill?
<aw_>
"readmem 0 ??? filename"
<kristianpaul>
wait give i'll give you right instructions
<aw_>
how long of copied memory length I should.
<kristianpaul>
whole nand? :-)
<aw_>
yeah, but i don't think that i need to dump all. shall i?
<kristianpaul>
i think, you should
<aw_>
also I don't know the allocation of current images they are.
<aw_>
when you have time, could you help on viewing secrets probably hidden behind two dumped texts?
<lekernel>
you dumped at 0x02920000 ?
<aw_>
readmem 0 0x02920000
<lekernel>
that's the filesystem
<aw_>
LEN = 0x02920000
<lekernel>
ah, 0!
<aw_>
yeah..my background color is different with normal one.
<lekernel>
so you dumped the complete flash, right?
<aw_>
yeah..from 0x0
<aw_>
i should say from 0 to 0x02920000
<aw_>
i was wrong on steps?
<lekernel>
obviously yes: the two files do not have the same size
<aw_>
yeah....0x14 have a huge size of 24.3MB, 0x2c is only 5.4MB
<lekernel>
just dump the complete thing (32M)
<lekernel>
btw you can use vbindiff to directly compare the binary data
<aw_>
hm..try now..
<aw_>
sorry, i misunderstood, vbindiff? I didn't know this.
<aw_>
hmm...good i just googled.
<aw_>
I'll not immediately replace 0x2c's chip except i have hard info said it was totally failure. So you think that it has been already failed?
<kristianpaul>
ah, vbindiff, nice :-)
<lekernel>
no, it seems fine
<lekernel>
don't replace anything
<aw_>
lekernel, seems fine? hmm...that's good. how to verify that it's fine. is there any obvious info. can you teach me? if it's hard, doesnt matter. :-)
<kristianpaul>
yeah
<lekernel>
well, if detectflash finds it, and if the data is not complete garbage, the flash chip is probably OK
<lekernel>
have you read the complete 32MB already?
<aw_>
with 0x2000000(LEN)?
<kristianpaul>
aw_: why you dumped from different ADR in the both samples?
<aw_>
different ADR? No, i did with the same "readmem 0 0x2920000" on different boards(0x2c, 0x14).
<kristianpaul>
ah, but size of dump is dramatically different between two boards, why?
<aw_>
yes, that's totally confused me a lot!
<aw_>
i dont know.
<kristianpaul>
okay let me see here
<aw_>
that's why i was thought if flash chip already got failed?
<scrts`>
hi
<kristianpaul>
aw_: the smaller one is a "bad" board?
<aw_>
kristianpaul, exactly
<scrts`>
hm, who made milkymist pcb? lekernel?
<kristianpaul>
scrts`: hu
<kristianpaul>
hi*
<aw_>
scrts`, hi, we sharism.cc made.
<scrts`>
I can't find how to add fiducials in altium :\
<scrts`>
maybe someone knows? :)
<aw_>
scrts`, are you saying adding optical fiducial on board edge for smt auto mounted reference?
<scrts`>
yes
<aw_>
hmm...we let pcb maker added those fiducials.
<scrts`>
hmm... :\
<aw_>
those fiducials you should reference to your smt manufacturer firstly then get criteria they can make on their smt machine carrier/conveyor, then feedback/add into to your own design or just let pcb maker helped you.
<aw_>
pcb maker will serve you. :-)
<aw_>
and build suitable panel gerber for you. :)
<scrts`>
nah, he told me to do the fiducials
<scrts`>
since the company has a machine, the fiducials can be placed as I want
<scrts`>
I can;t find the function of the fiducial placement
<scrts`>
it could be a simple pad, but there has to be certain layers/masks..
<kristianpaul>
aw_: 0x14 board is the one with "No boot media found" isnt?
<aw_>
No, 0x2c board is the one with "No boot media found".
<kristianpaul>
hehe
<kristianpaul>
ah yes sorry
<kristianpaul>
aw_: 0x2c board have a boot splahs screen wich said rescue in the botton, inside a red square?
<aw_>
kristianpaul, no, it didn't appear in rescue in the screen. :(
<kristianpaul>
ok ignore that question
<kristianpaul>
anyway 0x2c dump dont make sense to be smaller is is the good one
<aw_>
kristianpaul, maybe i just reflash 0x2c & 0x14 then compare them again?
<kristianpaul>
aw_: :-)
<aw_>
kristianpaul, you smiles means ok? or not? i know it's bad idea.
<aw_>
but why not? :-)
<kristianpaul>
aw_: i think you should read again 0x2c
<kristianpaul>
Then only diff i can see now are some bytes in standy bitstream
<kristianpaul>
I dunno if this a serial like number included in bitstream may be lekernel can add something?
<aw_>
scrts`, i don't know how to add. sorry that.
<kristianpaul>
aw_: Btw, What you think is the goal of dumping nand from the two boards and comparing then?
<lekernel>
aw_: what is the current status of 0x2c and 0x14 boards?
<lekernel>
do you have one which no longer boots at all?
<lekernel>
or did it "revive"?
<aw_>
since i can't see/realize both them from dump, if you here can.
<aw_>
yeah...0x2c now it revived.
<lekernel>
urgh...
<lekernel>
so you never have the "No boot medium found" error again?
<aw_>
the day before yesterday, it can not booted anymore even.
<lekernel>
ok, and now it always boots?
<lekernel>
and what is the problem on the 0x14 board?
<aw_>
no...after used Sch 3. it can be booted even shows flicknoise....then after 9 ~ 10 times fast power cycling...it shows "No boot meddium found"..
<aw_>
and under ">BIOS"..
<lekernel>
what is the problem on the 0x14 board?
<kristianpaul>
lekernel: that process of flashing with impact included some file system with a wallpaper and patches on it?
<aw_>
then i kept trying to fast power cycling...then finally can not booted anymore.
<aw_>
0x14 board with RC2 schematic(Sch. 1)
<lekernel>
kristianpaul: no we don't flash a filesystem atm
<aw_>
and I let it booted to dump flash info to compare.
<lekernel>
we should, but as I highlighted yesterday making the images is rather messy
<lekernel>
ok
<lekernel>
and, on the 0x2c board, have you experienced the "no configuration" bug?
<aw_>
so i tried to see here if you can see any obvious err on both boards.
<lekernel>
imo the "no boot medium" error has nothing to do with the flash
<aw_>
lekernel, yes, the 0x2c has experienced the unbooted condition.
<lekernel>
have you experienced "no configuration" (ie LEDs weakly lit and no serial output) on 0x2c?
<lekernel>
there are many unbooted conditions :)
<aw_>
yes. 0x2c has weakly for surely before
<lekernel>
even with Sch. 3?
<aw_>
also no serial output
<aw_>
all Sch. 1, 2, 3. on 0x2c had have.
<lekernel>
so, even when you applied the "sch. 3" modification, you still have the intermittent no configuration problem
<lekernel>
whose symptom is LEDs weakly lit after power up and no boot at all (no serial output)?
<lekernel>
right now we are tracking down the _configuration_ problems, so anything that can yield a serial output is a PASS
<aw_>
sometimes no SPLASH screen shown when no configuration. but not everytime.
<lekernel>
when there's no configuration, there is NEVER a splash screen...
<Fallenou>
lekernel: ubuntu may have soon a replacement for  X Window System :) it's called Wayland
<Fallenou>
it's in 11.04 repositories
<lekernel>
aw_: the reset IC is ONLY meant to track down "no configuration" issues
<aw_>
lekernel, hm..i should say more precisely, sorry
<lekernel>
anything else, like that "no boot medium found" error, is irrelevant
<aw_>
surely the reset IC is usefull
<lekernel>
so, after you have put that reset IC and the diode connected as in Sch. 3, the FPGA always configures reliably?
<aw_>
when i see there's no splash shown, the D2/D3 either weakly  lights or ON. that's so confused me!
<lekernel>
weakly lit is bad
<lekernel>
but the configuration problem happens even before you boot
<lekernel>
just apply power, and see if the LEDs are weakly lit or not
<aw_>
No, at the beginning of about 9~ 10times with Sch 3. it works well even with logo/splash shown...
<lekernel>
ok, we don't care about high level things like logo/splash for now
<aw_>
before 9 ~ 10times, the D2/D3 works well.
<lekernel>
we just care about whether the fpga always correctly loads its STANDBY bitstream (ie that waits for the middle pushbutton press) after power is applied or not
<lekernel>
which is what the reset IC is meant to fix
<aw_>
lekernel, second..connect 0x2c again.
<lekernel>
you can tell if it works or not by power cycling and checking that the FPGA is immediately configured or not
<lekernel>
checking for a configured FPGA can be done by checking that the LEDs are off (not weakly lit) and/or that the DONE pin is high
<lekernel>
do not press any pushbutton
<lekernel>
just apply power and check the FPGA configures itself
<kristianpaul>
Fallenou: over flow !!
<Fallenou>
=(
<Fallenou>
damn it
<kristianpaul>
yeap
<Fallenou>
anyway the "memory allocated on stack" is indeed a bug, but it shouldn't not trigger overflow
<Fallenou>
so I had little hope to fix the overflow with that
<aw_>
ok...now it works well-configured and DONE pin finally pull high well.
<Fallenou>
the only thing it can fix is that we have less chances to send garbage now :)
<lekernel>
aw_: does it do that all the time and very reliably or not?
<Fallenou>
thanks you for the test kristianpaul but now when the over flow does not happen, at least it should be more reliable (less garbage packets sent)
<lekernel>
aw_: may I remind you the problem we had before, and that the reset IC was meant to fix, was that in 1% of the power ups the FPGA did not load the standby bitstream from flash
<lekernel>
if we can power cycle the board, say, 1000 times, and the standby bitstream load always works, we can conclude the reset IC fixed it
<lekernel>
that's the first thing to try...
<kristianpaul>
Fallenou: (send garbage) oh, let me test that too
<aw_>
lekernel, the condition is tested by normal power on. if said using 1000 times testing when just delay 200ms (not use fast power cycling ) i agreed this approach to confirm reset ic completely needed.
<Fallenou>
kristianpaul: yes, with the dma on the stack, the chances were high to send garbage, since the stack gets changes as soon as the send() function returns
<Fallenou>
kristianpaul: so now it does not depend on the stack state anymore
<kristianpaul>
Fallenou: i dont remenber, you did test ttcp from rtems to host, isnt?
<Fallenou>
humm yes I think so
<Fallenou>
I tried both
<kristianpaul>
still not working here :-/
<Fallenou>
:/
<Fallenou>
I will try again but I am pretty sure I did
<lekernel>
aw_: so, have you successfully loaded the standby bitstream from flash in 1000 successive power cycle attempts?
<Fallenou>
kristianpaul: btw yesterday I tried sending *and* receiving small files (like 400 kB) via FTP
<lekernel>
with the reset ic connected?
<Fallenou>
kristianpaul: with md5sum correct in the end
<aw_>
lekernel, bad...start to count now. (:
<Fallenou>
kristianpaul:Â Â I first send a file to rtems, then receive it back and test it's integrity, it worked, but I must say with a 9 MB file I had a failure on the md5sum
<Fallenou>
kristianpaul: Will do more tests with big files
<kristianpaul>
Fallenou: oh, even on qemu...
<Fallenou>
yes
<aw_>
lekernel, sure with Sch. 3 on 0x2c board.
<Fallenou>
Will re-do the test to confirm
<Fallenou>
maybe it was a problem related on the filesystem
<lekernel>
aw_: if yes, then we always put the reset IC and we're done with that
<Fallenou>
since I used the memory card
<lekernel>
great, one less problem :)
<aw_>
lekernel, don't know yet..stay tuned..
<Fallenou>
how are you going to do 1000 power cycles and test DONE pin state ? :o
<Fallenou>
will take hours
<Fallenou>
is not familiar with tests in PCB factories
<kristianpaul>
i think he have a hardware for automate that i hope ;-)
<kristianpaul>
wow 1000, i tought china's general rule was 50 ;-)
<aw_>
kristianpaul, i agreed this 1000, why not?
<kristianpaul>
aw_: sure go ahead ! :-)
<kristianpaul>
will be fun
<aw_>
if not, we try other..(:
<lekernel>
Fallenou: if it weren't hard, it wouldn't be called hardware :p
<aw_>
lekernel, hey..wait. you said that loaded standby bitstream from flash?
<lekernel>
and maybe not 1000, perhaps say 300-400 would suffice ;o)
<kristianpaul>
500 ! ;-)
<lekernel>
aw_: yes, the FPGA loads a standby bitstream from the flash immediately after power up
<kristianpaul>
so less bugs will appear
<lekernel>
it is this bitstream that waits for the middle pushbutton to be pressed and triggers the boot process
<aw_>
am i just watching they are all good display on splash you tried to say?
<aw_>
hmm..got it, i always do like this!
<lekernel>
no, NO SPLASH
<lekernel>
the splash screen is displayed WAY AFTER the standby bitstream...
<aw_>
i pressed middle pushbutton surely
<Fallenou>
just check the DONE pin
<aw_>
also check DONE pin on scope!
<Fallenou>
oh ok
<lekernel>
when you see the splash screen, not only the standby bitstream was loaded, but also the milkymist soc bitstream, which ran the BIOS from flash, which carried out DRAM memory tests, enabled the video controller, and finally loaded the splash screen from a flash partition
<lekernel>
lots of stuff
<lekernel>
right now we test something very simple, the standby bitstream loading
<aw_>
ok
<Fallenou>
unit testing in hardware :)
<lekernel>
that's what we had problems with, not the rest
<lekernel>
do not touch the pushbuttons. if you are thinking of a test for the reset ic that involves pressing the pushbutton, it is wrong
<lekernel>
you can either check the DONE pin or the LEDs
<aw_>
hmm..just see either DONE pin or LEDs, i see.
<aw_>
21 times..now
<lekernel>
"LEDs weakly lit" is equivalent to "DONE pin low"
<aw_>
yes,,,,i discovered it.:-)
<aw_>
lekernel, reminding that i power on normally not fast power cycling surely.
<aw_>
man! if i have power relay to switch programly that would be wonderful.
<lekernel>
yeah, that's definitely something very useful for this kind of test
<lekernel>
along with automonitoring of the done pin ...
<lekernel>
it'd make an interesting project: scriptable relay and measurement system
<aw_>
btw, do you think this maybe later you can help on easy main c code then I survey relay card on web firstly?
<wolfspraul>
Fallenou: in hardware and especially manufacturing nobody is afraid of 1000 times anything
<wolfspraul>
how about 100k whatever step, or 1kk, or more? :-)
<wolfspraul>
1000 times, whatever it is, is almost zero, right Adam? :-)
<wolfspraul>
(just kidding, but partially it's true. 1000 times some test - no big deal) in a factory you easily have a lot of workers, so after you did the first 50 yourself, you introduce some kid to the procedure, and go out for lunch :-)
<wolfspraul>
I've personally reflashed 2000 Ben NanoNote by now, although on the second thousand I cheated by offloading some steps to nice and helpful worker hands...
<aw_>
wolfspraul, wait..if this can pass over 500 times, surely we have confidence on adding reset ic idea. i agreed since it's stupid tests but extremely tests.
<wolfspraul>
that was a lot more relaxing, I could drink a coffee and watch them flash
<wolfspraul>
and yes, you can automate everything, but that costs time and money again so the question is whether the 1k becomes 10k or not, or when, and then you decide whether it's worth to automate it or not
<kristianpaul>
Fallenou: i cant confirm this right now, but i think transfer (than send) data from mm1 now is faster and work with ~ 3Mb files so far
<aw_>
lekernel, btw, would you also try to think that if it's passed then what steps..we should...
<lekernel>
if it's passed, we put the reset IC, period!
<aw_>
because I was surely that I ran into unbooted status when fast power cycling..
<lekernel>
the only unbooted status that matters here is "no configuration"
<lekernel>
all other bugs are irrelevant to this case
<wolfspraul>
aw_: was that with weakly lit leds?
<wolfspraul>
if I understand lekernel correctly the best criteria to test the reset IC now is to watch for weakly lit led
<lekernel>
to watch for weakly lit leds *immediately after power up*
<wolfspraul>
if you don't see that, it's not the problem we are trying to fix
<lekernel>
after power up, actually the LEDs should be weakly lit for less than a second (while the fpga reads the standby bitstream from flash) and then go completely off
<aw_>
wolfspraul, right, like lekernel just said. it immedaitely leds(D2/D3) on after power up.
<wolfspraul>
lekernel: can they go back to weakly lit later?
<lekernel>
yeah, if the standby bitstream fails to reload the final bitstream for example
<lekernel>
when you push the middle button
<aw_>
sorry I should precisely said "weakly" lit during 200ms
<lekernel>
but that would be a different bug
<aw_>
then off...this is OK.
<wolfspraul>
ok got it
<lekernel>
aw_: yes, that's perfect, and the behaviour that the boards with the reset IC should always have
<aw_>
i must be have because configuration datasheet said that PROGRAME_B must pull low until power on ready on flash!!!
<aw_>
we just all missed this important info while design. :-)
<lekernel>
actually, the fpga only reads the flash after some 5 ms after power up
<aw_>
surely it didn't say to use rest ic.
<lekernel>
while the flash is ready in a few hundreds microseconds
<lekernel>
so it was supposed to work
<aw_>
but surely reset ic itslef has this behaviour.
<aw_>
yeah..also recommended P3(300us) on flash chip..well...we learnt this important stuff this time too.
<zumbi>
lekernel: how did you get spartan6 chips? are those available to buy?
<aw_>
130 times
<lekernel>
zumbi: yeah, iirc you can find them at digikey
<lekernel>
otherwise just go to some xilinx trade show and you'll find plenty of distributors that will sell you some (and ask lots of questions)
<aw_>
200 times
<lekernel>
aw_: looks good
<aw_>
300 times
<lekernel>
good
<aw_>
400 times, good :-)
<aw_>
i' happy now we passed 500 times.:-)
<kristianpaul>
checks deadairspace
<lekernel>
good, so this reset ic definitely fixed that startup problem we had
<wolfspraul>
lekernel: do we want to increment the hardware revision counter for rc3?
<lekernel>
yeah, let's increment it all the time
<lekernel>
even for the slightest pcb change
<wolfspraul>
ok
<aw_>
600 times
<kristianpaul>
oops "452 Error writing file."
<aw_>
700 times, good.
<aw_>
kristianpaul, what's that?
<kristianpaul>
Fallenou: wow, 500Kbytes/sec sending data from MM1, Thanks !!! i can start some work now that at least i can transfer data and faster
<kristianpaul>
aw_: ah, he and error when transfering a ~9Mb file to rtems running on MM1 :-)
<Fallenou>
kristianpaul: oh it improved the speed ? really ?
<kristianpaul>
s/adn/an
<kristianpaul>
Fallenou: YEAH
<Fallenou>
oh oh :) nice ! but I don't understand why :p
<kristianpaul>
lol
<kristianpaul>
Actually before that i wasnt able to transmit big (< 1Mb) from the mm1
<kristianpaul>
Fallenou: how i can create a dummy big file with rtems shell? ie dd ... but no dd so..
<kristianpaul>
ah yes dd is there !
<Fallenou>
:)
<Fallenou>
yes
<Fallenou>
but you can create the file on your computer
<Fallenou>
and send it via ftp
<Fallenou>
in /ramdisk
<kristianpaul>
no
<kristianpaul>
i mean yes, but i want a 20MB file :D
<kristianpaul>
that crash the board..
<kristianpaul>
ergg rtems
<Fallenou>
ok
<kristianpaul>
what is /ramdisk for?
<Fallenou>
it's a ramdisk :)
<kristianpaul>
hmm
<Fallenou>
it's RAM, mounted as a disque
<Fallenou>
disk
<kristianpaul>
ah i see
<Fallenou>
it's faster and more reliable than a filesystem
<kristianpaul>
hm interestinf
<kristianpaul>
oh
<Fallenou>
to do benchmarking of ethernet
<Fallenou>
you should use the ramdisk
<kristianpaul>
oh, yes
<kristianpaul>
i dint knew it
<kristianpaul>
thaks
<kristianpaul>
thanks*
<kristianpaul>
Fallenou: whats the /dev/zero equivalent in rtems?
<Fallenou>
don't know
<Fallenou>
maybe there isn't one
<Fallenou>
but you can easily code a driver that adds a /dev/zero
<Fallenou>
take the gpio code, remove all useless code
<Fallenou>
and put return 0 somewhere
<Fallenou>
and you got it
<aw_>
800 times
<Fallenou>
aw_: :)
<kristianpaul>
dammn, rtems got freezed when trying a rm..
<Fallenou>
oh
<Fallenou>
where did you rm ?
<kristianpaul>
in /ramdisk
<kristianpaul>
bad?
<Fallenou>
don't know, shouldn't freeze :/
<Fallenou>
check in the code what is mounted in /ramdisk
<Fallenou>
I didn't actually check if it is really ram
<Fallenou>
but it should be
<kristianpaul>
he, i filled
<kristianpaul>
4328960 bytes and no more space left :p
<Fallenou>
oh ok :)
<Fallenou>
maybe that's the reason
<Fallenou>
you can maybe increase the amount of ram mounted in the source code
<kristianpaul>
yes
<kristianpaul>
4Mb is too small :/
<Fallenou>
it's the same in linux
<kristianpaul>
ah
<Fallenou>
default ramdisk size is 4 MB
<Fallenou>
but you can increase it
<kristianpaul>
sure
<Fallenou>
it's usefull to test hard disk/sd card speed
<Fallenou>
or ssd
<kristianpaul>
yeah
<Fallenou>
instead of copying from a disk to another
<kristianpaul>
nice i learn  a new thing
<Fallenou>
you do a big ramdisk and then copy it to the disk you wanna test
<kristianpaul>
yeah IO is pain for benchmarking  this things
<aw_>
900 times
<kristianpaul>
:D
<aw_>
YES! 1000 times. MAKE IT! Although it's done manually. :-) Likely "manually reliable test on power start-up" test item. :-)
<wolfspraul>
lekernel: let's assume the diode/reset ic fixed the original bootup bug completely.
<wolfspraul>
and let's further assume that there are more bugs at some later stage in the boot process
<wolfspraul>
what is the likelihood that any of those further bugs require hardware modifications to be fixable?
<wolfspraul>
as opposed to them being fixed by modifications on the software side
<aw_>
I'll try to think about this on fast power cycling test. ;-) Although my python is still studying..
<wolfspraul>
calling it a day, I see the answer to my question later :-)
<wolfspraul>
good news now, seems one real bug is fully fixed
<lekernel>
I haven't noticed other hardware bugs so far
<aw_>
wolfspraul, definitely fixed on start-up without configuration. ;-)
<wolfspraul>
lekernel: so those other things adam described there are all things that can be fixed in software?
<wolfspraul>
crc error, gray screen, no boot medium, no background picture, etc.
<wolfspraul>
he didn't just 'describe' them, he actually took pictures :-)
<lekernel>
those sound like DRAM issues
<wolfspraul>
fixable how? do you want Adam to do more testing, to try to reproduce something?
<lekernel>
but PCB-wise the DRAM is fine...I did a lot of testing last summer
<wolfspraul>
that reminds me that one time, on one board, only one of the DRAM tests in your test program failed
<wolfspraul>
but the others passed
<lekernel>
it could be solder issues, semiconductor process variation issues (software fixable) or I/O timing calibration bugs (also software)
<wolfspraul>
maybe it was hammer or crosstalk? need to check the records.
<lekernel>
yeah, we had that too
<lekernel>
this one is weird
<wolfspraul>
we saw this once, definitely (it's in the log file), but then it went away
<lekernel>
that and the video chips that mysteriously burnt themselves...
<lekernel>
on the reworked boards
<lekernel>
(rc1)
<wolfspraul>
board 0x24
<aw_>
no, that dram problem i still have it(0x24)
<aw_>
rught
<aw_>
right
<wolfspraul>
crosstalk fails in about 1 of 10 attempts
<wolfspraul>
also tricky to keep this in mind to improve the testing software for the next run
<lekernel>
did the PCBs go through extensive e-test?
<wolfspraul>
we already searched around that 0x24 with x-ray for >30 minutes
<wolfspraul>
tons of pictures, nothing found
<wolfspraul>
(not to answer your e-test, just soldering issues and other x-ray identifiable problems)
<wolfspraul>
well once Adam settles the diode/reset ic fully, he will turn to some of those others boards and issues we found
<wolfspraul>
like the 0x24 hammer/crosstalk case
<wolfspraul>
or others - Adam has best overview
<lekernel>
otoh I have run tests involving several dozens of terabytes transferred on RC1 on random addresses, and it all passed to the slightest bit
<aw_>
lekernel, what's your e-test meaning? impedance test on traces about dram?
<wolfspraul>
sure sure, but once you produce more and more units you find more and more irregularities. that's hardware.
<lekernel>
no, test that PCB traces do not touch one another and are not broken
<wolfspraul>
we have this 0x24 and it's on hold (won't be sold), and maybe we can learn something or introduce some fix, whatever fix, even if just to the testing software
<wolfspraul>
that would be immediately visible in x-ray, I would think
<lekernel>
dunno, could be subtle and in one small location
<wolfspraul>
anyway I'm jumping around.
<wolfspraul>
diode+reset ic first
<lekernel>
and there are hundreds of DRAM traces
<lekernel>
i'd say go for rc3
<aw_>
hmm..the rc1 & rc2 did sure a test called open-short test while producing pcb. i always asked them do this on demand and paid. :-)
<wolfspraul>
yes but we had a big x-ray session with that particular board, trying to hunt down dram issues specifically.
<wolfspraul>
no, Adam will definitely try to dig into some of the other cases we found
<wolfspraul>
not indefinitely, but now that the bootup bug is fixed finally he will have a clear mind to look a little into those other things - they were immediately sidelined at the beginning
<aw_>
the open-short test is the one connecting touch both sides on traces. ;-)
<wolfspraul>
I think that 0x24 is one of the more interesting open cases.
<wolfspraul>
maybe others too
<wolfspraul>
that's Adam's call
<wolfspraul>
the reason I'm worried (and happy) about 0x24 is that if a test only fails in 1 out of 10 runs, that means 90% of this type of bug will slip through unnoticed
<wolfspraul>
so we don't even know whether other boards, even ones already sold, might have the same problem. we certainly didn't run the memory tests 10 times on all 40 boards.
<wolfspraul>
cannot go into rc3 like that, imho
<lekernel>
I can modify the test program to run the crosstalk test 100 or more times
<lekernel>
it's easy and that test is fast
<wolfspraul>
yes
<wolfspraul>
that's a good first step
<wolfspraul>
I think
<wolfspraul>
aw_ - up to you really, not me.
<lekernel>
btw, how do you test the nanonote sdram?
<wolfspraul>
don't know, those are proprietary tools and test stations that weren't 'freed' yet :-)
<wolfspraul>
not that there are big secrets there, but just nobodu had the time yet to dig this up
<wolfspraul>
NAND testing is quite sophisticated though, they have special fixtures and software from Samsung
<aw_>
lekernel, I'll take some times on 0x24, meanwhile u could provide me a s/w that can only test dram in 1000 times. :-)
<lekernel>
ok
<aw_>
because I didn't replace that 0x24 two drams. At the beginning of rc2, I reworked too much.
<lekernel>
running git bisect for gcc... found bad/good commits for the lm32 breakage
<aw_>
the one may just some probably easy bug on producing not design, but dont know yet. so thanks that if have routinely s/w can test dram. :-)
<wolfspraul>
the notes of 0x24 say U14/U15 was resoldered, but crosstalk test still fails in 1 out of 10 times
<wolfspraul>
maybe we could also try to not just resolder, but replace U14/U15 with new dram chips? maybe it's a problem in one of the two chips?
<aw_>
wolfspraul, yes, i knew, but reworked board with very unknown condition. if I replace surely IÂ Â know that you would say it's probably a 'match' problem potentially from fpga to dram.
<wolfspraul>
if the crosstalk test is fast, it's always a good idea that Sebastien lets it run 500 or 1000 times, whatever is still bearable (say a few seconds or so)
<aw_>
so first with a routinely s/w (most likely reliable test) be provided, then things will be clear then.
<wolfspraul>
alright I'm out, late
<wolfspraul>
n8
<wolfspraul>
good news on the bootup bug - thanks a lot for the solid work!
<aw_>
thanks all here...eastern world is sleepy...;-)
<Fallenou>
what is this crosstalk bug ? (don't understand crosstalk meaning)
<lekernel>
Fallenou: crosstalk is when current or voltage in one signal induces an unwanted voltage in a nearby track because of high frequency effects
<lekernel>
one of the MM1 board from the RC2 batch appears to have such a problem on the DDR data lines - or, at least, the test detects it as such
<lekernel>
it could also be DC "crosstalk" like a short circuit :)
<lekernel>
hmm... it seems Jon's patch fixed the dwarf issue, then we run into another one
<lekernel>
is it possible to tell git-bisect not to touch some files?
<lekernel>
maybe I can hack that into the git bisect run script... rewrite the files, run the test, restore the files :)
<lekernel>
what a mess...
<Fallenou>
so this is the patch git bisect points out lekernel ?
<Fallenou>
the on you pasted
<Fallenou>
s/patch/commit/
<lekernel>
dwarf issue that Jon fixed (confirming that atm)
<lekernel>
how can you test the return value of the last command in bash?
<lekernel>
ie if return value != 0 do something; exit
<lekernel>
yeah, Jon's patch fixed the dwarf related crash, then someone else broke LM32 again
<lekernel>
so I need git bisect to apply that patch everytime now
<larsc>
lekernel: $?
<larsc>
is last return value
<lekernel>
if $?; do
<lekernel>
echo prout;
<lekernel>
done
<lekernel>
this doesn't work
<larsc>
if [$? -ne 0]; then echo foo; fi
<larsc>
with extra space after the [ and before the ]
<lekernel>
phew, thanks :)
<lekernel>
I always spend 10 min figuring out that damn syntax everytime I need a bash script
<larsc>
hehe
<lekernel>
Bisecting: 2444 revisions left to test after this (roughly 11 steps)
<lekernel>
here we go again...
<lekernel>
we should set up a server that tests lm32 weekly... this would prevent the lm32-breaking commits from accumulating
<lekernel>
people like Ulrich Drepper probably love to break LM32, since this annoys "minorities" that don't help him in his "fight against Microsoft and Apple"
<larsc>
doing automated regular testing sounds like a good idea, but i guess you need go testcases then
<wpwrak>
lekernel: (testing $?) how about    command || exit    ?
<lekernel>
I have two commands
<lekernel>
command || exit can also be replaced with set -e
<wpwrak>
command1 && command2 || exit
<lekernel>
but I need to do something (in this case, revert the patch so git bisect doesn't stop) before exiting
<kristianpaul>
tee?
<wpwrak>
ah, command || { cleanup; exit; }
<lekernel>
ok
<kristianpaul>
ah
<lekernel>
maybe next time :)
<wpwrak>
or  set -e  combined  with trap 0 ...
<lekernel>
anyone ever tried xynth?
<lekernel>
looks interesting
<lekernel>
(the windowing system)
<larsc>
never heard of it before
<kristianpaul>
same here
<Fallenou>
and wayland ? :p
<lekernel>
wayland sounds like a big mess to get to work on rtems (or even uclinux)
<kristianpaul>
:-)
<wpwrak>
in the end, X will win. it always does ;-) it's kinda like street gangs fighting the highlander :)
<kristianpaul>
xynth written i C, nice
<larsc>
wpwrak: even keithp says wayland is the future