<aw>
0x32: fix2b, stopped @ 'Bitstream length: 1484404' while reflashing...
<aw>
0x32: tp36 - 690mV, tp37 - 793mV
<aw>
the voltage is not at correct Low or High, i am going to power off
<aw>
voltage of tp36, tp37 is the same. if first flash was not successed before, it seems that keep to stop at 'length: 1484404'...go for another board.
<wolfspraul>
hmm
<GitHub120>
[scripts] xiangfu pushed 1 new commit to master: http://bit.ly/q2vsz7
<GitHub120>
[scripts/master] add debug all to jtag - Xiangfu Liu
<xiangfu>
aw, Hi
<aw>
xiangfu, hi, any news?
<wolfspraul>
aw: let's look at one more board
<wolfspraul>
with fix2b
<wolfspraul>
although I think we already know it's no the magic solution yet
<wolfspraul>
if wpwrak is here we can look into 0x32 more, otherwise fix other bugs first, don't apply fix2b to a lot of boards until we know more
<aw>
0x39: D16 (in-circuit); forwarding voltage - 152mV, reversing voltage - 1545mV
<aw>
0x32: D16 (in-circuit); forwarding voltage - 153mV, reversing voltage - 1114mV
<aw>
so I am going to replace a new D16 firstly to see if this problem
<wolfspraul>
aw: I just noticed 0x32 is a board that never rendered before
<wolfspraul>
that could be a different problem...
<aw>
wolfspraul, yes
<aw>
it seems that different catagory failure
<wolfspraul>
possible
<wolfspraul>
what's your latest results with 0x32 now?
<aw>
btw: a good new diode(off-circuit): forwarding voltage - 153mV, reversing voltage - no voltage can measured
<wolfspraul>
you replace D16 on 0x32 with a new one?
<aw>
now 0x32: D16 (in-circuit) reversing voltage is 1114mV, but I replaced a new diode on board, it got 886mV, it must porgram_b loop let reversing voltage down a bit.
<wolfspraul>
does flashing work?
<aw>
also the replaced new D16, I tried to take apart and measure its reversing voltage is still good
<aw>
wolfspraul, i didn't do reflashing
<aw>
try again...if still can't reflashing ...leave it apart then
<wolfspraul>
ok
<wolfspraul>
let's look at 0x34 now
<aw>
maybe just another failure classification
<wolfspraul>
that one rendered before
<wolfspraul>
what is TP36/TP37 on 0x32 now?
<aw>
tp36 - 770mV, tp37 - 838mV , wrong
<wolfspraul>
hmm
<wolfspraul>
ok
<wolfspraul>
try 0x34
<aw>
yes, of course it can't reflashing and stop at 1484404
<aw>
right
<wolfspraul>
no need to test flashing with those tp36/tp37 values
<aw>
no
<aw>
i think this is good evidence. ;-)
<wolfspraul>
no
<wolfspraul>
makes no sense. I trust the tp36/tp37 values we measure.
<wolfspraul>
100%
<aw>
0x34: D16 (in circuit) forwarding - 154mV , rev. V - 1547 mV
<wolfspraul>
once we have hard data, let's use it
<wolfspraul>
well
<wolfspraul>
aw: one by one
<wolfspraul>
can we measure meaningful data in-circuit or not?
<wolfspraul>
if not, let's stop doing it
<aw>
wait wait
<wolfspraul>
if yes, those values mean the diode is damaged and needs to be replaced?
<aw>
let's test more boards and we see how reasonable For. & Rev. voltage they would be.
<wolfspraul>
wait
<wolfspraul>
I don't want all sorts of random data
<wolfspraul>
that's a bad time waste
<aw>
well
<wolfspraul>
aw: is the data meaningful?
<aw>
now 0x34 can reconfigure surely
<wolfspraul>
did you apply fix2b to 0x34 already?
<aw>
yes.
<aw>
you are like baby-watching though. ;-) no problems
<wolfspraul>
yes, sorry. try to understand the test data ;-)
<aw>
it must somewhere let in-circuit voltage gets low (without power on)
<GitHub15>
[scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd
<GitHub15>
[scripts/master] add debug all to jtag - Xiangfu Liu
<aw>
so later we consult with Werner, he may provide more details to us maybe. ;-)
<wolfspraul>
aw: you talk about measuring D16 performance in-circuit?
<aw>
yes
<aw>
For. & Rev. voltage measured before power-ed -on but in-circuit.
<wolfspraul>
ok
<wolfspraul>
alright, back to 0x34
<wolfspraul>
so it is booting now?
<wolfspraul>
I think you should reflash (reflash_m1.sh), and re-run all tests and rendering cycles (10)
<aw>
now 0x32 has worse Rev. voltage (below 1545mV), this means somewhere others influence D16's specification/behavior
<aw>
now to reflashing. ;-)
<wolfspraul>
see how it goes...
<wolfspraul>
aw: maybe the reset ic on 0x32 has a problem?
<wolfspraul>
(guessing)
<aw>
xiangfu, wow..man! your debug log msg is many..let's see...reflashing now...;-)
<xiangfu>
aw, for disable is. just remove the whole "debug all" line, just fyi
<wolfspraul>
I was worried about that. I hope your terminal history is enough. You may have to increase it so that we don't loose data.
<wolfspraul>
xiangfu: maybe it should be disabled by default
<wolfspraul>
we are currently (as of right now) not aware of any problem that 'debug all' may help us with
<wolfspraul>
so we can enable it when we run into such a problem
<aw>
wolfspraul, not enough to show history. ;-)
<wolfspraul>
yeah, well
<xiangfu>
this commit disable it by default: GitHub15> [scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd
<wolfspraul>
wpwrak_: I don't think a lot of diode problems
<aw>
so there's big FACTs now:
<wpwrak_>
(replace parts) in general, i would try to discard anything that got unsoldered (unless really really difficult to replace)
<wolfspraul>
correct, fully agree
<wpwrak_>
wolfspraul: i'm around "let's look at 0x34 now". just started catching up
<wolfspraul>
wpwrak_: basically we have a reference value for D16 now when measured in-circuit - ca. 150mV forward, 1545mV reverse
<wolfspraul>
when we see those numbers, we can assume D16 and C238 to be correct
<wpwrak_>
sounds reasonable. those 1.5 V are some obscure path, but that's the price of measuring in-circuit
<wolfspraul>
wpwrak_: ok, read top to bottom first...
<aw>
1. before I go to test these boards, just go for measure in-circuit voltage of D16, if not right. must be some other area is wrong, typical C238 and diode itself
<wolfspraul>
aw: let's try to fix 0x32 now
<wpwrak_>
ah, C238 acts up too ? interesting :) reading
<aw>
2. measure tp36 tp37 to confirm if 3.3V high enough
<aw>
good now is reflashing.....this won't stop at 1484404 there. ;-)
<aw>
now we have clear direction to fix these kinds of bugs. ;-)
<aw>
but bugs belongs to me Adam...;-)
<aw>
oah...man!
<aw>
after reflash 0x3A, let;s back to 0x32. ;-)
<aw>
oah~ no. 0x3A is d2/d3 dimly lit after reflash. :(
<wolfspraul>
no problem
<wolfspraul>
actually that's good
<wolfspraul>
aw: measure TP36/TP37
<aw>
tp36, tp37 is still 3.3V. good
<wolfspraul>
D16 forward/reverse (in-circuit)
<aw>
need to power off to measure
<wolfspraul>
wait
<wolfspraul>
d2/d3 is dimly lit right now?
<aw>
yes
<wolfspraul>
what was the process?
<wolfspraul>
1. you ran reflash_m1.sh
<wolfspraul>
2. it succeeded
<wolfspraul>
then what?
<aw>
wait
<wolfspraul>
you power cycled?
<wolfspraul>
or press middle button?
<aw>
1. I ran reflash_m1.sh
<wolfspraul>
wpwrak_: caught up?
<aw>
2. do nothing....until it terminal log shows finished and saw d2/d3 dimly lit
<aw>
i did nothing though. ;-)
<wolfspraul>
huh? did it finish flashing?
<aw>
no power off
<wolfspraul>
can you upload the log?
<aw>
yes, this failure was few cases in first round of tests though
<aw>
okay
<wpwrak_>
not yet. currently at the i/o redirection. maybe consider using "script"
<wolfspraul>
it may be a software problem only
<wolfspraul>
wpwrak_: ok so when you make it here :-)...
<wolfspraul>
basically fix2b worked well for 0x39 (yesterday) and 0x34
<wolfspraul>
it did not work for 0x32 and 0x3a (values see above)
<wolfspraul>
on 0x3A, it turned out that replacing D16 and C238 made it work (well, not 100% sure yet, see the dimly lit story just unfolding)
<wpwrak_>
btw, does reflashing still use "debug all" ?
<aw>
when you saw log, there's stop 1484404 there, after that I replaced C238 and diode. then can reflashed. ;-)
<aw>
but do nothing once reflashed done
<wolfspraul>
looks good
<aw>
yes
<wolfspraul>
still dimly lit now?
<aw>
sure
<wolfspraul>
press the middle button
<aw>
no any flash on leds
<aw>
no boot up
<wolfspraul>
ok
<wolfspraul>
now - power cycle
<aw>
now tp37 tp36 is stll good 3.3V
<wolfspraul>
ah wait
<wpwrak_>
what's the voltage on INIT_B ?
<wolfspraul>
no power cycle
<aw>
can't reconfigure after power cycle
<aw>
wpwrak, bad..
<aw>
i powered
<wolfspraul>
before we do measurements, I suggest to disconnect/reconnect the jtag-serial board, and flash again (remember to check that you flash in usb full-speed)
<wolfspraul>
this board was just flashed for the very first time, so it could be related to that
<wpwrak_>
a virgin board. maybe it's a little shy :)
<aw>
moment...the init_b is now at bottom side..phew~
<wpwrak_>
aw: ;-)
<wolfspraul>
aw: I suggest - reseat jtag-serial board, flash again
<wolfspraul>
maybe there was a problem writing into nor, whatever problem
<wpwrak_>
an item for the shopping list: lab at zero gravity ;-)
<wolfspraul>
and this was the first flashing. so it may be something totally different from our 'permanent reset' issue before.
<aw>
wpwrak, init_b = 3.3V while d2/d3 dimly lit
<wpwrak_>
that means that the FPGA is happy
<aw>
so now power off and replug jtag board and reflash again?
<wpwrak_>
maybe see if you can load the test program ?
<wolfspraul>
won't work, no reconfig
<aw>
wpwrak, once d2/d3 dimly lit, the middle btn is no action so that can not enter test s/w
<wpwrak_>
INIT_B = 3.3 V means either that the FPGA didn't even begin to reconfigure, or that it succeeded
<wolfspraul>
wpwrak_: theoretically a boot path entirely over jtag/fpga/sdram could be written, but a number of pieces are missing now
<wolfspraul>
I think we can load the bitstream over jtag, but then the bios has to come from nor
<wolfspraul>
but even for that we have no scripts ready now, right now
<wpwrak_>
wolfspraul: you need a devirginator ;-)
<wpwrak_>
(like we had at openmoko)
<wolfspraul>
yes I know
<wolfspraul>
people complained to me about inappropriate naming of technology by some rogue staff...
<wpwrak_>
;-))))
<wolfspraul>
to which I said it's beyond my control :-)
<wpwrak_>
so somebody noticed. i was wondering ;-)
<wolfspraul>
oh sure. this is actually not so pleasant to talk through with Taiwanese staff, female staff, etc.
<wolfspraul>
but we are all for free speech etc.
<wolfspraul>
in the US you would be in big trouble
<wpwrak_>
yeah. i never expected the name to stay around for long. so i'm quite surprised it did :)
<wolfspraul>
the problem is they take it serious, look it up in a dictionary etc.
<wolfspraul>
not so good
<wolfspraul>
:-)
<wpwrak_>
oh dear :)
<wolfspraul>
here you go. devirginator "A person who consistently sleeps with virgins i.e. removes their virginity or pops their cherry. Can be male or female."
<wolfspraul>
want me to discuss this with Taiwanese staff? no! please not!
<wpwrak_>
i think i got the idea from someone calling fresh-from-the-fab boards "virgin" boards
<wpwrak_>
heh :)
<wolfspraul>
well. they look it up.
<wolfspraul>
and that's what they find
<wpwrak_>
duly noted. need to find more obscure names
<wolfspraul>
nicely explained in Chinese maybe even
<wpwrak_>
the depravity of us westeners
<wolfspraul>
I should have suggested they schedule it to be added as an 'new words seen in the office' for the weekly English class
<wolfspraul>
move the problem to that teacher, so they earn their money...
<wpwrak_>
;-))
<wpwrak_>
make sure they all use it in daily conversation with other people :)
<wpwrak_>
our current board is 0x32, right ?
<wpwrak_>
to see what's happening, maybe monitor TP35 (DONE) with a scope when power cycling
<wpwrak_>
even better: monitor INIT_B too
<wolfspraul>
no it's 0x3A now
<wolfspraul>
but same case as 0x32 in that before fix2b, it never flashed or rendered
<wolfspraul>
aw: any update on 0x3A ?
<wolfspraul>
Adam is a little silent :-)
<wolfspraul>
wpwrak_: I vaguely remember one case in the US where a developer did something similar, naming some internal little tool in an 'inappropriate' way
<wolfspraul>
well, he had a nice little chat with general counsel or CEO or so, and then it got 'cleaned up' :-)
<wolfspraul>
all fine with his job etc. but that kind of stuff will just not be tolerated in the corporate US world
<wolfspraul>
so he ran around frantically trying to erase all traces of his neat little tool :-)
<wolfspraul>
the pussies are in control
<wolfspraul>
:-)
<wolfspraul>
ah Adam just told me he got interrupted, back soon. and I'm out to meet Jon. crossing my fingers...
<wpwrak_>
(us) yeah, that was of course part of the fun. knowing that this would never fly over there :)
<wpwrak_>
0x32 is in limbo, too ?
<wolfspraul>
put aside
<wolfspraul>
at that point we wanted to see some more fix2b results first
<wolfspraul>
because 0x32 never rendered before
<wpwrak_>
(more fix2b) sounds fair
<wolfspraul>
then we did 0x34 (which rendered before and fix2b turned it all good)
<wolfspraul>
and then 0x3A (which initially behaved same as 0x32 but then with C238 it got a little further, eventual resolution pending)
<wolfspraul>
that's where it stands now
<wolfspraul>
Adam thought 0x3A is a done deal, and he wanted to go back to 0x32, but then of course a problem still did show up on 0x3A
<wpwrak_>
in murphy we trust
<aw>
alright
<aw>
i am back
<wolfspraul>
ah, but I need to run. l8 and good luck!
<aw>
wolfspraul, sure
<aw>
wpwrak, so you got all histories of this moring test. ?
<aw>
wpwrak, hehe..
<wpwrak_>
still working on the backlog
<aw>
alright
<aw>
i think now i leave 0x3A apart firstly and back to see 0x32
<aw>
;-)
<aw>
but before this, i need to record first
<wpwrak_>
but i think if a board has okay voltages (after fix2b) but still has dim LEDs, the things to look at (with a scope) would be DONE and INIT_B. if INIT_B is inconvenient, use PROGRAM_B instead.
<aw>
i see. now mine is  0x3A
<wpwrak_>
DONE = TP35. at least that's easy :)
<aw>
so okay...that's scope TP35 to trigger with program_b?
<wpwrak_>
hmm, okay, trigger on PROGRAM_B rising
<aw>
man! 0x3A now is dimly lit again after power cycle
<aw>
let's see tp36, tp37 normal voltage first again
<wpwrak_>
let's say 100 ms/div, peak, ~3 div before, ~7 div after the trigger
<aw>
wpwrak, i think i need to scope init_b though trigger with program_b .;-)
<wpwrak_>
hmm, never finished configuration
<wpwrak_>
yes, INIT_B would be interesting then
<aw>
wpwrak, wait
<aw>
not sure
<wpwrak_>
pity you have only two channels
<aw>
fro rc2 the waveforms I scoped , i should set to more 250 ms/div and to see if done has been pulled high?
<aw>
wpwrak, aggreed?
<wpwrak_>
dunno. in rc2, DONE should rise within ~300 ms. here, you have ~700 ms
<wpwrak_>
but you can try. maybe the speed is variable / has gotten slower
<aw>
let me try..hope not miss more important info.
<wpwrak_>
another feature for your next scope: MEMORY :)
<aw>
wpwrak, yes, ch2 is over 8 div, and still no pull high...so even fpga didn't enter reconfigure stage
<aw>
wpwrak, ha..you can push Wolfspraul though..
<aw>
phew~ try init_b now
<wpwrak_>
(push wolfgang: yeah, i have a few ideas what needs to get bought if we should ever come across significant money. better scopes it pretty high on that list ;-)
<wpwrak_>
(alas, good scopes aren't cheap. the ones i have my eyes set on are all in the USD ~10k+ segment)
<aw>
yes, i remembered when i at OM, Wolfgang and Ruby tried to gather those info for you. ;-)
<wpwrak>
hmm, that's a weird one. i don't like the drops at t = +75 ms and t = +145 ms
<wpwrak>
but maybe that's retries
<aw>
this may show init_b can be output also input as an indicator
<aw>
could be?
<wpwrak>
sure. but we can't always tell when init_b is an input. input looks the same as output high :)
<aw>
wpwrak, sorry that i don't understand you don't like the drops at ....?
<wpwrak>
i wonder what they mean. but let's assume they're CRC errors.
<aw>
so init_b indicator should be High once fpga inside finished CRC checked and show High syncronized to PROGRAM_B?
<wpwrak>
so FPGA comes out of reset at t = 0, tries to load from NOR, gets a CRC error at t = +75 ms, tries again, gets another CRC error at t = +145 ms, tries again, ... and then seems to succeed (?)
<wpwrak>
INIT_B should be high while the CRC is okay
<aw>
mm...so this needs to consult with lekernel  to confirm?
<aw>
yup~reasonable from rc2 waveforms. got it
<wpwrak>
(consult) naw, i think we don't need to bother him with this. yet :)
<aw>
oah~okay
<wpwrak>
now .. why would the NOR have troubles. hmm.
<wpwrak>
next test: CH2 stays in INIT_B, move CH1 to TP37 (FLASH_RESET_N). then trigger on CH2 rising. move trigger to -200 ms so that we get the same time window as the last time
<aw>
hmm..let's trigger other pin of flash chip, to see if flash is in correct assertion?
<aw>
hmm...okay...
<wpwrak>
if the reset looks good, then we'd have to test the other NOR pins, yes. this will be fun :)
<wpwrak>
but i think if the reset is fine, then 0x32 should go to the "try to fix this when you have plenty of time" queue. because that can easily keep you busy for a whole day.
<aw>
oah...yup..whole day or directly replace flash chip...but too bad now is out of stock here. :(
<wpwrak>
regarding the NOR reading script, have you ever used this script successfully ? (with a board that works okay)
<wpwrak>
if yes, maybe you can try it here too
<aw>
never used but now lemme try 0x39
<aw>
since i doubt fpga will let jtag access with flash chip if unsuccessful on reconfigure. but worthy to read though that i 've never read before. ;-)
<aw>
reading from 0x39. :)
<aw>
xiangfu, reading flash image will be slow?
<xiangfu>
aw, yes.
<aw>
hours?
<wpwrak>
i think it will allow jtag access :) before fix2b, failure to reconfigure meant reset trouble, which also blocked NOR access via jtag. now, failure to reconfigure means something else. so NOR access via jtag should work.
<wpwrak>
aw: planning a five-course dinner ? :)
<wpwrak>
ah nice. finally got M1rc2_powerOnOff_sequences_manuscript.jpg printed. now i no longer need a screen just for this :)
<xiangfu>
aw, you want read whole 32MB flash? that needs ~4 hours.
<wpwrak>
ouch :)
<aw>
wpwrak, i'll always be beaten when do this with PHD. :)
<wpwrak>
xiangfu: what does the reading script read by default ? everything ?
<xiangfu>
wpwrak, no. it only read first 640KB.
<aw>
xiangfu, what's the image from 640K?
<wpwrak>
(640 kB) hmm, so that's a bit less than half the bitstream ?
<aw>
xiangfu, man! i stop reading
<wpwrak>
aw: 640 kB should take about 5 minutes
<xiangfu>
wpwrak, it only read standby.
<wpwrak>
oh, is see
<wpwrak>
s/is/I/
<aw>
wpwrak, hmm? so keep reading?
<xiangfu>
whole standby partition, I mean.
<wpwrak>
aw: yeah, let it finish. should be soon.
<aw>
alright
<xiangfu>
aw, if you want read soc bitstream. you can modify the script file a little.
<aw>
no no...i think we just need standby
<xiangfu>
ok
<aw>
so do i need to modify script file or itself is for standby already?
<aw>
xiangfu, so 5 minutes only?
<xiangfu>
aw, no. by default only read standby . yes about 5 minutes
<aw>
alright read again now. thanks
<aw>
wpwrak, so how do we go next?
<wpwrak>
aw: let's label 0x32 with "possible NOR instability" and put it on the pile of boards that need deeper analysis later
<wpwrak>
aw: the, the next would be 0x3a, right ?
<aw>
wpwrak, no , swapped them though
<wpwrak>
swapped ?
<aw>
so 0x3a is possible NOR instability
<aw>
next is 0x32. ;-)
<wpwrak>
ah, you were working on 0x3a ?
<wpwrak>
i see
<aw>
yup
<wpwrak>
okay, let's see what 0x32 can do :)
<aw>
now just reading 0x39 flash chip back. ;-)
<aw>
after this, i go back to read 0x3a since you said it could be read though. ;-)
<aw>
xiangfu, so Files is under /home/adam/.qi/milkymist/readback/20110817-1544
<aw>
will always be the same name file? or everytime is different
<aw>
hm...seems that you used system time. ;-)
<xiangfu>
aw, filename always the same. the folder is changed.
<xiangfu>
20110817-1544 <-- is the data time
<xiangfu>
yes
<aw>
xiangfu, oah ..got it
<wpwrak>
xiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?
<aw>
xiangfu, i would like the saved/readback file name is related to mac address, is it possible? ;-)
<aw>
0x39 read back is done
<aw>
now back to try to read 0x3a
<wpwrak>
(name by MAC) mv readback/2011<Tab> 0x39-standby.bit  :-)
<wpwrak>
or, rather  mv readback/2011<Tab> readback/0x39-standby.bit
<aw>
wpwrak, oah~ sweety, you know i poor on cmd. ;-)
<wpwrak>
mv /home/adam/.qi/milkymist/readback/20110817-1546 /home/adam/.qi/milkymist/readback/0x39-standby.bit          (or wherever you want it)
<aw>
wpwrak, yes. you are right. 0x3a is reading now. ;-)
<wpwrak>
see :) we're winning !
<aw>
wpwrak, that's because we've seen program_b/init_b/rp# are all correct, so that's why you wanted to me to buy you a dinner! .;-)
<aw>
wpwrak, so later how we compare those two bitstream files?
<wpwrak>
(dinner) ah no, i was asking there, whether you were planning to take a long break (e.g., for a lavish dinner) while a very slow download is happening
<aw>
wpwrak, oah...misunderstood though...
<aw>
ha
<aw>
wpwrak, can we from those two (0x39 and 0x3a) bitstream files to discover secrets behind?
<wpwrak>
to compare:Â Â diff -u <(hexdump first-file) <(hexdump second-file)
<wpwrak>
(if you're using the bash shell)
<aw>
wait..so we need to go deeply 0x3a or work for 0x32(next board) next after read from 0x3a?
<wpwrak>
once you've downloaded the standby bitstream from 0x3a, please download it a second time. that way, we can see if it changes (e.g., if there is noise on the bus)
<wpwrak>
xiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?
<aw>
wpwrak, aha~ good idea.
<aw>
seems he's not here. :)
<aw>
0x3a: read again now.
<wpwrak>
yeah, seems that we lost him :-(
<wpwrak>
can you upload the bitstreams you got so far (0x39 and 0x3a) somewhere ?
<lekernel>
morning
<xiangfu_>
my last message is : wpwrak, the test is standby --> soc bitstream --> BIOS --> test bin
<lekernel>
imagines talking to someone in business suit: "- How do we program the boards? - Well you have to take the..., ahem,... the Devirginator"
<lekernel>
hahaha
<wpwrak>
xiangfu_: thanks !
<xiangfu_>
morning
<wpwrak>
lekernel: it gets better: at the factory, the girls working there were running ./devirginate from the command line :)
<wpwrak>
lekernel: top is FLASH_RESET_N (all is fine there), bottom is INIT_B
<wpwrak>
lekernel: looks as if it hits a CRC error at +75 ms, retries, hits another CRC error at +145 ms, and then succeeds
<wpwrak>
lekernel: now, i wonder if that CRC error would be in the standby or the "regular" bitstream. does the "regular" bitstream also use a load mechanism involving INIT_B, DONE, etc., as the initial hardwired loader ?
<aw_>
that was done by 'diff -u <(hexdump first-file) <(hexdump second-file)'?
<lekernel>
can we read the bitstreams several times?
<wpwrak>
almost :)
<wpwrak>
lekernel: that's already from two successive reads of the same NOR (no reflash in between)
<wpwrak>
aw: this would be the command I used: diff -u <(hexdump -C 0x3a-1.bit) <(hexdump -C 0x3a-2.bit)
<wpwrak>
aw: the -C adds the ASCII column on the right side
<aw_>
wpwrak, okay..thanks, i record cmd first. ;-)
<lekernel>
maybe that's just a urjtag bug?
<aw_>
wpwrak, okay
<lekernel>
urjtag won't use the same timings as the configuration system
<lekernel>
so if you get intermittent read failures, it doesn't mean much
<wpwrak>
lekernel: hmm, could be. would you expect urjtag to always have such issues ? or just with that usb-jtag board ?
<wpwrak>
aw_: when you do your experiments, does each M1 has its own usb-tag board or do you use the same usb-jtag board for all the M1s ?
<wpwrak>
s/has/have/
<lekernel>
also, if it's a problem on a data line, why don't we get problems when writing too?
<lekernel>
and why is the software CRC in the test tool passing most of the time?
<aw_>
wpwrak, each board has its own usb-jtag board
<lekernel>
this simply looks like urjtag bugs to me
<wpwrak>
aw_: can you please read the 0x39 bitstream a second time ?
<wpwrak>
lekernel: let's find out :)
<aw_>
wpwrak, okay
<wpwrak>
lekernel: this board hasn't booted in its life so far, so we haven't made it to the software CRC yet
<lekernel>
it always failed to boot?
<wpwrak>
yes
<lekernel>
ah
<lekernel>
ok
<lekernel>
then maybe this is the problem
<wpwrak>
now fix2b has been applied and it seems to work a little better. but still not okay
<wpwrak>
meanwhile, fix2b has "cured" two boards (i think)
<wpwrak>
so this is a new/different problem
<lekernel>
what is fix2b? disconnect INIT_B?
<wpwrak>
yes
<lekernel>
this should not have any influence
<wpwrak>
and also check D16 and replace if it looks suspicious
<lekernel>
except if we use crappy diodes
<wpwrak>
we do :)
<wpwrak>
adam's current procedure is to disconnect INIT_B on the boards "in the cluster", then check TP36 and TP37 voltages. also measure D16 in-circuit, which seems to work more or less reliably. (he has removed a few good diodes, though)
<wpwrak>
ah, and C238 once had an issue too
<wpwrak>
so the whole fix2 rework is a bit fragile
<lekernel>
argh....
<wpwrak>
the joy of hardware ;-)
<wpwrak>
lekernel: if you think this is bad, you should have seen how things went at openmoko :)
<wpwrak>
lekernel: response times measured in days, unexplained departures from the procedure you asked them to perform, quick nonsensical ad hoc fixes thrown into the mix, and so on. pure chaos.
<aw_>
wpwrak, hehe..at least all these are done myself though. not OM everyone could involved then you couldn't find out the root cause. ;-)
<wpwrak>
lekernel: it once took me about half a year just to figure out whether they had fixed a missing resistor on the base of a transistor ...
<aw_>
wpwrak, there's a fact: since i have to improve my soldering , but seems hard a bit. :)
<lekernel>
here we get intermittent and weird problems that redefine peskiness
<lekernel>
that compensates
<wpwrak>
aw_: (everyone could play) yeah, the "chain of command" was a little ... strange over there :)
<wpwrak>
it got much better if you were physically present, though. shorten the loop and catch suspicious activities quickly :)
<aw_>
wpwrak, and i tried to openly as possible as i can. ha ;-)
<aw_>
wpwrak, so how's next after 0x39's dump?
<wpwrak>
lekernel: some of the component issues are indeed a bit surprising to me
<aw_>
if no err with each other?
<wpwrak>
aw_: upload the dump and then we'll see if 0x39 also changes from dump to dump. if yes, then the dumps are worthless. if the 0x39 dumps are the same, then we can try 0x3a with the usb-jtag board of 0x39
<aw_>
wpwrak, i see
<wpwrak>
hmm, in the dumps, is the first byte DQ0-DQ7 or DQ8-DQ15 ?
<wpwrak>
lekernel: ah, here's a way to test whether it's the DQx bus or usb-jtag: if is's always the same bit that changes, then the bus is the likely suspect. else, it's something else.
<aw_>
wpwrak, if both -3 and -4 are identical, this'll be a trouble for me. :)
<wpwrak>
aw_: you can stop
<wpwrak>
0x3a-3 is identical to 0x39
<aw_>
wpwrak, big trouble now. :(
<wpwrak>
throw away the usb-jtag that was in 0x39 ;-)
<wpwrak>
okay, now: reflash
<wpwrak>
(reflash 0x3a)
<aw_>
okay. ;-)
<aw_>
reflashing...
<lekernel>
wpwrak: from what we have seen so far, it seems to be bits 7 and 15
<lekernel>
first byte is DQ8-5
<lekernel>
DQ8-15
<wpwrak>
adds big-endian mode to the bit comparer
<aw_>
wpwrak, well...here I have usb-jtag boards with rc1 and rc2 version.  I have to know if they are different. 0x39 and 0x3a used the same usb-jtag rc1 vesion.
<wpwrak>
aw_: heh, no idea what the differences are ;-) maybe it was also just a bad connection. we can find out later.
<aw_>
0x3a: reflashed done but d2/d3 dimly lit still there.
<wpwrak>
the board doesn't know it yet, but it *will* boot today ;-)
<aw_>
wow~ i am expecting that boot. :)
<wpwrak>
resistance is futile :)
<wpwrak>
just on the radio: there's some marihuana plantation burning (somewhere in buenos aires, it seems). and they say "some of the firefighters are affected by the smoke" ;-))
<aw_>
wpwrak, so how's best and quick way that i can know if usb-jtag board is bad while testing m1?
<aw_>
via 'diff" cmd to know?
<wpwrak>
for now, that seems to be the best test, yes. actually, it could also be the M1
<wpwrak>
what's interesting is that 0x3a-2 was correct
<wpwrak>
so the errors some to come and go
<aw_>
you had have suspected before about usb speed transmission effect, will this related to that? or start to think if it might be a flash chip problem itself?
<wpwrak>
s/some/seem/
<lekernel>
wpwrak, have you tried a board that works? that might just be stupid urjtag bugs
<lekernel>
let's not spend any time on those
<wpwrak>
lekernel: yes, 0x39 is a "good" board
<lekernel>
ok, and if it had so many failures, the software CRC wouldn't work so well
<wpwrak>
lekernel: and two dumps from 0x39 were identical
<lekernel>
mh
<lekernel>
so the flash really behaves like crap on that 3a board which doesn't work...
<wpwrak>
what's odd is that the bit position changes
<lekernel>
maybe we simply have sourced broken flash chips
<lekernel>
is the pattern reproducible with GDB read memory command?
<wpwrak>
doesn't gdb need the BIOS ?
<lekernel>
no
<lekernel>
only you won't be able to access the SDRAM
<lekernel>
you can simply 'pld load' the SoC and GDB will work no matter what
<wpwrak>
ah, great. maybe you can walk adam through gdb use then (i'll be watching, since i don't know the process either)
<aw_>
the flash chip this time i ordered from authorized here Taipei (WPI), it should be okay though i think.
<wpwrak>
aw_: should i ask where the other NORs came from ? ;-)
<wpwrak>
so the NOR in the rc3 boards is from WPI ?
<aw_>
in one batch of order
<aw_>
no splitted shipments
<wpwrak>
better than "Flash soup kitchen" in wolfgang's backyard ;-)
<aw_>
ordered 96pcs in one batch
<aw_>
but not sure if i have stock now
<aw_>
i hope smt sent me all back.
<wpwrak>
yeah, let's hope the NOR is good in general. there could still be the occasional bad chip, of course. either factory-bad or didn't like SMT or whatever
<aw_>
well..but do you have any idea on 0x3a? or let's still name it as 'possible NOR instability'? ;-)
<aw_>
then we back to see 0x32. :)
<wpwrak>
lekernel: have you heard of "baking" ? that's also a fun thing: components absorb water. some only very little others more. when you SMT them, the water evaporates and the steam pressure may crack the plastic ... somewhere. lots of fun to debug :)
<aw_>
it will be added into file everytime I reflashed
<wpwrak>
i see. okay, let's try to boot. maybe it works ;-)
<aw_>
wait...you said 0x3a-5 is identical original file standby.fpg?
<aw_>
but now d2/d3 are still dimly lit. did i understand correctly?
<wpwrak>
no, 0x3a-3 was identical
<aw_>
okay
<wpwrak>
and 0x3a-2
<aw_>
0x3a-5 was not, wasn't it?
<wpwrak>
0x3a-4 = 0x3a-5, but different from 0x39
<aw_>
got it
<wpwrak>
did you try to boot ?
<aw_>
since d2/d3 is dimly lit now. but let's try to press middle btn first
<aw_>
if not
<aw_>
i go for power off to see if d2/d3 is fully off.
<wpwrak>
alright
<aw_>
can't boot surely while dimly lit
<aw_>
d2/d3 dimly lit after power - cycle. :(
<wpwrak>
so still no go. hmm.
<wpwrak>
okay, can you please take two more dumps ?
<aw_>
yup
<aw_>
okay...
<wpwrak>
and then 0x3a goes back to the queue, "NOR mystery corruption"
<aw_>
hehe :)
<wpwrak>
after that, i'd like three more dumps from 0x39. to make sure that the reading does indeed work. that way, we can be sure we can use this tool to analyze future NOR issues. (or, if the reads of 0x39 also yield inconsistencies, then we know that we don't have a reliable tool :)
<wpwrak>
but first the two from 0x3a
<aw_>
hmm...good idea about 'reliable tool' preparation indeed.
<aw_>
okay
<aw_>
so I'll use current usb-jtag (i.e the original good usb-jtag we guessed) on 0x3a to 0x39 too.
<wpwrak>
no, keep them as they are
<wpwrak>
the original 0x3a usb-jtag is now in M1 0x39 and the original usb-jtag from 0x39 is now in M1 0x3a, correct ?
<aw_>
correct
<wpwrak>
since M1 0x3a is acting weird with both, the usb jtag may be okay. we'll test this implicitly when taking the dumps from 0x39
<aw_>
so later we dump 0x39, will we use the 0x39's original jtag board? is that you wanted?
<wpwrak>
no, M1 0x39 with usb-jtag 0x3a
<wpwrak>
i.e., don't swap the usb-jtag. use them as they are now
<wpwrak>
this is identical to the one you had before reflashing
<aw_>
man..so it's not identical to the latest reflash. :(
<aw_>
i still reading another one...
<wpwrak>
very very strange ...
<wpwrak>
maybe DQ15 now simply fails consistently
<aw_>
i hope this is not radiation problem though..but can be more clear after three dumps that worked 0x39. :-) exciting to know the results about 0x39 later.
<wpwrak>
the problem strikes on average every 33.5875 bytes
<wpwrak>
it seems way to reproducible to be just something random
<aw_>
wpwrak, it still be good to know how 0x39 will be reliable or not though. pls check it. thanks. ;-)
<wpwrak>
aw_: yes, please keep them coming
<wpwrak>
aw_: 0x39-2 is good
<aw_>
wpwrak, great
<aw_>
keep reading
<aw_>
lekernel, received standby.mcs, tks
<aw_>
wpwrak, after these three dumps, i go for dinner first. ;-) sorry
<togi>
i was at ccc last week, but i missed the milkymist talk :/ anyone know if it's available somewhere?
<aw_>
and when I'm back. let's to see using xilinx tool. yup..long time not use it :-)
<wpwrak>
aw_: we should put you on a sushi diet. maki, to be precise. then you can quickly eat a bit each time you have to wait for some up- or download :)
<aw_>
oah~yup...i do really sorry on this.
<wpwrak>
thinking of it, that would work for me too. i have a sushi restaurant just around the corner. and they do delivery :)
<wpwrak>
well, actually japanese restaurant. but it seems their non-sushi stuff isn't so great.
<aw_>
seems i have to buy more foods in preparation.
<aw_>
wpwrak, great! so this shows up that we currently no need to worry usb-jtag board. i'll be back soon
<wpwrak>
yup, usb-jtag looks good. enjoy your meal !
<aw_>
k
<lekernel>
wpwrak, thanks so much for your help.
<wpwrak>
no problem. it's fun ;-)
<lekernel>
wpwrak, to sum up now, it may seem that we have a combination of a) unreliable writes b) intermittent reset circuit fuckup that causes boards to fail in the field?
<wpwrak>
i think b) is starting to disappear. we may not have found all the critters there, but at least some.
<wpwrak>
what causes a) is still a mystery. oh, and we also had changes between reads. so it's all still foggy.
<lekernel>
you said that reads were working reliably?
<wpwrak>
now they are. on 0x3a, there were successive reads with differences.
<lekernel>
let's always use impact on problem boards now
<lekernel>
it received more testing than urjtag and the usb-jtag board
<wpwrak>
let's first see what impact impact makes :)
<scrts2>
who is the one here coded ethernet? I wonder if the milkymist is connected to a bigger network through a few switches, is there a packet queue for packets, which are received in different time or are duplicated? e.g. ip header identification field shows, that the later packet has identification value smaller than the previous packet, which means that this particular packet must be processed before the other
<wpwrak>
now let's find out what impact "impact" has :)
<aw_>
wpwrak, hi yes, i am looking for my rc2's previous script~ phew
<aw_>
lekernel, i tried to reflash 0x39 firstly via xilinx jtag
<aw_>
this is i currently use for xilinx jtag to reflash standby.mcs
<lekernel>
aw_, all you should do is 1. load the standby.mcs I sent you using the impact gui (not the script) 2. test if the flash was correctly written
<lekernel>
period
<aw_>
hmm...okay..i go to open impact gui first
<lekernel>
actually that script you pointed might work as well
<lekernel>
just don't forget the template.cmd file
<aw_>
hmm....no included template.cmd under folder...try again
<aw_>
mm..no template.cmd already there.
<aw_>
i opened impact gui. as i knew before: only used this to do read device status/device id etc... I 've not loaded into standby.mcs with this iMPACT before. only used script. :(
<aw_>
how to load standby.mcs via iMPACT gui? i need to set many parameters?
<lekernel>
.....
<lekernel>
no
<lekernel>
create new project, select "autodetect devices with boundary scan", then when it asks whether you want to program a flash attached to the fpga say yes and select the .mcs
<lekernel>
it's completely trivial
<aw_>
okay
<lekernel>
i cannot give you step by step instructions, I lost the ribbon cable of my xilinx jtag cable
<Fallenou>
I confirm, it's trivial
<Fallenou>
even for non fpga-expert like me :)
<aw_>
man! created a project as autodetect with boundary scan. now "Identify Succeeded", but which item that I go for selecting my *.mcs?
<aw_>
Fallenou, he..seems not trivial for me. :)
<Fallenou>
well you want to put the bitstream in the flash ? or just program the fpga ?
<aw_>
load standby.mcs file in fpga
<Fallenou>
right click on the FPGA
<Fallenou>
and there should be a menu element that says "load bitstream" or something like this
<aw_>
oah~ i see it, thanks
<Fallenou>
assigner configuration file
<aw_>
i assigned done with 16bit data bus/BPI/Flash chip
<aw_>
and ?
<Fallenou>
when you have assigned the configuration file to the proper device
<Fallenou>
then you can do thing like right click, configure or something like that
<Fallenou>
(I don't have impact on my computer, sorry)
<wolfspraul>
wow, need to read the backlog...
<aw_>
oaw~ i see...must right click on "flash" icon then program it. :)
<aw_>
now it's programming...
<Fallenou>
if you right click on the "flash device", then you are programming the flash
<Fallenou>
not the fpga directly
<Fallenou>
I don't know what you are trying to do exactly though
<aw_>
oah~ man! but good now my d2/d3 is fully OFF now..
<Fallenou>
oh ok reading backlog I understand
<Fallenou>
you should be ok
<aw_>
Fallenou, i saw the console with the most likely message same as script though. :)
<aw_>
lekernel, i right clicked on 'Flash' icon not fpga itself, is that right?
<Fallenou>
ok good
<aw_>
hm...i lost my self though
<lekernel>
yes, click the flash icon
<lekernel>
we do not care about what the leds are doing while you are in impact
<Fallenou>
aw_: if you just want to reflash the board, so that the board would be able to boot without being plugged to a computer, then yes
<aw_>
hm...good
<aw_>
so now 0x39 boot up and rendering well
<aw_>
so now let's go for 0x3a :)
<lekernel>
ok, so it simply seems urjtag has some bugs that make writing unreliable particularly at the beginning of the flash
<lekernel>
if you have boards that do not configure at all, give them the impact treatment
<aw_>
just noticed that d2/d3 is fully off after xilinx tool finished programming
<lekernel>
ah, hm
<lekernel>
no, power cycle the board
<aw_>
yes, 0x39 i powered cycle . it works well now. :)
<aw_>
boot up and rendering
<aw_>
so let's see 0x3a via xilinx tool next :)
<wolfspraul>
as usual. it's not great but if the xilinx tool is more reliable, we should probably always use the xilinx tool.
<wolfspraul>
lekernel: what do you mean with "writing unreliable particularly at the beginning of the flash"?
<wolfspraul>
how is that possible?
<lekernel>
wolfspraul, just fix the boards that did not pass with impact
<lekernel>
if the CRC check is good, then writing was ok...
<lekernel>
someone needs to fix this annoying bug in urjtag, but later...
<aw_>
one question first: will i need to "identify" fpga eveytime when i am going to program a new board?
<wolfspraul>
lekernel: ah yes, of course I agree. But what is this bug?
<lekernel>
some pesky and mundane time sink
<lekernel>
nothing very interesting I think
<aw_>
hmm...i answered my question, just directly click 'flash' icon to program though. :)
<wolfspraul>
the backlog is scary, I cannot follow all details :-)
<wolfspraul>
I think we should definitely move forward to other boards
<wolfspraul>
not get stuck
<wolfspraul>
I just need realibility that we are able to produce 100% stable and tested boards, so we can start selling.
<wolfspraul>
reliability
<wolfspraul>
it seems fix2b is good
<wolfspraul>
right?
<wolfspraul>
I mean I find no evidence in today's long work that there is any problem with fix2b.
<wolfspraul>
so I think we should continue with more boards from the 19 and fix2b.
<wolfspraul>
and if there is a problem, just move to the next board.
<wolfspraul>
aw_: do you agree?
<wolfspraul>
if you feel better, always use Xilinx Impact. Impact or reflash_m1.sh - your choice.
<wolfspraul>
but pick one and stick to it
<wolfspraul>
ah, finally finished
<aw_>
wolfspraul, yes, agreed, Werner & me just tried to discover others we may pretty not sure. even for if usb-jtag is the problem source, but now this consideration is gone
<wolfspraul>
but it seems Xilinx Impact did not help :-)
<wolfspraul>
Xilinx Impact only showed right away that the read failed
<wolfspraul>
in that case I would continue to use the jtag-serial board and reflash_m1.sh
<aw_>
wolfspraul, no no...the xilinx tool i have only standby.mcs file from lekernel, with this only. i can't rely on xilinx for reflash all other boards
<wolfspraul>
ok
<wolfspraul>
and Xilinx Impact did not improve anything if I understood the backlog correctly
<wolfspraul>
so just use reflash_m1.sh
<lekernel>
wolfspraul, it did fix urjtag write problems with one board
<lekernel>
no?
<aw_>
just like lekernel said if I have some trouble with NOR problems, this xilinx tool with standby.msc could be helpful.
<wolfspraul>
I'm overwhelmed with the details of the backlog.
<wolfspraul>
I thought no
<wolfspraul>
it just said 'failed' by itself
<wolfspraul>
aw_: I think tomorrow we need to go to full speed mode. not get stuck on a few boards.
<wolfspraul>
just power through the whole batch of 19...
<wolfspraul>
if anything doesn't work or is unclear, just take a note and move to the next one
<wolfspraul>
I feel pretty good about fix2b now
<wpwrak>
aw_: hmm, but 0x39 worked before. and 0x3a fails with impact as well. so it seems with 0x39 both work and with 0x3a neither.
<wolfspraul>
yes
<wolfspraul>
wpwrak: did we find any evidence for problems with fix2b today? doesn't look like to me...
<wpwrak>
okay, all agree :)
<wpwrak>
wolfspraul: in fix2b we (still) trust :)
<wolfspraul>
good
<aw_>
agreed though...We only reworked 4 boards only this morning, and got 2 unknown reasons caused. Werner 7 me tried to figure this out hopefully..just don't want more boards like this...surely need to speed up...but tough decision though..
<lekernel>
<aw_> yes, 0x39 i powered cycle . it works well now. :)
<wpwrak>
i'd suggest putting 0x3a in the fridge. see if temperature changes it. we've has it work better and worse and the course of these experiments. very confusing.
<wpwrak>
lekernel: 0x39 worked before :)
<lekernel>
so why the hell did we flash a board with impact that worked before ?!?
<wpwrak>
lekernel: adam tried a good board first. only then the problem board.
<wpwrak>
:)
<aw_>
0x39 : both usb-jtag & iMPACT all works well
<wpwrak>
wolfspraul: we verified that urjtag can read back the NOR quite reliably. so we can use it in the future for verifications, if necessary
<wpwrak>
wolfspraul: what's a bit troubling is that there doesn't seem to be a proper verification of what gets written. at least we once got completely bogus content flashed. maybe the ... "verify skipped" (?) in the logs is a hint :)
<wpwrak>
aw_: trying any more boards today ? or entering suspend mode ?
<aw_>
wpwrak, yup..i gotta entering suspend mode to myself to start another day.
<lekernel>
aw_, did we have similar impact flashing problems in run 2?
<lekernel>
this sounds like a brand new problem, no?
<aw_>
lekernel, in rc2, we finally got 35/40 pcs done
<lekernel>
(and like crappy flash chips, too)
<kristianpaul>
why are crappy the flash chips? is that a new discovering on rc3?
<lekernel>
right now there are 51 working boards?
<kristianpaul>
sorry i missed all backlog..
<lekernel>
aw_, did the missing 5 run2 boards have similar flashing problems?
<aw_>
lekernel, and those 4 pcs rest were mostly to yes d2/d3 dimly lit problems but at that time we guessed they were damages by "fast power-cyling"
<lekernel>
there are no damages by fast power cycling
<wpwrak>
lekernel: not sure if it's the NOR. could also be the FPGA. or soldering on either.
<aw_>
and eventually those 4 boards are finally "dead" though... so which is if actually belongs to flash NOR problems, this is really good question!
<lekernel>
what do you mean, "finally dead"?
<wpwrak>
lekernel: what we've seen with board 0x3a were 1) good NOR content but (variable) errors on read and 2a) bad NOR writing with 100% reliable read (of the bad data) or 2b) good NOR writing (and unrelated failure to configure) with 100% reproducible corruption on NOR read
<aw_>
thus cant reconfigure, but at that time we thought it was an unnormal production process on switch fast power-cycling then.
<wpwrak>
lekernel: so, a bit scary that one. a moving target. but it think we did enough tests to be reasonably sure of these results.
<lekernel>
try replacing the flash chip
<lekernel>
wolfspraul, can we move forward with the other boards?
<aw_>
well...these failure boards I'll leave them apart firstly
<aw_>
lekernel, tomorrow i go directly for other boards with fix2b circuit
<lekernel>
aw_, what is your next target?
<lekernel>
what 'other' boards? the 51 working ones?
<aw_>
lekernel, no the first 19pcs boards (including today's 4 boards already) and see what they move.
<wpwrak>
lekernel: there are some more in the fix2b "cluster"
<lekernel>
aw_, you are not touching the 51 working/available ones, right?
<aw_>
so I'll go for rest 14pcs cluster tomorrow firstly
<aw_>
lekernel, right
<aw_>
well...time to go
<wpwrak>
lekernel: the 51 "available" ones should at least be checked. some may also need fix2b and are just at the edge of not working. some of the boards in the cluster have worked a little once and then went worse, so the fix2b problem isn't just black and white
<aw_>
I'll work on 1st 14 rest boards.
<aw_>
good night
<wpwrak>
lekernel: but it would be good to be able to test them in a non-intrusive way, to avoid more rework
<wpwrak>
aw_: sweet dreams ! :)
<lekernel>
wpwrak, should we apply fix2b on all the working boards?
<lekernel>
they look nicer after that (no messy cable)
<roh>
hey. how is it going?
<wpwrak>
lekernel: i'm slightly in favour of applying it everywhere, yes
<wpwrak>
lekernel: seems to be low-risk enough
<wpwrak>
lekernel: and yes, we get rid of the cable. all evidence of human fallibility destroyed ;-)
<wpwrak>
roh: today we had one with problems somewhere between FPGA core, NOR, and back. or, rather, it had us.
<wpwrak>
roh: to make things more interesting, the problem pattern shifted. first it looked merely like a usb-jtag problem, but then that part turned out to be quite reliable but NOR reads or writes caused trouble.
<roh>
oh. pcb routing problems?
<wpwrak>
roh: fix2b is still looking good, though. we're getting further than we used to.
<wpwrak>
roh: hard to say. could be bus, could be I/O pad drivers dying, could be a bad NOR bank, ...
<roh>
wpwrak: whats fix2b?
<roh>
i just learnt about burning streets in london etc. 10 days of camping take its toll (there was ip and power in my tent but i was too drunk and met too many interresting people to care)
<wpwrak>
roh: fix2b = remove the diode between INIT_B and PROGRAM_B (and the wire going around the board). also, check that diode D16 is okay. some aren't, and let FLASH_RESET_N get pulled low or into an undefined state
<wpwrak>
roh: fix2b solves: the problems with usb-jtag flashing stopping at "bit stream length = 14xxxxxxx" and failure to (re)configure on some boards
<wpwrak>
roh: success rate about 50% so far on those afflicted by such problems (i.e., 2 out of 4)
<roh>
hm
<wpwrak>
interesting detail: when NOR reading on 0x3a was a problem, the bit flips were all 0 -> 1. when the reading stabilized, the bit flips were all 1 -> 0
<kristianpaul>
lekernel: mm_i2l.pdf, thanks for publishing it !
<kristianpaul>
the one about plasma looks worth to look to nice :)
<kristianpaul>
if you have more slides about HDL specific and milkymist, please share :)
<lekernel>
have you tried the demo binary on your board?
<kristianpaul>
milkymist demo?
<kristianpaul>
no never, i saw wolfgang to used at cparty no more
<lekernel>
no, the demo bin from masteri2l plasma
<kristianpaul>
no no
<kristianpaul>
i'm at work now, and just reading rss now
<kristianpaul>
nice, that tp_files tarball is a hello world at the milkymist style :)