<aw>
0x32: 1. stoped at 'Bitstream length: 1484404' while flashing. 2. plugged out and replugged jtag board then can reflashing...but d2/d3 dimly lit after flashed 3. resoldering u9 flash chip still the same. 4. replaced new u7/u19/u20 5. applied fix2b circuit 6.  stopped @ 'Bitstream length: 1484404' while reflashing. 6. tp36 - 690mV, tp37 – 793mV 7. D16(in-circuit): For.V. = 153mV, Rev.V. = 1426mV down to 836mV(not constant) 8. t
<aw>
p36/tp37 - random pulses (stable 0.76V to random pulse to 2.2V) 9. removed C238, tp36/tp37 pull high correctly at 3.29V 10. replaced a new C238, D16(in-circuit): For.V. = 158mV, Rev.V. = 1545mV 11. reflashed successfully 12. board auto boot (d2 is ON) and Monitor screen stops at Milkymist One logo after powered on and shutdown intermittencely without pressing middle btn, not everytime can be reproducible; even can't reconfigure(
<aw>
tp36/tp37 level not pull high enough) sometimes after power-cycle. 13. D16(in-circuit): For.V. = 152mV, Rev.V. = 758mV which continually dropping(not constant) 14. took reset ic apart: measure impedance(over 20 Mega ohm) between Out and Gnd; same as new part(unmounted), 8.3 M ohm between Vcc and Out; same as new part(umounted). Bad that didn't meausred impedance TP36 to gnd 15. check other good board, impedance TP36 to gnd: 10.2
<aw>
3 k ohm, 18.12 k ohm @TP37 to gnd 16. solder a new D16 back 17. 10.05k ohm @TP37 to gnd, 17.48k ohm @TP36 to gnd 18. oberseving/ at least 1 minute on TP36 with good 3.3V after power on(d2/d3 is fully off). keeping probering then pulsing up and d2/d3 is dimly lit. TP36/TP37 Voltage down to stable 2.4V.
<aw>
weird
<aw>
surely can't readback from flash, stoped @'Bitstream length: 1484404'
<aw>
if tp36/tp37 stays at 3.3V, dump can be done, i think.
<wolfspraul>
aw: I think the results yesterday looked good, no?
<wolfspraul>
in the bottom line. So I see no reason why we cannot proceed with fix2b for all 90 boards.
<wolfspraul>
of course it's good to study 0x32/0x3C a little bit.
<aw>
yes, i felt fix2b was good yesterday...although still have unstable/intermittence pulse at TP36 which we don't know yet.
<aw>
so next Monday I think that I could keep rework fix2b on others firstly
<aw>
but meanwhile if werner have any new idea want me to study those 0x32/0x3c/0x77, I can be interrupted firstly for a while
<aw>
so reworks via fix2b will be continued next week
<wolfspraul>
yes I think that's a good plan
<aw>
i'm dumping 0x77 now for records firstly
<wolfspraul>
definitely - clean records, very important
<aw>
later I'll all update in wiki notes
<aw>
finish 0x77 dump..I'll go out...so we continue next week, okay?
<aw>
at least now we knew 'writting' to NOR had have a one word zeroed out on 0x48, and read actually is quite reliable from usb-jtag board this we knew it. Regards to 'writing': we don't know yet is urjtag or bitstream problem though. ;-)
<aw>
wolfspraul, from 0x48 histories with one word zeroed out, can we say the 0x85 CRC failed then pass i met by using test program is reasonable?
<wolfspraul>
hmm
<wolfspraul>
yes probably good [0x85]
<wolfspraul>
but we keep it in a special 'hold' condition same as 0x48
<aw>
yup...i also think 0x85 is good, but now i didn't put it 'avail -fix2b' for sure
<aw>
that could be just probability rate though it happened while power down...well..now don't know yet.
<aw>
alright, the wiki ods file is up-to-date.
<wolfspraul>
well. if power-down really corrupts the nor, we have another big rework in the pipeline.
<wolfspraul>
after reading the 0x85 notes more carefully, I don't see a big problem like with 0x48 because the crc error showed up immediately after the test program ran, if I understand the notes correctly
<wolfspraul>
then all 10 rendering cycles passed without incident
<wolfspraul>
we find out more details on Monday, but so far I'm not worried. no indication that we may need the 4.4v reset ic rework...
<wpwrak__>
good morning ! :) lemme catch up ...
<wpwrak__>
(0x85) was there a reflash between the bad and the good CRC check ? or did just rebooting "fix" it ?
<wpwrak>
let's use this window ... fewer underscores in my nick :)
<wpwrak>
the iliad seems short compared to the epic journey of 0x32 ;-)
<wolfspraul>
I'm not 100% sure about 0x85 either and Adam went into his well deserved weekend
<wolfspraul>
but in either way I think the notes indicate the crc problem was very early, before rendering cycles
<wolfspraul>
it may even be the exact same software bug as 0x48 (guess in both cases, no evidence)
<wolfspraul>
but the important thing is - we have no evidence of power-down related nor corruption
<wolfspraul>
and we have no boards that fail after the first render cycle
<wolfspraul>
that's what I'm watching for. so I think Monday Adam continues with fix2b across the entire batch, and we follow the results.
<wolfspraul>
after 30 or so are 100% good I think we can stop and assemble/package them
<wpwrak>
what i wonder if the NOR in 0c85 apparently changed without rewriting it. in 0x48, a reflash made the error go away, which is consistent with a write having gone awry for some reason (sw bug, stray corruption)
<wpwrak>
if 0x85 changed spontaneously, it's more like old friend 0x3a
<wolfspraul>
like I said many times. as long as the 100% pass boards are all stable, we can look at these things later.
<wolfspraul>
the main question is whether something like 0x34 can ever fail again :-)
<wolfspraul>
if it can, we have a problem
<wolfspraul>
if it cannot, it doesn't matter that there are some trouble-makers in the run, even if they cause more design improvements
<wpwrak>
i'll think of some nice torture tests for 0x34 :)
<wolfspraul>
so as long as 0x34, 0x39, 0x40, ... are stable, all is fine
<wolfspraul>
and I am watching for evidence that they may not be stable, in other words watching for evidence that our test process is leaky
<wolfspraul>
but so far in hundreds of render cycles, not one bit of such evidence has come up
<wolfspraul>
they all either lead to 'cleanly failing' or to 'cleanly passing' boards, very binary
<wpwrak>
what bothers me with things like 0x3a is that i still can't point a finger to a specific location. e.g., FPGA content, FPGA I/O, bus, NOR I/O, NOR memory cells, maybe power, maybe reset. we have seen only high-level effects, not the isolated phenomenon (with the understanding that there are limits to how far we can get)
<wolfspraul>
understood. but you also understand my logic. I am not aiming for perfection. I am aiming for 100% pass boards that I can sell and support with a straight face.
<wpwrak>
we have a few that aren't quite clean, possibly including 0x85
<wolfspraul>
we see on Monday. I think it never made it to the first rendering.
<wpwrak>
yes, i understand. what i'm saying is that for me, these odd boards may still point to a real reliability problem
<wolfspraul>
oh sure
<wolfspraul>
but that's a yield improvement
<wolfspraul>
for this run or future runs or derived products
<wpwrak>
not, even a real problem wouldn't have to be devastating, but i'd at least like to know it a bit better. e.g., to be able to predict when or how it might strike
<wolfspraul>
but the customer who buys a 100% pass rc3 board now will never notice the improvement
<wpwrak>
e.g., if all the M1 will fail after 20 hours of operation above 30 C, that wouldn't be so good :)
<wolfspraul>
true
<wpwrak>
i also wouldn't aim for "no board left behind". but at least i'd like to know what to put on the epitaph :)
<wolfspraul>
like I say. two different problems.
<wolfspraul>
and raising the bar of the test process, even design verification, to higher levels, is yet another separate problem
<wolfspraul>
I am aiming for 100% understanding of all 90 boards, definitely
<wolfspraul>
but realistically we will write some off 'unexplained'
<wolfspraul>
for economic reasons, and because at some point the history including reworks introduces too many unknowns/variables
<wolfspraul>
better to stop and try again in the next run
<wpwrak>
yes, the risk of working adam too hard must be considered
<wpwrak>
but i think, so far, he enjoys the occasional break from bulk rework :)
<wolfspraul>
I think Adam will test and fix rc2 for another 1-2 months.
<wolfspraul>
sorry rc3
<wolfspraul>
we see, too early to estimate now
<wolfspraul>
even though there were some bumps, we have to do a proper job with all boards
<wolfspraul>
the bumps were great and helped us to understand and improve a lot of things
<wpwrak>
looking at the results ... 0x32: wow, i'm surprised he even got a dump. that NOR is certainly a mess :)
<wpwrak>
ah, wait .. picked the wrong one
<wpwrak>
no, 0x32 looks more orderly
<wpwrak>
ten 0->1 errors in bit 7
<wpwrak>
all in the address range (hex) 7660-82c0
<wpwrak>
0x3c readback was perfect
<wpwrak>
maybe 0x3c is a good start for further analysis. there, we already know the NOR is good at the moment
<wpwrak>
the relevant test would be, when the board is in the "messy voltage" state, to inject a limited current towards 3.3 V into TP36 (PROGRAM_B) and TP37 (FLASH_RESET_N)
<wpwrak>
then see how much is needed to bring it up. (limited current) e.g., 100R from the 3V3 rain in series with an amperemeter
<wpwrak>
if the problem is a short at or near C238, TP36 and TP37 should be roughly the same, and the current may be quite large (> 10 mA)
<wpwrak>
if it's a failure to pull up TP37 alone, due to lack of FPGA pull, the current should be small (microamps) on TP36 and even less - possible even negative - on TP37.
<wpwrak>
if it's a short to ground on FPGA or NOR, TP37 should pull all it can, but TP36 should be zero (as above)
<wpwrak>
(short on FPGA, i mean P22, the one that drives FLASH_RESET_N)
<wpwrak>
if it's just a little on both TP36 and TP37, maybe R30 has an issue
<wpwrak>
so these tests will tell us a lot
<wpwrak>
what sets 0x3c (and a few others) apart from the tested and found to be good boards is that they still show suspicious voltages on TP36/37
<wpwrak>
so they may just form a "fix2+fix2b needs more prodding" cluster. but maybe we can spot the origin of that cluster. e.g., a visually detectable soldering problem
<wpwrak>
0x77's NOR is also perfect
<wpwrak>
also has the voltages problem. there's definitely a cluster.
<wpwrak>
these are tricky, because they did survive a number of things. such as writing the NOR and reading it back. they also show that the in-circuit test of D16 doesn't catch this issue, so it's somewhere else
<wpwrak>
you saw the last bits of my ramblings ? up to and including "these are tricky. [...]" ?
<wolfspraul>
I think I got it ... "visually detectable soldering problem"
<wolfspraul>
I'll read the weblog in a bit
<wolfspraul>
I already mentally bookmarked "follow Werner's test plan on Monday" :-)
<wolfspraul>
the details I will lookup then :-)
<wpwrak>
hehe, good :)
<wpwrak>
in any case, with fix2b and the testing, M1rc3 made a lot of progress. i think that fixed 75% or 80% of the "cluster", didn't it ?
<wpwrak>
maybe you should brag a bit about it on the list :)
<wolfspraul>
ah, I don't know. my todo is exploding everywhere and I need to keep my energy for the launch and marketing.
<wolfspraul>
plus there are still unknowns, I just wait for more data first
<wpwrak>
just to let people know what's happening and that something is happening. doesn't have to be a novel :)
<roh>
wolfspraul: heh. regret being a pioneer, holding only a knive and a folding shovel, standing in dirty boots somewhere in no-mans-land?
<wpwrak>
roh: there be no regrets ! ;-)
<lekernel>
1-2 months ?????
<wolfspraul>
:-) sorry about that. you took that out of context.
<wolfspraul>
of course everything we do the last weeks is prioritized to bring in the first day of sales.
<wolfspraul>
I meant that I can see Adam continuing on the 'long tail' of rc3 1-2 months _after_ the first day of sales, yes.
<wolfspraul>
and I hope that's a conservative estimate
<wolfspraul>
I hope it's 2 weeks. but I cannot always create more pressure on Adam, he is already working 70+ hours / week for over a month
<wolfspraul>
Adam even told me he takes a 1 week vaction in late September. The first in 2 years :-)
<wolfspraul>
I think he deserves it...
<wpwrak>
damn. vacation ! we should never have abolished slavery !
<wolfspraul>
we have a lot of rc4 verification parts in Taipei already. adv7181c, gates, 4.4v reset ics
<wolfspraul>
everything there
<wolfspraul>
just hours of the day still limited to 24... :-)
<wolfspraul>
Sebastien had a nice idea about the rtc. we can build a little daughterboard for the expansion header, and put a cheap rtc chip + cr2016 battery or so on it
<wolfspraul>
that could even be retrofitted by rc1/rc2/rc3 users who want it
<wpwrak>
so those ~2 months are what you predict will be the time until rc4 gerber out ?
<wolfspraul>
hmm
<wolfspraul>
too many unknowns now
<wpwrak>
(daughterboard) ah yes, why not. if it works well, you can then merge it into rc5 or so
<wpwrak>
okay, what's your "internal schedule" ? :)
<wolfspraul>
also we could make a few hundred UBB-style 'base-boards' for people to experiment with
<wolfspraul>
i have no internal schedule, all full power forward
<wolfspraul>
I think it depends on the launch, a lot
<wpwrak>
full speed ahead in no particular direction ? aw ... :)
<wolfspraul>
who carries the news? do we have a good shop landing page
<wolfspraul>
oh no
<wolfspraul>
of course towards the world's best video synthesizer
<wolfspraul>
so the variables
<wpwrak>
heh :)
<wolfspraul>
1. speed of sales
<wolfspraul>
2. discoveries on rc3 or rc4 design verification after we start selling rc3
<wolfspraul>
3. customer feedback from rc3
<wolfspraul>
4. potential larger customers who want to put in a big order
<wolfspraul>
I sales are super slow, and nobody is interested to buy more then we have more time for rc4 :-)
<wolfspraul>
then I will spend my time marketing the box, even traveling around to potential customers maybe
<wolfspraul>
systematically, the rc4 run should probably be about twice the size of rc3, so let's say 160 units
<wpwrak>
(slow sales = more time) very good ! :)
<wolfspraul>
but it could be less or more depending on the unknowns that slowly come in
<wolfspraul>
so how can I answer your question now: not at all
<wpwrak>
hmm, you'll need an assistant for adam then
<wolfspraul>
I should work on a great shop landing page
<wpwrak>
yeah, the shop is important, too
<wolfspraul>
Jon had some idease, have to follow up etc and then also do it
<wpwrak>
lemme see what milkymist looks like now ...
<wpwrak>
sharism -> "Product not found!" :(
<wolfspraul>
it's impossible to say more now
<wolfspraul>
yeah sure ;-)
<wolfspraul>
even if I think about extreme cases, anything is possible. say a customer shows up who wants to buy 100 right away, and pre-pays 50%. how fast can we ship?
<wpwrak>
you shouldn't un-list out of stock items. give people something more useful. e.g., send them to a different distributor, tell them when you expect more, or how to get notified
<wolfspraul>
well, of course if we feel good about rc3, we can also use the exact same gerber and produce that
<wolfspraul>
why not
<wpwrak>
at least now we know more or less how to fish the bugs out of rc3 ;-)
<wolfspraul>
it's a bit additional risk, but if a customer wants to order and we feel we can produce, then that's the fastest way
<wolfspraul>
but this case is unlikely. in that case we could do the next smt as early as 3 weeks later. better not tell adam.
<wolfspraul>
gerber out monday. :-)
<wpwrak>
yes, sure. you wouldn't let a big order slip if you help it
<wpwrak>
;-)
<wolfspraul>
so anyway, way too many unknowns now.
<wolfspraul>
if gerber out is MOnday and it's the same pcb, who knows maybe smt the following Monday :-)
<wpwrak>
"sorry, your vacation just got canceled. and about those 70 hours week, i think you should also put in the weekends" ;-)
<wolfspraul>
then ready to ship Wednesday?
<wolfspraul>
I hope Adam doesn't read this out of context, everybody will get their shocks...
<wolfspraul>
@adam: WE ARE JOKING!
<wpwrak>
;-)
<wolfspraul>
I like the daughterboard idea
<wolfspraul>
we should make little breakout boards for people to experiment
<wpwrak>
what are the headers ? 0.1" ?
<wolfspraul>
maybe first just the naked board that exposes the pins to the next level, free for the soldering attack
<lekernel>
wpwrak: 2.54mm yes
<lekernel>
standard stuff
<wpwrak>
very good
<wpwrak>
wolfspraul: 0.1" shouldn't really need anything additional
<lekernel>
what can make sense is a optoisolated breakout
<wpwrak>
wolfspraul: it's all standard items from there on
<wpwrak>
lekernel: yes, something that adds circuit
<wolfspraul>
standard or not a little board is helpful
<wolfspraul>
otherwise everybody has to do that first
<wpwrak>
lekernel: but just plastic, copper, and FR4 ? naw.
<wolfspraul>
and optoisolation is a good idea too
<lekernel>
inexperienced people do all sort of wrong stuff with electronics (inductive loads without flywheel diodes, short circuits, overvoltages, ...) which could easily damage the fpga when connected directly
<wpwrak>
wolfspraul: why do you need the board ? you just get a standard connector, add a ribbon cable, and connect it to whatever you want
<wolfspraul>
a board provides space
<wolfspraul>
optoisolation argument it strong imho
<wolfspraul>
is
<wpwrak>
wolfspraul: most people who are at a point where they wouldn't want to attach their own stuff to an M1 are probably at the point where they have solved the basic prototype board problem in some way
<wpwrak>
wolfspraul: breadboard, pre-patterned PCB, DIY PCB, there's a lot of options
<wpwrak>
and 80% will probably just connect to their beloved arduino ;)
<wpwrak>
lekernel: how's 5V compatibility ? (-:C
<wolfspraul>
there's a big note on the silkscreen about that :-)
<wolfspraul>
I still think a safe starting point would be great
<wpwrak>
wolfspraul: arduino in circle, with a line across it ? ;-)
<wolfspraul>
since this will interact with the ic design running in the fpga, there will necessarily be a lot of 'live' development with the chips wired up to the fpga
<wolfspraul>
a cheap standard expansion daughterboard may help
<wpwrak>
i don't quite see the use case. you already have a perfect standard connector. probably the first thing people learn to connect to :)
<wpwrak>
and a cable that goes out of the M1 to your circuit is much safer than a PCB that hangs somewhere inside, over the rest
<wolfspraul>
ok, maybe we document a few approaches, together with digi-key part numbers etc.
<wolfspraul>
that may provide a similar effect
<wpwrak>
yeah. it's really basic.
<wolfspraul>
there are 2 headers there, 2*8 and 2*9
<wolfspraul>
I never know which is which. I think one is related to vga, the other one goes to the fpga.
<wpwrak>
if you want to make little extra boards, make them do something useful. e.g., galvanic isolation (btw, there's more interesting stuff than just opto. i'll try to dig it out after breakfast)
<wpwrak>
or if you want, an arduino interface board ;-) complete with example code that lets the M1 blink a LED on the arduino and that let the arduino send a "hello world" to the M1, for rendering ;-)
<wpwrak>
maybe you can get tuxbrain interested :)
<lekernel>
wpwrak, there's midi to talk to arduinos
<lekernel>
no need to open the case, just connect the midi port to the arduino serial pin with a resistor
<wolfspraul>
J3 is 2*9 and connected to some audio/video codec wires
<wpwrak>
lekernel: both ways ?
<lekernel>
yes
<wolfspraul>
ah no both are 2*9
<wpwrak>
lekernel: perfect
<wolfspraul>
J21 goes to the fpga
<wolfspraul>
it says 'not 5V tolerant' but it seems J21 provides both 5V and 3.3V
<wpwrak>
maybe add a jumper in rc4, to enable 5 V only if you're really sure you know what you're doing  ?
<lekernel>
yes, the 5V pins are here to drive the sync signal of an hypothetical dual screen output
<lekernel>
no, if you touch this connector either know what you are doing or use an optoisolated adapter
<wolfspraul>
I don't see what's wrong with providing some power. people have to read the schematics anyway. :-)
<wpwrak>
nothign wrong with providing power. but mistakes happen .. :)
<wolfspraul>
another thing is how the pins going to the fpga are chosen. I understand they are not all the same. different banks or so? voltage domains? not sure. I think we will hear over time "too bad that wire XXX is not on the expansion header..."
<wolfspraul>
or I hope we hear because it means people build expansions :-)
<wpwrak>
yeah. if everyone is quiet, that doesn't always mean they're happy :)
<wpwrak>
lekernel: no optics. and they also provide power :)
<wpwrak>
lekernel: also exist with 4 data channels
<lekernel>
yes, I received some AD spam about it already
<lekernel>
they're pretty cool
<wolfspraul>
wpwrak: what's special about this and what can you build with it?
<kristianpaul>
ribbon cable :)
<kristianpaul>
BUT, in my case for example a PCB is nice if can hold something on it and i finally can put that acrylic thing in the top again..
<wpwrak>
wolfspraul: (special) high speed and they also transfer power
<wolfspraul>
which applications does that translate to?
<wpwrak>
anything that needs galvanic separation plus a bit of circuit on the other end. e.g., for protocol processing. for example USB to some other serial protocol. one side has a USB device chip, the other some other MCU (or whatever). the USB side powers both.
<wpwrak>
(via this isolater)
<kristianpaul>
[12483.356146] usb 1-3: usbfs: usb_submit_urb returned -121
<kristianpaul>
bad fedora ;)
<kristianpaul>
hum, from what i can see it is related to libusb and kernel indeed
<kristianpaul>
but seems kinda old issue...
<kristianpaul>
kinda anoying bug indeed
<kristianpaul>
the cost of try the fancy gnome3, xD
<kristianpaul>
argggh
<kristianpaul>
even at full speed the same message