#milkymist on 2011-07-16 — irc logs at freenode.irclog.whitequark.org

04:01 <aw_> updated: applied xiangfu's new reflash_m1.sh, now..rendering...:-) data partitions exists there. :-)

04:03 <aw_> thanks xiangfu!

04:04 <wolfspraul> nice

04:04 <aw_> be noticed that this script enables 'verify' settings, although this is too long to speed up.

04:05 <aw_> but xiangfu and sebastien was discussing about planing if do 'verify' tasks in test program to reduce time. this is good idea from sebastien. :-)

04:06 <aw_> so well...i can enable or disable while testing rc3 board when i need to speed up on testing the same board to investigate. :-)

04:07 <aw_> so with 'verify' setting is good to reflash a fully NEW mounted board. :-)

04:08 <aw_> now re-test in test program to see if still all works well too.

04:08 <nixfreak> what would be a good way to be to have an embedded device rip , transcode,and play videoÂ Â but then operation be very fast

04:09 <nixfreak> but then execute very fast especially ripping and transcoding

04:10 <wolfspraul> I don't understand what you wrote about verify...

04:11 <wolfspraul> nixfreak: transcode in which way?

04:11 <wolfspraul> sounds like the Milkymist One could do it, no? video-in, transcode, vga-out? but depending on what you want to do exactly a lot of programming may be needed :-)

04:13 <aw_> wolfspraul, just forwarded their discussions in email. :-) you will know.

04:13 <kristianpaul> wolfspraul: i tought same, thats why i point nixfreak to join channel and confirm it by it self :)

04:15 <aw_> new reflash_m1.sh script: http://pastebin.com/tstnwy9E

04:15 <wolfspraul> nixfreak: get a Milkymist One, start hacking :-)

04:18 <wolfspraul> aw_: remember the audio_white_noise ogv you created yesterday? Will you upload it into the wiki or shoudl I do it?

04:18 <wolfspraul> we should collect the various testing documents and materials, and that's a nice one...

04:18 <nixfreak> basically wanna a create a device that can rip optical medium and then encode to say x264 with a very simple UI

04:19 <wolfspraul> can you be more precise? what do you mean with 'rip optical medium'?

04:19 <wolfspraul> what do you want to do with the encoded x264 stream? save somewhere? stream over the internet?

04:20 <nixfreak> for consumers that are sick of dvd players but also want to archive the video

04:20 <nixfreak> and be able to view the video in what ever resolution the tv / monitor is set to

04:20 <aw_> wolfspraul, hi you can, pls. actually i wanted to make a page to show those two ogv that can show up how difference they are. :-)

04:21 <wolfspraul> is there a wiki page already that collects rc3 tests?

04:21 <wolfspraul> I will add it there

04:21 <aw_> i am working others... :-)

04:21 <aw_> second

04:22 <wolfspraul> nixfreak: sounds possible, but I still don't fully understand what you want. You want to buy such a device? You want to manufacture one? You want to hack Milkymist One to be such a device?

04:22 <wolfspraul> or you just want to tell us that you think you have a cool idea? :-)

04:22 <aw_> wolfspraul, you could creat a subtitle under http://en.qi-hardware.com/wiki/Milkymist_One_run_3_schedule

04:22 <wolfspraul> I'll check

04:22 <aw_> then later I link them all. :-)

04:23 <wolfspraul> ah yes, I'll just throw it in there for now

04:23 <wolfspraul> let's not create too many pages, rather a few and longer pages

04:23 <nixfreak> I was thinking of FPGA to design/ create this project then was told to try thisÂ Â channel

04:24 <wolfspraul> yes you are definitely in the right place

04:24 <wolfspraul> but what do you want to do now?

04:24 <nixfreak> I guess ask some advice what would be good way to implement this

04:25 <wolfspraul> what do you mean with 'implement'?

04:25 <wolfspraul> one way to set this up would be just to take a notebook and some software, no?

04:25 <nixfreak> right

04:26 <wolfspraul> are you trying to do this for yourself at your home, and then you are done. or do you want to learn about FPGA hacking? or are you trying to manufacture and sell a dedicated embedded device to perform this function?

04:26 <nixfreak> but in the end it should be embedded and small as possible

04:26 <wolfspraul> 'should'?

04:26 <wolfspraul> :-)

04:26 <wolfspraul> can you just get to the bottom line

04:26 <nixfreak> yes learn and create

04:26 <wolfspraul> many things 'should' be this or that way

04:26 <wolfspraul> tell kristianpaul :-)

04:26 <wolfspraul> GPS 'should' work by now, no?

04:27 <wolfspraul> what is your background?

04:27 <wolfspraul> student?

04:27 <wolfspraul> learning? what?

04:27 <nixfreak> not a student ,self taught on many things

04:27 <wolfspraul> have you done FPGA and Verilog programming before?

04:27 <nixfreak> no

04:27 <wolfspraul> how much time do you want to invest in this?

04:28 <nixfreak> as much as it takes

04:28 <wolfspraul> it sounds like get a notebook, set it up, and enjoy :-)

04:28 <wolfspraul> or... get a Milkymist One and start hacking

04:28 <wolfspraul> but those two paths are very different

04:29 <nixfreak> I like hacking code to learn

04:29 <wolfspraul> Milkymist One costs 499 USD + shipping, you can buy it in a few weeks, if you like

04:29 <wolfspraul> but then you will need _a lot_ of hacking, months, maybe years

04:29 <wolfspraul> just saying...

04:30 <nixfreak> did say I was going to create this over night (:

04:30 <nixfreak> didn't

04:30 <wolfspraul> why do you want to embard on this project?

04:30 <wolfspraul> embark

04:32 <nixfreak> cause I would like to see a cheap price for media implementation especially for older folk that get frustrated with complex controls

04:32 <kristianpaul> (GPS) yes it will

04:33 <wolfspraul> "cheap price", ok. slowly you are telling us more about your motivations.

04:33 <wolfspraul> cheap price will require big investment

04:33 <wolfspraul> you can make such a device for 20 USD, I'm sure. but only if someone invests millions of USD or more.

04:34 <wolfspraul> that's why most users tend to use standardized mass-market hardware, because even though it is overpowered, it is still cheaper and quicker to apply to a particular problem.

04:34 <wolfspraul> at least someone has invested billions of USD to drive performance up in those devices

04:35 <wolfspraul> so if you look for "cheap price", that's a big question you have to answer first. cheap for whom? for yourself? for the world? who invests? why? why not use existing devices? etc.

04:36 <nixfreak> cheap as in $100.00

04:36 <wolfspraul> so far my feeling is milkymist one is not the right thing for you, but I could be wrong. and of course I will happily sell you one :-)

04:36 <wolfspraul> yes, someone has to invest millions of USD

04:36 <wolfspraul> 100 USD retail price itself is no problem

04:36 <nixfreak> just looking at my options

04:36 <wolfspraul> sure

04:36 <wolfspraul> and I try to give you free consulting :-)

04:37 <wolfspraul> if it's just for yourself, get a notebook and try to set it up that way. problem solved.

04:37 <nixfreak> i appreciate it

04:37 <wolfspraul> if you are serious about learning Verilog and FPGA hacking, fine, get a Milkymist One and start

04:37 <wolfspraul> the 500 USD will be nothing compared to the thousands of hours you will sink into that over the next years

04:38 <nixfreak> yep I understand

04:38 <aw_> lunch time here, back soon.

04:38 <wolfspraul> if you are hoping that you somehow can contribute to such a device on the market for 100 USD, you still need to find financially strong partners at some point that invest millions in this, and don't mind doing that (i.e. they have no better alternative for their money, called 'opportunity costs')

04:38 <wolfspraul> investments in technology are hard because there is such tough competition

04:39 <wolfspraul> so, given what I currently understand about you, I would think you should categorize that option under "not very likely to actually happen"

04:40 <wolfspraul> you need a very thorough market understanding (sales trends etc) to make such an investment decision

04:40 <wolfspraul> unless you have a lot of industry experience already, and speak as an insider (which you are clearly not :-)), I'd say don't do it, forget it, it's not an option

04:41 <wolfspraul> you will never get that 100 USD device unless Samsung or LG or whoever one day decide to make one, for whatever reason

04:43 <nixfreak> only if I don't try

04:43 <wolfspraul> :-)

04:43 <nixfreak> have a good one thx for the advice

04:43 <wolfspraul> oops

04:44 <wolfspraul> too much realidad maybe. I was just looking up a nice Wikipedia link...

04:44 <wolfspraul> there is something about hardware that attracts people to do stupid things.

04:44 <wolfspraul> I'm wondering which percentage of sparkfun sales (for example) ends up in failed projects.

04:45 <wolfspraul> 'failed' is hard to define probably, maybe the learning experience from the failure is all that matters

04:47 <wolfspraul> kristianpaul: this was the link I wanted to send him :-) http://en.wikipedia.org/wiki/Sysiphus

04:47 <wolfspraul> kristianpaul: sorry in case I wasn't friendly enough to this guy, since you brought him here...

04:48 <kristianpaul> no no, i just pointed to milkymist because he asked about video related stuff with fpgas in #fpga

04:48 <wolfspraul> I doubt he will buy an m1, I doubt he will ever contribute 1 line of code anywhere, and I doubt he will ever get that 100 USD x.264 encoder made. Now he can proove me wrong...

04:48 <kristianpaul> :-)

04:48 <wolfspraul> I know people hate it when someone says "you will never" :-)

04:48 <wolfspraul> he he. being nasty.

04:51 <kristianpaul> realidad, is nasty anyway

04:53 <wolfspraul> I think it's fun. don't fight mother nature.

04:54 <wolfspraul> without reality problems, thinking would be easy, we could all just get drunk and dream.

04:54 <wolfspraul> but once reality hits, and you still want your idea to come true, that's hard

04:54 <wolfspraul> 1000 people want to build an airplane, 1 makes one that can actually fly. no?

05:19 <kristianpaul> yes

05:34 <kristianpaul> 404 http://www.milkymist.org/snapshots/latest/reflash_m1.sh ...

05:37 <wolfspraul> the whole snapshots is gone, maybe they cleaned up

05:37 <wolfspraul> 'snapshots' is not a good name

05:37 <wolfspraul> we need one name, and a clear testing and release process

05:37 <wolfspraul> so we don't push out broken updates to users

05:37 <aw> update: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/2f-results

05:40 <wolfspraul> looks good

05:40 <wolfspraul> what's next?

05:41 <aw> btw we can cancel the test about Press [0] in test program, since the new remote control doesn't have [0] push key. I'll let xiangfu know this.

05:42 <aw> next to capture full screen

05:43 <wolfspraul> no '0' on remote?

05:44 <aw> at the left up corner of remote control is hex code '0'

05:44 <wolfspraul> http://en.qi-hardware.com/wiki/Milkymist_One_accessories#T003

05:44 <kristianpaul> arghh, second time something happen with rtems, and all get stucj in I: Booting...

05:44 <kristianpaul> stuck*

05:44 <wolfspraul> I don't think in the test we need to press that many buttons anyway.

05:44 <aw> so i used my old to push '0

05:45 <wolfspraul> two or three is enough. If those work I cannot imagine what other things we test, unless we press each button on the remote (then we test the remote), but that's a waste imo.

05:45 <wolfspraul> ok, we need to fix that '0' test then, something is wrong there

05:45 <aw> yeah...can reduce some to 5 keys, well... i pressed quickly though. not bit deal.

05:45 <wolfspraul> but it's still stupid and unfocused

05:46 <wolfspraul> two or three is enough, then we know the numbers are correctly getting across and we test the wires, transceiver, even rc-5 implementation in the fpga

05:46 <wolfspraul> if we want to test the remote control itself (which I think we don't need to), then we need to press each button on the remote

05:47 <wolfspraul> anyway, small detail

05:47 <wolfspraul> ok, full screen test now. and then D16, the one I'm waiting for :-)

05:53 <kristianpaul> lekernel: the right way of sync a core with a lower clock than m1 soc using wishbone is,Â Â re-use wishbone handshake but sync the control signals. right? like in page 5 of this pdf http://www.edn.com/file/17561-310388.pdf?force=true

05:57 <aw> do i need to always type IP address/Netmask/Gateway/DNS ?

05:58 <aw> btw, i entered m1 by 'ftp' now. :-)

06:04 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_2f_Screenshot-00.png

06:05 <aw> background is my living room. :-)

06:36 <wolfspraul> screenshot-02 reminds me that it would be cool if we had better brightness defaults, or auto-brightness...

06:36 <wolfspraul> but I like that we have brightness and contrast adjustment on the keyboard now, I think I saw that somewhere. maybe remote next...

06:39 <wolfspraul> aw: I uploaded the audio noise videos, see http://en.qi-hardware.com/wiki/Milkymist_One_run_3_schedule#Audio_Noise

06:40 <wolfspraul> under the rc3 video, I document it was measured between C21 and R18. Was that how you measured the rc2 video as well? I have no information about that under the RC2 video. If you don't remember where you measured, no big deal... I think the key point is documented.

06:40 <aw> wolfspraul, yeah...good video in one column. tks.

06:40 <wolfspraul> one row

06:40 <wolfspraul> do you remember the test points for the rc2 video?

06:41 <aw> yeah..i should set my brightness firstly...

06:41 <aw> you can write the same

06:42 <aw> the audio signals acts alternatively so that it doesn't matter on which C21 pad though. :-)

06:44 <aw> yes, good that i just checked the video I took on rc2 is the same on C21. :-)

06:48 <aw> i modified. :-)

06:49 <wolfspraul> ok great

06:49 <GitHub171> [extras-m1] yizhangsh pushed 1 new commit to master: http://bit.ly/ozVVMY

06:49 <GitHub171> [extras-m1/master] modified box die-cutting files and added size markers - Yi Zhang

07:35 <aw> soldered three wires relevantly on TP36 (PROGRAM_B), TP35 (Done), TP37(RP#) to get ready to scope.

08:34 <aw_> now I uploaded two waveforms without D16:

08:35 <aw_> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_2f_ch1-PROGRAM-B_ch2-DONE_no_D16.JPG

08:35 <aw_> channels: what is what as file name.

08:36 <aw_> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_2f_ch1-RP_ch2-DONE_no_D16.JPG

08:36 <aw_> next steps: I am going to scope with D16. moments...

09:01 <aw_> i found it: yes. the diode D16 was reversed when I stayed in factory. I must be nervous and calm yesterday. hehe...stupid adam & smart werner, yes, there's exact a bar marked cathode on diode's body. and also I trained factory to see footprint's bar mark as anode. then we got in this results.

09:02 <aw_> but...also bad thing is now that : I tested 10 times, i got 5 random NO-reconfiguration too. now go to scope again though.

09:03 <aw_> didn't keep calm yesterday. :(

09:03 <wolfspraul> so wait. now you are saying D16 'works', but now you still get the boot error it was supposed to fix?

09:03 <wolfspraul> so it doesn't work?

09:05 <aw_> sorry that again , i changed my words about NO-reconfigurations. keep calm again.

09:05 <wolfspraul> huh?

09:06 <wolfspraul> I'm calm, just trying to understand :-)

09:06 <wolfspraul> for importance - the "no-reconfiguration" bug is a critical bug. Basically it means that m1 will not boot, right?

09:06 <wolfspraul> that is very important to be properly fixed, 100% of the time. So when you see this bug, that's a serious issue.

09:06 <aw_> now: D16 soldered correctly. D2 ON slightly when power-on and OFF.

09:07 <wolfspraul> yes but that's wrong

09:07 <aw_> no no...wait ..let me finish my words. :-)

09:07 <wolfspraul> ok :-)

09:07 <wolfspraul> that's one of the most important bugs fixed with rc3... (together with audio noise)

09:08 <aw_> 1. D16 soldered correctly. D2 ON slightly when power-on and OFF.

09:09 <aw_> 2. I powered-on ten times, D2 ON slightly and goes OFF well.

09:09 <wolfspraul> I don't understand

09:09 <wolfspraul> can we focus on product behavior, either correct product behavior, or incorrect product behavior

09:09 <wolfspraul> do you see incorrect product behavior?

09:11 <aw_> 3. among those ten times, I pressed SW2 to try to boot and enter gui, then 5 / 10 times D2 didn't ON.

09:11 <wolfspraul> that's a very serious bug

09:11 <wolfspraul> :-)

09:12 <wolfspraul> so D16 has the right polarity now, but you say that basically the reset IC + diode does not fix the boot bug?

09:12 <aw_> let me scope finished again firstly. :-)

09:12 <wolfspraul> ok but focus on understanding product behavior: 1) correct 2) incorrect

09:12 <wolfspraul> otherwise we collect data points that are meaningless

09:13 <wolfspraul> it sounds like you have D16 back on now, and you believe it's correct (polarity), but the product behavior is still INCORRECT

09:13 <wolfspraul> that is, the boot bug is not fixed

09:13 <xiangfu> upload them there: http://milkymist.org/updates/2011-07-13/for-rc3/, include the new IR test. only needs 1,3,5

09:14 <wolfspraul> but with our rc2 when we tested the reset IC+diode, the bug was fixed. What is the difference now?

09:14 <wolfspraul> that's my thinking...

09:15 <aw_> wait...later the new waveforms I can know if the "forward voltage" of D16 is too high. or too margin.

09:15 <wolfspraul> from your #1-#3 above, you seem to say that in 5/10 times, you cannot boot even though the D2 is not lit dimly before?

09:16 <wolfspraul> from before I only know that basically whenever the D2 was dimly lit after power-on, it would not boot. but whenever it was clearly off, it would boot.

09:16 <wolfspraul> if your description is accurate and I understand it correct, we have a third case now?

09:16 <wolfspraul> D2 goes off entirely, but m1 still doesn't boot?

09:16 <GitHub109> [autotest-m1] xiangfu pushed 1 new commit to master: http://bit.ly/rcNar1

09:16 <GitHub109> [autotest-m1/master] tests_ir: reduce test buttons to 3 - Xiangfu Liu

09:17 <wolfspraul> aw_: to guide your work, focus on correct/incorrect behavior of rc3. rc3 must always boot into the GUI after power-on and button press, 100% of the time.

09:17 <wolfspraul> not 99% or anything, 100% of the time

09:17 <wolfspraul> so the moment you have an rc3 not booting once, something is wrong

09:19 <wolfspraul> so yesterday in the factory it didn't boot at all because the D16 polarity was wrong? will D16 have the correct polarity on the other 89 boards?

09:26 <aw_> xiangfu, thanks.

10:32 <aw_> first sum up a little from yesterday until now: (noticed that reset ic is always there on rc3)

10:34 <aw_> 1. without D16, m1 rc3 can boot up successfully at least more than 20 times

10:36 <aw_> 1.1 but not intensively to power up

10:38 <aw_> 2. without D16, after I scoped with a prober stuck on PROGRAM_B, it still can boot up 10 times.

10:41 <aw_> 3. with D16 (soldered), I didn't scope a prober stuck on PROGRAM_B, it got 5 / 10 times failed on boot up, so i still

10:44 <aw_> 4. with D16, scoped with a prober stuck on PROGRAM_B again, it can still boot up until my 26th power up, then got D2 keeps ON.

10:46 <aw_> now m1 can't reconfigure. :(

10:47 <aw_> item 3, made me remembered last time I did 1000 times to reconfigure successfully but forgot to test also 1000 times to boot up.

10:49 <aw_> we indeedly didn't do boot up 1000 times. now I have no idea...later to see if m1 can restore.

10:53 <aw_> i 'feel' an equivalent capacitance in scope's prober (which connected to PROGRAM_B) will easily let m1 get into NO reconfiguration status. But not proved.

11:01 <lekernel> if the initial configuration works, this is certainly something which is fixable in software

11:02 <lekernel> so that's extremely pesky, but not a big worry

11:15 <wolfspraul> ok that's messy, we need to clear this up

11:15 <wolfspraul> if we are unsure whether the way we tested this on rc2 actually was good, then we need to repeat the test on the rc2 board, including full bootup

11:15 <wolfspraul> maybe not 1000 times, but 100 or 200 should be enough

11:16 <wolfspraul> then I'm surprised that I guess rc3 booted fine without D16. that makes me wonder why we added D16 in the first place and whether the problem is fixed if we simply remove it.

11:16 <wolfspraul> which is something we can also verify on the rc2 board, after the full bootup test, by removing the diode there...

11:16 <wolfspraul> after that, we need to stop testing irrelevant cases on rc3, we should only focus on testing the one and only design we plan to manufacture and sell

11:17 <wolfspraul> that means we don't need to find out that the behavior is different when we keep the scope connected, because our users will not keep a scope connected :-)

11:17 <wolfspraul> but if we don't even know which design we actually think will fix the bug, then that introduces some uncertainty that makes us keep the scope connected all the time, I guess

11:18 <wolfspraul> finally, it's surprising that the rc3 board is now in a 'dead' state, hopefully we can find out which state this exactly is and recover

11:18 <wolfspraul> aw_: does this all make sense?

11:22 <aw_> wolfspraul, yeah...i stop and slow a bit. need to think steps on rc2 test again.

11:23 <wolfspraul> good. and also see my other points - why do we need D16 at all? why did rc3 boot without D16? stop testing irrelevant cases. how to recover rc3 board now.

11:24 <wolfspraul> seems those are all valid questions, at least to me... and without answering them we are just producing something we don't really understand.

11:31 <wpwrak_> aw_: (D16Â Â reversed) that was an easy guess ;-)

11:34 <lekernel> if it's the same problem as I had twice, reflash

11:36 <aw_> wpwrak_, yeah :(

11:37 <aw_> lekernel, what did you mean about last sentence? :-)

11:37 <lekernel> to recover your board that no longer boots, reflash it

11:38 <wolfspraul> lekernel: why did we add D16 in the first place? we believe the circuit is incorrect without it? How come Adam had no booting problems while he had D16 removed?

11:38 <lekernel> a funny thing I noticed is that reflashing only the standby bitstream doesn't seem to help

11:38 <aw_> lekernel, yeah

11:38 <lekernel> I suspect some weird behaviour of the xilinx silicon

11:38 <lekernel> try reflashing standby only first, to confirm

11:39 <lekernel> then if it still doesn't boot (D2 dimly lit) reflash the rest

11:39 <lekernel> I'll use altera next time, I'm tired of this kind of things

11:40 <wpwrak_> wolfspraul: documenting behaviour with and without scope can be quite relevant, particularly while hunting for a problem that's not fully understood yet

11:40 <wolfspraul> I'm not against it, but we will run into Altera specific bugs for sure.

11:41 <lekernel> i've heard a lot less negative comments and errata entries about any altera chip than about the spartan6...

11:41 <wolfspraul> wpwrak_: yes, that's why we first need to define what we are workign on here. do we have doubts about the design/schematic? or the particular rc3 board under test?

11:43 <wpwrak_> wolfspraul: does adam have enough boards yet to even make this distinction ?

11:44 <wolfspraul> (that's unrelated to this particular run here) I am open minded about Altera, we just need to keep in mind that between Altera and Xilinx, over time the 'leadership position' will probably bounce back and forth between them. So we may end up with bad timing and always regret our switch later...

11:44 <wolfspraul> another thing to consider is that once we sold a board, we have to support it for good, in software updates. So our build and test/release process will get more difficult, if we build up a trail of chip changes in our product history.

11:45 <lekernel> with D16 present and with the correct polarity, does the initial configuration always work? (i.e. D2 is totally off)

11:45 <wolfspraul> other than that I'm cool about Altera, it proves the portability point, may lower barriers for some contributors, and gets us better chips

11:46 <wpwrak_> lekernel: the art of reading errata ;-) here's one i came across recently: http://www.cypress.com/?docID=27429Â Â lovely items, particularly the very last one (#14) should warm the heart of any connoisseur of USB. and the "we won't fix any of these" isn't very encouraging either. particularly in a chip that something like 8 years on the market. (context: i heard of that one in a discussion on how to bring USB host to the ben)

11:46 <wolfspraul> lekernel: I think adam's tests are inconclusive to answer that

11:47 <lekernel> that's the one and only thing to test about the reset IC and D16

11:47 <wolfspraul> yes I know

11:47 <wolfspraul> like I said - lots of irrelevant tests

11:47 <wolfspraul> without D16, with probe connected

11:48 <wolfspraul> but what would help is if we build upon solid assumptions, for example it turns out when we did the test on rc2, we forgot to actually boot

11:48 <wolfspraul> you say even if there is a problem there it can be fixed in software, but that's speculative at this point

11:48 <lekernel> then there might be other layers of peskiness that randomly and rarely corrupt the flash, prevent booting after pushbutton press, etc. but those are highly unlikely to be related to the reset IC

11:49 <wolfspraul> ok, but let's collect hard data points, then we are not swimming in an ocean of uncertainty

11:49 <wolfspraul> maybe it just becomes a little lake of uncertainty :-)

11:50 <wolfspraul> lekernel: did you see my question above - why did we put D16 in in the first place? why did rc3 boot fine without it?

11:51 <lekernel> see mailing list archives

11:51 <lekernel> it's to make sure the flash is in reset at power up and doesn't register wrong commands

11:52 <lekernel> and all boards boot fine without it in 99% of the cases ...

11:53 <Vaati_> oh whats this?

11:54 <wolfspraul> lekernel: you mean boot fine without the entire reset_ic+diode fix?

11:54 <lekernel> yes

11:54 <wolfspraul> totally wrong.

11:54 <wolfspraul> you have seen how many boards?

11:55 <wolfspraul> 2?

11:55 <wolfspraul> I saw the other 38, each one of them

11:55 <wolfspraul> if you want I can make a little survey of the few active users :-)

11:55 <wolfspraul> I have this problem all the time

11:55 <wolfspraul> xiangfu has it, Jon has it

11:55 <wolfspraul> to turn on my m1, I always need to try to plug the power supply in several times, it's normal for me

11:55 <lekernel> yes, I have it as well, but it's still rare - on my board at least

11:56 <wolfspraul> fine but please don't say "all boards fine in 99%" if you actually have such little visibility

11:56 <wolfspraul> just trust me as your manufacturer telling you that it's a serious problem and I'm very happy and optimistic that we will fully fix it in rc3

11:56 <lekernel> ok, then some boards boot fine without it in 99% of the cases ...

11:56 <wpwrak_> Vaati_: at the moment, it's a bunch of people trying to figure out what's wrong with the ~90 boards that are just going through SMT :) boards with FPGA, video, and such.

11:56 <wolfspraul> maybe you got a lucky 2 of the 40 :-)

11:57 <wolfspraul> if there is any value I can provide, it's solid testing across the entire run

11:57 <Vaati_> hmmmÂ Â milkymist sounds familiar -- I think I came across it once while googling for some stuff

11:57 <Vaati_> but I havent really ever looked into it

11:57 <wolfspraul> kristianpaul: do you have this problem sometimes? that you need to plug in the power of your m1 multiple times before you can boot?

11:57 <Vaati_> oh wow

11:57 <wpwrak_> wolfspraul: it's not luck. the guy who's in the best position to fix a problem always gets the boards that don't have it. one of the many corollaries of murphy's law :)

11:59 <wpwrak_> wolfspraul: isn't reliable boot more the domain of the reset chip ? D16 sounds more like preventing flash corruption (?)

12:00 <lekernel> one of the problem we can have if the flash reset isn't asserted at power up is that the flash register a command and sends status info instead of data

12:00 <lekernel> this would make the fpga unable to read its bitstream, so no configuration, d2 dimly lit, etc.

12:01 <lekernel> d16 is here to assert the flash reset at power up

12:01 <lekernel> is that clear?

12:01 <wolfspraul> wpwrak_: I am pretty sure we are very close to fixing this bug once and for all. But I guess the bug doesn't want to go without some drama... a nasty bug indeed.

12:02 <wolfspraul> Adam's testing was a little unfortunate/unfocused, but in the next round we'll narrow it down and then it's done, I'm sure.

12:02 <wolfspraul> adam has only 1 rc3 right now, the only one that is fully soldered

12:02 <wpwrak_> lekernel: okay, makes sense. is this also possibly connected to the obscure flash corruption problem you had a while ago ? or is that one already off the table

12:03 <lekernel> yes, if the flash happens to register a write or erase command at power up (unlikely but who knows...), that might well corrupt it.

12:04 <lekernel> d16 is meant to prevent that as well ...

12:07 <wolfspraul> ok so we should focus on getting things to work with D16, because we believe D16 is what fixes the underlying bug

12:07 <wolfspraul> together with the reset ic

12:08 <lekernel> yes

12:08 <lekernel> we also connected the reset IC to PROGRAM_B to make sure the FPGA doesn't attempt to read the flash while it's still in reset

12:08 <wolfspraul> maybe Adam's next quick test should be the old rc2 board, unmodified (with reset+diode), and test the complete bootup, 100 times or so

12:08 <lekernel> that's all this reset circuit is about.

12:08 <lekernel> do you get it now?

12:09 <lekernel> I have explained it several times already :p

12:09 <wpwrak_> lekernel: perfect. now the question is just why it doesn't to that. pity you used a diode and not a 74xxx1G logic gate. the be wary of diodes is one of the lessons we learned in the early GTA02 days at openmoko. there we also had such a reset "mixer" with a diode that caused all sorts of fun. (there, the problem was the reverse current)

12:09 <lekernel> reverse current? mh

12:10 <lekernel> impedance is some 10k there

12:11 <lekernel> yeah, 10K

12:11 <lekernel> the diode pulls down a 10K pull-up resistor to 3.3V

12:11 <wpwrak_> lekernel: the diode we had back then was a grotesquely ill-fitting choice. so the chance that you fell into that trap is small :)

12:11 <wpwrak_> lemme find the schematics ....

12:11 <lekernel> and on the other side it's 4.7K

12:12 <wpwrak_> gaah. too many pictures on that page :)

12:12 <lekernel> leakage current should make a totally negligible voltage drop across 4.7K

12:12 <lekernel> unless that's a very, very bad diode

12:14 <wpwrak_> PROGRAM_B_2 some sort of a nWAIT signal ?

12:15 <lekernel> then forward voltage is 0.33V at 0.1A for that diode

12:15 <lekernel> wpwrak_, no, it's like a reset signal that clears FPGA configuration and prevents configuration attempts when asserted

12:16 <lekernel> and for the flash, any voltage below 0.6V is treated as logic low... so no problem there either

12:17 <lekernel> pfff ...

12:18 <lekernel> there's a note in the flash datasheet says "Sampled, not 100% tested." for the logic low voltage specification, whatever that means

12:19 <wolfspraul> that means there is a small change that they will later withdraw from claiming this feature

12:19 <wolfspraul> either the documentation hasn't been updated saying that it was 100% tested, or the feature is not reliable and should be removed from documentation

12:19 <wpwrak_> hmm, FLASH_RESET_N comes from the FPGA and goes back into it again (via PROGRAM_B_2)

12:19 <wolfspraul> s/small change/small chance/

12:19 <lekernel> wpwrak_, no, the diode D16 blocks that path

12:20 <lekernel> so when the FPGA asserts the flash reset, it only resets the flash (and not itself)

12:20 <lekernel> both are active low signals

12:20 <wpwrak_> ah yes, got it.

12:22 <wpwrak_> is the FLASH_RESET_N output open-drain ? (IO_L48N_MIDQ9_1)

12:24 <lekernel> before FPGA configuration it is high impedance with weak pull up resistor (inside the fpga)

12:25 <wpwrak_> wolfspraul: (sampled and other weasle words) data sheets usually keep a lot of things rather vague. nothing new there at all :)

12:25 <lekernel> after FPGA configuration it is push pull... can be made open drain, but shouldn't matter

12:28 <lekernel> theoretically, we should be able to remove R60, since it is already present in the FPGA

12:45 <wpwrak_> i'm not sure if i'm reading the A4809 data sheet correctly, but the output current seems to be extremely low

12:47 <wpwrak_> well, in some cases. now looking at the graphs on page 7, and things look fairly normal there (i.e. several mA instead of uA)

12:48 <lekernel> ah, indeed...

12:49 <lekernel> that could well be the problem

12:54 <lekernel> yeah 0.05mA

12:55 <lekernel> that has to pull low a parallel combination of 4.7k and 10k resistor (neglecting the diode impedance)

12:55 <wpwrak_> at low Vdd, though

12:55 <lekernel> we get low Vdd during power the power ramp that is causing our headaches

12:56 <wpwrak_> ah, i see. yes, then that would be bad

12:56 <lekernel> that's an equivalent resistor of roughly 3.2k... 0.05mA drops 160mV (!) there

12:57 <lekernel> ok we need larger resistor values

12:57 <lekernel> though the fpga's built-in resistor might cause us some trouble... argh

12:58 <lekernel> or, there's a P-channel version of the reset IC, with 2mA output

12:58 <wpwrak_> would be good to see the signals on a scope. let's hope adam's has at least three working channels :)

13:00 <lekernel> maybe when Vdd is low enough the flash wouldn't register any command at all anyway

13:00 <lekernel> and if we are lucky we could get away by removing R60 and using a higher R30 value

13:02 <lekernel> from the reset IC datasheet: "The value of R0 need to be selected in different application, typical value is 470K&"

13:02 <lekernel> we are 100 times below that

13:03 <lekernel> meh

13:03 <lekernel> nice catch ...

13:04 <wpwrak_> oopsie. 2 orders of magnitude off isn't so nice

13:05 <lekernel> the FPGA pull up resistor sources 200 to 500 uamp according to datasheet

13:05 <lekernel> there's still a chance we could get those boards to work :)

13:05 <lekernel> remove R60, use a large R30... might do the trick

13:06 <wpwrak_> worst case, replace U24

13:06 <lekernel> assuming we could find a footprint compatible replacement with the right characteristics

13:07 <wpwrak_> the resistor swapping game is a bit complicated by having a diode there. for rc4, you may want to consider some 74xxx1G logic gate. they're nice and clean :)

13:07 <lekernel> there's also the option of cutting the FPGA trace to insert a diode to block the on-chip pull-up, but that would be messy

13:07 <wpwrak_> oh there's tons of them ...

13:07 <lekernel> the logic gate may not work when the power supply voltage is too low

13:08 <wpwrak_> yuo can pick one that goes pretty low. lower than a D+R circuit :)

13:09 <lekernel> won't they have output current problems too?

13:09 <lekernel> also, the FPGA pull up resistor only gives 12 to 100 uamp at 1.2V

13:10 <lekernel> there's a good chance this could work, heh

13:10 <lekernel> aw, remove R60 and use 470K for R30

13:10 <lekernel> this should hopefully fix all power up issues

13:10 <lekernel> (see IRC log)

13:11 <aw> sounds have good news. :-) oaky...let me read them completely first. :-)

13:13 <wpwrak_> lekernel: not so quick ... what's the input leakage current of the FPGA ? (PROGRAM_B_2, to be precise)

13:14 <wpwrak_> (let's hope did something sane there. not like samsung ...)

13:22 <lekernel> oh, it has a pull up too

13:22 <lekernel> so you can remove R30

13:22 <wpwrak_> for similar chips, on digi-key, search for reset, then pick category "PMIC - Supervisors", then type "Simple Reset/Power-On Reset", then narrow down by package and voltage

13:22 <wpwrak_> kewl. it's getting simpler all the time :)

13:23 <lekernel> it's the same pull up as the other pins +/- 20 uamp

13:24 <wpwrak_> and the NOR has some input leakage as well

13:24 <lekernel> aw, so just remove R30 + R60

13:25 <lekernel> NOR has 1 uamp leakage

13:25 <wpwrak_> good. that won't cause problems. neither up nor down.

13:25 <aw> lekernel, second..i read very slow. sorry. not completely finished read. :-)

13:33 <wpwrak_> lekernel: and for single gates, the 74AUP1G family is quite nice. works from 0.8 V to 3.6 V

13:33 <wolfspraul> lekernel: the R30/R60 change is in addition to keeping the original reset ic + D16 diode, right?

13:36 <lekernel> yes

13:41 <wolfspraul> aw: one more thing we are sure about now (lekernel and wpwrak_ please speak up if I'm wrong): we do not need to test any case without D16

13:42 <wolfspraul> D16 is an essential part of the circuit we have in mind, removing it makes no sense at all. we do not need to test that.

13:42 <wolfspraul> always keep D16 there (in correct polarity of course)

13:42 <aw> good catchs on current driven between fpga inside and reset ic's output analysis. As well as the parallel equivalent resistors(R60//R30), nice analysis!

13:45 <wolfspraul> if the R30/R60 removal works now (for rc3), do we still want another (cleaner?) solution for rc4, or can we keep this solution?

13:45 <aw> firstly, do i just reflash firstly standby image to restore after I replace a R30(470K) and remove(R60)?

13:46 <wpwrak_> wolfspraul: (keeping D16) yes, you want to keep D16 or the rc2 gremlins come back.

13:47 <lekernel> aw, yes, you can test the reflashing first

13:47 <aw> or just replace R30 and remove R60 to directly check if rc3 works back well?

13:48 <wpwrak_> wolfspraul: (rc4) i would recommend considering a 74AUP1G08 or 09 instead of the diode-based "wired and". that would provide a cleaner barrier than the diode does.

13:48 <lekernel> 1) reflash standby image, check if it works

13:48 <lekernel> 2) if it didn't work reflash all the rest

13:48 <lekernel> 3) remove R30/60 and check it works _reliably_, i.e. power cycle a few hundred times

13:48 <wolfspraul> and boot

13:49 <wpwrak_> wolfspraul: (rc4) of course, if things work perfectly now, you may not want to take the risk.

13:49 <wolfspraul> and put D16 back in, and no scope

13:51 <kristianpaul> wolfspraul: (cant boot rtems) is not too often, but so far happened two times, at least that i'm aware of

13:51 <wolfspraul> kristianpaul: ok so from your feeling that's in 5% of power on attempts? (that it won't power on)

13:52 <wolfspraul> or 10% or 1%? (just roughly)

13:52 <wolfspraul> in my board maybe 30%, I have a feeling it's a bit higher when the board is warm/hot

13:53 <wolfspraul> I'm not worried if I go somewhere that I won't be able to power it up, but I'm fully prepared that I may have to replug the power a few times. With that in mind it's bearable for me.

13:54 <aw> lekernel, wait. you wrote 'remove R30/60'. but i read back above is to 'remove R60 and use 470K for R30'. :-)

13:54 <lekernel> no, remove R30 and R60, do not put 470K back in

13:54 <lekernel> we figured out later that R30 is already included in the FPGA

13:55 <aw> okay. got it. an equivalent resistance as to be R30 role.

13:57 <wpwrak_> lekernel: oh, speaking of the diode's reverse current: it should be about 2 uA at 25 C, 2 mA at 100 C (fig. 4). so on a hot day, you could in fact replace it with a 0R ;-)

14:00 <lekernel> mh, crap

14:00 <wpwrak_> lekernel: since you already have a wired-AND, even without the diode, it may not be much trouble

14:02 <aw> lekernel, i just modified xiangfu's script, let me if it's okay for reflash standby only >> http://pastebin.com/J0DPhHxk

14:06 <lekernel> yeah should be ok

14:06 <aw> alright

14:07 <kristianpaul> wolfspraul: not power on, thats different issue, i meant it load bitstream but wont load rtems.. or stay loading, as is flash we're corrupted,

14:07 <wolfspraul> how about power on?

14:09 <kristianpaul> its hard to tell, a guess will be unfair, but yes i remenber having this issue, but the no the last month..

14:11 <wolfspraul> ok, makes sense

14:11 <kristianpaul> may be not since jtag let me on a state in wich M1 is electrically power on, is not needed, as just fpga get in a standby state after i flash somthing new on it

14:11 <wolfspraul> I think every board will show it, percentage I am not clear (and never cared much because whether it's 1% or 80%, it needed to be fixed fully anyway)

14:12 <kristianpaul> i think there are two issues here, for making product behavior incorrect

14:12 <wolfspraul> yes

14:12 <wolfspraul> correct

14:12 <wolfspraul> we hope both are 100% fixed in rc3 :-)

14:12 <kristianpaul> the electrically power on problem, that you're tryin to fix delaying fpga

14:12 <wolfspraul> but please keep describing

14:12 <kristianpaul> and the booting in to rtems problem that also happen !

14:13 <kristianpaul> and for end user will be just a booting issue as well

14:13 <wolfspraul> from now on Adam will test with a complete boot all the way to rendering

14:13 <kristianpaul> yes, please,

14:14 <kristianpaul> i'm sorry i dint describe rtems booting issue before, so fat i tought was a nornmal error percentage after flashing m1 more than 10 times a day...

14:15 <kristianpaul> or may be because a partial reflhash of nor? so if you dont reflash the whole thing it may lead to corruption somwehere?..

14:15 <kristianpaul> well, just a few guesses

14:16 <aw> step 1):Â Â good, now m1 reconfigure normally, let me see if boot up. :-)

14:16 <aw> yeah...can't boot up now...go to remove R30/R60. :-)

14:20 <wolfspraul> kristianpaul: there may be multiple bugs, that's why we need to calmly fix them one by one

14:20 <wolfspraul> otherwise we are a remote group of people, communication is never perfect, and we are all confused by different reports and different things we mean when we report something

14:24 <wolfspraul> for example in lekernel's list earlier he wrote about 'check that it works', but now Adam is writing about "cannot boot". Do they mean the same thing? works == boots? not sure.

14:25 <wolfspraul> let's see what Adam finds next :-)

14:25 <wolfspraul> with 'boot' I mean all the way to gui or rendering

14:26 <wolfspraul> but others may mean different things, or even different things depending on context

14:27 <wolfspraul> what do we have? power-on -> fpga reconfigure (?) -> D2 goes off -> middle button -> D2 goes on -> boot -> gui/render

14:27 <wolfspraul> right or wrong?

14:27 <wolfspraul> kristianpaul: the first step after power-on is called 'reconfigure'?

14:28 <wolfspraul> I'm a bit confused about the 're' since it's the first thing after power-on...

14:28 <aw> when i said 'reconfigure' means D2 dimly lit short time then OFF, when i said that 'boot up' yes i meant the way to gui and D2/D3 is all ON. :-)

14:28 <wolfspraul> ok so we use roughly the same terminology

14:29 <wolfspraul> wpwrak_: will R30/R60 impact anything after reconfigure?

14:30 <wolfspraul> my understanding was that any impact of the R30/R60 change is for reconfigure itself, not after it

14:32 <kristianpaul> yeah, well reconfigure is okay, at least you sort the bitstream is loaded :)

14:33 <aw> okay..removed. let's check if reconfigure normally first after power up.

14:33 <kristianpaul> lekernel: standy bitstream reconfigure fpga with soc bistream after middle button pressed right?

14:35 <aw> good on reconfiguration stage, but D2/D3 only keeps ON when I pressing SW2. then both D2/D3 is OFF. :(

14:36 <aw> the go both LEDs OFF.

14:36 <wolfspraul> aw: did you reflash only the standby image, or everything?

14:36 <aw> i stop now. :-)

14:36 <wolfspraul> I think you should reflash everything.

14:37 <aw> i have to reflash everything?

14:37 <wolfspraul> sure I would do that

14:37 <aw> hmm..second.

14:37 <wolfspraul> no point in trying to fix 3 bugs at once

14:39 <wolfspraul> get everything back to the best state we can imagine now

14:39 <wolfspraul> and then test. first power-on, then middle-button, then boot, then render.

14:40 <aw> reflashing...

14:41 <aw> i set 'NOVERIFY="noverify"' , so speed up. :-)

14:43 <wolfspraul> after reflash, unplug the power from the board entirely, so we start from a known state

14:45 <aw> reflash done. power off > power on > can reconfigure > press SW2 > D2 doesn't keep ON....not success on boot

14:45 <aw> yes, i plugged off adapter power.

14:45 <wolfspraul> interesting

14:45 <wolfspraul> :-)

14:46 <wolfspraul> D16 is there and with correct polarity?

14:46 <aw> yes, soldered there

14:46 <aw> polarity correctly.

14:47 <wolfspraul> so that's exactly the same as you reported at the beginning

14:48 <wolfspraul> but this may not be related to R30/R60 anyway, because it's a problem after reconfigure

14:48 <wolfspraul> it's strange though that booting doesn't work because of the diode? maybe it loaded a corrupt bitstream?

14:49 <wolfspraul> if the bitstream is correct, then I wouldn't know how the diode could affect the chances of booting

14:49 <wolfspraul> we need to wait for lekernel for more input and ideas :-)

14:49 <aw> yes

14:49 <wpwrak_> (r30/r60 after reconfigure) dunno. i wouldn't think it should, but then i don't know these things too well

14:50 <wpwrak_> i wonder if the flash reset output could glitch during configuration

14:50 <wolfspraul> maybe looking at the serial console could give us clues?

14:51 <aw> wpwrak_, but indeedly your discussion above with lekernel was good and reasonable in tech knowledge well...

14:51 <wpwrak_> aw: you should be able to see glitches on FLASH_RESET_N by probing TP37, with a falling or rising edge trigger

14:51 <wolfspraul> aw: can you press the middle button multiple times?

14:51 <wpwrak_> aw: (see glitches) that is, if there are any :)

14:51 <wolfspraul> basically now the D2 goes off, and pressing the middle button does nothing, right?

14:52 <aw> wait...

14:53 <aw> D2 will show ON in a short time that time is that when I press middle buttun(SW2), then D2 goes to OFF

14:53 <wolfspraul> maybe it actually boots? how long do you wait?

14:54 <wolfspraul> ah no, I think D2 should stay on during the boot, and finally D3 will go on as well

14:54 <wolfspraul> aw: when you press the middle button again (second time), will D2 go on again? or stay off

14:55 <aw> D2 will always go ON then OFF after I press SW2.

14:55 <aw> it's not right. :-)

14:56 <wolfspraul> hmm

14:56 <wolfspraul> but it does that every time

14:56 <wolfspraul> I think that means it's still running

14:57 <wolfspraul> ok, last test

14:57 <aw> wpwrak_, yeah..maybe the D16's forwarding voltage doesn't make a good low enough.

14:57 <wolfspraul> try to power cycle, reconfigure, press middle button

14:58 <wolfspraul> say 5 times, should be enough

14:58 <wolfspraul> I want to see whether it ever boots to gui/rendering

14:59 <aw> hmm? no. i stop. :-) it should not like this. :-)

14:59 <aw> bad adam. :-)

14:59 <wolfspraul> ok

14:59 <wolfspraul> but I think the behavior sounds stable now

15:00 <wolfspraul> of course we still don't have a solution

15:00 <aw> i need to see like werner's said on glitches later. :-)

15:00 <wolfspraul> ok

15:00 <wolfspraul> maybe also compare to our earlier rc2 results?

15:01 <wolfspraul> we tested this circuit before and found it working, or our test back then was completely wrong?

15:01 <aw> TP37 should have something to discover. :-)

15:01 <wolfspraul> then there is werners 74AUP1G08/09 idea, I don't know how hard it is to try that. sounds like that is not an option for rc3.

15:02 <wolfspraul> or a different D16 diode with other specs?

15:03 <lekernel> aw, you completely reflashed your board?

15:03 <wolfspraul> yes

15:04 <wolfspraul> what he sees now is that it seems to reconfigure fine (D2 goes off), but then when pressing the middle-button D2 will go on briefly then back off

15:04 <wolfspraul> repeatedly, so if he presses the middle button again, D2 will come on again briefly and go off again

15:04 <lekernel> yes, it should do that

15:04 <aw> wolfspraul, i feel discussions from werner and lekernel is good, just don't know once a corrupt occurred, does flash's rest pin internal or fpga itself doesn't nver restore back more? don't know

15:04 <lekernel> then turn hard on

15:04 <wolfspraul> no, that doesn't happen

15:04 <wolfspraul> it stays off

15:04 <lekernel> ok, leakage current of the diode causing problems i'd guess

15:05 <wolfspraul> aw: if the circuit is stable, there should be no corruptions ever, I'm sure.

15:05 <wolfspraul> so we don't need to worry that much how to recover from a corrupted nor, I think. because that just won't happen in normal use.

15:05 <lekernel> aw, can you try with a 470k R30?

15:06 <aw> i can try reflashing all image again to see. second

15:06 <lekernel> no, don't waste time on reflashing

15:06 <aw> before I try 470K 30, try again. :-)

15:06 <lekernel> do you have anything on the serial console btw?

15:07 <aw> hmm..no time no use serial console. please let me know the baud ratio setting.

15:11 <aw> sorry that, long time no use. :-)

15:18 <wolfspraul> got disconnected

15:21 <lekernel> 115200

15:22 <aw> flow control? 8 bits 1 bit stop

15:23 <lekernel> no flow control

15:23 <aw> okay, no parity?

15:24 <lekernel> no

15:24 <lekernel> have you soldered r30 already?

15:24 <lekernel> serial console is unimportant

15:25 <aw> want to at the same time if you want to see. :-)

15:25 <aw> moment...

15:32 <wpwrak_> lekernel: btw, could FLASH_RESET_N glitch between the start of (re)configuration and when the system should start running ? or is there maybe even an intentional flash reset at the end of configuration ?

15:35 <wpwrak_> lekernel: also, are there any checksums or such in configuration ? i.e., when configuration completes, do we have any knowledge of whether things were loaded correctly ?

15:43 <aw> lekernel, can reconfigure but no boot after 470K R30.

15:46 <wolfspraul> aw: I think this is a stable condition now, not bad.

15:46 <wolfspraul> here is my current understanding:

15:46 <aw> mmm

15:46 <wolfspraul> our best bet circuit right now is with reset_ic, diode, both R30 and R60 removed

15:47 <wolfspraul> but in this case, for some reason after reconfiguration (successful reconfiguration?), the m1 doesn't boot

15:47 <wolfspraul> from here we could go in a number of directions - we could go back to the rc2 and understand why it worked there

15:47 <wolfspraul> you could dig into suspected glitches

15:48 <wolfspraul> we could come up with changes beyond r30/r60

15:48 <wolfspraul> we can go back and remove d16 (which makes it boot), but probably that will expose us to the old reconfigure problem

15:48 <wpwrak_> aw: when will you get more boards ?

15:49 <wolfspraul> we could see whether the serial console holds any clues

15:49 <wolfspraul> or lekernel or wpwrak_ could come up with any other idea :-)

15:49 <aw> guess i need to go there to solder myself to get 2 ~ 3 pcs next Monday

15:49 <wolfspraul> which way to go?

15:50 <wolfspraul> aw: I think you should definitely go back to the old rc2 you had with reset ic + diode, and see whether that one fully boots

15:50 <aw> mrt or scooter

15:50 <wpwrak_> wolfspraul: first, i'd like to understand a little better how all the signals are supposed to behave

15:50 <wolfspraul> no no, not 'which way to go to the factory' :-)

15:50 <wpwrak_> hehe ;-))

15:50 <wolfspraul> which way to go in our analysis & fix

15:50 <aw> ;-O

15:50 <wolfspraul> I definitely want to know whether the old rc2+reset_ic+diode boots or not

15:51 <wolfspraul> that was the basis for our design decision, but we may have overlooked several issues there, I guess

15:51 <wolfspraul> then we should also connect the serial console on the rc3 adam has now, just to see if we are lucky and anything comes up there

15:51 <lekernel> aw, record flash reset and fpga program_b with a 2-channel scope, at 1) power up 2) boot time. use 1:10 probes if you are worried about capacitance.

15:52 <lekernel> wpwrak_, there is a flash reset after configuration with the soc bitstream

15:53 <lekernel> and after each soft reboot

15:53 <wpwrak_> lekernel: at such low currents, i'd be VERY worried about capacitance :)

15:53 <aw> lekernel, alright...I'll do this tomorrow morning and links given here.

15:53 <wpwrak_> lekernel: (reset) perfect

15:53 <wpwrak_> lekernel: does PROGRAM_B_2 do anything after the configuration ?

15:53 <lekernel> no, it should not

15:54 <wpwrak_> lekernel: also good. will it become push-pull after configuration ?

15:55 <aw> lekernel, is this descibe more? http://en.qi-hardware.com/wiki/File:Configuration_sequence.png

15:55 <lekernel> program_b is a dedicated input with no change when the fpga is configured

15:55 <lekernel> no this has nothing to do here

15:55 <wpwrak_> lekernel: (program_b_2 input only) hmm, then the reverse leakage current shouldn't matter

15:56 <lekernel> wpwrak_, what can happen is that when the soc boots, it resets the flash, then the leakage diode current pulls program_b low and clears the fpga

15:56 <lekernel> aw, scope traces first

15:56 <lekernel> then we will know

15:56 <lekernel> instead of just guessing ...

15:57 <wpwrak_> agrees. scope next :)

15:58 <wpwrak_> lekernel: (clear the fpga) so there are no checksums or such in configuration ?

15:58 <lekernel> wpwrak_, what does it have to do with checksums?

15:59 <lekernel> when program_b is pulsed, the fpga is cleared, period

15:59 <wpwrak_> oh, like that. i see.

15:59 <lekernel> this has absolutely nothing to do with checksums

15:59 <aw> alright...let me measure now...then i sleep. :-) :-)

15:59 <lekernel> aw, how many channels does your scope have?

15:59 <aw> so before I scope, should i remove 470K R30?

15:59 <lekernel> can you measure flash_reset_n, program_b and 3v3 at the same time?

15:59 <aw> two only

15:59 <lekernel> no, leave it there

16:00 <wpwrak_> lekernel: yes, then we would have a reverse current scenario indeed

16:00 <lekernel> ok then skip 3v3

16:00 <kristianpaul> why serial console is unimportant?, it could tell if flickernoise loaded or not

16:00 <wpwrak_> wolfspraul: when you come across some money, get adam a 4 channel scope :)

16:01 <wolfspraul> the other 2 channels are in Buga :-)

16:01 <kristianpaul> ;-)

16:01 <wpwrak_> wolfspraul: ideally, one with lots of memory. alas, the good ones are expensive. 10+ kUSD

16:02 <wpwrak_> kristianpaul: seems that lekernel it pretty sure there's nothing alive in the fpga. if the reverse current scenario is what's happening, that would indeed be the case.

16:03 <kristianpaul> okay,so... lets asume that :)

16:05 <wpwrak_> hmm, 26 C in taipei. leakage should be around 2-4 uA then. still sub-critical, at least in theory. one thing to keep in mind: if it suddenly starts to work, that may be because it gets colder for the next ~4-5 hours.

16:07 <wolfspraul> what if m1 runs at a nightclub in 45 degree Bangalore?

16:07 <wolfspraul> (just joking... we don't need to discuss this now :-))

16:08 <wpwrak_> wolfspraul: you should let an M1 roast out in the sun for a bit and then see how well it works ;)

16:15 <aw> hi, interesting things happens again. once my prober connected to TP then m1 get boot up!

16:16 <wpwrak_> i was afraid something like that would happen ...

16:16 <wpwrak_> aw: can you identify if a single probe already makes a difference ?

16:17 <aw> from my views seeing flash reset and program_b, they are both synchronized at the same rising pulse @ 2.5ms :-)

16:17 <wpwrak_> aw: note: please try a few times. i'd say at least 5 times, better 10. this can easily be a statistical problem now, and we'll need a reasonably large sample size.

16:17 <aw> yeah...yes..try program_b or flash reset pin caused that..second. :-)

16:18 <wpwrak_> (same pulse) @2.5 ms = the pulse duration ? or the time after powering up ?

16:19 <wpwrak_> if both show a relatively fast pulse, that would be the diode's evil work

16:19 <aw> the time after powerin g up. they both are the same. :-)

16:19 <wpwrak_> how long is the pulse ?

16:20 <wpwrak_> maybe just post a screenshot

16:20 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_2f_ch1-PROGRAM-B_ch2-DONE_no_D16.JPG

16:20 <aw> please see this firstly , the CH1 is program_b

16:20 <lekernel> aw, are you using a 10:1 probe?

16:21 <aw> after a rest ic's delay time (~= 200ms) , program_b goes from LOW to HIGH.

16:21 <wpwrak_> kristianpaul: do you have a script that downloads a screenshot from the scope and puts it on downloads.qi-hardware.com/people ? might be useful for adam

16:21 <lekernel> aw, why is there a glitch on PROGRAM_B in the beginning?

16:21 <wpwrak_> aw: ah, you;'re already there. great :)

16:21 <kristianpaul> wpwrak_: nope i dont, i just visit by web

16:22 <lekernel> the first pulse

16:22 <lekernel> where you put the cursor btw

16:23 <aw> wpwrak_, that DONE(CH2) is initalize after power on then goes LOW...then once fpga reconfigure works done, that DONE pins goes high to tell it's done. :-)

16:23 <wpwrak_> kristianpaul: ah, he's got a TDS1012. that one may not even have ethernet. i thought he had a scope similar to yours

16:24 <aw> lekernel, yeah..moment...let me change. ;-)

16:25 <wpwrak_> interesting ... 200 ms load time (to DONE), and 200 ms reset delay by the reset chip

16:25 <aw> yes

16:26 <wpwrak_> lekernel: do PROGRAM_B_2 is edge-triggered, not level-triggered ?

16:26 <wpwrak_> s/do/so/

16:26 <lekernel> it's level triggered

16:27 <aw> do you want me to level triggered? guess 200ms, they should be the same.

16:27 <wpwrak_> then i don't understand what i'm seeing. looks as if PROGRAM_B_2 was low all the time, thus constantly resetting, yet the FPGA thinks it finishes configuration (indicated by DONE)

16:27 <wpwrak_> something doesn't compute :)

16:27 <wpwrak_> aw: (level triggered) that was about how the FPGA works, not your scope setup

16:28 <lekernel> aw, on your scope, the first is PROGRAM_B and the other one DONE?

16:28 <aw> lekernel, yes. sorry that ...now i changed to 1:10, then can't boot more. :(

16:28 <lekernel> aw, ok, no, this is good!

16:29 <aw> lekernel, yes, CH1 is program_b, CH2 is DONE pin.

16:29 <lekernel> at least the measurement is not making that ultrapesky bug disappear

16:29 <lekernel> ok, then wpwrak_ is right, DONE shouldn't go high when PROGRAM_B is low

16:29 <lekernel> wtf

16:29 <wpwrak_> aw: just to check, can you please tell us the TP numbers you connected to ?

16:30 <aw> wpwrak_, program_b(TP36), DONE(TP35), flash Reset(TP37)

16:31 <aw> be noticed this http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_2f_ch1-PROGRAM-B_ch2-DONE_no_D16.JPG was scoped this afternoon. not now. :-)

16:31 <wpwrak_> so CH1 is on TP36 and CH2 is on TP35 ? trigger is ... on CH1

16:32 <wpwrak_> ah, no D16 !

16:32 <wpwrak_> now, please do all this again, but with D16 in place :)

16:32 <aw> wpwrak_, exactly

16:32 <aw> yes, no D16.

16:32 <lekernel> no more tests without d16 please

16:32 <aw> i show this just let you know program_b & done relationship.

16:32 <aw> yeah...moments

16:34 <wpwrak_> lekernel: of course, even without D16, the pattern looks weird, with PROGRAM_B_2 low and things still (apparently) completing

16:34 <lekernel> rhaa... apparently the s6 needs INIT_B to be driven low as well to delay configuration

16:35 <lekernel> the amount of peskiness that lies into this configuration process is incredible

16:36 <wpwrak_> lekernel: so the whole contraption around D16 won't work ?

16:36 <lekernel> it can fail if the fpga attempts to read the flash while it's still in reset

16:37 <lekernel> aw, add a second diode between the output of the reset IC and INIT_B (accessible through R157)

16:37 <wpwrak_> searches for INIT_B ...

16:38 <lekernel> oh, and it'd be much better if we could get a reset IC with more current sink capabilities and/or diodes with less leakage current

16:39 <wpwrak_> lekernel: reset ic should be no problem. for the diodes, you'd likely have to trade Vf for Irev. the only "clean" way out of this is probably a 1G gate

16:43 <lekernel> btw, that init_b vs. program_b discovery explains why adding a capacitor (instead of the reset ic) didn't work either...

16:43 <lekernel> it's amazing the time we spend on small issues like that

16:44 <lekernel> and extremely frustrating

16:44 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_2f_CH1-programB_CH2-DONE_D16.JPG

16:44 <lekernel> aw, all further tests must be done with a diode to INIT_B

16:45 <wpwrak_> lekernel: 74AUG1G have nice features like an input leakage of max. +/- 0.75 uA, and even at Vcc = 1.1 V, they can sink some 1.1 mA. they're nice chips to know. they lack the coolness of just solving a tricky problem with a few passive elements, but they do a lot to make things more predictable.

16:45 <lekernel> as your scope trace shows, holding PROGRAM_B low does nothing ...

16:46 <wpwrak_> (trace) good. same result as without D16. so D16 may be off the hook for now.

16:46 <lekernel> wpwrak_, if that stupid reset IC was able to sink more than a ridiculous 500 uamp, the diodes (with additional external pull ups) would work just fine

16:47 <lekernel> s/500/50

16:47 <wpwrak_> lekernel: you could still run into trouble with the reverse current

16:48 <aw> http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_2f_CH1-programB_CH2-flashreset_D16.JPG

16:48 <lekernel> on a ~5K impedance, the reverse current wouldn't be able to cause much trouble, would it?

16:49 <lekernel> aw, so 1st one is program_b and the other flash reset?

16:49 <lekernel> or is it the other way around?

16:49 <aw> yes

16:50 <lekernel> ok

16:50 <aw> scoped after power up

16:50 <wpwrak_> lekernel: okay, with a 5 k pull-up, you can probably kill it :)

16:50 <aw> so you can use program_b as reference base

16:51 <wpwrak_> aw: btw, it may be good to set the scope's acquisition to peak detect

16:51 <lekernel> aw, now one problem. program_b does nothing. we must use init_b instead

16:51 <lekernel> and another problem - init_b becomes active after configuration, so we must use another diode

16:52 <aw> all you both talks is like this http://en.qi-hardware.com/wiki/Milkymist_One_Power_On_Off_Sequence#Power-On_Sequence_Precautions_for_FPGA[5].

16:53 <aw> ?

16:53 <lekernel> wpwrak_, I also have heard horror stories about FPGAs not configuring correctly because of rise time too slow on their external control signals. I don't know if Xilinx improved this, but because of that I'm for small pullup values.

16:53 <wpwrak_> lekernel: still haven't found INIT_B or R157 :-( on which sheet are they ?

16:53 <lekernel> with the fpga schematics, on bank 2

16:54 <lekernel> at the top right of bank 2

16:55 <aw> well...i get to sleep though, guys ;-)

16:55 <wolfspraul> sure, 'night

16:55 <lekernel> wpwrak_, the xilinx reference designs have 4.7k external pullups on program_b and init_b

16:55 <wpwrak_> aah, got it. thanks ! nicely hidden :)

16:55 <lekernel> gn8

16:55 <aw> let me know if somethings i can help tomorrow. cu

16:55 <kristianpaul> n8

16:56 <lekernel> I'm for trying to keep those pullups, and having a reset IC that has some serious current sink capability

16:56 <wpwrak_> sounds reasonable

16:57 <lekernel> any reset ic you could recommend off the top of your head?

16:57 <lekernel> it's in sot-23

16:57 <wpwrak_> you'll need at least 1 mA, right ?

16:58 <wpwrak_> let's first check if the one you have isn't good enough after all

16:58 <lekernel> so, this thing is going to drive: flash reset (10k), program_b (4.7k) unless we manage to cut the trace when reworking the board and init_b (4.7k)

16:58 <wpwrak_> i haven't used reset chips yet, so there's none i "like from experience". but there's a ton of choice at digi-key. so i'm not worried about finding something decent, if necessary

17:00 <wpwrak_> if i interpret the A4809 data sheet correctly, the sink current is very low at low Vdd, but gets reasonable when Vdd increases

17:00 <lekernel> that's 1.9k equivalent resistance, which needs at least 1.7mA at 3.3V

17:01 <wpwrak_> okay, with tolerances that's 2 mA absolute minimum

17:01 <lekernel> yeah and we must also account the fpga internal pullups btw

17:02 <lekernel> those will add 0.2 to 0.5 mA/pin at 3.3V

17:02 <lekernel> so worst case 1.5mA total

17:02 <wpwrak_> 4 mA then

17:02 <lekernel> which brings that minimum current for the reset ic to 3.5mA

17:03 <lekernel> or 4 if you want extra safety

17:03 <wpwrak_> i totally love extra safety ;-)

17:03 <lekernel> worst case we will use a relay instead lol

17:04 <wpwrak_> i'm not sure we really need to worry about the early ramp. as long as we get a proper reset when we approach 3.3 V, we should still be fine, right ? maybe with the possible exception of freak flash corruption

17:04 <wpwrak_> (relay) naw, they bounce :)

17:06 <wpwrak_> the flash corruption scenario would be that we have Vdd high enough for the FPGA to try to talk to the flash, and Vdd high enough for the flash to be able to change its content, and Vdd too low for the reset chip to pull enough

17:06 <lekernel> yeah, the output current of our reset ic approaches 10mA

17:06 <lekernel> though what the heck is VDS?

17:06 <wpwrak_> figure 10 ?

17:07 <wpwrak_> VDS is the voltage on the output

17:07 <wpwrak_> oh wait. relative to Vdd :)

17:07 <wpwrak_> err no, to ground

17:07 <wpwrak_> page 11, figure 3

17:08 <wpwrak_> hmm. still looks suckish.

17:08 <lekernel> I'm looking at this atm: http://www.technorise.ne.jp/doc/ait/A4809-v10.pdf

17:09 <lekernel> ah, page 12 :)

17:10 <wpwrak_> a have rev 1.3 (found by google)

17:13 <wpwrak_> http://www.ait-ic.com/uploads//2009-10/21/_1256089836_7ol2c.pdf

17:14 <wpwrak_> for Vdd = 3.3 V, the closest approximation seems to be figure 12 on page 7 (of rev 1.3): Vdd = 3.0 V

17:15 <wpwrak_> if we assume Vds = 0.5 V (you said Vil(max) = 0.6 V, right ?), then we get about ... 8 mA

17:16 <lekernel> Vds should be lower than that

17:16 <lekernel> we have to take into account the drop of the diode

17:16 <wpwrak_> urgh. right.

17:16 <lekernel> also 0.6V is for the flash, I'm trying to find what the FPGA needs

17:16 <lekernel> http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf

17:16 <lekernel> should be somewhere in that

17:17 <lekernel> for PROGRAM_B and INIT_B

17:18 <wpwrak_> hmm, what diode current shall we assume ? 2-3 mA ?

17:18 <wpwrak_> let's say 3 mA. then Vf should be < 200 mV

17:19 <wpwrak_> so we get Vds = 0.3 V

17:19 <lekernel> have you found the maximum low level voltage for the fpga?

17:19 <lekernel> i'm still searching through this big datasheet

17:19 <wpwrak_> no, i hope you're quicker than me ;-)

17:20 <wpwrak_> besides, i don't think FPGA.Vil(max) will be higher than NOR.Vil(max)-200 mV. but i'm sure you'll set me straight if my guess wasn't right ;)

17:21 <lekernel> wpwrak_, INIT_B will need to be connected through a diode

17:21 <wpwrak_> okay, then it matters. darn.

17:21 <lekernel> ok, Vil is either 0.8V or 0.7V... can't figure out, but the flash is the limiting factor anyway

17:22 <lekernel> those are the numbers for the LVCMOS33 and LVCMOS25 I/O standards

17:23 <lekernel> so let's assume Vds = 0.3V

17:23 <wpwrak_> good. i was just about to ask which of the gazillion I/O standards it was ;-))

17:23 <lekernel> I'm not actually sure

17:23 <wpwrak_> okay. 0.3 V ... A4809 data sheet rev 1.3, page 7

17:23 <lekernel> those are for the configurable I/O pins, not for the fixed function pins

17:24 <lekernel> but it'd make sense to assume the fixed function pins use either LVCMOS33 or LVCMOS25

17:24 <wpwrak_> figure 12 says that, with VDD = 3.0 V, we can expect something like 5-6 mA

17:24 <lekernel> no, we are using Detector Threshold=2.7V

17:24 <lekernel> so figure 10 (not 12) is relevant

17:25 <wpwrak_> fig. 10 only goes up to VDD = 2.0 V

17:26 <wpwrak_> for VDD = 2.0 V, fig 10 and fig 12 are similar. so i would assume the characteristics of the output transistor are comparable

17:26 <lekernel> maybe...

17:26 <wpwrak_> i.e., i'm currently looking at the performance in the 200 ms after crossing the threshold

17:28 <wpwrak_> so i think the chip should be barely adequate. you don't have a lot of headroom, but it should be sufficient.

17:28 <lekernel> if we used 10k pullups (instead of 4.7k) the minimum current for the reset IC would be 2.5mA ... this should give more margin on the reset IC side

17:29 <lekernel> so what would you think of:

17:29 <lekernel> 1) we keep the current reset IC

17:29 <wpwrak_> right now we have 4.7 k plus 10 k, so it's already a bit friendlier

17:29 <lekernel> 2) we use a 10k pullup on INIT_B and PROGRAM_B (instead of 4.7k)

17:30 <wpwrak_> ah, you mean R157, okay

17:30 <lekernel> 3) we add a diode between the output of the reset IC and INIT_B

17:30 <lekernel> 4) we remove R60 (the pullup on the flash reset) since diode leakage and fast rise time do not matter here

17:31 <wpwrak_> what are the functions of PRORGAM_B_2 and INIT_B ? you said INIT_B becomes an output after configuration ?

17:31 <lekernel> yes, it becomes an open drain output

17:31 <lekernel> which might pull low

17:31 <wpwrak_> under what conditions does it pull low ?

17:31 <lekernel> Before the Mode pins are sampled, INIT_B is an

17:31 <lekernel> input that can be held Low to delay configuration.

17:31 <lekernel> After the Mode pins are sampled, INIT_B is an

17:31 <lekernel> open-drain active-Low output indicating whether

17:31 <lekernel> a CRC error occurred during configuration:

17:31 <lekernel> 0 = CRC error

17:32 <lekernel> 1 = No CRC error

17:32 <wpwrak_> aah !

17:33 <lekernel> also this doesn't say what happens while the configuration take place. are there glitches on INIT_B?

17:33 <wpwrak_> what would happen if you connected INIT_B to PROGRAM_B_2 ?

17:33 <lekernel> I don't really want to know :) diode is safer, no?

17:33 <wpwrak_> ;-))

17:33 <lekernel> plus it'd hold the flash in reset

17:34 <lekernel> and I'm not sure we could later deassert INIT_B e.g. from JTAG

17:34 <lekernel> so not using the diode looks like murphy bait

17:35 <Vaati_> what chip are you discussing?Â Â is it something in the milkymist itself ?

17:35 <lekernel> Vaati_, fpga, flash and reset ic

17:35 <Vaati_> lekernel: whats the manufacturer of the fpga

17:35 <Vaati_> ?

17:35 <lekernel> and yes, those are giving us inordinate amounts of trouble to get the milkymist devices to work reliably

17:35 <kristianpaul> xilinx

17:36 <Vaati_> ah

17:36 <kristianpaul> Vaati_: you can see http://en.qi-hardware.com/wiki/Milkymist_One_RC3_BOM

17:36 <wpwrak_> (JTAG) yeah, that's the big question. PROGRAM_B_2 = INIT_B looks vaguely useful for recovering from glitches causing a CRC error. but of course, if you then can't fix the flash via jtag, that would suck more than anything else.

17:36 <lekernel> wpwrak_, for rework, I think it should be easy to use a non-SMD diode between the output of the reset IC and the R157 pins

17:37 <lekernel> wpwrak_, I have read somewhere (it seems) the FPGA should already contain logic to retry configuration after failed CRCs

17:37 <lekernel> I don't really want to mess with it

17:39 <lekernel> wpwrak_, what would you think of keeping R60 and adding a capacitor between flash reset and ground?

17:40 <lekernel> or not keeping R60 and still having the capacitor, which would charge through the fpga

17:40 <lekernel> it should keep reset low during the very early stages of power up, before the reset IC takes over

17:40 <wpwrak_> (retry logic) very good !

17:41 <lekernel> but at the same time it should be small enough not to delay the flash too much, otherwise the fpga will attempt to read from it when it's not ready ...

17:41 <wpwrak_> (no mess with it) yeah, feels unsafe

17:41 <lekernel> but maybe that's overkill and cause more problems than it solves

17:42 <wpwrak_> le't check the diode .. at 25 C, Irev would be about 2.5 uA. at 100 C more like 2.5 mA. now, how to interpolate ?

17:42 <wpwrak_> s/le't/let's/

17:44 <wpwrak_> the cap would also make the reset voltage crawl very slowly from a clean low to a clean high.

17:45 <lekernel> yeah, let's try without first

17:45 <lekernel> and without R60

17:47 <roh>

17:47 <lekernel> roh, ?

17:47 <wolfspraul> so we have a new plan?

17:47 <wpwrak_> hmm, for Irev, we'd mainly work against R30 = 10 kOhm. to stay on the safe side of Vih(min), we shouldn't drop more than 100 uA

17:48 <wolfspraul> I am curious about one thing - why did we not notice this in our rc2 tests? I mean the need for those additional improvements.

17:48 <wpwrak_> that's 40 x the T = 25 C and 1/25 x the T = 100 C value. tricky.

17:49 <wpwrak_> goes looking for a Irev vs. T curve

17:49 <lekernel> wolfspraul, I guess because a consequence of Murphy's law was that the flash we used for the test was fast to come out of reset and the FPGA was slow to begin reading from it

17:49 <wolfspraul> ok

17:49 <wolfspraul> just want to make sure we have at least a theory for everything we see :-)

17:49 <lekernel> or... did we use the exact same reset IC?

17:49 <lekernel> or something with less delay?

17:50 <wolfspraul> I don't want to do too much history digging, I just asked to make sure our new theories are still in line with old discoveries.

17:50 <wolfspraul> but it seems you are not worried about that, so whatever we saw in rc2 is still in sync with the new realizations

17:50 <wolfspraul> that's good then!

17:51 <wolfspraul> so reset ic stays, D16 stays, and now a few more things on top

17:51 <wpwrak_> (T vs. Irev) we're probably good up to 75 C (according to The Circuit Designer's Companion, giving 1N4148 characteristics)

17:52 <lekernel> 1N4148 is a PN pure silicon junction, does such a curve also apply to a Schottky junction?

17:52 <wpwrak_> lekernel: i checked the schottky section and he didn't warn of any perversions there

17:56 <wolfspraul> ok I try to summarize, see whether I followed the discussion correctly

17:56 <wolfspraul> 1) 10k pullup on init_b and program_b (instead of 4.7k now)

17:56 <wolfspraul> 2) add diode between output of reset_ic and init_b

17:57 <wolfspraul> 3) remove r60

17:57 <lekernel> (2) with negative terminal towards the reset IC

17:57 <wolfspraul> what about R30?

17:57 <wpwrak_> (list) that's what i have too

17:58 <lekernel> R30 is the pullup on PROGRAM_B, so you already mentioned what should be done to it

17:58 <wpwrak_> R30 needs to stay. else D16.Irev may cause FLASH_RESET_N to PROGRAM_B_2 contamination

17:59 <wolfspraul> ok

17:59 <lekernel> wolfspraul, ok for your 3 points. do you mail Adam and the list?

17:59 <wpwrak_> (stay) changed a little, to 10 kOhm. but not higher. and not removed.

17:59 <wolfspraul> Adam will check the backlog

17:59 <wpwrak_> better write a mail with the final verdict :)

18:01 <wolfspraul> lekernel: you mentioned earlier that it is frustrating and depressing we spend so much time on these little things

18:01 <wolfspraul> but in my experience it is totally normal, not much at all, and not a hopeless case or anything

18:02 <wolfspraul> I'm not making gloom or doom predictions, but these things pop up, and they need to be addressed. it should not be frustrating.

18:02 <wpwrak_> plus, i wouldn't call ~1 day "so much time" ;-)

18:02 <wolfspraul> it's a complex board and nearly impossible to get the hundreds (maybe thousands) of details right immediately

18:02 <wolfspraul> no not at all

18:02 <wpwrak_> wolfspraul: we've seen worse, haven't we ? (-:C

18:03 <wolfspraul> I'm just speaking from my experience and comparing with successful and failed projects.

18:03 <wolfspraul> rc1 set the bar very high, it was an _excellent_ first shot

18:03 <wolfspraul> from there on it continued very well, with rc2 (I believe 0 regressions from rc1), now rc3 (again so far it seems no regressions, and many improvements)

18:04 <wpwrak_> (complex board) indeed. it's a little PC. i also like that the resolution makes sense. it's not just a shot in the dark.

18:04 <wolfspraul> I still do expect some more issues to pop up on rc3, I have to say

18:04 <wolfspraul> I'm not saying this to annoy people or to be the wise guy doing nothing.

18:04 <wolfspraul> it's just realistic to expect that, next week when we test all 90 boards

18:04 <wolfspraul> there will be something

18:04 <wolfspraul> :-)

18:05 <wolfspraul> so... with the latest great of very good ideas for the bootup problem, let's see what Adam reports tomorrow

18:05 <wpwrak_> have things like DMX and MIDI seen any major testing yet ?

18:05 <wolfspraul> latest round

18:05 <wolfspraul> 'major' is hard to define

18:06 <wolfspraul> Adam did a number of electrical tests, got some very long cables, got loopback cables, etc.

18:06 <wolfspraul> Sebastien has been using DMX for performances

18:06 <wolfspraul> 'major' as in hundreds of people having used it with hundreds of devices - no

18:06 <wpwrak_> (performances) okay, very good

18:06 <wolfspraul> 'major' as in we tried to do a good job internally - yes

18:07 <wolfspraul> so yes, there could be surprises in midi and dmx, but right now I'd say what we have is not bad

18:07 <wolfspraul> we need more customer feedback before committing resources on tracking something specific down

18:08 <wpwrak_> one customer issue you'll likely hit is MIDI-over-USB. this seems to be quite popular these days, also with people attaching things to their iGadgets.

18:08 <wolfspraul> don't mention USB in the presence of Sebastien

18:08 <wpwrak_> ;-)

18:08 <wolfspraul> realistically those things have to wait

18:08 <wolfspraul> I'm just being realistic

18:09 <wolfspraul> I will throw myself behind marketing and selling the product on what it can do today, its strengths

18:09 <wolfspraul> and then we invest every penny back to make it better

18:09 <wolfspraul> so realistically - do not expect midi over usb to work in the next few months

18:09 <wolfspraul> Sebastien will correct me if I'm wrong...

18:10 <wpwrak_> so what's the plan when those things come up ? just an excuse ? an ETA for a solution ? a work-around ?

18:10 <wolfspraul> that's why I keep jtag-serial in every unit, I am hoping to attract some serious new contributors, at least I will try

18:10 <wolfspraul> we are collecting feature challenges here http://en.qi-hardware.com/wiki/Milkymist_One_marketing#Feature_Challenges

18:11 <wolfspraul> I can add midi over usb :-)

18:11 <wpwrak_> (plan) ah, or hope for someone else to come up with an answer :)

18:11 <wolfspraul> no it's fine, I think we need to communicate effectively

18:12 <roh> has a plan. barbecue. (actually i am getting pulled away to one. bbl)

18:12 <wolfspraul> there is no point in getting stuck on things that don't work

18:12 <wolfspraul> so I'm very frank in the "does not work list"

18:12 <wolfspraul> instead, I want to focus on what works

18:12 <wolfspraul> roh: enjoy barbecue!

18:12 <wpwrak_> i think something along the lines of "we don't support MIDI-over-USB yet, but you could use the following low-cost/widely-available true MIDI keyboard/whatever ..." would help

18:13 <wpwrak_> at least it would remove a blocker for people who are serious

18:14 <wolfspraul> refresh the wiki :-)

18:14 <wpwrak_> *grin*

18:15 <wolfspraul> my main concern on a run like rc3 are regressions

18:15 <wolfspraul> and it seems so far we have zero, which is how it should be but which is also great

18:16 <wolfspraul> I'm not so worried whether all our improvements are a hit, or whether we discover new problems

18:16 <wolfspraul> this is my success gauge

18:16 <wolfspraul> when a project starts to accumulate regressions, then it's really serious

18:16 <wpwrak_> ;-)

18:17 <wpwrak_> yeah. that's a sign of loss of control.

18:17 <wolfspraul> at least then I may quickly not know how to continue, because obviously the foundations of the engineering work are flawed somehow

18:20 <wolfspraul> wpwrak_: thanks a lot for lending us a helping hand on the bootup problems! very appreciated!

18:20 <wolfspraul> I'm anxious to see Adam's new reports tomorrow :-)

18:20 <wpwrak_> you're welcome. always fun to stick my nose into something tricky :)

18:21 <wolfspraul> unfortunately Adam cannot follow at this speed on understanding the thought process and reasoning for the changes, but that's OK

18:21 <wolfspraul> so he will just try it all tomorrow and give us new input data points...

18:22 <wolfspraul> we are a team after all, so the most important for me again is that he is in an excellent position to do a good testing job for the 89 boards that will soon flood his apartment

18:22 <wolfspraul> that's going to be a mess :-)

18:22 <wpwrak_> let's hope the inputs are all along the lines of "it works now" :) else, it's back to the drawing board

18:23 <wpwrak_> (89 boards) yeah, i don't envy him ;-)

18:23 <wolfspraul> Jon is in Taipei soon, and I suggested to Adam I send him for help

18:23 <wolfspraul> ha!

18:23 <wolfspraul> that was kindly rejected :-)

18:23 <wpwrak_> and tuxbrain can probably share the sentiment :)

18:23 <wpwrak_> hehe ;-)))

18:23 <wolfspraul> which I can understand

18:24 <wolfspraul> I wouldn't want to send myself for help either

18:24 <wpwrak_> who rejected ? adam or rejon ?

18:24 <wolfspraul> at 40 it was still OK, but with 90 boards that's going to be quite some stress

18:24 <wolfspraul> no Adam

18:24 <wolfspraul> because of course Jon will not be a great help

18:24 <wolfspraul> just sitting around piles of stuff. then Adam has two problems. the boards & Jon.

18:25 <wpwrak_> yeah. by the time he'd be properly trained, most of the boards would be tested already

18:25 <wolfspraul> 90 is really tough already. if this is all successful and sells well, and the next run is 160, then we may have to rethink his home office setup.

18:26 <wpwrak_> he knows his M1 reasonably well, though. also managed to solve all his flashing issues at fisl. all he really needed from me was my mouse ;-)

18:26 <wolfspraul> but step by step, now is this run of 90/80 first

18:26 <wolfspraul> I dont' know exactly when Jon arrives in Taipei

18:26 <wolfspraul> if it is after Adam has most of the chaos under control, the visit may still make sense

18:27 <wolfspraul> it depends on how many reworks are needed, and how the tests are going

18:27 <wolfspraul> the problem is not to think through the process if everything goes smooth

18:27 <wpwrak_> he could finally get that L19 (?) rework, too

18:27 <wolfspraul> the problem is to think through the process if there are massive problems with the first 20 boards

18:27 <wolfspraul> :-)

18:27 <wolfspraul> and different problems, so there is one pile here, one pile there, etc.

18:27 <wolfspraul> :-)

18:28 <wolfspraul> so when things go bad, that's when you know whether your testing setup was robust or not

18:28 <wolfspraul> I spare everyone the stories from the famous Openmoko production lines...

18:28 <wolfspraul> :-)

18:28 <wpwrak_> does adam live alone ? or will have have to declare some restricted areas ? :)

18:29 <wolfspraul> don't know exactly, last time he had a sub-tenant renting a room, his apartment is quite big actually

18:29 <wolfspraul> there are enough options

18:29 <wolfspraul> let's hope things go smooth

18:29 <wolfspraul> I'm sure Adam hopes so too :-)

18:29 <wpwrak_> oh, openmoko production. so much fun. first, the suspense ... when will they produce ? have they already ? who knows the results ?

18:29 <wolfspraul> otherwise his apartment will quickly turn into a moon landscape

18:30 <wpwrak_> then the sherlock-holmes-like discovery of what exactly happened

18:30 <wolfspraul> this rework so far still sounds manageable, let's hope it works

18:30 <wolfspraul> I mean the new diode & pullup

18:31 <wolfspraul> if he can completely verify it tomorrow he may even tell the factory to do it on the other 89 boards

18:31 <wpwrak_> then the chaotic struggle with somehow patching up the bugs. some with sequels and sequels of sequels. remember how long it too to finally find out what LED and transistor configuration our freerunners has ? was it half a year ? a year ? longer ? :)

18:31 <wolfspraul> so that's an important thing actually

18:31 <wolfspraul> well the project was out of control

18:31 <wolfspraul> that's what I try to avoid here

18:32 <wolfspraul> once you are in that situation you can only pray that one day you will be back in control

18:32 <wolfspraul> not fun

18:32 <wolfspraul> so, back to us. in a perfect world adam can confirm the final solution tomorrow, and enlist the smt factory's help in the rework on the other 89 boards.

18:32 <wolfspraul> that would be ideal

18:33 <wolfspraul> if we are not that fast, the boards go back to his place and then most likely he will manually rework whatever we eventually come up with

18:33 <wolfspraul> unless the rework is so difficult that it would better be done at the factory, which means ship back and forth, etc.

18:33 <wpwrak_> (diode & pullup) so far, everything very civilized, yes. we had one wrong try, but then sebastien found INIT_B, and we now have a good consistent theory for the next try. and all fairly quickly and with swift, useful feedback. not days of puzzlement.

18:34 <wolfspraul> those kids at the factory are also unbeatably fast and precise, of course. another thing to consider.

18:34 <wolfspraul> so getting this settled tomorrow would be awesome

18:35 <wpwrak_> dunno how hard it is to find a diode. if adam has some of the D16, an option would be to just add that, plus a wire

18:37 <wpwrak_> else, he needs to find some suitable replacement on short notice (i.e., go to the electronics mall and just buy a few diodes). shouldn't be horribly difficult, but needs review to make sure they don't add funny surprises.

18:37 <wolfspraul> sure. Taipei is not Shenzhen, but it should be no problem.

18:38 <wpwrak_> (funny surprises) like that monster we had in GTA02. i think you were already there when that happened, weren't you ? the XXXL diode for the USB reset.

18:38 <wolfspraul> don't remember

18:38 <wpwrak_> (taipei) i think i could even find the shops ;-)

18:39 <wpwrak_> wolfspraul: (gta02 diode) it was a similar circuit to the one we have in M1: the system reset (shared by various components) was fed with a diode to the USB side, to make sure the pull-up was disabled while the CPU was in reset.

18:41 <wpwrak_> wolfspraul: now, the diode was something to behold. it was also a schottky. but designed for high-current use. i think it could handle something like 10 A. what we really needed was maybe 1 mA or even less. that power diode was HUGE. plus, it has a completely insane reverse current. several mA even under favourable conditions.

18:42 <wpwrak_> wolfspraul: so what happened was that the system reset was pulled down via the reverse current going though this diode and the board never made it out of reset. that is, unless you remove the monster.

18:44 <wpwrak_> wolfspraul: so a lot of finicky rework was done. this story has a sequel. we then redesigned the whole mess. a proposed a solution with a pair of 74xxx1G gates.

18:45 <wpwrak_> wolfspraul: that solution worked beautifully. shortly thereafter, i moved to HXD8. there, we found the need for something similar (not entirely sure if it was really necessary or whether we just thought it was). so we just copied the solution from GTA02, which we already knew was good. so far, so good.

18:48 <wpwrak_> wolfspraul: then the day of making our first prototype run after the death of fiwin and the repatriation of HXD8 to FIC+Openmoko approached. a few days before, i thought i'd ask our hw team for the BOM. not quite sure why ... maybe i needed some information, or maybe i just thought i hadn't heard enough mentioning of the BOM.

18:48 <kristianpaul> ah, Dash Express, now i see from where some ideas came from :-)

18:49 <wpwrak_> wolfspraul: the answer was "just a moment". then the team developed a level of activity reminiscent of a termite hill being peed upon.

18:50 <wpwrak_> wolfspraul: after politely inquiring after a while how things were going, i got the truth: so far, no BOM had existed. but there were bits and pieces scattered all over the place. much later that night, i got a first draft of a consolidated bom.

18:51 <wpwrak_> wolfspraul: i think i also asked then whether we had the parts :) well, to make a long story short, soon thereafter, the help of FIC sourcing was enlisted, to get us the things we didn't have yet

18:52 <wolfspraul> now we know what a well run organization we have today...

18:53 <wpwrak_> wolfspraul: we then had two parts of the BOM - the one where we had the parts or were certain we'd have them, and the one with the parts that were still unknown

18:53 <wpwrak_> wolfspraul: then, i think less than a week before SMT, the bomb dropped.

18:54 <wpwrak_> wolfspraul: there were several components with a lead time measured in weeks if not months. they included some filters, the LCMs, and ... those 74xxx1G chips.

18:55 <wpwrak_> wolfspraul: now, back when i had designed that reset circuit, rookie that i was, i had used digi-key stock as my guide for what are "safe" components (in terms of sourcing)

18:55 <wpwrak_> wolfspraul: so i found it a little surprising that these things would all of a sudden be so terribly hard to get. had i made a grave mistake ?

18:57 <wpwrak_> wolfspraul: well, i went back to my cubicle, chased the bugs away (FIC HQ was crawling with vermin), and checked at digi-key. lo and behold, they had thousands of these chips in stock. i also found some of the other "impossible" items, or very similar replacements.

18:59 <wolfspraul> you can discard that sourcing input (as you know by now)

18:59 <wpwrak_> wolfspraul: in the end, i got sean's credit card and did a bit of shopping. two days later we had those items with a supposed lead time of months. we also got the lcms. from mouser, at the cost of a small car. all on liane's credit card :)

18:59 <wolfspraul> it was more due to incompetencies within the company and parent company, not a real outside problem

18:59 <wolfspraul> digikey is a good guide, first of all

19:00 <wolfspraul> but sourcing is a long story, as you know modules are very tough, some things like LCM are tough, rf baseband chips, etc. etc.

19:00 <wolfspraul> I feel pretty good now about the simple 'availability' part of sourcing, I am more worried about iqc nowadays (incoming quality control)

19:00 <wpwrak_> wolfspraul: yeah, eventually i found out what had happened: FIC sourcing has been specifically ordered not to go to digi-key, because there were too expensive, but to try to get things from the official distributors, if possible, free samples

19:01 <wpwrak_> wolfspraul: that little stunt with bypassing FIC sourcing apparently caused some bad blood inside FIC :) so we weren't supposed to repeat this

19:01 <wpwrak_> wolfspraul: but sourcing also got a bit more agile afterwards :)

19:03 <wolfspraul> calling it a day, 3 am here

19:03 <wolfspraul> I'm getting old, I guess :-)

19:04 <wpwrak_> wolfspraul: (lcm) that was the Sony PSP display. so there were actually lots of them at distributors. i don't quite sure what had happened to our supply there - either we forgot to order or the supplier had let us down. hence the (super-expensive

19:04 <wpwrak_> oops. s/quite sure/quite remember/

19:04 <wpwrak_> s/$/) order from mouser/

19:06 <wpwrak_> wolfspraul: untroubled dreams then ! ;-)

19:07 <wolfspraul> I will digest on diodes and forward voltages

19:07 <wolfspraul> n8

20:35 <kristianpaul> hey, this project have same spartant6 also a flash prom,, ah

20:35 <kristianpaul> flash PROM for multiboot FPGA po

20:35 <kristianpaul> http://www.ohwr.org/projects/spec/wiki

20:37 <kristianpaul> well, SPI flash..

20:40 <kristianpaul> afaik, i don have altium to check schematics