#milkymist on 2011-07-28 — irc logs at freenode.irclog.whitequark.org

00:22 <GitHub9> [openwrt-milkymist] larsclausen pushed 4 new commits to master: https://github.com/milkymist/openwrt-milkymist/compare/b0ed756...5a05105

00:22 <GitHub9> [openwrt-milkymist/master] Revert "Clone git repositories with '--depth 1'" - Lars-Peter Clausen

00:22 <GitHub9> [openwrt-milkymist/master] [package] busybox: Properly pass CFLAGS etc. - Lars-Peter Clausen

00:22 <GitHub9> [openwrt-milkymist/master] lm32: Select ext2 by default instead of ramdisk - Lars-Peter Clausen

00:34 <larsc> mwalle: not the most recent version, but it did work before your device tree patch

00:34 <larsc> kristianpaul: i'm afrai i don't have it anymore

00:38 <wpwrak> deleting config files is like burning books ;-)

01:40 <kristianpaul> hehe knew this reply, and i must confess i did with porpuse ;-)

02:16 <qi-bot> test

02:34 <qi-bot> kristianpaul speaking too soon is good :)

02:37 <wolfspraul> kristianpaul: who is writing from the qi-bot console?

02:37 <xiangfu> me.

02:38 <wolfspraul> ah ok. spooky :-)

02:38 <kristianpaul> hehe i was scared for a moment :-)

02:38 <wolfspraul> xiangfu: yes don't do that too much, it may confuse people...

02:39 <xiangfu> wolfspraul, sure. ok.

02:39 <kristianpaul> or identify first ;)

02:41 <xiangfu> the auto build will start in next 10 hours. after nanonote build finished. then we see if we got some images.

02:42 <xiangfu> there are some folder name needs make sure. (like bin/milkymist/***) for now I just guess them. after first build I will have the correct name.

02:43 <wolfspraul> nice

03:40 <GitHub95> [autotest-m1] xiangfu pushed 2 new commits to master: https://github.com/milkymist/autotest-m1/compare/7822c25...f2c5182

03:40 <GitHub95> [autotest-m1/master] add empty tests_images.c - Xiangfu Liu

03:40 <GitHub95> [autotest-m1/master] cleanup the Makefile - Xiangfu Liu

06:22 <wpwrak> (m1 rework) seems that R157 rework is a bit unreliable

06:23 <wpwrak> aw: did you change those R157 etc. because things didn't work with 4.7 k ? or simply because you measured they were wrong ?

06:25 <aw> I made sure all R157 now is 10K. ;-)

06:26 <aw> some of them didn't replace 10K in factory, but now they are 10k yes after my first "Impedance" step.

06:26 <aw> factory missed some R157 though. bad!

06:27 <wpwrak> these R157 they had missed, did they cause tests to fail ?

06:33 <aw> wpwrak, no, so far now. I caught R157 without replaced 10K while my step of "Impedance" stage.

06:34 <wpwrak> good. it would have been worrisome if that relatively small change had already caused malfunctions

06:35 <wpwrak> (i.e., we never experimentally established the range of permissible values. so any sign of parameter instability would be bad.)

06:35 <aw> current those R157 missed boards are independently to failed obviously. that's what i saw data now. ;-)

06:35 <wpwrak> very good :)

06:36 <aw> but i do had have some boards with d2/d3 dimly lit after finished reflash.

06:37 <aw> well...i keep testing firstly ...no more chats now. ;-)

06:37 <aw> surely any new idea from you, let me know. ;-)

06:38 <wpwrak> (testing) good luck ! may the yield be with you ! :)

06:55 <GitHub95> [extras-m1] yizhangsh pushed 1 new commit to master: https://github.com/milkymist/extras-m1/commit/1b6e25f5fc69f85be1ec54dbc2d7ab5615882b1d

06:55 <GitHub95> [extras-m1/master] removed white background in graphics - Yi Zhang

07:48 <GitHub128> [extras-m1] yizhangsh pushed 1 new commit to master: https://github.com/milkymist/extras-m1/commit/59e8a8208d42beada5ae036c1fea8e23abb66991

07:48 <GitHub128> [extras-m1/master] modified m1 graphic - Yi Zhang

08:36 <GitHub135> [linux-milkymist] larsclausen pushed 2 new commits to master: https://github.com/milkymist/linux-milkymist/compare/e94ece6...28b907d

08:36 <GitHub135> [linux-milkymist/master] lm32: Fix led gpio pin numbers - Lars-Peter Clausen

08:36 <GitHub135> [linux-milkymist/master] Merge remote branch 'lm32/master' into lm32 - Lars-Peter Clausen

10:06 <aw> wolfspraul, the file I can upload it as: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/m1_rc3_test_report.ods

10:07 <aw> wolfspraul, but now my Firefox in our wiki can't show the newest file i uploaded, after I restart Firefox, still the same, what else could cause this?

10:08 <wolfspraul> I'll check

10:21 <kristianpaul> noticed the "No vga screen" comment, kinda often

10:21 <wolfspraul> yes, we will track it down

10:31 <aw> be noticed that 0x7A is interesting: 1. d2/d3 OFF after reflashed successfully, but then power up then d2/d3 is dimly lit then cant reflash

10:31 <aw> 2. http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/7A-reflash-results

10:32 <aw> 3. 3. after couple minutes then can power on/boot up/rendering(Only D3 is ON, no VGA screen)

10:32 <wolfspraul> hmm. I think focus on fully testing all boards first.

10:32 <aw> well..i keep testing

10:33 <wolfspraul> then we need to start fixing, hopefully a lot more boards can be turned to 100% pass status then

10:33 <wolfspraul> seems something is not right with vga on quite a few boards...

10:34 <Alarm> When I run "lm32-rtems4.11-objcopy -Obinary hello hello.bin"

10:34 <Alarm> I have the following error:lm32-rtems4.11-objcopy:hello: File format not recognized

10:38 <lekernel> Alarm, your "hello" file is most probably not OK

10:39 <lekernel> wolfspraul, I'd check the SOT23 gates which are used for buffering the sync signals (and cause signal detection pass/fail on the monitor). there was a failed one already on the MIDI of one board.

10:40 <lekernel> each batch, new broken components. after IR sensors and beads, now gates...

10:40 <wolfspraul> that's why you do testing

10:40 <wolfspraul> what leaves Taipei is 100% working

10:41 <wolfspraul> or the test routine is not good enough yet :-)

10:42 <wolfspraul> anyway, one by one. first test all boards. looks messy now, but it will clear up eventually :-)

10:42 <wolfspraul> it has to...

10:42 <wolfspraul> lekernel: btw, you cannot just say "broken components", the reality is in many cases you don't know what happened.

10:42 <wolfspraul> but it doesn't matter as long as our testing is rock solid

10:42 <lekernel> IR and beads definitely were broken

10:43 <wolfspraul> you mean the 1.40 USD beads?

10:43 <lekernel> yes, and the 6 IR sensors on the run1 boards. none of them worked.

10:43 <wolfspraul> ok probably we mean different things with 'broken components'

10:44 <wolfspraul> those beads were most likely not 'broken'

10:44 <lekernel> for me, more than one order of magnitude out of specs qualifies as "broken"

10:44 <wolfspraul> and if all 6 IR sensors on the first run of 6 boards don't work, that's a quite strong indication that they are not all six 'broken' either (in my use of the word)

10:45 <wolfspraul> they are the wrong ones maybe

10:45 <wolfspraul> anyway

10:45 <wolfspraul> it's just different meanings of 'broken'

10:45 <wolfspraul> so far we have 9 or 10 100% pass boards, it's going up :-)

10:46 <lekernel> either way, those boards were assembled with components that did not perform as specified

10:46 <wolfspraul> :-)

10:46 <wolfspraul> yes!

10:49 <lekernel> omg there are 1557 performances at this belarusian festival

10:49 <lekernel> http://flxer.net/performances/

10:54 <lekernel> ok, we should print tons of brochures :)

12:10 <GitHub142> [milkymist] sbourdeauducq created eack (+1 new commit): https://github.com/milkymist/milkymist/compare/e6356d1^...e6356d1

12:10 <GitHub142> [milkymist/eack] TMU: early ack - Sebastien Bourdeauducq

13:10 <Alarm> Here it is my "Hello world" appears on the console on my PC but not on the screen connected to the M1

13:10 <lekernel> there's no screen console with rtems

13:13 <Alarm> so this is normal always goes well!

13:23 <aw> i tried to use BEN's original usb cable (the shorter one) instead of current longer Fukang upward 90 degree USB (for jtag/serial) 1.5M. to flash failure ones on d2/d3 dimly lit after finished reflashed. THEN NO d2/d3 dimly lit.

13:24 <wolfspraul> interesting discovery!

13:24 <wolfspraul> all sorts of things you see when you work with a lot of boards in sequence, no? :-)

13:25 <wolfspraul> aw: in your milkymist reflash script, there is a line "frequency 6000000" somewhere

13:25 <wolfspraul> do you see that?

13:26 <aw> yes, i saw it. moment

13:26 <wolfspraul> can you try to reduce that to a lower value, and still use the longer 1.5m cable?

13:26 <wolfspraul> which value should we try?

13:26 <aw> see http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/48-reflash-results

13:26 <wolfspraul> maybe try 3000000 first

13:26 <wolfspraul> is the problem very reproducible with the longer USB cable (on those boards)?

13:27 <aw> wait...let me see script

13:27 <wolfspraul> I would prefer if you don't switch to the shorter cable now, at least not yet.

13:27 <wolfspraul> the reason is that we include the 1.5m cable in the box, and we are just hiding the problem from our eyes, and pushing it to our users.

13:28 <wolfspraul> let's try lower values and see what happens

13:29 <aw> alright...let me change the script to 3000000 and still use longer 1.5M cable.

13:29 <wolfspraul> can you reproduce the problem well?

13:32 <aw> not sure...but i can only take those failures ones to reflash them again. let's me try 0x48 again. then try the old failure one. ;-)

13:32 <wolfspraul> you can also go lower, 1000000, even less, 500000

13:32 <wolfspraul> it may become very slow though ;-)

13:33 <wolfspraul> but we need to find out what is a safe value with the cable we include in the box

13:33 <wolfspraul> otherwise our users will run into this and suffer much worse than we suffer now in finding a safe value

13:34 <wolfspraul> I'm not even sure this is the right idea with the frequency setting, but maybe it is...

13:35 <aw> 0x48 became d2/d3 is fully OFF when power on, well..i am now going to reflash @3000000 to see it if will reproduce it again. maybe not, don't know.

13:36 <lekernel> don't bother with that... those JTAG adapters will and should remain mostly unused anyway

13:36 <lekernel> they're just a _developer_ thing, and developers should be able to handle JTAG frequency problems

13:37 <wolfspraul> no not good. I like to understand what I sell, and to know that it works and how it works.

13:37 <lekernel> just get those boards flashed and working in as little time as possible

13:38 <wolfspraul> maybe we should default to a lower frequency value in the reflash script we publish, and then developers who understand things well can increase that value

13:38 <lekernel> no because it's additional delays on us to determine that frequency

13:39 <wolfspraul> argh :-)

13:39 <wolfspraul> you are quite insisting sometimes to cause you a big headache later :-) I don't want to support devs who run into this type of problem first time they fiddle with jtag...

13:40 <wolfspraul> took me 2 hours already and some additional grey hair to narrowly avoid that rejon had no usable m1 at all for his talk...

13:40 <wolfspraul> so if Adam tells me he has a _workaround_ for himself, that's not good enough for me as a manufacturer

13:40 <lekernel> this should not happen with the current software

13:40 <wolfspraul> you mean the web update?

13:41 <lekernel> there is 1) web update for the main images 2) rescue mode in case of problems

13:41 <wolfspraul> so - I always see the positive side. I think Adam's discovery is great, very good observation!

13:41 <lekernel> all the rest is developer (1/1000 users) and unsupported

13:41 <wolfspraul> :-)

13:42 <wolfspraul> at least we have a bar, like a test, developers have to pass :-)

13:42 <aw> done...seems 0x48 is good by 1.5M with 3000000 Hz, btw, from now on, let's use this to reflash rest to see if easily happens d2/d3 dimly lit after reflash. ;-)

13:42 <wolfspraul> aw: ok, let's do this

13:42 <lekernel> the JTAG cable is only for _FPGA_ development. you can use netboot for all software.

13:42 <wolfspraul> please continue to use the cable we will include

13:42 <lekernel> if you can't fix a JTAG connection, you probably can't program FPGAs either

13:42 <wolfspraul> lower the value to 3000000 for now and let's see whether you run into more cases

13:43 <wolfspraul> if this looks more stable to you, we will change the value in the published script

13:43 <wolfspraul> I rather err on the side of robustness, especially out-of-the-box.

13:44 <lekernel> also, this lower value makes flashing boards slower on our side. if they can be flashed correctly at 6MHz with another USB cable, just do it.

13:45 <wolfspraul> but that's a separate reason.

13:51 <lekernel> right now, the major blocker in this project is run3 delays (followed shortly by lack of publicity). it'd rather make sense to optimize those rather than track down a rare and developer-only JTAG problem.

13:54 <wpwrak> aw, wolfspraul: (cable testing methodology) first, it would be good to do, say, 10 tests with the long cable and the original value. otherwise you don't even know what failure probability to look for.

13:55 <wpwrak> wolfspraul: (grey hair) have you started to let it grow ? :)

13:56 <aw> wpwrak, good reminds, I test now to see results 10 times individually. ;-)

14:01 <wpwrak> lekernel: (1557) i was wondering what leet "ISST" was supposed do mean ;-) btw, they're at 1558 now. are you going there ?

14:01 <lekernel> yes

14:02 <wolfspraul> lekernel: come on. we have to lower the barriers of entry. seriously there is no 'delay' because of this. the biggest delay is that Adam ran into this in the first place because we have been sloppy about taking reflashing issues seriously _before_ the run.

14:02 <wpwrak> roh: on some pictures the S looks a bit strange. does it also look odd in real life ?

14:03 <roh> not really

14:03 <wolfspraul> it's a minor hickup, and good discovery from Adam

14:03 <wolfspraul> let's get back to professional and fast work now, no worries

14:03 <wolfspraul> I do not want Adam to silently switch to a lab-only workaround, and put a cable into the box that he saw problems with but didn't tell anybody.

14:03 <wolfspraul> our users/developers/whoever _WILL_ run into these issues

14:04 <wolfspraul> guaranteed

14:04 <roh> the first 2 pix have the protective foil on the squares not removed

14:04 <wolfspraul> lekernel: http://yamato.hyte.de/tmp/logotest/

14:05 <wpwrak> wolfspraul: yeah, shipping known to be broken stuff is generally not a good idea. the least you can do is find out if lowering the frequency is a suitable workaround. say, if you have 50% failure at 6 MHz and 0% at 3 MHz, that's a good indication.

14:05 <wpwrak> roh: aah ! that's why :)

14:05 <wolfspraul> wpwrak: in a perfect world, the software would make more checks and automatically fall back to the highest 'safe' frequency. but meanwhile I do need to ship with a robust baseline.

14:06 <wpwrak> roh: it still sticks out a little on SANY0028.jpg, but not as much as on the previous ones

14:06 <lekernel> since I was against shipping this JTAG stuff in each box, I'm a bit annoyed that it incurs delays now. but well...

14:06 <wolfspraul> roh: looks nice to me. what about size and location?

14:07 <lekernel> excellent logo though :)

14:07 <roh> wolfspraul: location: i would simply center it

14:07 <wolfspraul> lekernel: are you ok with this logo, centered? which size?

14:07 <lekernel> how is this done? http://yamato.hyte.de/tmp/logotest/SANY0021.jpg

14:07 <wolfspraul> lekernel: no you need to look at the later ones

14:07 <lekernel> the insides are de-polished?

14:07 <wolfspraul> that one is misleading

14:07 <wolfspraul> no it has the film on it still :-)

14:07 <lekernel> ah, that's just the film

14:07 <wolfspraul> if I understand things correctly

14:07 <lekernel> ok

14:08 <wolfspraul> look at the last 2

14:08 <lekernel> yeah it's perfect :)

14:08 <wolfspraul> roh wants to skip a full surface scan this time

14:08 <wolfspraul> roh: there you go, that's the green-light :-)

14:08 <wolfspraul> thanks a lot, this is great!!!

14:08 <wolfspraul> wonderful that we got it this far...

14:08 <aw> hmm...seems that hard to restore from that failures once happened. i just got reflash stops @ 5th time by 1.5M & 3MHz.

14:09 <wolfspraul> maybe you will also see it with the shorter cable, if you try often enough

14:09 <aw> when it stops, it will be stayed at "Bitstream length: 1484404

14:09 <lekernel> aw, this looks like the libusb problem that Jon and I had

14:09 <wolfspraul> I'm a little worried that we don't have full CRC all the time, as per my last understanding at least.

14:09 <wpwrak> aw: maybe it's not the FTDI data speed but a USB signal integrity issue

14:10 <wpwrak> lekernel: or software. always good if you have software to blame ;-)

14:10 <aw> hmm..i felt using 0x48 to test 10 times is not good idea though

14:10 <wolfspraul> wait so we all settled on the logo, right?

14:10 <wolfspraul> seems yes :-)

14:10 <lekernel> yeah, logo is perfect

14:10 <aw> i should use a good board to test cable

14:10 <lekernel> go ahead

14:10 <roh> lekernel: needed to rework the stuff i got from jon.. somehow it wasnt squares etc

14:11 <roh> wolfspraul: ok. will hack up a centering rig now ;)

14:12 <wolfspraul> aw: wait, let's not stray away too far now.

14:12 <wpwrak> roh: on SANY0029.jpg, are there still remains of the film in the grooves ? or why do they look rough ?

14:12 <roh> wpwrak: i guess so.

14:12 <wolfspraul> aw: don't do tests with many different frequencies and cables.

14:12 <wolfspraul> not worth it

14:13 <aw> are you sure?

14:13 <wolfspraul> yes. there are too many combinations and it will create little value. we've been there before, and haven't implemented anything more robust yet.

14:13 <kristianpaul> nice logo !!

14:13 <kristianpaul> (SANY0028.)

14:13 <wolfspraul> if you try with 90 boards it will add more harm than good.

14:14 <wolfspraul> aw: before you tried to reflash 0x48 with the shorter cable, how many times did you try with the 1.5m cable?

14:15 <aw> two times with 1.5m & 6MHz, then just use shorter usb one then no d2/d3 dimly lit, is it obviously clear to realize differences?

14:16 <wolfspraul> hmm

14:16 <aw> the shorter usb one I still used 6MHz

14:17 <wolfspraul> I reluctantly force myself to agree with lekernel :-)

14:18 <wolfspraul> aw: that means: 1) use the shorter one @6mhz for all reflashing now

14:18 <wolfspraul> 2) we still include the 1.5m cable in the box and hope that we can later fix this issue in software

14:19 <wolfspraul> what does everyone think?

14:19 <wolfspraul> cheap Chinese crap manufacturer cutting corners? :-)

14:19 <kristianpaul> hum..

14:19 <kristianpaul> yeah, i guess if a developer had issues will join here, and we tell the history :-)

14:19 <wpwrak> wolfspraul: how about doing a proper test but postponing it until a less busy time ? (hoping such a time will come :)

14:19 <lekernel> wpwrak, +1

14:20 <kristianpaul> :-)

14:20 <wolfspraul> the problem is that there may be too many actual root causes now, and Adam is in a tough spot with 90 boards around him and he is focusing on manufacturing yield, i.e. producing as many 100% pass boards as possible, in the least amount of time

14:20 <lekernel> and yes, include the cable

14:20 <kristianpaul> wpwrak: after 1.5m usb cables shipped?

14:20 <wolfspraul> wpwrak: yes correct. same idea different wording.

14:20 <kristianpaul> well as soon as oders come

14:20 <wolfspraul> Adam is not in the right position now with so many boards around him and yield pressure.

14:21 <wolfspraul> I don't want him to get lost in an ocean of cable length & frequency test data now...

14:21 <kristianpaul> yeah, thats messy

14:21 <wolfspraul> aw: did you understand? we all agree now :-)

14:21 <wolfspraul> it's easy: use the short cable for reflashing now, and include the long one in the box later :-)

14:21 <aw> wolfspraul, i am reading your all discussions now and thinking.

14:21 <wpwrak> wolfspraul: in general, if you find this sort of issue, you want to understand them. otherwise, you're quickly juggling too many unknowns. but if you have a procedure that always works, even if very different from the regular procedure, then you can defer solving the issue

14:22 <wpwrak> kristianpaul: (after cable shipped) preferably not ;)

14:22 <wolfspraul> well you've been with rejon, you've seen the issue before...

14:22 <wpwrak> i'm not sure what i saw ;-)

14:22 <wolfspraul> someone just needs to sit down and spend serious time on it, with the many priorities we have that's not going to happen easily

14:22 <kristianpaul> when adam have a little time later, providing infom about libusb version will be nice

14:23 <wolfspraul> so someon has to test with different cables, different frequencies, find the root cause, make the software more robust probably in multiple ways, etc. etc.

14:23 <wolfspraul> but that's not a good thing for Adam to take on now

14:23 <wolfspraul> not at all

14:23 <wpwrak> at the moment, it seems that we have three theories: 1) it's data frequency dependent, 2) it's USB signal integrity, 3) it's libusb

14:24 <kristianpaul> and you miss the hardware!

14:24 <kristianpaul> well, at least usb cable it self is OK

14:24 <kristianpaul> now i think i undertand lekernel love for USB ;)

14:25 <wpwrak> assuming 3) is a clear bug (and not a case of "uh, this random number seems to be luckier than the previous one"), then 3) should be checked first. then, try the long cable at 6 MHz. if the problem persists, try a lower frequency, 3 MHz or maybe even 1 MHz (assuming there are no know timing constraints on the lower end)

14:26 <lekernel> 3 is a clear bug

14:26 <lekernel> I always failed to reflash the board correctly with the new libusb, like rejon did

14:26 <lekernel> a complete reflash always failed

14:27 <wpwrak> if the long cable still fails at 1 MHz, then it could be either the cable, the PC, or the JTAG board. if the long cable works perfectly at 1 MHz, then you still don't know what exactly is the problem, but you have a very promising work-around.

14:27 <wpwrak> lekernel: oh, so it's a regression. that's bad.

14:28 <kristianpaul> or may be the bug is in urjtag..

14:28 <kristianpaul> for not following last libusb changes :-)

14:28 <wpwrak> lekernel: is that libusb 0.1 vs. 1.0 ? or something within each line ?

14:28 <wolfspraul> I don't think it's a cable issue

14:28 <wolfspraul> guess of course

14:28 <wpwrak> wolfspraul: think or fervently hope ? ;-)

14:28 <lekernel> I don't know

14:29 <wolfspraul> so for me Adam can bypass it now, get the boards reflashed with any cable that works, and still throw the 1.5m one into the box...

14:29 <wolfspraul> guess

14:29 <lekernel> I just downgraded both. that problem had used enough of my time already.

14:29 <wolfspraul> just guess

14:29 <wolfspraul> at some point I agree with lekernel about the importance of focus, so... bypass, throw cable into box, move forward, hope that things will get better over time :-)

14:30 <wolfspraul> also we need to keep in mind that the USB cable itself comes from a very respected vendor, has already undergone testing by that vendor, etc.

14:30 <kristianpaul> yeah

14:30 <wolfspraul> it's not a 'cheap crap' cable sourced at a street corner in Shenzhen

14:30 <kristianpaul> and force users to downgrade libs :-)

14:31 <wpwrak> kristianpaul: not a good idea :)

14:31 <kristianpaul> wpwrak: sure not :-)

14:31 <wpwrak> wolfspraul: could be just an issue on the JTAG side. bad impedance match or such. the thing is high-speed, not just full-speed, isn't it ?

14:32 <wolfspraul> high-speed yes

14:32 <wpwrak> (JTAG side) i mean the board

14:32 <wpwrak> then i can offer a 4th parameter: downgrade to full-speed ;-)

14:34 <wpwrak> if you have poor but not hopeless signal integrity at high-speed, going to full-speed is pretty much guaranteed to solve this ;-)

14:35 <kristianpaul> oh, dear..

14:35 <wpwrak> (not sure how you'd accomplish the downgrade, though. change a bit in the FTDI's EEPROM ?)

14:35 <wpwrak> kristianpaul: USB is great fun :)

14:36 <aw> well...i continue to test with shorter usb cable & 6MHz. :)

14:38 <kristianpaul> wpwrak: not just USB too many variables here, as why in some boards it worked well and other dont..

14:39 <wpwrak> kristianpaul: and you wouldn't believe what correct USB signals look like when you measure them along the path. USB is designed to take in account reflections to compensate for other transmission effects.

14:40 <kristianpaul> wpwrak: also that dimly lit sounds like leaking power issue for me still

14:40 <kristianpaul> wpwrak: (compensate), smart way to avoid bugs :-) and create more fun as you said :)

14:41 <wpwrak> (too many variables) oh, that's why you make a tree :) think of potential causes, then split your tests such that they tell you something useful. branch at each test.

14:41 <kristianpaul> yes

14:41 <wpwrak> (compensate) oh, the electrical side is perfectly sound. it's just extremely confusing until you understand what's going on :)

14:42 <wpwrak> (usb signal) lemme see if i still have my simulation from the happy ghost chase in HXD8 ...

14:42 <wpwrak> (dimly lit) yeah, don't know what that means. only that adam doesn't seem to like it :)

14:43 <kristianpaul> no body :-|

14:51 <wpwrak> kristianpaul: http://downloads.qi-hardware.com/people/werner/tmp/usb-signal-sim.ps

14:51 <wpwrak> kristianpaul: in real life it actually looks worse

14:52 <wpwrak> kristianpaul: the signal travels from the right to the left. you start with a clean square. at the end you have a bit of overshoot but still good edges. in the middle, you have something a lot scarier ...

14:54 <wpwrak> kristianpaul: in HXD8, we ran into USB stability issues. well, rather, they had already been an old issue in HXD8 when i ran into that project. the hardware folks were quite convinced they had done everything right. so this was presented to me as a software problem.

14:56 <wpwrak> kristianpaul: so i spent a few days sifting though the kernel. i found a couple of small things, but nothing that really looked as if it had enough potential to cause trouble. (the trouble was that ethernet-over-usb would stall after some time, often around 10-30 minutes)

14:58 <wpwrak> kristianpaul: then we thought of examining signal integrity. the problem: where to find the equipment to do this ? well, at FIC, there was one lab where they had a big scope with the USB test software. that was so exclusive that you had to ask for turns. so we got our turn the next day and walked down with our troubled board.

15:00 <wpwrak> kristianpaul: the expert then hooked the board up and showed us the eye diagram (that's a setting where you trigger on both edges of the signal, so you see a pattern that looks like a hexagon)

15:01 <wpwrak> kristianpaul: the eye diagram looked HORRIBLE. not at all like a hexagon. instead, we saw the signal crawl up to a plateau at about half the level, stay there for a bit, then go up some more, etc. basically what you see in the middle of the simulation.

15:03 <wpwrak> kristianpaul: so we said our thanks and went to work on that signal integrity. countless reworks later, we had something like 100 pF of extra capacitance scattered all over the board, the signals looked a bit "cleaner" on the scope ... and the problems were just as bad as before

15:04 <wpwrak> kristianpaul: while the hw team was doing reworks, i went to my office and made this simulation. i was a bit surprised that it also showed the "bad" signal. even though it was supposed to be "perfect". eventually, i realized that we (and the USB expert) had been looking at the wrong end of the cable.

15:06 <wpwrak> kristianpaul: as a little detail, one night, i needed to check something on the scope. alas, i didn't have a good enough instrument at hand. but i remembered we had some really fancy 1 GHz or more beast stored in some forgotten corner. i didn't know which group it belonged to, but hey, who's there to complain at 1 am ? ;-)

15:07 <wpwrak> kristianpaul: so i dragged the thing over and did my things. while playing around, i found that it also had some USB test software installed. turned out that we could have done all the fancy testing at our leisure with that scope, without having to rely on the other lab.

15:09 <wpwrak> kristianpaul: fun fact #2: eventually, our head of EE did a little investigation and found out that this scope (at the value of a decent car) actually belonged to our group ;-)

15:10 <wpwrak> kristianpaul: well, the story continues. i then suggested that we may have a clock instability that may originate from poor power routing or other power contamination. (power went around the CPU in a rather peculiar pattern)

15:13 <wpwrak> kristianpaul: one theory was that some other component may contaminate power. e.g., the GSM modem on the same board. so we tried to remove all other chips, one by one, to see if the problem would stop. that rework was actually amazing. the EE folks removed one BGA after the other, without damaging the board.

15:13 <wpwrak> kristianpaul: alas, by the time when there was little left besides the CPU itself, the USB bug was still alive.

15:14 <aw_> hi i am going to sleep now. let's continue tomorrow. :)

15:14 <wpwrak> kristianpaul: we also added beads all across the power tree, to contain possible sources of contamination, to no avail.

15:14 <aw_> the newest file: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/m1_rc3_test_report.ods

15:15 <wpwrak> kristianpaul: finally, we started to run out of time. so we put in all our best guesses and hoped for the best, without really being convinced that we had nailed the problem.

15:15 <kristianpaul> wait wait, just reading i was away :)

15:15 <aw_> i made a column to note a shorter usb cable from now on (marked "V" at the most right column)

15:16 <wpwrak> kristianpaul: more by accident, then wandered into the final review meeting of EE. i tought that could be interesting, also because i knew there were some other changes i didn't like, and i was hoping for a chance to kill them.

15:16 <aw_> night

15:16 <kristianpaul> n8

15:17 <wpwrak> kristianpaul: well, at some point, they discussed some of the power changes and showed that region of the PCB. that was the first time i had a good look at the layout. (they used PADS, so access to all those things was difficult)

15:19 <wpwrak> kristianpaul: there, i noticed something rather strange. four large pads from which two traces meandered towards the CPU, crossing a large set of parallel signals, to vanish in some vias, and supposedly to continue from there.

15:19 <wpwrak> kristianpaul: when i asked what that was, they told me it was the crystal. when i asked where those signals would come out again, they pointed to the opposite side of the chip.

15:20 <wpwrak> kristianpaul: the parallel signals i had seen were data and address lines for the RAM.

15:22 <wpwrak> kristianpaul: so the traces between CPU and crystal went from the crystal, right underneath the RAM lines, then tunneled underneath the CPU to its opposite corner, burrowing all their way to the other side of that 6 (i think) layer board and back again, until they finally reached the CPU.

15:22 <wpwrak> kristianpaul: needless to say, there wasn't much ground around these traces either, not even at the same layer

15:26 <wpwrak> kristianpaul: that was the great moment of revelation ;-) it took a bit of discussion until i had the hw team convinced that we could indeed improve this even without having to do a complete re-layout (which, understandably, everyone was afraid of). but then they went at it with gusto. when the revised board was finally made, the USB instability was gone for good :)

15:29 <wolfspraul> kristianpaul: wpwrak [dimly lit] in conjunction with the reflashing problems that sounds like we write corrupted data and then the s-6 hangs on reconfiguration or shortly thereafter

15:29 <wolfspraul> just a guess of course but if the problem goes away with better flashing, I'd say that point away from power problems

15:30 <wpwrak> may just be a separate problem. one being bad flash, the other something with power

15:31 <kristianpaul> wpwrak: remove BGA, nice to watch :)

15:33 <kristianpaul> wpwrak: so how you improved?

15:34 <wpwrak> (bga) that was totally amazing. i expected that a board wouldn't survive more than 1 maybe 2 such changes. also because we didn't have optimal equipment for all this. yet they did this with disdainful ease. one chip after the other, maybe ten of them in total, several of them BGAs. and the board just kept on working.

15:36 <wpwrak> (improved) oh, we moved the crystal traces away from the RAM traces. that was probably the main issue. of course, the whole design regarding the crystal was deeply flawed.

15:37 <kristianpaul> ah, i tought move traces was goint to be other big problem, hopefully not then :-)

15:37 <wpwrak> (improved) then we also shortened the traces a little, made sure they had some shielding above and below them, didn't cross any other high-interference signals, put ground around them and around the crystal.

15:38 <kristianpaul> wolfspraul: (hang) yeah that could made sense

15:38 <wpwrak> (improved) so we basically went from three mortal sins (as far as crystal design is concerned) to only one :)

15:40 <wpwrak> (sins) 1) keep traces short. 2) surround them with ground. 3) keep them away from high-speed signals.

15:41 <wpwrak> of course, running them straight under the RAM signals, which are the fastest and busiest in the whole design, was just golden. that's bordering on sabotage :)

15:42 <wpwrak> oh, and i should mention that the layout had been outsourced. so our hw team didn't commit all those sins themselves. but of course, they should have spotted such things on their own.

15:43 <kristianpaul> sabotage, including the forgoten fancy scope :-)

15:43 <wpwrak> i should also mention that this was long before adam joined :)

15:43 <kristianpaul> (outsourced), yeah blame the third party! ;)

15:48 <wpwrak> (fancy scope) of course, we had only one probe. the others have somehow "wandered off". the fun thing is that FIC was very strict about inventories, even assigning people personal responsibility for purchases they had handled. (so our secretary was personally responsible for some 100+ kUSD of equipment)

15:49 <wpwrak> so it's rather odd that such a valuable item would just completely fall through the cracks

15:49 <kristianpaul> (spot), well, may be a common sense lack for this kind of design, also that exaplin the outsourcing it self

15:52 <larsc> hmpf, stupid me, remove mmap support and wonder why nothing works anymore...

15:52 <wpwrak> (spot) c'mon. probably all of them went to university and studied EE (actually, i don't know their biography. that's something wolfgang would know.)

15:53 <wpwrak> larsc: did you replace it with something that fails silently, in a seemingly plausible way ? :)

15:53 <larsc> wpwrak: -ENOSYS

15:53 <kristianpaul> larsc: you're talking of milkymsit related stuff? :)

15:53 <wpwrak> larsc: hmm, bad. better return, say malloc(1234);

15:53 <kristianpaul> btw i noticed you derived milkymist openwrt from some *linaro stuff isnt?

15:54 <larsc> wpwrak: it will work once i recompile userspace

15:54 <larsc> libc will use mmap if it is available otherwise mmap2

15:54 <larsc> and since our mmap is just a wrapper around mmap2 we can drop it

15:54 <wpwrak> ah, so it's a migration, not a total removal. now i get it :)

15:56 <larsc> removal of sys_mmap

15:58 <GitHub86> [linux-milkymist] larsclausen pushed 3 new commits to master: https://github.com/milkymist/linux-milkymist/compare/28b907d...889c214

15:58 <GitHub86> [linux-milkymist/master] lm32: Drop sys_mmap support - Lars-Peter Clausen

15:58 <GitHub86> [linux-milkymist/master] lm32: Cleanup show_regs a bit - Lars-Peter Clausen

15:58 <GitHub86> [linux-milkymist/master] lm32: Cleanup signal handling - Lars-Peter Clausen

16:04 <larsc> it would be interresting to see if could squeeze lm32 support in less than 1 kloc

16:08 <wpwrak> larsc: well, what's the maximum line length gcc can handle ? :)

16:09 <larsc> hehe

16:09 <larsc> if we strip all the gpl headers we are actually not far from it

16:25 <GitHub109> [linux-milkymist] larsclausen pushed 3 new commits to master: https://github.com/milkymist/linux-milkymist/compare/889c214...2b719af

16:25 <GitHub109> [linux-milkymist/master] lm32: Do not set USER_DS in flush_thread - Lars-Peter Clausen

16:25 <GitHub109> [linux-milkymist/master] modules: add default loader hook implementations - Jonas Bonn

16:25 <GitHub109> [linux-milkymist/master] lm32: Cleanup module loading - Lars-Peter Clausen

16:27 <larsc> and another 100 lines gone

16:29 <wpwrak> let's see how long until you have a two-liner :)

16:30 <wpwrak> or maybe even a one-liner, if you can find a convenient spot in some makefile

17:14 <kristianpaul> hum i wast aware lekernel used twitter to post frequently mm1 related progress

17:15 <kristianpaul> so often

17:16 <kristianpaul> what? there is not rss support in twitter anymore?.. :(

17:20 <lekernel> there is, but they hid it

17:20 <lekernel> check my blog/mailing list

17:24 <kristianpaul> he, spartan3 faster that s6?, just because hold/setup time

17:32 <kristianpaul> now i wonder a s3Â Â milkymist one?

17:34 <kristianpaul> hum price close to s6

17:38 <kristianpaul> what? XC3S2000-4FGG456I 40600 LE is 48.7USD andÂ Â XC6SLX45-2FGG484C still 39USD

17:38 <kristianpaul> wow

17:40 <wpwrak> at what quantity ?

17:41 <kristianpaul> ah, good point

17:42 <wpwrak> besides, the XC3 seemd to have a few more logic while the XC6 seems to have a bit more RAM. so it's not trivial to compare them. dunno about speed grades.

17:42 <lekernel> s3 is slower and smaller

17:42 <lekernel> and older, more expensive and obsolete sooner

17:42 <lekernel> period

17:43 <lekernel> if we ever change the fpga it will be a 7 or altera

17:44 <kristianpaul> sure, i wasnt point you to do it, just intelectual curiosity

17:46 <roh> yay. lasering done.

18:00 <larsc> nah, i'll start moving code to the generic section of the kernel ;)

18:21 <mwalle> mh either my rework is not working or usb/mouse support is not working in the latest snapshot

18:26 <mwalle> mh test tool wokrs

18:27 <mwalle> lekernel: btw was the phy changed? i get unexpected phy id 0045 with the test tool

18:34 <mwalle> wolfspraul: were there any mac addesses assiged to the rc1 boards?

18:36 <mwalle> wolfspraul: found it :)

18:44 <mwalle> cool everthing is working :)

18:44 <mwalle> lekernel: thx for the wolfson codec :)

18:44 <mwalle> wpwrak: so i have the second working rework of the ac97 codec :)

18:47 <wpwrak> mwalle: whee ! congratulations !

18:47 <kristianpaul> :-O

18:47 <kristianpaul> kudos indeed, mwalle !

18:48 <kristianpaul> some aditional comments for those rc2 still not fixed ac97 and may want to do it some day?

18:51 <mwalle> well just remove it with hot air and solder a new one :)

18:51 <mwalle> i'll take a picture later

18:52 <kristianpaul> heh

18:52 <kristianpaul> seems i definetelly i need a hot ait station..

20:38 <lekernel> mwalle, the mdio bit banging codes has bugs at time; 0045 = (0022 << 1) | 1 ....

20:45 <lekernel> http://www.reddit.com/r/funny/comments/j26j0/for_engineers/

20:58 <wpwrak> nice ;-)

21:00 <kristianpaul> haha

21:08 <mwalle> lekernel: oh ok, and btw is there sth wrong with bios and vga out? i only get some picture when flickernoise is started

21:09 <lekernel> nope, the BIOS disables video out unless you press the power button long enough or ESC/F8 on the keyboard

21:09 <mwalle> http://walle.cc/mmone/IMG_1265.JPG http://walle.cc/mmone/IMG_1270.JPG

21:09 <lekernel> (or if it can't boot)

21:09 <mwalle> lekernel: ah ok :) so no more spash screen?

21:09 <lekernel> technically yes, but it's not that useful

21:11 <mwalle> btw dunno the voltage rating for the capacitors for the codec, mine were rated 6V3

21:11 <lekernel> for the USB resistors, you should be able to stack them

21:11 <lekernel> (ie mount them on top of the existing varistors)

21:11 <mwalle> see second picture ;)

21:12 <lekernel> yup. but you mounted them close to the varistors, not on top

21:12 <lekernel> seems easier to mount them on top, for me at least :)

21:13 <mwalle> pushed them together with tweezers

21:14 <mwalle> next thing will be a working ir receiver ;)

21:15 <mwalle> btw i noticed a lot of freezes, after flickernoise right after flickernoise has started (and started a video in patch=

21:16 <lekernel> have you shorted L19?

21:17 <mwalle> no

21:17 <mwalle> should i?

21:22 <mwalle> lekernel: but will a non working video input freeze the whole board?

21:26 <lekernel> it should not, but in practice I have seen such things. it could be that the video chip sends some broken data to the video input core, which then DMA's crap all over the address space and crashes the board.

21:27 <lekernel> in a perfect world, the video input core should be robust enough not to do that, but ...

21:27 <mwalle> i'll short it tomorrow ;)

21:28 <lekernel> it's the big ferrite bead close to the video in chip, it's easy to short except that the ground plane sucks a lot of heat from the iron

21:28 <lekernel> do you have spare IR receivers?

21:28 <mwalle> would it make sense to supress automatic switch to video patches when no valid input signal is detected?

21:28 <mwalle> (ir) nope

21:29 <lekernel> yeah, that's something that should be done

21:29 <lekernel> along with caching the compiled patches

21:29 <lekernel> maybe for flickernoise 1.1 :)

21:29 <mwalle> larsc: cool more generic stuff (modules) :)

22:30 <larsc> mwalle: came with the openrisc linux port

22:31 <GitHub96> [linux-milkymist] mwalle pushed 2 new commits to master: https://github.com/milkymist/linux-milkymist/compare/2b719af...8c38f72

22:31 <GitHub96> [linux-milkymist/master] lm32: syntax fixes - Michael Walle

22:31 <GitHub96> [linux-milkymist/master] lm32: redefine sys_mmap to prevent undef reference - Michael Walle

22:31 <mwalle> larsc: please review these two commits

22:33 <larsc> looks good

22:37 <mwalle> why we undef NR_mmap but not NR_vfork?

22:40 <larsc> because we define our own vfork function in uclibc

22:40 <larsc> but the generic mmap will use NR_mmap if defined otherwise NR_mmap2

22:42 <larsc> hm, i guess my module cleanup was a bit to abious. missed that one function was using Elf32_Rel and the other Elf32_Rela

22:43 <larsc> ambitious

22:49 <larsc> i've been wondering whether we should treat scall like a normal function call and not save/restore r0-r10. Since for most functions it will be a tail call they won't use the restored regs anyway