<GitHub9>
[openwrt-milkymist/master] lm32: Select ext2 by default instead of ramdisk - Lars-Peter Clausen
<larsc>
mwalle: not the most recent version, but it did work before your device tree patch
<larsc>
kristianpaul: i'm afrai i don't have it anymore
<wpwrak>
deleting config files is like burning books ;-)
<kristianpaul>
hehe knew this reply, and i must confess i did with porpuse ;-)
<qi-bot>
test
<qi-bot>
kristianpaul speaking too soon is good :)
<wolfspraul>
kristianpaul: who is writing from the qi-bot console?
<xiangfu>
me.
<wolfspraul>
ah ok. spooky :-)
<kristianpaul>
hehe i was scared for a moment :-)
<wolfspraul>
xiangfu: yes don't do that too much, it may confuse people...
<xiangfu>
wolfspraul, sure. ok.
<kristianpaul>
or identify first ;)
<xiangfu>
the auto build will start in next 10 hours. after nanonote build finished. then we see if we got some images.
<xiangfu>
there are some folder name needs make sure. (like bin/milkymist/***) for now I just guess them. after first build I will have the correct name.
<aw>
wolfspraul, but now my Firefox in our wiki can't show the newest file i uploaded, after I restart Firefox, still the same, what else could cause this?
<wolfspraul>
I'll check
<kristianpaul>
noticed the "No vga screen" comment, kinda often
<wolfspraul>
yes, we will track it down
<aw>
be noticed that 0x7A is interesting: 1. d2/d3 OFF after reflashed successfully, but then power up then d2/d3 is dimly lit then cant reflash
<aw>
3. 3. after couple minutes then can power on/boot up/rendering(Only D3 is ON, no VGA screen)
<wolfspraul>
hmm. I think focus on fully testing all boards first.
<aw>
well..i keep testing
<wolfspraul>
then we need to start fixing, hopefully a lot more boards can be turned to 100% pass status then
<wolfspraul>
seems something is not right with vga on quite a few boards...
<Alarm>
When I run "lm32-rtems4.11-objcopy -Obinary hello hello.bin"
<Alarm>
I have the following error:lm32-rtems4.11-objcopy:hello: File format not recognized
<lekernel>
Alarm, your "hello" file is most probably not OK
<lekernel>
wolfspraul, I'd check the SOT23 gates which are used for buffering the sync signals (and cause signal detection pass/fail on the monitor). there was a failed one already on the MIDI of one board.
<lekernel>
each batch, new broken components. after IR sensors and beads, now gates...
<wolfspraul>
that's why you do testing
<wolfspraul>
what leaves Taipei is 100% working
<wolfspraul>
or the test routine is not good enough yet :-)
<wolfspraul>
anyway, one by one. first test all boards. looks messy now, but it will clear up eventually :-)
<wolfspraul>
it has to...
<wolfspraul>
lekernel: btw, you cannot just say "broken components", the reality is in many cases you don't know what happened.
<wolfspraul>
but it doesn't matter as long as our testing is rock solid
<lekernel>
IR and beads definitely were broken
<wolfspraul>
you mean the 1.40 USD beads?
<lekernel>
yes, and the 6 IR sensors on the run1 boards. none of them worked.
<wolfspraul>
ok probably we mean different things with 'broken components'
<wolfspraul>
those beads were most likely not 'broken'
<lekernel>
for me, more than one order of magnitude out of specs qualifies as "broken"
<wolfspraul>
and if all 6 IR sensors on the first run of 6 boards don't work, that's a quite strong indication that they are not all six 'broken' either (in my use of the word)
<wolfspraul>
they are the wrong ones maybe
<wolfspraul>
anyway
<wolfspraul>
it's just different meanings of 'broken'
<wolfspraul>
so far we have 9 or 10 100% pass boards, it's going up :-)
<lekernel>
either way, those boards were assembled with components that did not perform as specified
<wolfspraul>
:-)
<wolfspraul>
yes!
<lekernel>
omg there are 1557 performances at this belarusian festival
<GitHub142>
[milkymist/eack] TMU: early ack - Sebastien Bourdeauducq
<Alarm>
Here it is my "Hello world" appears on the console on my PC but not on the screen connected to the M1
<lekernel>
there's no screen console with rtems
<Alarm>
so this is normal always goes well!
<aw>
i tried to use BEN's original usb cable (the shorter one) instead of current longer Fukang upward 90 degree USB (for jtag/serial) 1.5M. to flash failure ones on d2/d3 dimly lit after finished reflashed. THEN NO d2/d3 dimly lit.
<wolfspraul>
interesting discovery!
<wolfspraul>
all sorts of things you see when you work with a lot of boards in sequence, no? :-)
<wolfspraul>
aw: in your milkymist reflash script, there is a line "frequency 6000000" somewhere
<wolfspraul>
do you see that?
<aw>
yes, i saw it. moment
<wolfspraul>
can you try to reduce that to a lower value, and still use the longer 1.5m cable?
<wolfspraul>
is the problem very reproducible with the longer USB cable (on those boards)?
<aw>
wait...let me see script
<wolfspraul>
I would prefer if you don't switch to the shorter cable now, at least not yet.
<wolfspraul>
the reason is that we include the 1.5m cable in the box, and we are just hiding the problem from our eyes, and pushing it to our users.
<wolfspraul>
let's try lower values and see what happens
<aw>
alright...let me change the script to 3000000 and still use longer 1.5M cable.
<wolfspraul>
can you reproduce the problem well?
<aw>
not sure...but i can only take those failures ones to reflash them again. let's me try 0x48 again. then try the old failure one. ;-)
<wolfspraul>
you can also go lower, 1000000, even less, 500000
<wolfspraul>
it may become very slow though ;-)
<wolfspraul>
but we need to find out what is a safe value with the cable we include in the box
<wolfspraul>
otherwise our users will run into this and suffer much worse than we suffer now in finding a safe value
<wolfspraul>
I'm not even sure this is the right idea with the frequency setting, but maybe it is...
<aw>
0x48 became d2/d3 is fully OFF when power on, well..i am now going to reflash @3000000 to see it if will reproduce it again. maybe not, don't know.
<lekernel>
don't bother with that... those JTAG adapters will and should remain mostly unused anyway
<lekernel>
they're just a _developer_ thing, and developers should be able to handle JTAG frequency problems
<wolfspraul>
no not good. I like to understand what I sell, and to know that it works and how it works.
<lekernel>
just get those boards flashed and working in as little time as possible
<wolfspraul>
maybe we should default to a lower frequency value in the reflash script we publish, and then developers who understand things well can increase that value
<lekernel>
no because it's additional delays on us to determine that frequency
<wolfspraul>
argh :-)
<wolfspraul>
you are quite insisting sometimes to cause you a big headache later :-) I don't want to support devs who run into this type of problem first time they fiddle with jtag...
<wolfspraul>
took me 2 hours already and some additional grey hair to narrowly avoid that rejon had no usable m1 at all for his talk...
<wolfspraul>
so if Adam tells me he has a _workaround_ for himself, that's not good enough for me as a manufacturer
<lekernel>
this should not happen with the current software
<wolfspraul>
you mean the web update?
<lekernel>
there is 1) web update for the main images 2) rescue mode in case of problems
<wolfspraul>
so - I always see the positive side. I think Adam's discovery is great, very good observation!
<lekernel>
all the rest is developer (1/1000 users) and unsupported
<wolfspraul>
:-)
<wolfspraul>
at least we have a bar, like a test, developers have to pass :-)
<aw>
done...seems 0x48 is good by 1.5M with 3000000 Hz, btw, from now on, let's use this to reflash rest to see if easily happens d2/d3 dimly lit after reflash. ;-)
<wolfspraul>
aw: ok, let's do this
<lekernel>
the JTAG cable is only for _FPGA_ development. you can use netboot for all software.
<wolfspraul>
please continue to use the cable we will include
<lekernel>
if you can't fix a JTAG connection, you probably can't program FPGAs either
<wolfspraul>
lower the value to 3000000 for now and let's see whether you run into more cases
<wolfspraul>
if this looks more stable to you, we will change the value in the published script
<wolfspraul>
I rather err on the side of robustness, especially out-of-the-box.
<lekernel>
also, this lower value makes flashing boards slower on our side. if they can be flashed correctly at 6MHz with another USB cable, just do it.
<wolfspraul>
but that's a separate reason.
<lekernel>
right now, the major blocker in this project is run3 delays (followed shortly by lack of publicity). it'd rather make sense to optimize those rather than track down a rare and developer-only JTAG problem.
<wpwrak>
aw, wolfspraul: (cable testing methodology) first, it would be good to do, say, 10 tests with the long cable and the original value. otherwise you don't even know what failure probability to look for.
<wpwrak>
wolfspraul: (grey hair) have you started to let it grow ? :)
<aw>
wpwrak, good reminds, I test now to see results 10 times individually. ;-)
<wpwrak>
lekernel: (1557) i was wondering what leet "ISST" was supposed do mean ;-) btw, they're at 1558 now. are you going there ?
<lekernel>
yes
<wolfspraul>
lekernel: come on. we have to lower the barriers of entry. seriously there is no 'delay' because of this. the biggest delay is that Adam ran into this in the first place because we have been sloppy about taking reflashing issues seriously _before_ the run.
<wpwrak>
roh: on some pictures the S looks a bit strange. does it also look odd in real life ?
<roh>
not really
<wolfspraul>
it's a minor hickup, and good discovery from Adam
<wolfspraul>
let's get back to professional and fast work now, no worries
<wolfspraul>
I do not want Adam to silently switch to a lab-only workaround, and put a cable into the box that he saw problems with but didn't tell anybody.
<wolfspraul>
our users/developers/whoever _WILL_ run into these issues
<wolfspraul>
guaranteed
<roh>
the first 2 pix have the protective foil on the squares not removed
<wpwrak>
wolfspraul: yeah, shipping known to be broken stuff is generally not a good idea. the least you can do is find out if lowering the frequency is a suitable workaround. say, if you have 50% failure at 6 MHz and 0% at 3 MHz, that's a good indication.
<wpwrak>
roh: aah ! that's why :)
<wolfspraul>
wpwrak: in a perfect world, the software would make more checks and automatically fall back to the highest 'safe' frequency. but meanwhile I do need to ship with a robust baseline.
<wpwrak>
roh: it still sticks out a little on SANY0028.jpg, but not as much as on the previous ones
<lekernel>
since I was against shipping this JTAG stuff in each box, I'm a bit annoyed that it incurs delays now. but well...
<wolfspraul>
roh: looks nice to me. what about size and location?
<lekernel>
excellent logo though :)
<roh>
wolfspraul: location: i would simply center it
<wolfspraul>
lekernel: are you ok with this logo, centered? which size?
<wolfspraul>
lekernel: no you need to look at the later ones
<lekernel>
the insides are de-polished?
<wolfspraul>
that one is misleading
<wolfspraul>
no it has the film on it still :-)
<lekernel>
ah, that's just the film
<wolfspraul>
if I understand things correctly
<lekernel>
ok
<wolfspraul>
look at the last 2
<lekernel>
yeah it's perfect :)
<wolfspraul>
roh wants to skip a full surface scan this time
<wolfspraul>
roh: there you go, that's the green-light :-)
<wolfspraul>
thanks a lot, this is great!!!
<wolfspraul>
wonderful that we got it this far...
<aw>
hmm...seems that hard to restore from that failures once happened. i just got reflash stops @ 5th time by 1.5M & 3MHz.
<wolfspraul>
maybe you will also see it with the shorter cable, if you try often enough
<aw>
when it stops, it will be stayed at "Bitstream length: 1484404
<lekernel>
aw, this looks like the libusb problem that Jon and I had
<wolfspraul>
I'm a little worried that we don't have full CRC all the time, as per my last understanding at least.
<wpwrak>
aw: maybe it's not the FTDI data speed but a USB signal integrity issue
<wpwrak>
lekernel: or software. always good if you have software to blame ;-)
<aw>
hmm..i felt using 0x48 to test 10 times is not good idea though
<wolfspraul>
wait so we all settled on the logo, right?
<wolfspraul>
seems yes :-)
<lekernel>
yeah, logo is perfect
<aw>
i should use a good board to test cable
<lekernel>
go ahead
<roh>
lekernel: needed to rework the stuff i got from jon.. somehow it wasnt squares etc
<roh>
wolfspraul: ok. will hack up a centering rig now ;)
<wolfspraul>
aw: wait, let's not stray away too far now.
<wpwrak>
roh: on SANY0029.jpg, are there still remains of the film in the grooves ? or why do they look rough ?
<roh>
wpwrak: i guess so.
<wolfspraul>
aw: don't do tests with many different frequencies and cables.
<wolfspraul>
not worth it
<aw>
are you sure?
<wolfspraul>
yes. there are too many combinations and it will create little value. we've been there before, and haven't implemented anything more robust yet.
<kristianpaul>
nice logo !!
<kristianpaul>
(SANY0028.)
<wolfspraul>
if you try with 90 boards it will add more harm than good.
<wolfspraul>
aw: before you tried to reflash 0x48 with the shorter cable, how many times did you try with the 1.5m cable?
<aw>
two times with 1.5m & 6MHz, then just use shorter usb one then no d2/d3 dimly lit, is it obviously clear to realize differences?
<wolfspraul>
hmm
<aw>
the shorter usb one I still used 6MHz
<wolfspraul>
I reluctantly force myself to agree with lekernel :-)
<wolfspraul>
aw: that means: 1) use the shorter one @6mhz for all reflashing now
<wolfspraul>
2) we still include the 1.5m cable in the box and hope that we can later fix this issue in software
<wolfspraul>
what does everyone think?
<wolfspraul>
cheap Chinese crap manufacturer cutting corners? :-)
<kristianpaul>
hum..
<kristianpaul>
yeah, i guess if a developer had issues will join here, and we tell the history :-)
<wpwrak>
wolfspraul: how about doing a proper test but postponing it until a less busy time ? (hoping such a time will come :)
<lekernel>
wpwrak, +1
<kristianpaul>
:-)
<wolfspraul>
the problem is that there may be too many actual root causes now, and Adam is in a tough spot with 90 boards around him and he is focusing on manufacturing yield, i.e. producing as many 100% pass boards as possible, in the least amount of time
<lekernel>
and yes, include the cable
<kristianpaul>
wpwrak: after 1.5m usb cables shipped?
<wolfspraul>
wpwrak: yes correct. same idea different wording.
<kristianpaul>
well as soon as oders come
<wolfspraul>
Adam is not in the right position now with so many boards around him and yield pressure.
<wolfspraul>
I don't want him to get lost in an ocean of cable length & frequency test data now...
<kristianpaul>
yeah, thats messy
<wolfspraul>
aw: did you understand? we all agree now :-)
<wolfspraul>
it's easy: use the short cable for reflashing now, and include the long one in the box later :-)
<aw>
wolfspraul, i am reading your all discussions now and thinking.
<wpwrak>
wolfspraul: in general, if you find this sort of issue, you want to understand them. otherwise, you're quickly juggling too many unknowns. but if you have a procedure that always works, even if very different from the regular procedure, then you can defer solving the issue
<wpwrak>
kristianpaul: (after cable shipped) preferably not ;)
<wolfspraul>
well you've been with rejon, you've seen the issue before...
<wpwrak>
i'm not sure what i saw ;-)
<wolfspraul>
someone just needs to sit down and spend serious time on it, with the many priorities we have that's not going to happen easily
<kristianpaul>
when adam have a little time later, providing infom about libusb version will be nice
<wolfspraul>
so someon has to test with different cables, different frequencies, find the root cause, make the software more robust probably in multiple ways, etc. etc.
<wolfspraul>
but that's not a good thing for Adam to take on now
<wolfspraul>
not at all
<wpwrak>
at the moment, it seems that we have three theories: 1) it's data frequency dependent, 2) it's USB signal integrity, 3) it's libusb
<kristianpaul>
and you miss the hardware!
<kristianpaul>
well, at least usb cable it self is OK
<kristianpaul>
now i think i undertand lekernel love for USB ;)
<wpwrak>
assuming 3) is a clear bug (and not a case of "uh, this random number seems to be luckier than the previous one"), then 3) should be checked first. then, try the long cable at 6 MHz. if the problem persists, try a lower frequency, 3 MHz or maybe even 1 MHz (assuming there are no know timing constraints on the lower end)
<lekernel>
3 is a clear bug
<lekernel>
I always failed to reflash the board correctly with the new libusb, like rejon did
<lekernel>
a complete reflash always failed
<wpwrak>
if the long cable still fails at 1 MHz, then it could be either the cable, the PC, or the JTAG board. if the long cable works perfectly at 1 MHz, then you still don't know what exactly is the problem, but you have a very promising work-around.
<wpwrak>
lekernel: oh, so it's a regression. that's bad.
<kristianpaul>
or may be the bug is in urjtag..
<kristianpaul>
for not following last libusb changes :-)
<wpwrak>
lekernel: is that libusb 0.1 vs. 1.0 ? or something within each line ?
<wolfspraul>
I don't think it's a cable issue
<wolfspraul>
guess of course
<wpwrak>
wolfspraul: think or fervently hope ? ;-)
<lekernel>
I don't know
<wolfspraul>
so for me Adam can bypass it now, get the boards reflashed with any cable that works, and still throw the 1.5m one into the box...
<wolfspraul>
guess
<lekernel>
I just downgraded both. that problem had used enough of my time already.
<wolfspraul>
just guess
<wolfspraul>
at some point I agree with lekernel about the importance of focus, so... bypass, throw cable into box, move forward, hope that things will get better over time :-)
<wolfspraul>
also we need to keep in mind that the USB cable itself comes from a very respected vendor, has already undergone testing by that vendor, etc.
<kristianpaul>
yeah
<wolfspraul>
it's not a 'cheap crap' cable sourced at a street corner in Shenzhen
<kristianpaul>
and force users to downgrade libs :-)
<wpwrak>
kristianpaul: not a good idea :)
<kristianpaul>
wpwrak: sure not :-)
<wpwrak>
wolfspraul: could be just an issue on the JTAG side. bad impedance match or such. the thing is high-speed, not just full-speed, isn't it ?
<wolfspraul>
high-speed yes
<wpwrak>
(JTAG side) i mean the board
<wpwrak>
then i can offer a 4th parameter: downgrade to full-speed ;-)
<wpwrak>
if you have poor but not hopeless signal integrity at high-speed, going to full-speed is pretty much guaranteed to solve this ;-)
<kristianpaul>
oh, dear..
<wpwrak>
(not sure how you'd accomplish the downgrade, though. change a bit in the FTDI's EEPROM ?)
<wpwrak>
kristianpaul: USB is great fun :)
<aw>
well...i continue to test with shorter usb cable & 6MHz. :)
<kristianpaul>
wpwrak: not just USB too many variables here, as why in some boards it worked well and other dont..
<wpwrak>
kristianpaul: and you wouldn't believe what correct USB signals look like when you measure them along the path. USB is designed to take in account reflections to compensate for other transmission effects.
<kristianpaul>
wpwrak: also that dimly lit sounds like leaking power issue for me still
<kristianpaul>
wpwrak: (compensate), smart way to avoid bugs :-) and create more fun as you said :)
<wpwrak>
(too many variables) oh, that's why you make a tree :) think of potential causes, then split your tests such that they tell you something useful. branch at each test.
<kristianpaul>
yes
<wpwrak>
(compensate) oh, the electrical side is perfectly sound. it's just extremely confusing until you understand what's going on :)
<wpwrak>
(usb signal) lemme see if i still have my simulation from the happy ghost chase in HXD8 ...
<wpwrak>
(dimly lit) yeah, don't know what that means. only that adam doesn't seem to like it :)
<wpwrak>
kristianpaul: in real life it actually looks worse
<wpwrak>
kristianpaul: the signal travels from the right to the left. you start with a clean square. at the end you have a bit of overshoot but still good edges. in the middle, you have something a lot scarier ...
<wpwrak>
kristianpaul: in HXD8, we ran into USB stability issues. well, rather, they had already been an old issue in HXD8 when i ran into that project. the hardware folks were quite convinced they had done everything right. so this was presented to me as a software problem.
<wpwrak>
kristianpaul: so i spent a few days sifting though the kernel. i found a couple of small things, but nothing that really looked as if it had enough potential to cause trouble. (the trouble was that ethernet-over-usb would stall after some time, often around 10-30 minutes)
<wpwrak>
kristianpaul: then we thought of examining signal integrity. the problem: where to find the equipment to do this ? well, at FIC, there was one lab where they had a big scope with the USB test software. that was so exclusive that you had to ask for turns. so we got our turn the next day and walked down with our troubled board.
<wpwrak>
kristianpaul: the expert then hooked the board up and showed us the eye diagram (that's a setting where you trigger on both edges of the signal, so you see a pattern that looks like a hexagon)
<wpwrak>
kristianpaul: the eye diagram looked HORRIBLE. not at all like a hexagon. instead, we saw the signal crawl up to a plateau at about half the level, stay there for a bit, then go up some more, etc. basically what you see in the middle of the simulation.
<wpwrak>
kristianpaul: so we said our thanks and went to work on that signal integrity. countless reworks later, we had something like 100 pF of extra capacitance scattered all over the board, the signals looked a bit "cleaner" on the scope ... and the problems were just as bad as before
<wpwrak>
kristianpaul: while the hw team was doing reworks, i went to my office and made this simulation. i was a bit surprised that it also showed the "bad" signal. even though it was supposed to be "perfect". eventually, i realized that we (and the USB expert) had been looking at the wrong end of the cable.
<wpwrak>
kristianpaul: as a little detail, one night, i needed to check something on the scope. alas, i didn't have a good enough instrument at hand. but i remembered we had some really fancy 1 GHz or more beast stored in some forgotten corner. i didn't know which group it belonged to, but hey, who's there to complain at 1 am ? ;-)
<wpwrak>
kristianpaul: so i dragged the thing over and did my things. while playing around, i found that it also had some USB test software installed. turned out that we could have done all the fancy testing at our leisure with that scope, without having to rely on the other lab.
<wpwrak>
kristianpaul: fun fact #2: eventually, our head of EE did a little investigation and found out that this scope (at the value of a decent car) actually belonged to our group ;-)
<wpwrak>
kristianpaul: well, the story continues. i then suggested that we may have a clock instability that may originate from poor power routing or other power contamination. (power went around the CPU in a rather peculiar pattern)
<wpwrak>
kristianpaul: one theory was that some other component may contaminate power. e.g., the GSM modem on the same board. so we tried to remove all other chips, one by one, to see if the problem would stop. that rework was actually amazing. the EE folks removed one BGA after the other, without damaging the board.
<wpwrak>
kristianpaul: alas, by the time when there was little left besides the CPU itself, the USB bug was still alive.
<aw_>
hi i am going to sleep now. let's continue tomorrow. :)
<wpwrak>
kristianpaul: we also added beads all across the power tree, to contain possible sources of contamination, to no avail.
<wpwrak>
kristianpaul: finally, we started to run out of time. so we put in all our best guesses and hoped for the best, without really being convinced that we had nailed the problem.
<kristianpaul>
wait wait, just reading i was away :)
<aw_>
i made a column to note a shorter usb cable from now on (marked "V" at the most right column)
<wpwrak>
kristianpaul: more by accident, then wandered into the final review meeting of EE. i tought that could be interesting, also because i knew there were some other changes i didn't like, and i was hoping for a chance to kill them.
<aw_>
night
<kristianpaul>
n8
<wpwrak>
kristianpaul: well, at some point, they discussed some of the power changes and showed that region of the PCB. that was the first time i had a good look at the layout. (they used PADS, so access to all those things was difficult)
<wpwrak>
kristianpaul: there, i noticed something rather strange. four large pads from which two traces meandered towards the CPU, crossing a large set of parallel signals, to vanish in some vias, and supposedly to continue from there.
<wpwrak>
kristianpaul: when i asked what that was, they told me it was the crystal. when i asked where those signals would come out again, they pointed to the opposite side of the chip.
<wpwrak>
kristianpaul: the parallel signals i had seen were data and address lines for the RAM.
<wpwrak>
kristianpaul: so the traces between CPU and crystal went from the crystal, right underneath the RAM lines, then tunneled underneath the CPU to its opposite corner, burrowing all their way to the other side of that 6 (i think) layer board and back again, until they finally reached the CPU.
<wpwrak>
kristianpaul: needless to say, there wasn't much ground around these traces either, not even at the same layer
<wpwrak>
kristianpaul: that was the great moment of revelation ;-) it took a bit of discussion until i had the hw team convinced that we could indeed improve this even without having to do a complete re-layout (which, understandably, everyone was afraid of). but then they went at it with gusto. when the revised board was finally made, the USB instability was gone for good :)
<wolfspraul>
kristianpaul: wpwrak [dimly lit] in conjunction with the reflashing problems that sounds like we write corrupted data and then the s-6 hangs on reconfiguration or shortly thereafter
<wolfspraul>
just a guess of course but if the problem goes away with better flashing, I'd say that point away from power problems
<wpwrak>
may just be a separate problem. one being bad flash, the other something with power
<kristianpaul>
wpwrak: remove BGA, nice to watch :)
<kristianpaul>
wpwrak: so how you improved?
<wpwrak>
(bga) that was totally amazing. i expected that a board wouldn't survive more than 1 maybe 2 such changes. also because we didn't have optimal equipment for all this. yet they did this with disdainful ease. one chip after the other, maybe ten of them in total, several of them BGAs. and the board just kept on working.
<wpwrak>
(improved) oh, we moved the crystal traces away from the RAM traces. that was probably the main issue. of course, the whole design regarding the crystal was deeply flawed.
<kristianpaul>
ah, i tought move traces was goint to be other big problem, hopefully not then :-)
<wpwrak>
(improved) then we also shortened the traces a little, made sure they had some shielding above and below them, didn't cross any other high-interference signals, put ground around them and around the crystal.
<kristianpaul>
wolfspraul: (hang) yeah that could made sense
<wpwrak>
(improved) so we basically went from three mortal sins (as far as crystal design is concerned) to only one :)
<wpwrak>
(sins) 1) keep traces short. 2) surround them with ground. 3) keep them away from high-speed signals.
<wpwrak>
of course, running them straight under the RAM signals, which are the fastest and busiest in the whole design, was just golden. that's bordering on sabotage :)
<wpwrak>
oh, and i should mention that the layout had been outsourced. so our hw team didn't commit all those sins themselves. but of course, they should have spotted such things on their own.
<kristianpaul>
sabotage, including the forgoten fancy scope :-)
<wpwrak>
i should also mention that this was long before adam joined :)
<kristianpaul>
(outsourced), yeah blame the third party! ;)
<wpwrak>
(fancy scope) of course, we had only one probe. the others have somehow "wandered off". the fun thing is that FIC was very strict about inventories, even assigning people personal responsibility for purchases they had handled. (so our secretary was personally responsible for some 100+ kUSD of equipment)
<wpwrak>
so it's rather odd that such a valuable item would just completely fall through the cracks
<kristianpaul>
(spot), well, may be a common sense lack for this kind of design, also that exaplin the outsourcing it self
<larsc>
hmpf, stupid me, remove mmap support and wonder why nothing works anymore...
<wpwrak>
(spot) c'mon. probably all of them went to university and studied EE (actually, i don't know their biography. that's something wolfgang would know.)
<wpwrak>
larsc: did you replace it with something that fails silently, in a seemingly plausible way ? :)
<larsc>
wpwrak: -ENOSYS
<kristianpaul>
larsc: you're talking of milkymsit related stuff? :)
<wpwrak>
larsc: hmm, bad. better return, say malloc(1234);
<kristianpaul>
btw i noticed you derived milkymist openwrt from some *linaro stuff isnt?
<larsc>
wpwrak: it will work once i recompile userspace
<larsc>
libc will use mmap if it is available otherwise mmap2
<larsc>
and since our mmap is just a wrapper around mmap2 we can drop it
<wpwrak>
ah, so it's a migration, not a total removal. now i get it :)
<wpwrak>
let's see how long until you have a two-liner :)
<wpwrak>
or maybe even a one-liner, if you can find a convenient spot in some makefile
<kristianpaul>
hum i wast aware lekernel used twitter to post frequently mm1 related progress
<kristianpaul>
so often
<kristianpaul>
what? there is not rss support in twitter anymore?.. :(
<lekernel>
there is, but they hid it
<lekernel>
check my blog/mailing list
<kristianpaul>
he, spartan3 faster that s6?, just because hold/setup time
<kristianpaul>
now i wonder a s3Â Â milkymist one?
<kristianpaul>
hum price close to s6
<kristianpaul>
what? XC3S2000-4FGG456I 40600 LE is 48.7USD and  XC6SLX45-2FGG484C still 39USD
<kristianpaul>
wow
<wpwrak>
at what quantity ?
<kristianpaul>
ah, good point
<wpwrak>
besides, the XC3 seemd to have a few more logic while the XC6 seems to have a bit more RAM. so it's not trivial to compare them. dunno about speed grades.
<lekernel>
s3 is slower and smaller
<lekernel>
and older, more expensive and obsolete sooner
<lekernel>
period
<lekernel>
if we ever change the fpga it will be a 7 or altera
<kristianpaul>
sure, i wasnt point you to do it, just intelectual curiosity
<roh>
yay. lasering done.
<larsc>
nah, i'll start moving code to the generic section of the kernel ;)
<mwalle>
mh either my rework is not working or usb/mouse support is not working in the latest snapshot
<mwalle>
mh test tool wokrs
<mwalle>
lekernel: btw was the phy changed? i get unexpected phy id 0045 with the test tool
<mwalle>
wolfspraul: were there any mac addesses assiged to the rc1 boards?
<mwalle>
wolfspraul: found it :)
<mwalle>
cool everthing is working :)
<mwalle>
lekernel: thx for the wolfson codec :)
<mwalle>
wpwrak: so i have the second working rework of the ac97 codec :)
<wpwrak>
mwalle: whee ! congratulations !
<kristianpaul>
:-O
<kristianpaul>
kudos indeed, mwalle !
<kristianpaul>
some aditional comments for those rc2 still not fixed ac97 and may want to do it some day?
<mwalle>
well just remove it with hot air and solder a new one :)
<mwalle>
i'll take a picture later
<kristianpaul>
heh
<kristianpaul>
seems i definetelly i need a hot ait station..
<lekernel>
mwalle, the mdio bit banging codes has bugs at time; 0045 = (0022 << 1) | 1 ....
<mwalle>
lekernel: ah ok :) so no more spash screen?
<lekernel>
technically yes, but it's not that useful
<mwalle>
btw dunno the voltage rating for the capacitors for the codec, mine were rated 6V3
<lekernel>
for the USB resistors, you should be able to stack them
<lekernel>
(ie mount them on top of the existing varistors)
<mwalle>
see second picture ;)
<lekernel>
yup. but you mounted them close to the varistors, not on top
<lekernel>
seems easier to mount them on top, for me at least :)
<mwalle>
pushed them together with tweezers
<mwalle>
next thing will be a working ir receiver ;)
<mwalle>
btw i noticed a lot of freezes, after flickernoise right after flickernoise has started (and started a video in patch=
<lekernel>
have you shorted L19?
<mwalle>
no
<mwalle>
should i?
<mwalle>
lekernel: but will a non working video input freeze the whole board?
<lekernel>
it should not, but in practice I have seen such things. it could be that the video chip sends some broken data to the video input core, which then DMA's crap all over the address space and crashes the board.
<lekernel>
in a perfect world, the video input core should be robust enough not to do that, but ...
<mwalle>
i'll short it tomorrow ;)
<lekernel>
it's the big ferrite bead close to the video in chip, it's easy to short except that the ground plane sucks a lot of heat from the iron
<lekernel>
do you have spare IR receivers?
<mwalle>
would it make sense to supress automatic switch to video patches when no valid input signal is detected?
<mwalle>
(ir) nope
<lekernel>
yeah, that's something that should be done
<lekernel>
along with caching the compiled patches
<lekernel>
maybe for flickernoise 1.1 :)
<mwalle>
larsc: cool more generic stuff (modules) :)
<GitHub96>
[linux-milkymist/master] lm32: syntax fixes - Michael Walle
<GitHub96>
[linux-milkymist/master] lm32: redefine sys_mmap to prevent undef reference - Michael Walle
<mwalle>
larsc: please review these two commits
<larsc>
looks good
<mwalle>
why we undef NR_mmap but not NR_vfork?
<larsc>
because we define our own vfork function in uclibc
<larsc>
but the generic mmap will use NR_mmap if defined otherwise NR_mmap2
<larsc>
hm, i guess my module cleanup was a bit to abious. missed that one function was using Elf32_Rel and the other Elf32_Rela
<larsc>
ambitious
<larsc>
i've been wondering whether we should treat scall like a normal function call and not save/restore r0-r10. Since for most functions it will be a tail call they won't use the restored regs anyway