<wpwrak>
Q: does reading the NOR via JTAG-USB go directly to the NOR, or is there an intermediate copy of the data made in any RAM on the M1 ?
<mwalle>
wpwrak: you directly control and readback the io pins
<wpwrak>
excellent !
<mwalle>
mh? :)
<wpwrak>
mwalle: you wouldn't have by any chance experienced those supposed NOR corruptions / CRC errors on your own M1 ?
<mwalle>
i dont think so
<mwalle>
although i havent flashed it very often
<wpwrak>
i'm trying to write up a test procedure to figure out what's really going on, but i'm a little afraid it may get so complex that we then also have to spend quite some time debugging the bug reports :-(
<wpwrak>
no, it's supposed to happen to the content in flash, without actually attempting to change it
<mwalle>
wpwrak: only temporal or permanent corrupted?
<wpwrak>
that's one of the questions we're not quite sure about :)
<wpwrak>
on rc2, from what i've heard, it appears to be more permanent than temporary, on rc3 we don't know yet. it's also not clear if rc2 and rc3 show the same problem, if there's only one problem or multiple overlapping ones, etc.
<mwalle>
and are all sectors equally affected?
<mwalle>
because i'm just using the bitstream and bios from the flash most of the time
<mwalle>
+only
<mwalle>
so i have a rc1
<wpwrak>
again, insufficient data :-(
<wpwrak>
what exactly is that BIOS ? is it some sort of monitor ?
<mwalle>
mh? the 'normal' mm bios
<mwalle>
btw urjtag has some python bindings, (dunno if its already merges, but there was some patches on the ML)
<mwalle>
if my bitstream image was corrupted, i would know it
<wpwrak>
i don't know what the "normal MM BIOS" :) all i know about the M1 is from reading about it. well, i briefly held one in my hands in porto alegre. even managed to pry loose the uSD holder's lid :)
<mwalle>
ah  ok:)
<wpwrak>
do you power cycle often ?
<mwalle>
mh, not really, im using the the reset button most of the time
<wpwrak>
one theory is that it's related to power ramp up/down issues
<wpwrak>
(reset button) thought so :) maybe this is just what saves you :)
<mwalle>
the bios is the first thing which is loaded and has some basic support for loading binaries from net/sd/flash
<wpwrak>
does the BIOS implement any user access ? such as NOR retrieval by whatever means (ether or such) ?
<wpwrak>
ah, and do you know if NOR content is stored inverted by any chance ?
<wpwrak>
does the "Images CRC" come frmo the BIOS ? or is it something else ?
<mwalle>
wpwrak: no user access to the internal bus, but you can use the gdb rom for that
<mwalle>
wpwrak: and in dont know the physical implementation of a nor cell
<mwalle>
thats a test program running there, which has the crcs somehow embedded
<mwalle>
but the bios has a crc for itself too
<wpwrak>
(test program) ah, i see. suspected that this may be an "unusual" setup
<mwalle>
and maybe we could add a crc32 command to the gdb stub, so you wont have to download the memory over a slow connection
<wpwrak>
(nor cell) i mean, do you know if a byte that reads, say, 0x55 in the file and that RTEMS would see as 0x55 is actually stored in the NOR as 0x55 or is it inverted for some obscure reason ? (i'm asking because a 0->1 single-bit corruption appears to have been observed, which is something that should be "impossible")
<wpwrak>
that could be useful. most of the time, i think you want all the memory to do proper analysis. and even if it's good, you want to exercise the communication channel
<mwalle>
wpwrak: no its stored non inverted
<wpwrak>
e.g., we know that USB-JTAG has gremlins. can we be sure that Ether doesn't have some, too ? :)
<wpwrak>
darn :)
<mwalle>
gdbrom uses the serial line, with some checksumming, so i think you can be pretty sure its reliable :)
<mwalle>
and it wont need any external ram or flash
<wpwrak>
what happens if you get a checksum error ?
<mwalle>
command is retried
<wpwrak>
that's serial via USB-JTAG, correct ?
<mwalle>
serial via ft2232, yes
<mwalle>
that usb-jtag module
<wpwrak>
okay
<wpwrak>
does yours do full-speed or is it the "fixed" version that does high-speed ?
<mwalle>
i have an early hand-assembled one, which wasnt affected by the highspeed bug, iirc
<wpwrak>
hmm. tricky :)
<wpwrak>
so yours gets enumerated as a high-speed device ? (on Linux, i suppose)
<mwalle>
[Â Â Â Â 1.556009] usb 2-3: new high speed USB device using ehci_hcd and address 4
<mwalle>
[Â Â Â Â 1.691945] usb 2-3: New USB device found, idVendor=20b7, idProduct=0713
<mwalle>
20b7:0713 was the id, right? :)
<wpwrak>
and when gdb gets a checksum error and retries, does it leave any record of this ? such as a message, log entry, statistics counter, ... ? i.e., is there a way to tell how often your communication incurs a checksum error ?
<wpwrak>
yup :)
<mwalle>
wpwrak: sorry, dunno, maybe gdb has some counters, but i dont think you see any command checksum errors
<mwalle>
before that you would see usb message errors imho
<mwalle>
unless the gdb stub is doing sth wrong
<wpwrak>
the thing is that USB also retries bad packets. i need to see if this leaves any traces. you can actually get lots of bit errors on your USB connection with everything looking normal.
<wpwrak>
and every once in a while, one of those errors (must be multi-bit, CRC catches all single-bit errors) will slip through USB's guard
<wpwrak>
in your case, gdb should then see a checksum failure, and retry
<mwalle>
and how is that connected to the suddenly corrupted flash sector? :)
<wpwrak>
i hope not :) but it's an additional factor in the analysis. since usb-jtag seems to be the lowest-level access we have, be it with jtag or gdb, it would make sense to use that to read back the NOR content when a CRC error is reported
<wpwrak>
now, if USB-JTAG itself is unreliable, we need to take this into account too, and not mistake USB-JTAG errors for NOR errors, or vice versa
<mwalle>
i see, so you could download the image via gdb (fjmem, the jtag access is slow as hell), and do a crc on both the board and your downloaded image
<mwalle>
to be sure they match
<wpwrak>
yes
<mwalle>
assuming that the nor errors are sticky :)
<wpwrak>
repeat this a few times, to be sure to have a baseline estimate for an upper bound of the error rate on usb-jtag
<wpwrak>
then use the board until the CRC error happens
<wpwrak>
no, first you'd do all this with a NOR that still thinks it's okay
<wpwrak>
sort of a calibration :)
<mwalle>
i bet there wont be any bad runs :)
<wpwrak>
in the calibration or once you start systematic testing ? :)
<mwalle>
calibration
<wpwrak>
that's what i'd hope for, too :)
<wpwrak>
but if you don't measure it, you don't know
<mwalle>
and of course if you try to reproduce an error there wont be any
<wpwrak>
yeah, and you hear that faint giggling from murphy's general direction :)
<mwalle>
hehe
<wpwrak>
for all we know, it could actually be a RAM corruption, too. wouldn't this be fun ? :)
<mwalle>
have to go, last day at work tomorrow :)
<wpwrak>
retiring ? :)
<mwalle>
if thats true, gdb should be able to dump the right image :)
<wpwrak>
(gdb) yes ! does xiangfu know how to do these things with gdb ?
<mwalle>
holyday :)
<wpwrak>
(holiday) great ! so you'll have more time for M1 ;-)
<mwalle>
yeah thinking about it on the beach :)
<wpwrak>
have a few drinks. and sometimes the best ideas pop up on the morning, with a light hangover :)
<mwalle>
dunno if xiangfu knows it, but theres something in the wiki how to connect to the board using gdb, and then it should be some normal gdb commands to download a memory region
<mwalle>
so gn8 :)
<wpwrak>
sweet dreams ! :)
<wpwrak>
and thanks for all the help !
<mwalle>
np, btw does the crc fails for other flash partitions, too?
<mwalle>
not only backup bios?
<mwalle>
oh forget it
<mwalle>
its backup flickernoise :)
<mwalle>
time to sleep
<wpwrak>
it seems to fail in the bitstream as well. recovery may simple be the only one where the problem can be reported in this way. bitstream failure yields my mysterious results :)