<wpwrak>
interesting. now it's easier to trigger the hang. am i just quicker in the morning ? :)
<wpwrak>
hmpf. nice. I: Booting...  assertion "!the_message" failed:
<wpwrak>
that's even before i get to grab the thing with gdb ...
<wpwrak>
attaching when it's there worked, though. phew :)
<wpwrak>
mwalle: how do you see the possibility of overcoming gdb's apparently very small transfer size ?
<wpwrak>
mwalle: (see also my posting with subject "gdb data transfer limitation ?")
<wpwrak>
i can work around it reasonably well in the case of data, but there doesn't seem to be a sane way for processing stack frames piecewise (sane = without prying it apart manually)
<wpwrak>
hmm, is it safe to increase the size of the memory regions in milkymist/software/gdbstub/linker.ld ?
<wpwrak>
i.e., where does the rom+ram = 8 kB limit come from ?
<lekernel>
it's on-chip memory, which is scarce
<lekernel>
also, if you change the gdbstub, you'll need to rebuild the bitstream
<wpwrak>
shit :-(
<lekernel>
(you can increase a bit the amount of memory at the same occasion, though. but try not to - it's an expensive resource)
<wpwrak>
hmm. let's see if i can fix the problem on the gdb side instead then
<lekernel>
well... the main problem with bitstream builds is the 10GB of bloatware you need to set up. once you have ISE installed, it's just one command.
<wpwrak>
and years of processing etc.
<wpwrak>
how is llhdl coming along anyway ? ;-)
<lekernel>
only 15-30 min on a modern computer
<wpwrak>
gdb-7.3.1 should be good, right ? i.e., supporting lm32 out of the box ?
<lekernel>
yes
<lekernel>
if you're using the flterm pass-through, any GDB version with lm32 support should be OK
<wpwrak>
kewl, thanks
<lekernel>
the gdb 7.3+ requirement is only when NOT using flterm
<wpwrak>
hehe, sometimes just reading the source helps :)
<wpwrak>
the magic word is  set remote memory-read-packet-size 500
<wpwrak>
(or something like this)
<wpwrak>
hmm .. maybe
<wpwrak>
this lets me read big variables. no luck with the stack, though
<wpwrak>
seems that gdb doesn't understand lm32 stack frames. hmm.
<wpwrak>
i guess things are compiled with omit-frame-pointer ?
<lekernel>
FN isn't, unless it's the compiler default
<lekernel>
RTEMS I don't know
<wpwrak>
i don't see it there either
<wpwrak>
and we do have a very nice frame pointer. hmm.
<wpwrak>
down
<wpwrak>
oops
<mwalle>
wpwrak: yeah there is a qSupported query
<mwalle>
which returns PacketSize=..
<mwalle>
#define BUFMAX 800
<mwalle>
#define BUFMAX_HEX 320Â Â /* keep this in sync with BUFMAX */
<mwalle>
i fixed this some time ago, and iirc a quick test showed that everything was working ;)
<mwalle>
commit id 02cdac90da6
<wpwrak>
hmm, that's old enough that it should be in the SDK
<wpwrak>
a test would be something like this (on flickernoise):Â Â p *( struct yaffs_dev *) 0x408d85c0
<wpwrak>
(any reasonably valid address will do)
<mwalle>
could you please show remote memory-read-packet-size before setting it?
<wpwrak>
The memory-read-packet-size is 0. Packets are limited to 800 bytes.
<mwalle>
mh is 750 working?
<mwalle>
799?
<mwalle>
798? :)
<wpwrak>
up to 799 yes ;-)
<wpwrak>
length < (sizeof(remcom_out_buffer) / 2))Â Â maybe s/</<=/ ?
<wpwrak>
and then find a way to sneak in the \0 ? :)
<mwalle>
btw do you have binary downloads enabled?
<wpwrak>
ah, that's already there
<wpwrak>
no, it's all text
<mwalle>
mh theres a checksum at the end of the packet
<wpwrak>
aah, right, the checksum
<mwalle>
but this not written to the buffer
<wpwrak>
yup. generated on the fly
<mwalle>
are you getting E22?
<wpwrak>
so maybe it's really just < vs. <=
<wpwrak>
yes
<wpwrak>
btw, you don't need parentheses around sizeof(...)/2
<wpwrak>
wonders how many bugs are overlooked in reviews due to excessive parentheses
<wpwrak>
maybe we need a -Wredundant-parentheses ideally enabled by default, along with -Werror :)
<mwalle>
wpwrak: mh nice compiler switch :)
<mwalle>
wpwrak: thats not my code ;)
<wpwrak>
heh :) many possible culprits
<wpwrak>
now, what's really troubling me at the moment is that stack traces don't work
<mwalle>
yeah <= makes sense
<mwalle>
whats the last stack frame you see?
<mwalle>
wpwrak: do you submit a patch for gdbstub? rebuild it, copy gdbstub.rom to the proper location and submit that patch too? :)
<wpwrak>
i don't have the bitstream build process set up, so i couldn't test changes to gdbstub
<wpwrak>
*snivel* all i wanted are some nice colorful effects. and now i'm there, fixing RTEMS, poking around in the depths of gdb, ...
<wpwrak>
(fixing RTEMS) as in, i already have two more bugs in the queue. one fixed, the other one in need of better debugging support (hence the stack traces). the 2nd bug may be in FN/MM, though
<wpwrak>
(it's something trying to read from an empty queue right after the system says "Booting". sometimes, a small assert can work wonders ...)
<mwalle>
wpwrak: it almost never works like that :b
<wpwrak>
yeah, but usually the ride is a little less scenic :)
<lekernel>
wpwrak, and you're doing an outstanding job at it :) keep up the good stuff *g*
<mwalle>
wpwrak: lekernel: ok so i'll fix and test it
<wpwrak>
in fact, the only thing that saves the system from total collapse is that most of the RTEMS infrastructure is used very lightly. put a bit more pressure on it, and all those little oversights will go up like new year's eve fireworks
<wpwrak>
lekernel: thanks for commiserating ;-)
<wpwrak>
mwalle: thanks !
<mwalle>
i like wpwrak pictured metaphors ;)
<wpwrak>
what scares me a bit is that some of the more dubious constructs (like the doubly linked list structure) can have subtle failure modes. e.g., if you just use the list in the "old-fashioned" way, i.e., for (p = list_first(); p; p = p->next) ..., things will almost work. the only problem will be that you get what looks like a corrupt last list element. but the pointer will still be valid, so you wouldn't even trip over that.
<wpwrak>
and there's a pair of examples of mixing up the paradigms right in chain.h, so the issue is less hypothetical than it may seem ...
<Fallenou>
(reading rtems ML) Ralf really is a pain.
<wpwrak>
"I don't see what add a cast to void* would fix." ? :)
<Fallenou>
I was looking at the chat between Joel and Ralf about RTEMS being dependent on GNU C compiler
<wpwrak>
and, towards what are they leaning ?
<Fallenou>
Joel is just saying it would be cool to reduce step by step dependencies on gcc in order to someday be able to compile with something else
<Fallenou>
Ralf is just answering like an ass as usual
<Fallenou>
sarcasm etc
<wpwrak>
hmm. moderate gcc dependencies shouldn't be a big issue. you need a major compiler for such a project anyway. and the likely candidates seem to try to be compatible with gcc anyway.
<wpwrak>
ah, very good. at least two patches are in.
<Fallenou>
sdcc pcc clang were quoted
<Fallenou>
Metrowerks and GreenHill as well
<wpwrak>
doesn't clang track gcc ?
<kristianpaul>
sdcc :)
<wpwrak>
sdcc ... i wouldn't try that ;-)
<kristianpaul>
i will hold my hope for while
<kristianpaul>
:_)
<Fallenou>
very nice catch about rtems linked lists
<Fallenou>
really crazy that such errors are still in rtems code base ...
<Fallenou>
and that nobody spotted it
<wpwrak>
yes :)
<Fallenou>
crazy crazy
<kristianpaul>
linux linux ? :)
<wpwrak>
also all the macro use issues
<wpwrak>
kristianpaul: indeed. most of this wouldn't survive first review :)
<Fallenou>
Is VxWorks better than rtems ? (someone knows ?)
<Fallenou>
better in term of code quality, memory footprint, features etc ?
<lekernel>
it's proprietary, no?
<Fallenou>
I think yes
<kristianpaul>
yes it is
<Fallenou>
but I mean, NASA uses RTEMS, I don't think NASA hates proprietary software
<Fallenou>
So I wonder why they use RTEMS and not some proprietary OS
<kristianpaul>
may be is not NASA it self, may be a thir party contract? but yes is cool see rtems in NASA
<Fallenou>
is hoping a linked list related crash won't make a missile crash into his home
<lekernel>
wpwrak, btw, I've had a good share of linux problems, and I think mwalle and lars_ too. don't imagine linux is a little paradise :)
<kristianpaul>
Fallenou: ;)
<Fallenou>
lekernel: you mean linux in general ? or lm32-linux or some-arch-linux ?
<lekernel>
lm32
<Fallenou>
ok ok
<wpwrak>
lekernel: dunno. it's usually been kind to me the last ~20 years :)
<wpwrak>
not that i hadnt't had my moments of despair searching for flaws in linked lists. didn't find any, though :)
<lars_>
thats because your linux code wasn't written by theobroma
<wpwrak>
who's that ?
<Fallenou>
guys who wrote lm32 port of linux
<lars_>
the inital lm32 port ;)
<wpwrak>
ah, i see
<Fallenou>
ah yes initial
<wpwrak>
lars_: i suppose you and your pitchfork had some fun there ?
<lekernel>
how's the current code btw? "only" missing 90% of the drivers?
<wpwrak>
and the mmu :)
<lars_>
wpwrak: took me a while to get fork() stable
<wpwrak>
that sounds ugly
<lekernel>
no other niceties like shared libs not working, random crashes, linker problems etc.?
<lars_>
lekernel: basically yes. there hasn't been much work done lately. starting a job probably wasn't one of my brightest ideas ;)
<lars_>
except for the random crashes, that should be fixed by now
<wpwrak>
an mmu also helps a bit with these things :) particularly the random crashes - by catching these NULL pointers before they mess up things at weird places
<lekernel>
he, catching NULL pointers should be about 2 lines of verilog
<lekernel>
assuming the gdb system handles bus errors correctly
<wpwrak>
show us ! (-:C
<lars_>
hehe. adding a watchpoint for 0x00 is usually the first thing i do when i want to debug weird problems on lm32
<wpwrak>
right now, my "null pointer catcher" consists of awatch *(uint32_t *) 0 ... awatch *(uint32_t *) 12
<lekernel>
we can easily generate a bus error on the first 256kb or so
<wpwrak>
this gets me at least things those pesky linked lists. it's not good for much else, though
<wpwrak>
s/things/things with/
<wpwrak>
lars_: i just wish watchpoints could be larger than 4 bytes. or there could be more than 4 of them.
<lars_>
i tink you can use expressions like <= 0x1024, but gdb will fallback to singlestepping when you use it
<wpwrak>
lekernel: even the first 4 kB would be great. or special-case a watchpoint at address 0 to cover a larger area. whatever.
<lars_>
but, like lekernel generating a bus error would probably be easier
<wpwrak>
single-stepping is very un-fun :)
<wpwrak>
of course, it means i don't have to hurry to get my MIDI/mouse hang. but getting to the overflow, with even the most moderate amount of single-stepping (conditional breakpoint), took something like half an hour ...
<wpwrak>
mwalle: btw, any ideas about those stack traces ? :)
<mwalle>
wpwrak: mom just had dinner ;)
<mwalle>
(gdb) p *( struct yaffs_dev *) 0x408d85c0
<mwalle>
No struct type named yaffs_dev.
<mwalle>
with the flickernoise binary
<mwalle>
arent there the structures embedded into the binary with debug symbols?
<wpwrak>
hmm. checking ...
<wpwrak>
hah, indeed
<mwalle>
lars_: the fork bug is fixed?
<wpwrak>
mwalle: seems that yaffs was built without debug symbols :-(
<wpwrak>
let me find something else ...
<mwalle>
ok :)
<lars_>
mwalle: at least for me ;)
<wpwrak>
struct compiler_sc shuold do nicely. 87036 bytes :)
<mwalle>
yep
<wpwrak>
oh. stupid me :) reversed the assertion. freud must have been whispering BUG_ON() ;-)
<GitHub179>
[rtems-yaffs2/master] Build with debug symbols - Sebastien Bourdeauducq
<wpwrak>
ah, thanks
<wpwrak>
phew. two more stack frames.
<wpwrak>
let's switch to code review. this has worked in the past ...
<mwalle>
naa timing not met
<mwalle>
lekernel: is there any other milkymist branch for flickernoise stable?
<mwalle>
other than master, which has the new uart
<lekernel>
you mean soc stable?
<lekernel>
not yet, but i'll probably create one soonish with the bus error on null pointers
<wpwrak>
we don't use _CORE_message_queue_Flush_support (or any of its gazillion aliases) in any way, do we ?
<wpwrak>
or _CORE_message_queue_Broadcast
<lekernel>
no, I don't think so
<lekernel>
at least FN and the MM driver code do not use flush/broadcast
<wpwrak>
good. i suspect they may all have races. but i feel way too lazy to fix these, too :)
<lekernel>
haha
<wpwrak>
and they may actually be hard to fix. because the code assumes you can do those things atomically. of course, you CAN but then, you need to do a few other things with interrupts off. such as copying messages.
<wpwrak>
well, all this in the absence of anything ensuring exclusive access
<wpwrak>
let's see if there's anything ...
<mwalle>
mh i guess gdb is completely broken on the master branch :(
<wpwrak>
nice :)
<wpwrak>
hmm, are you even supposed to use things like rtems_message_queue_send from an interrupt ?
<wpwrak>
ah, and you actually do use rtems_message_queue_flush ;-)
<wpwrak>
but it's in midi_open. should be harmless
<mwalle>
wpwrak: mh instead of submitting a patch i filed a bug ;)
<wpwrak>
or, rather, if there's trouble with it, it will affect the system for its entire lifetime
<mwalle>
maybe i find more time to look into this this weekend or next week