fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has joined #systemtap
sapatel_ has joined #systemtap
sapatel has quit [Ping timeout: 245 seconds]
irker133 has quit [Quit: transmission timeout]
agentzh has quit [Remote host closed the connection]
hpt has quit [Ping timeout: 244 seconds]
orivej has quit [Ping timeout: 245 seconds]
hpt has joined #systemtap
mjw has joined #systemtap
orivej has joined #systemtap
hpt has quit [Ping timeout: 246 seconds]
khaled has joined #systemtap
sscox has quit [Ping timeout: 276 seconds]
orivej_ has joined #systemtap
orivej has quit [Ping timeout: 244 seconds]
orivej has joined #systemtap
orivej__ has joined #systemtap
orivej_ has quit [Ping timeout: 246 seconds]
orivej has quit [Ping timeout: 244 seconds]
orivej__ has quit [Ping timeout: 258 seconds]
orivej has joined #systemtap
khaled has quit [Ping timeout: 264 seconds]
wcohen has quit [Ping timeout: 245 seconds]
hpt has joined #systemtap
khaled has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
orivej has joined #systemtap
hpt has quit [Quit: Lost terminal]
hpt has joined #systemtap
wcohen has joined #systemtap
sscox has joined #systemtap
hpt has quit [Ping timeout: 264 seconds]
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 264 seconds]
gromero has quit [Quit: Leaving]
sapatel__ has joined #systemtap
sapatel_ has quit [Ping timeout: 246 seconds]
RP has joined #systemtap
<RP>
I'm trying to get the 5.2 kernel working with systemtap in the yocto project. We're seeing errors on 32 bit mips and arm.
<RP>
The line runtime/linux/access_process_vm.h:#if defined(STAPCONF_LINUX_SCHED_HEADERS) looks odd as I can pass that issue if I remove the ifdef.
<RP>
it suggests something about the definitions aren't right. Sadly I know little about systemtap other than the tests that worked no longer work
<fche>
RP, yeah these kinds of things happen as kernels / architectures change
<fche>
so we have a mechanism in the translator (using buildrun.cxx and runtime/linux/autoconf-* files) to adapt to slight changes in APIs
<fche>
maybe the autoconf* test tied to that STAPCONF* macro could be tweaked
<fche>
or add a new test that is finely tuned to this particular case
<RP>
fche: any tips on how I'd debug that?
<fche>
it's trial-and-error really -- pick a particular declaration-error you'd try to fix
<fche>
see what minimal changes to the runtime or such code would be required to make it work again
<RP>
fche: I guess I'm struggling to get something simpler to debug than "stap --disable-cache -DSTP_NO_VERREL_CHECK /tmp/hello.stp" (where hello.stp is a really simple hello world test)
<fche>
add a declaration
<fche>
stap -p4 /tmp/hello.stp goes through to the build stage
<fche>
if a simpler command line is all you're after
<RP>
fche: I'm debugging under emulation for mips so anything to speed this up helps a lot
<RP>
fche: thanks!
<fche>
yeah unfortunately the build autoconf process is pretty intensive (the first time; results are cached)
<RP>
fche: is there a way to make it a bit more verbose?
<fche>
ah yes
<fche>
the usual way :)
<fche>
stap -v or in this case specifically stap --vp 0004 ish to bump up the pass-4 verbosity only
<fche>
--vp 0004 == -v -v -v -v for pass 4 only
<RP>
fche: thanks, that last bit is the kind of thing its hard to pick up easily hence the question! :)
<fche>
aha. well, [man error::pass4] gives other relevant hints
<RP>
fche: on a limited embedded target that isn't so easy as it sounds
<fche>
the man page? you can run that anywhere else too
<RP>
fche: fair enough
<fche>
"fair enough" is my middle name
<RP>
fche: its a long story but this is basically blocking our kernel upgrades in Yocto which in turn blocks feature freeze for release and causes a load of other problems
<fche>
ouch
<fche>
stap in the critical path for something else is ... tricky
<RP>
fche: just giving context for why I'm trying to avoid going too deeply into learning all about stap! :)
<fche>
haha yea
<RP>
(I would actually love to but there is a time/place)
<fche>
32-bit mips & arm ... those are pretty far from our mainstream platforms here unfortunately
<RP>
fche: which could be why they're not working? I assume they're not in any kind of testing setup?
<fche>
yeah, not that I know of
<RP>
(I'm trying to figure out if this is something we did or a real just doesn't work scenario)
<fche>
we'd be glad to take patches that fix this stuff
<RP>
I'd be happy to send them if I can figure them out
<fche>
e.g. if the code also breaks 32-bit i686, it becomes much easier to test / fix here
<fche>
are you using git master systemtap btw?
<RP>
fche: we test mips, arm, x86 and powerpc in 32 and 64. Its only the two I mention that break
<wcohen>
fche, the 32-bit i686 is working. I have a guest vm running that and haven't seen problems with it.
<fche>
consider configuring your builds with --enable-dejazilla so they can upload their test result runs to our public server
<fche>
(that's for "make check" or "make installcheck" results)
<RP>
fche: we don't actually run that as we're cross compiling so this test is an emulated target test of a simple stap command
<fche>
hm, a cross stap ...
<RP>
we have been adding support for make check where it has cross support, we've not looked at stap yet though
<fche>
be sure to run it with proper -a FOO type parameters
<fche>
stap cross testing is to some extent possible with stap --remote ... but we don't have a lot of suchly configured tests
<RP>
Its on our nice to have list but probably a way off yet. Patches very welcome though if anyone is interested!
<fche>
but yeah if you are running a cross-architecture/version/whatever stap, you'll need stap -a ARCH -r /path/to/kernel/cross/build kinds of flags
<RP>
the failing case is simpler as its on target under emulation
<RP>
fche: got a handle on what is happening now, looks like a header search path problem
<RP>
./arch/mips/include/asm/addrspace.h:13:10: fatal error: spaces.h: No such file or directory
<RP>
that file is in ./arch/mips/include/asm/mach-XXX
<fche>
adding extra -I paths to the kbuild is another buildrun.cxx job
amerey has quit [Remote host closed the connection]
<fche>
can your local toolchain compile routine out-of-tree modules for your target already?
<RP>
fche: yes, we have tests for that
amerey has joined #systemtap
<RP>
seems to be missing an -I./arch/mips/include/asm/mach-generic
<RP>
if I add that, the autoconf-linux-sched_headers.c define works
khaled_ has quit [Remote host closed the connection]
<RP>
fche: confirmed that even if I dump autoconf-linux-sched_headers.c into a module, it still builds so stap isn't picking up the right module compilation flags somehow
khaled has joined #systemtap
khaled has quit [Remote host closed the connection]
<fche>
some of those flags should come from the arch-specific kernel-side makefiles
<fche>
are you sure you're running stap in a cross-compiling mode, as above (-a / -r ) ?
<fche>
similarly to how you'd cross-compile an out-of-tree kernel module ?
<RP>
fche: I'm running it on target so that isn't an issue?
<fche>
ok then.
<RP>
fche: ah, it finds the right flags for building the stap module, just not for running the tests
khaled has joined #systemtap
<fche>
the autoconf tests?
<RP>
fche: yes. Somehow the flag in KBUILD_CFLAGS is getting lost
<RP>
fche: I'm wondering what _KBUILD_CFLAGS := $(call flags,KBUILD_CFLAGS) does with regard to that
<fche>
stap --vp 0004 -k .... should let you see exactly how the subsidiary gcc's are run, and -k should let you inspect the makefiles/etc.
<RP>
fche: _KBUILD_CFLAGS is empty
<RP>
KBUILD_CFLAGS has the flags we need in it
<fche>
so I'd play with the Makefile that stap -k leaves behind
<fche>
see how the KBUILD_CFLAGS can be propagated properly
<RP>
fche: I think I need to understand what that "call flags" filter was trying to do
<RP>
fche: I wonder if the kernel removed that filter?
<fche>
perhaps
<fche>
this indirection was brought in 2008 stap commit e5976ba0af9b828dcc76b3937b5a98fe9c0f6cb8
khaled has quit [Remote host closed the connection]
<fche>
RP, been looking through kernel code history, and don't see where the 'flags' make macro was or isn't :)
<RP>
fche: right, I'm also struggling to locate it
<RP>
fche: I'm trying a build with raw KBUILD_CFLAGS, see what it does. Its slowly working its way through the autoconf tests...
<fche>
if the _KBUILD_CFLAGS ... call flags, bit is suspect, take that out of the generated makefile - adjust
<fche>
if that works, then let's fix buildrun.cxx to do same (patch welcome if you'd like your name in the super amazing AUTHORS list)
khaled has joined #systemtap
<RP>
fche: I will see how this test goes. If that works I can run it on our wider infrastructure (will be an overnight job) and then if that works out I can send a patch :)
<RP>
fche: I'm fairly confident this should at least fix the failures we were seeing though
<fche>
yeah but it should've broken long ago and elsewhere
<RP>
fche: right. I also feel we're missing something
<fche>
would like to credit you as a Reported-By: what's your name / email (if you like )?
irker903 has joined #systemtap
<irker903>
systemtap: fche systemtap.git:refs/heads/master * release-4.1-80-g7cfac6c / buildrun.cxx: buildrun: adapt to loss of "flags" filter in linux scripts/Kbuild.include http://tinyurl.com/y28eay7r
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
<RP>
fche: Sounds good, Richard Purdie <richard.purdie@linuxfoundation.org>
<fche>
ar too late,
<RP>
fche: never mind :)
<fche>
next time :)
<fche>
anyway see if taht fix helps
<fche>
we may be able to something different / sneakier if needed
<RP>
fche: I will set a test of that away...
mjw has quit [Quit: Leaving]
<fche>
righto
<RP>
fche: thanks for the help! I've set a test running, will be around 30 mins to see if it worked
<fche>
30 mins at one 32 bit mips is 57,600,000,000 bits
sapatel_ has joined #systemtap
sapatel__ has quit [Ping timeout: 245 seconds]
orivej has quit [Ping timeout: 245 seconds]
<RP>
fche: worked :)
<RP>
thanks again
<fche>
very good
<fche>
thanks for the report, and for egging me on to find out the real problem :-)
<RP>
fche: no problem, glad I could help figure it out :)
<fche>
yup, thanks!
<fche>
wouldn't be surprised if we end up having to revisit it, and maybe put in a fake version of the 'flags' filter into our own makefiles
<fche>
but so far the new code is surviving fine on our platforms' CI bots too, so probably good enough for now.
<RP>
fche: I wondered about that. The kernel implies its no longer needed so time will tell I guess
<RP>
fche: do you have any idea on the timescale of the next release?
<fche>
just entre-nous, expecting early november
<RP>
fche: ok, after us. We'll run with a git version then, thanks :)
<fche>
sure!
<fche>
when's your deadline?
<RP>
fche: we release mid October but we're entering a freeze once we get things working
<fche>
aha
sscox has quit [Ping timeout: 244 seconds]
<RP>
fche: FWIW there looks to be one 32 bit arm issue left but the failure is much more concise and looks like a headers issue with 5.2