fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
sscox has quit [Ping timeout: 250 seconds]
orivej has quit [Ping timeout: 252 seconds]
sscox has joined #systemtap
hpt has joined #systemtap
orivej has joined #systemtap
hpt has quit [Ping timeout: 265 seconds]
tonyj has quit [Remote host closed the connection]
dmalcolm__ has joined #systemtap
dmalcolm_ has quit [Ping timeout: 245 seconds]
dmalcolm__ has quit [Ping timeout: 240 seconds]
mjw has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
gromero has joined #systemtap
wcohen has quit [Ping timeout: 240 seconds]
tromey has joined #systemtap
hpt has joined #systemtap
wcohen has joined #systemtap
hpt has quit [Ping timeout: 250 seconds]
orivej has quit [Ping timeout: 245 seconds]
khaled has joined #systemtap
tonyj has joined #systemtap
simon__ has joined #systemtap
simon__ has quit [Client Quit]
simon__ has joined #systemtap
sscox has quit [Ping timeout: 240 seconds]
sscox has joined #systemtap
orivej has joined #systemtap
<simon__>
hello! I'm trying my very first systemtap on some own code... and it's almost working! can anybody kindly help me debug it?
<simon__>
it says: semantic error: while resolving probe point: identifier 'foo*' at foo_tap_all.stp:1:7
<simon__>
and: semantic error: probe point mismatch: didn't find any wildcard matches (similar: nfs, nfsd, vfs, vm, _nfs): identifier 'foo*' at :1:7
<simon__>
hi!
<simon__>
what am I missing here? why doesn't it work?
<fche>
hm, probe foo* - does that expand to something via another stap script here?
<fche>
probe process("foo").mark("*") is how you'd refere to them
<fche>
just how the -L operation lists them
<simon__>
thanks... I was just following the syntax on the wiki page posted above... but your syntax makes more sense :-)
<fche>
ah let's fix the wiki page :)
<simon__>
:-)
<fche>
ah I see what's goign on there
<fche>
see the 'adding a tapset' subsection
<simon__>
If I change foo_tap_all.stp to: probe process("foo").mark("*") { printf("%s\n", probestr); }
<fche>
you skipped that part
<simon__>
ahhh...
<fche>
it's not mandatory by any means, but if your script is going to refer to those symbols, you need it
<simon__>
ahhh... so I need a tapset *and* the .stp file...
<fche>
or have your .stp file use process().mark() directly.
<simon__>
hmmm... the example uses "probestr" which I guess is in the tapset...
<fche>
would have been
<fche>
if you put it there
<fche>
search the wiki page for "probestr = sprintf ("....')
<simon__>
It says "The tapsets are typically placed in /usr/share/systemtap/tapsets" ... is there also a way to have the tapsets locally?
<fche>
yes, you can put some in any directory you like, and name it with stap -I/path/../
<simon__>
Yep, 'probestr' is in the tapset and referenced by the .stl file in the wiki page example...
<simon__>
thanks! I'll try that...
<fche>
if you just want one-off scripts, don't bother with tapset files
<fche>
you could put those probe aliases right into your final .stp file, if it provides value
<simon__>
hmmm.. okay
<fche>
btw, what was the attraction of the dtrace style trace macros for your purposes? just checking why debuginfo-based instrumentation is not sufficient
<fche>
for your case
<simon__>
So I now have a 2 line foo_tap_all.stp file which it seems to like:
<simon__>
line 1: probe foo_bar_enter = process("foo").mark("bar_enter") { a = $arg1; b = $arg2; probestr = sprintf("%s(a=%d, b=%d)", $$name, a, b); }
<simon__>
line 2: probe foo* { printf("%s\n", probestr); }
<simon__>
however, when running this command there are other errors: probe foo* { printf("%s\n", probestr); }
<fche>
what are the errors and the command invocation?
<simon__>
error line 1: /usr/share/systemtap/runtime/linux/access_process_vm.h: In function ‘__access_process_vm_’:
<simon__>
error line 2: /usr/share/systemtap/runtime/linux/access_process_vm.h:35:29: error: passing argument 1 of ‘get_user_pages’ makes integer from pointer without a cast [-Werror=int-conversion]
<fche>
ok, pass-4 errors are described in [man error::pass4]
<simon__>
error line 3: ret = get_user_pages (tsk, mm, addr, 1, write, 1, &page, &vma);
<fche>
usually means that your copy of systemtap is much older than your kernel
<fche>
a recent stap version should work with any older kernel
<sapatel>
fche, sure but, when I ran the script to update the HTML docs, it was producing an error
<fche>
yup, this is a separate thing
<fche>
this is manual
<sapatel>
ohh ok
<sapatel>
fche, alright I'll update them
<fche>
hm you'd need to create a wiki account first
<simon__>
"at was the attraction of the dtrace style trace macros for your purposes?" I would like to instrument a larger C/C++ code base with about 1,000 source files to trace function calls at run-time...
<fche>
simon__, you don't need the dtrace probe macros for that - you certainly wouldn't want thousands of them
<simon__>
I could auto instrument with or without dtrace style trace macros, but it seems to me that if I used dtrace then I could potentially include the instrumentation in the release build and then be able to pick and choose which functions to trace?
<fche>
you can do that regardless
<fche>
release build - if that build lacks all debuginfo, then yeah dtrace style markers are your remaining choice
<simon__>
thanks! so I tried para-callgraph.stl on ./foo but got this error in addition to the version based errors mentioned above...
<fche>
the version stuff has to be dealt with independently, no matter what
<simon__>
Now make all is progressing... :-) fingers crossed...
<simon__>
hmmmm... error: "Making all in python: ImportError: No module named setuptools"
<simon__>
seemed to fixed with the command: sudo apt-get install python-setuptools
<simon__>
hmmm... now it's saying that "make all" has finished :-)
<simon__>
is there any way to test it without installing it? :-)
<simon__>
hmmm... after I built up some courage I tried: sudo make install
<simon__>
but it failed :-(
<simon__>
error seems to be: cp: cannot stat '/home/simon/systemtap-4.2/doc/SystemTap_Tapset_Reference/tapsets.pdf': No such file or directory
<simon__>
indeed... that PDF file does not exist :-(
<simon__>
this command finds 4 other PDFs... but not that one... any ideas? find . -type f | egrep -i pdf
<fche>
hmm
<fche>
well a 'make -k install' or even -i should be okay, just skip the docs
<fche>
you don't have to use sudo make install btw - configure with a personal directory as --prefix= and then you just need sudo $prefix/bin/stap but nothing else
<simon__>
kk thanks! trying... :-)
mjw has joined #systemtap
<simon__>
I'm trying: $ ./configure --prefix=/home/simon/systemtap-4.2-19055 && make all
<simon__>
But ./configure also said: Running systemtap uninstalled, entirely out of the build tree, configure: is not supported.
<simon__>
so the prefix thing is a local install and not the build tree?
<fche>
yeah, a --prefix makes it work easier (fewer environment variable complications)
<fche>
build tree is the place you run configure/make from
<fche>
the prefix is the directory under which 'make install' will copy the results, and from where you can run most easily
<simon__>
do I need to set up any env vars etc or can I just run it like that?
<fche>
just run it like that
<simon__>
also, is there anything like make test?
<fche>
make check
<fche>
sudo make installcheck
tromey has quit [Quit: ERC (IRC client for Emacs 26.1)]
wcohen has quit [Ping timeout: 250 seconds]
<simon__>
Makefile:513: recipe for target 'installcheck' failed
<simon__>
There's no obvious error message etc on the terminal :-(
<simon__>
this is lower down: ./execrc: 1: eval: runtest: not found
<simon__>
anyway regardless, now this command from earlier works except for the warning: sudo ~/systemtap-4.2-19055/bin/stap para-callgraph.stp 'process("./foo").function("*")' 'process("./foo").function("main")' -c ./foo
<fche>
the testsuite relies on a package called dejagnu
<simon__>
hmmm... except there is no final line for: TRACE(FOO_MAIN_LEAVE());
<fche>
not sure why that'd be but note that this para-callgraph script doesn't use the dtrace-flavoured macros at all
mjw has quit [Quit: Leaving]
<simon__>
sudo make installcheck appears to be running now :-)
<fche>
it can take several hours to complete
<fche>
so don't sit there waiting for it
<fche>
-and- depending on machine (kernel etc.), several thousand PASSes and several hundred FAILs are about normal
<simon__>
hehe now I know why it's not advertised in the build instructions :-)
<fche>
ah it's there
<simon__>
another thing I'm confused about: ./foo outputs its "- r=55" at the end... but the callgraph is output even after all that, i.e. later... why?
<fche>
To run the full test suite from the build tree, install dejagnu,
<fche>
then run with root privileges:
<fche>
# make installcheck
<simon__>
kk :-)
<simon__>
how would I run the para-callgraph example on ./foo but ask it to only output for main() and not bar() ?
<fche>
if you have only two functions, but want to trace only one of them, it's not a callgraph any more :)
<simon__>
i.e. all functions but exclude bar()
<fche>
couple of ways
<simon__>
haha :-)
<fche>
there is a {foo,bar} syntax supported in function name strings
<fche>
or
<fche>
trace them all with a broad wildcard, but skip ones you don't want via a runtime test ( if(ppfunc() =~ "main") next; ...
<simon__>
another thing I'm confused about: ./foo outputs its "- r=55" at the end... but the callgraph is output even after all that, i.e. later... why?
<fche>
the object code can have more bits in it after the printf
<simon__>
how do you mean?
<fche>
after the printf runs within foo proper
<fche>
stap can still continue running monitoring the rest of foo; plus stap's own i/o buffering can delay its reports from earlier in time
<simon__>
ahhhh... makes sense... is there a way to make stap have line buffered output?
<fche>
it's not a granularity issue but a timing issue really -- it's best not to intermingle the output streams
<fche>
you wouldn't want dogs & cats living together etc.
<simon__>
hehe kk
<simon__>
when I run time ./foo on its own then it takes "real 0m0.002s" but with stap time says "real 0m0.935s" ... why the huge overhead and what is that spent, on?
<fche>
stap's own processing time - especially the first time you run a script - it needs to do a bunch of work to analyze your program, translate the script to C and compile THAT etc etc
<fche>
there's a lot going on behind the scenes
<simon__>
is that before foo starts?
<fche>
if you're running stap -e 'probe ...' -c foo then mostly yes
<simon__>
I've noticed that the first time I run it with a new config -- i.e. .function("*") changed to .function("bar") -- then it can take even longer, e.g. 3.2 seconds... but running it again causes it to revert to the 1 second again... is something cached in the background or why even longer on the first try?
<fche>
yes, cached
<fche>
add a stap -v to ask it to report a little about that
<simon__>
thanks... now I see 5 passes :-)
wcohen has joined #systemtap
<simon__>
I also now got the foo_tap_all.stp file to work as expected :-)
<simon__>
it seems similar results and execution time to the para-callgraph.stp example, except you need to insert all the macros...
<simon__>
I have seen it take up to 10 seconds on a non-cached run the first time on this little foo.c example...
<simon__>
If I had an enormous 1,000 object file executable, would it take 1,000 * 10 seconds on the first run?