fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
sscox has quit [Ping timeout: 250 seconds]
orivej has quit [Ping timeout: 252 seconds]
sscox has joined #systemtap
hpt has joined #systemtap
orivej has joined #systemtap
hpt has quit [Ping timeout: 265 seconds]
tonyj has quit [Remote host closed the connection]
dmalcolm__ has joined #systemtap
dmalcolm_ has quit [Ping timeout: 245 seconds]
dmalcolm__ has quit [Ping timeout: 240 seconds]
mjw has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
gromero has joined #systemtap
wcohen has quit [Ping timeout: 240 seconds]
tromey has joined #systemtap
hpt has joined #systemtap
wcohen has joined #systemtap
hpt has quit [Ping timeout: 250 seconds]
orivej has quit [Ping timeout: 245 seconds]
khaled has joined #systemtap
tonyj has joined #systemtap
simon__ has joined #systemtap
simon__ has quit [Client Quit]
simon__ has joined #systemtap
sscox has quit [Ping timeout: 240 seconds]
sscox has joined #systemtap
orivej has joined #systemtap
<simon__> hello! I'm trying my very first systemtap on some own code... and it's almost working! can anybody kindly help me debug it?
<simon__> I'm following the instructions here: https://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps
<simon__> here's my code:
<simon__> seems to compile and run as normal, and I can even list the instrumentation points :-)
<simon__> however... when I try this command it fails: sudo stap foo_tap_all.stp -c ./foo
<simon__> cat foo_tap_all.stp
<simon__> probe foo* { printf("%s\n", probestr); }
<fche> hi
<simon__> it says: semantic error: while resolving probe point: identifier 'foo*' at foo_tap_all.stp:1:7
<simon__> and: semantic error: probe point mismatch: didn't find any wildcard matches (similar: nfs, nfsd, vfs, vm, _nfs): identifier 'foo*' at :1:7
<simon__> hi!
<simon__> what am I missing here? why doesn't it work?
<fche> hm, probe foo* - does that expand to something via another stap script here?
<fche> probe process("foo").mark("*") is how you'd refere to them
<fche> just how the -L operation lists them
<simon__> thanks... I was just following the syntax on the wiki page posted above... but your syntax makes more sense :-)
<fche> ah let's fix the wiki page :)
<simon__> :-)
<fche> ah I see what's goign on there
<fche> see the 'adding a tapset' subsection
<simon__> If I change foo_tap_all.stp to: probe process("foo").mark("*") { printf("%s\n", probestr); }
<fche> you skipped that part
<simon__> ahhh...
<fche> it's not mandatory by any means, but if your script is going to refer to those symbols, you need it
<simon__> ahhh... so I need a tapset *and* the .stp file...
<fche> or have your .stp file use process().mark() directly.
<simon__> hmmm... the example uses "probestr" which I guess is in the tapset...
<fche> would have been
<fche> if you put it there
<fche> search the wiki page for "probestr = sprintf ("....')
<simon__> It says "The tapsets are typically placed in /usr/share/systemtap/tapsets" ... is there also a way to have the tapsets locally?
<fche> yes, you can put some in any directory you like, and name it with stap -I/path/../
<simon__> Yep, 'probestr' is in the tapset and referenced by the .stl file in the wiki page example...
<simon__> thanks! I'll try that...
<fche> if you just want one-off scripts, don't bother with tapset files
<fche> you could put those probe aliases right into your final .stp file, if it provides value
<simon__> hmmm.. okay
<fche> btw, what was the attraction of the dtrace style trace macros for your purposes? just checking why debuginfo-based instrumentation is not sufficient
<fche> for your case
<simon__> So I now have a 2 line foo_tap_all.stp file which it seems to like:
<simon__> line 1: probe foo_bar_enter = process("foo").mark("bar_enter") { a = $arg1; b = $arg2; probestr = sprintf("%s(a=%d, b=%d)", $$name, a, b); }
<simon__> line 2: probe foo* { printf("%s\n", probestr); }
<simon__> however, when running this command there are other errors: probe foo* { printf("%s\n", probestr); }
<fche> what are the errors and the command invocation?
<simon__> error line 1: /usr/share/systemtap/runtime/linux/access_process_vm.h: In function ‘__access_process_vm_’:
<simon__> error line 2: /usr/share/systemtap/runtime/linux/access_process_vm.h:35:29: error: passing argument 1 of ‘get_user_pages’ makes integer from pointer without a cast [-Werror=int-conversion]
<fche> ok, pass-4 errors are described in [man error::pass4]
<simon__> error line 3: ret = get_user_pages (tsk, mm, addr, 1, write, 1, &page, &vma);
<fche> usually means that your copy of systemtap is much older than your kernel
<fche> a recent stap version should work with any older kernel
<simon__> stap --version says: Systemtap translator/driver (version 2.9/0.165, Debian version 2.9-2ubuntu2 (xenial))
<simon__> so getting a newer version of stap should fix that error?
<fche> yes.
<fche> 2.9 is 2015-10-08, so four years old
<simon__> kk thanks! I'll see if I can do that...
<fche> if your kernel is much newer than that, you need
<simon__> yeah, I'm running Ubuntu 16.04 LTS ...
<fche> sapatel, hey btw, were you going to update https://sourceware.org/systemtap/wiki/ etc. with the new version numbers?
<sapatel> fche, sure but, when I ran the script to update the HTML docs, it was producing an error
<fche> yup, this is a separate thing
<fche> this is manual
<sapatel> ohh ok
<sapatel> fche, alright I'll update them
<fche> hm you'd need to create a wiki account first
<simon__> "at was the attraction of the dtrace style trace macros for your purposes?" I would like to instrument a larger C/C++ code base with about 1,000 source files to trace function calls at run-time...
<fche> simon__, you don't need the dtrace probe macros for that - you certainly wouldn't want thousands of them
<fche> sapatel, use wiki account name SagarPatel
<sapatel> gotcha
<simon__> I could auto instrument with or without dtrace style trace macros, but it seems to me that if I used dtrace then I could potentially include the instrumentation in the release build and then be able to pick and choose which functions to trace?
<fche> you can do that regardless
<fche> release build - if that build lacks all debuginfo, then yeah dtrace style markers are your remaining choice
<simon__> thanks! so I tried para-callgraph.stl on ./foo but got this error in addition to the version based errors mentioned above...
<fche> the version stuff has to be dealt with independently, no matter what
<simon__> command: sudo stap para-callgraph.stp 'process("./foo").function("*")' 'process("./foo").function("main")' -c ./foo
<fche> that old stap can't work with this new kernel
<simon__> error line 1: WARNING: function _start return probe is blacklisted: keyword at para-callgraph.stp:24:1
<simon__> error line 2: source: probe $1.return { trace(-1, $$return) }
<simon__> I'm looking into how to upgrade systemtap... do you think this latest error is also to do with the systemtap version?
<fche> I don't see an error in what you quoted, just a warning
<simon__> ahhhh... good point... :-)
<simon__> hmmm... systemtap seems so specialized that it's difficult to google for instructions on how to get a newer version for
<simon__> Ubuntu... :-(
<fche> you can build it yourself, or file a bug with the ubuntu maintainers to update their version
<fche> is 16.04 lts still in supported category?
<simon__> yep... LTS stands for Long Term Support: https://wiki.ubuntu.com/Releases
<simon__> until 2021 :-)
<fche> ok, so yeah, I'd file a bug with them about getting a newer version built
<fche> some of the ubuntu releases keep track of upstream stap quite well
<fche> dunno why that particular one would be 4 years out of date, but alas
<fche> anyway - building it for yourself is not that bad
<simon__> do you happen to have a link for instructions for that?
<simon__> thanks! I'll have a go :-)
<fche> yes
<simon__> kk thanks!
CME_ has joined #systemtap
CME has quit [Remote host closed the connection]
ema_ has joined #systemtap
CME_ is now known as CME
mjw has quit [Ping timeout: 240 seconds]
ema has quit [Ping timeout: 240 seconds]
mjw has joined #systemtap
mjw has quit [Quit: Leaving]
mjw has joined #systemtap
mjw has quit [Quit: Leaving]
khaled has quit [Quit: Konversation terminated!]
<simon__> fche, I managed to ./configure ... :-)
<simon__> Now make all is progressing... :-) fingers crossed...
<simon__> hmmmm... error: "Making all in python: ImportError: No module named setuptools"
<simon__> seemed to fixed with the command: sudo apt-get install python-setuptools
<simon__> hmmm... now it's saying that "make all" has finished :-)
<simon__> is there any way to test it without installing it? :-)
<simon__> hmmm... after I built up some courage I tried: sudo make install
<simon__> but it failed :-(
<simon__> error seems to be: cp: cannot stat '/home/simon/systemtap-4.2/doc/SystemTap_Tapset_Reference/tapsets.pdf': No such file or directory
<simon__> indeed... that PDF file does not exist :-(
<simon__> this command finds 4 other PDFs... but not that one... any ideas? find . -type f | egrep -i pdf
<fche> hmm
<fche> well a 'make -k install' or even -i should be okay, just skip the docs
<fche> you don't have to use sudo make install btw - configure with a personal directory as --prefix= and then you just need sudo $prefix/bin/stap but nothing else
<simon__> kk thanks! trying... :-)
mjw has joined #systemtap
<simon__> I'm trying: $ ./configure --prefix=/home/simon/systemtap-4.2-19055 && make all
<simon__> But ./configure also said: Running systemtap uninstalled, entirely out of the build tree, configure: is not supported.
<simon__> so the prefix thing is a local install and not the build tree?
<fche> yeah, a --prefix makes it work easier (fewer environment variable complications)
<fche> build tree is the place you run configure/make from
<fche> the prefix is the directory under which 'make install' will copy the results, and from where you can run most easily
<simon__> kk
<simon__> $ ~/systemtap-4.2-19055/bin/stap --version
<simon__> Systemtap translator/driver (version 4.2/0.165, non-git sources)
<simon__> do I need to set up any env vars etc or can I just run it like that?
<fche> just run it like that
<simon__> also, is there anything like make test?
<fche> make check
<fche> sudo make installcheck
tromey has quit [Quit: ERC (IRC client for Emacs 26.1)]
wcohen has quit [Ping timeout: 250 seconds]
<simon__> Makefile:513: recipe for target 'installcheck' failed
<simon__> There's no obvious error message etc on the terminal :-(
<simon__> this is lower down: ./execrc: 1: eval: runtest: not found
<simon__> anyway regardless, now this command from earlier works except for the warning: sudo ~/systemtap-4.2-19055/bin/stap para-callgraph.stp 'process("./foo").function("*")' 'process("./foo").function("main")' -c ./foo
<fche> the testsuite relies on a package called dejagnu
<simon__> hmmm... except there is no final line for: TRACE(FOO_MAIN_LEAVE());
<fche> not sure why that'd be but note that this para-callgraph script doesn't use the dtrace-flavoured macros at all
mjw has quit [Quit: Leaving]
<simon__> sudo make installcheck appears to be running now :-)
<fche> it can take several hours to complete
<fche> so don't sit there waiting for it
<fche> -and- depending on machine (kernel etc.), several thousand PASSes and several hundred FAILs are about normal
<simon__> hehe now I know why it's not advertised in the build instructions :-)
<fche> ah it's there
<simon__> another thing I'm confused about: ./foo outputs its "- r=55" at the end... but the callgraph is output even after all that, i.e. later... why?
<fche> To run the full test suite from the build tree, install dejagnu,
<fche> then run with root privileges:
<fche> # make installcheck
<simon__> kk :-)
<simon__> how would I run the para-callgraph example on ./foo but ask it to only output for main() and not bar() ?
<fche> if you have only two functions, but want to trace only one of them, it's not a callgraph any more :)
<simon__> i.e. all functions but exclude bar()
<fche> couple of ways
<simon__> haha :-)
<fche> there is a {foo,bar} syntax supported in function name strings
<fche> or
<fche> trace them all with a broad wildcard, but skip ones you don't want via a runtime test ( if(ppfunc() =~ "main") next; ...
<simon__> another thing I'm confused about: ./foo outputs its "- r=55" at the end... but the callgraph is output even after all that, i.e. later... why?
<fche> the object code can have more bits in it after the printf
<simon__> how do you mean?
<fche> after the printf runs within foo proper
<fche> stap can still continue running monitoring the rest of foo; plus stap's own i/o buffering can delay its reports from earlier in time
<simon__> ahhhh... makes sense... is there a way to make stap have line buffered output?
<fche> it's not a granularity issue but a timing issue really -- it's best not to intermingle the output streams
<fche> you wouldn't want dogs & cats living together etc.
<simon__> hehe kk
<simon__> when I run time ./foo on its own then it takes "real 0m0.002s" but with stap time says "real 0m0.935s" ... why the huge overhead and what is that spent, on?
<fche> stap's own processing time - especially the first time you run a script - it needs to do a bunch of work to analyze your program, translate the script to C and compile THAT etc etc
<fche> there's a lot going on behind the scenes
<simon__> is that before foo starts?
<fche> if you're running stap -e 'probe ...' -c foo then mostly yes
<simon__> I've noticed that the first time I run it with a new config -- i.e. .function("*") changed to .function("bar") -- then it can take even longer, e.g. 3.2 seconds... but running it again causes it to revert to the 1 second again... is something cached in the background or why even longer on the first try?
<fche> yes, cached
<fche> add a stap -v to ask it to report a little about that
<simon__> thanks... now I see 5 passes :-)
wcohen has joined #systemtap
<simon__> I also now got the foo_tap_all.stp file to work as expected :-)
<simon__> it seems similar results and execution time to the para-callgraph.stp example, except you need to insert all the macros...
<simon__> I have seen it take up to 10 seconds on a non-cached run the first time on this little foo.c example...
<simon__> If I had an enormous 1,000 object file executable, would it take 1,000 * 10 seconds on the first run?
simon__ has quit [Ping timeout: 245 seconds]