fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
simon__ has joined #systemtap
<simon__>
fche I'm back! :-)
<simon__>
I tried out the callgraph on the big executable and it said:
<simon__>
Pass 2: analyzed script: 93757 probes, 79948 functions, 1 embed, 3 globals using 1307148virt/1277392res/7012shr/1272896data kb, in 21310usr/270sys/21594real ms.
<simon__>
Also, my laptop seemed to completely lock up too while those 36m were going on... :-(
<simon__>
Does this mean that neither the macro method nor the callgraph method is going to help me instrument and trace this C/C++ project with ~ 80k functions? Because even if it did not run out of memory, who'd want to wait 30+ minutes to start the executable?
hpt has joined #systemtap
simon__ has quit [Ping timeout: 246 seconds]
simon__ has joined #systemtap
<simon__>
hi fche!
<fche>
yo
<simon__>
hello again!
<simon__>
can you see my messages and question from 16:33 ? Or about 1.5 hours ago?
<fche>
hm, 90000 probes is a huge number, yea
<fche>
just got back
<simon__>
kk nice :-)
<fche>
(9pm here)
<fche>
shouldn't be 30min, unless perhaps the machine was really short of ram and was paging to death
<simon__>
kk :-)
<fche>
on a larger ram box, that should be manageable, I suspect, but it's larger than usual
<fche>
thousands, ok, hundred thousand ... wow!
<simon__>
laptop has 32GB of RAM...
<fche>
this could be one of those times where agentzh's separate-compilation-of-stap-modules could be necessary
<fche>
for a probe job of that magnitude
<fche>
could you try smaller, like say one tenth the size?
<simon__>
yep, that's what I was thinking too... so I'm just making a script which manufactures a source file with n functions inside it :-)
<simon__>
does the linux kernel do things differently, because surely it also has any many probe points, or more?
<fche>
(stap can probe numerically many more places than function calls, in principle)
<fche>
(in practice, memory etc. constraints kick in)
<fche>
my poor gcc has been running for five minutes, chugging a stap script for 'probe kernel.function("*") {}' ... shouldn't be that long
<fche>
but memory consumption is okay for now
<simon__>
I created a script which generates n functions which all call each other and finally the last one returns
<fche>
try using not para-callgraph.stp but a simpler one - that doesn't print $$vars type things (which are so inherently context-specific and result in a lot of extra code being generated, to pull out and pretty-print all those local variables)
<simon__>
however, para-callgraph.stp appears to fail if more than 63 functions deep are called... :-(
<simon__>
this is where function bar100() gets called: 1403 zoo(24173): ->bar100 i=0x63
<simon__>
but the very next line is: 1413 zoo(24173): <-bar63 return=0x64
<simon__>
return lines for bar64() through bar100() are mysteriously missing...
<simon__>
Do you know a way to increase the call depth? Or I need to write a simpler auto generate C program... :-)
<fche>
hm, I can't think of a reason that shouldn't work
<simon__>
I came up with a workaround script which generates n functions which are called one at a time instead...
<simon__>
possibly a bug in stap? is there a test to test recursion depth bigger than 63?
<fche>
stap doesn't experience recursion
<fche>
it would experience a sequence of calls at deeper and deeper nesting levels, but stap itself doesn't know or care
<fche>
again try a stap script that does not process the $$parms; it should drastically reduce resource requirements
<simon__>
do you have such a script handy?
<simon__>
so with 10k functions then this source file was generated: -rw-r--r-- 1 root root 69547889 Nov 21 18:32 stap_d59bb5f6406a3529e5d18985a6051fe1_10337070_src.c
<simon__>
it's taking a long time to compile...
<simon__>
looks like cc1 is using 2.5GB RAM so far...
<simon__>
so 1,000 functions took real 0m18.938s, but 10,000 functions took real 5m22.487s
<simon__>
so I'm wondering what's in that file because 69,547,889 / 10,000 is 6,954 bytes per function? Seems a bit on the steep side, or?
<fche>
if it involves grabbing a bunch of different context variables and pretty-printing, that could be about right
<fche>
if you grab the para-callgraph.stp script file, and replace $$parms with "" and $$return with "" then it won't mess with that
<simon__>
I found out where it caches the .c file... :-)
<fche>
sssh, tell no one, it's a secret
<simon__>
So I guess it's not / 10,000 but really / 20,000 because it's the enter and leave points...
<fche>
or you could run stap -k ...
<fche>
yup, things add up
<simon__>
so that's approx. 3,500 bytes per probe thingy...
<simon__>
and there's lots AND LOTS of duplicated lines, e.g.: 20,000 * #define STAP_RETURN(v) do { STAP_RETVALUE = (int64_t) (v); goto out; } while(0)
<simon__>
and that's just one of many examples...
<fche>
those are harmless tho
<simon__>
yy but the compile time starts to go through the roof...
<fche>
not because of those
<fche>
macros
<simon__>
how can I get the actual compile command line for the monster C file?
<fche>
stap --vp 02 ish .. stap -k will keep the tmp directory so you can run it for yourself later
<simon__>
zoo.c with 10k functions takes 3.1 seconds for gcc to compile... but it looks like the much bigger .c file takes over 5 minutes... admittedly it is much bigger...
<fche>
sorry stap --vp 0002 (pass 4 verbosity 2)
<simon__>
thanks!
* fche
must sign off shortly
<fche>
a comfy pillow beckons
<fche>
good luck dude and we can talk again tomorrow
<simon__>
thanks for your help! and greetings from Vancouver, Canada! :-)
<fche>
ah it's just late dinner time for you then
<fche>
say hi to the whales in the fraser
<simon__>
wow... you know your geography :-)
simon__ has quit [Ping timeout: 240 seconds]
khaled has joined #systemtap
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 240 seconds]
ema_ is now known as ema
hpt has quit [Ping timeout: 265 seconds]
mjw has joined #systemtap
sscox has quit [Ping timeout: 245 seconds]
khaled_ has quit [Read error: Connection reset by peer]
khaled has joined #systemtap
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 246 seconds]
wcohen has quit [Ping timeout: 276 seconds]
sscox has joined #systemtap
wcohen has joined #systemtap
simon__ has joined #systemtap
<simon__>
hi again fche!
tromey has joined #systemtap
<fche>
hey simon__ -morning
<simon__>
hello! yes, morning 9:17 AM for me... but 12:17 PM for you?
<simon__>
:-)
<fche>
so what's new today
<simon__>
fche, may I ask you some more questions about systemtap?
<simon__>
yesterday I discovered that for the 10,000 function example, systemtap spends over 5 minutes creating and compiling its .c file which gets compiled to a .ko kernel object file... where can I find more info about the architecture of systemtap and how the .ko file fits into the big picture?
<fche>
the docs include an introduction/architecture paper
<simon__>
"Original architecture paper (July 2005)." is this the best one?
<fche>
it's a good one to start
<fche>
t
<fche>
the concepts are the same
<simon__>
thanks!
<simon__>
another question: Yesterday we talked briefly about including and excluding functions in a runt-time call-tree trace. What about if I managed to instrument a large executable with ~ 80k functions and wanted to do something more complicated like: Have more control over the verbosity? Give some functions a higher or lower verbosity so that they are included or excluded in the trace depending upon the verbosity level? And also, let's say
<simon__>
I would like to run the executable with verbosity switched off, but switch it on when a particular function is first executed? What are ways that you would approach these types of challenges with systemtap?
<fche>
'verbosity' within a script is entirely under your control
<fche>
it's a programming language, eh?
<fche>
so you print when you want to - you can track nesting levels, function names, time of day, whatever you want
<fche>
and you decide when something should be printed