fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has joined #systemtap
drsmith has joined #systemtap
drsmith has left #systemtap [#systemtap]
hpt has quit [Ping timeout: 272 seconds]
hpt has joined #systemtap
hpt has quit [Quit: Lost terminal]
ego has joined #systemtap
srikar_away is now known as srikar
srikar is now known as srikar_away
hkshaw has quit [Quit: Leaving.]
hpt has joined #systemtap
darvon has quit [Ping timeout: 246 seconds]
darvon has joined #systemtap
hkshaw has joined #systemtap
ananth has joined #systemtap
ravi has joined #systemtap
ravi has quit [Remote host closed the connection]
ravi has joined #systemtap
naveen1 has joined #systemtap
gila has joined #systemtap
srikar_away is now known as srikar
pfallenop has quit [Ping timeout: 272 seconds]
pfallenop has joined #systemtap
fche has quit [Read error: Connection reset by peer]
fche has joined #systemtap
hpt_ has joined #systemtap
nkambo has joined #systemtap
nkambo has quit [Ping timeout: 276 seconds]
hpt has quit [Ping timeout: 244 seconds]
nkambo has joined #systemtap
hpt has joined #systemtap
hpt has quit [Quit: Lost terminal]
nkambo has quit [Read error: Connection reset by peer]
hpt_ has quit [Ping timeout: 264 seconds]
nkambo has joined #systemtap
hpt has joined #systemtap
hpt has quit [Client Quit]
ego has quit [Ping timeout: 250 seconds]
gila_ has joined #systemtap
gila has quit [Ping timeout: 272 seconds]
ego has joined #systemtap
ph7 has joined #systemtap
ph7 has quit [Client Quit]
ph7 has joined #systemtap
gila_ has quit [Quit: My Mac Pro has gone to sleep. ZZZzzz…]
<lukas`>
Hello to the community. I am quite new in systemtap and I go through some tutorial trying to trace a statement from a source file. It works if the statement comes from the code but I need it from dynamic library and it tend to finish on semantic error: 'semantic error: while resolving probe point: identifier 'process' at <input>:1:7'. The only difference seems to be that the source file comes from the library. Must
<lukas`>
the library file be mentioned in the stap line?
<lukas`>
Or is there another condition to be fulfilled?
<lukas`>
Ok, I have my answer now, thanks
<fche>
hi lukas`
<fche>
what was your solution?
<lukas`>
fche: Frank?
<fche>
today, my secret name is Franck
<fche>
don't tell anyone
<lukas`>
fche: I think I read one of your messages in some related thread ... anyway the solution was to mention the libc in the process argument.
<lukas`>
fche: sure :-)
<lukas`>
fche: anyway, thanks for previous help with redis PMDA
<fche>
yup, probes are relative to an executable or shared library
<fche>
np
<fche>
can you think of a place in docs or in the diagnostics where we could make this clearer?
<lukas`>
fche: well, I try to write a script that would trace *alloc and free to detect some annoying memory leak which usual tools are hard to find.
ph7 has joined #systemtap
<lukas`>
fche: this is hard to tell, for me the SystemTap was uneasy to understand and een trey on Gentoo and before I personally spoke to Lukas Berk, I even did not believe that I could use it.
<lukas`>
fche: it was harder than experimenting with dtrace
<fche>
aha. we'll be putting in some serious effort into easing this stuff more, soon
<fche>
can you enumerate briefly the problems?
<fche>
(thanks lberk for helping the fellow lukas!)
<lberk>
np :)
<lukas`>
fche: well, lberk just had a presentation on a Red Hat conference and I just asked him - "I tried the SystemTap few times and I found it fairly complicated to be used, maybe convenient for expertes on kernel. Should it be so? Ans he said, no, it should be fairly easy ..."
<lberk>
devconf last year?
<lukas`>
lberk: yes
* lberk
looks up his notes incase there are more details he scribbled away
<lukas`>
fche: okay, imagine a server connected to a load generator. Processing the request leads to further requests to other servers, several protocol stacks are involved. Now, ocasionally, the memry utilization of the process in a load/stability test starts to grow. The tricky thing is that it may or may not start to grow. Usual tools like gperftools or valgrind tools did not show any real leaks, so the memory is probably
<lukas`>
just not being freed under some special conditions.
* fche
has been thinking of pcp proc memory metrics growing as one way to track that sort of thing
<lukas`>
fche: now, tricky things are - it happens only sometimes, e.g. on three of four machines loaded with the same throughput and we do not know the starter.
<lukas`>
fche: we know it is heap ... I checked the pmap output.
<lukas`>
now, I've been thinking to trace ?alloc and free with userspace stack and match it and compare the newly allocated amount of memory with RSS increases.
<fche>
righto
<fche>
glibc has some sys/sdt.h markers in it that may help trace the major events
<fche>
(again that's probably not going to work as is on rhel6.8)
<lukas`>
Here is where I lack some knowledge and have weak points - malloc de facto does not allocate the memory, just reserves it. The memory increases the RSS at the time the page (or group of pages) starts to be used. Therefore I am not even sure with my approach.
<fche>
you'd see the effects of posix level malloc/free traffic
<fche>
new arenas being allocated (via sbrk or mmap) over time
<lukas`>
fche: I already have this post open ... try to understand it
<lukas`>
fche: but I guess that since the problem appears just sometimes a longer run will be necessary and therefore some kind of aggregation - that is yet for me to understand.
<fche>
my guess (!) is that if you monitor arena creation long after the program has initialized and reached steady state of operation,
<fche>
then any -new- arena allocations are suspicious
<fche>
(btw have you tried running things under valgrind?)
<lukas`>
You asked what you should added to the docs - I think the main problem with systemtap. I think that this tool could be tremendously useful for many people. The problem is that you cannot learn it without examples. This is not like a language where you can add some debugger or trace lines as not so many people know internals and can judge what do they see.
<fche>
we're thinking hard about that problem right now, and could use any & all advice about how we could help such learning
<lukas`>
fche: sure, valgrind was our second choice (as the problem happen only when the system is loaded to a certain level - another tricky part of the problem)
<fche>
yup, been there done that
ravi has quit [Ping timeout: 244 seconds]
<lukas`>
fche: Well, as the systemtap requires some effort for the user, I would start with something highly motivating like showing examples of medium to hard bugs like e.g. leaks or, crashes where looking to the code is not enough. Show developers or even testers or admins a way how they can get details about the problem or focus on the right area in code and they will be okay to spend some time (e.g. imagine performance t
<lukas`>
esters that wish to get rid of bottlenecks that their developers cannot cope with)
<fche>
yeah, that's a good description of the motivation
<fche>
one needs to leap to a technical implementation though :)
<lukas`>
... then, advance to the next level - describe some simple behaviour of kernel so that the SystemTap users could study it - e.g. memory allocator
<lukas`>
fche: well, start with a problem, demonstrate it on a software that a lot of people use and is easily configurable, or share it in buggy configuration via Docker.
<lukas`>
buggy configuration or buggy implementation ... simply as and example of a bug that can be traced
<lukas`>
show it on Apache, Bind, Samba, some FTP server or so
<lukas`>
... then you may show something cool like tracing KVM (wow a lof cloud users would love to know why their machine sometimes behave as it does)
<lukas`>
fche: thanks for hints, I have to leave now. But I guess I will be back soon. BTW, I would be grateful for some code review and notes about my Redis PMDA and if anyone would be interested, I work on NutCracker PMDA and more detailed Bind server PMDAs now.
<fche>
sure, pls consider sending a (reminder?) email over to the mailing list; others are also keen to review new folks' code
mjw has quit [Quit: Leaving]
drsmith has quit [Ping timeout: 250 seconds]
nkambo has quit [Quit: Good day !]
hchiramm has joined #systemtap
drsmith has joined #systemtap
hchiramm is now known as Humble
drsmith has left #systemtap [#systemtap]
drsmith has joined #systemtap
csanting has joined #systemtap
ph7 has quit [Quit: Leaving.]
gila has quit [Quit: My Mac Pro has gone to sleep. ZZZzzz…]
<irker390>
systemtap: flu systemtap.git:refs/heads/master * release-3.0-117-ge7540f5 / elaborate.cxx parse.cxx staptree.cxx staptree.h translate.cxx: Fix pausing of probes in monitor mode http://tinyurl.com/zjr7jds
mjw has joined #systemtap
ph7 has joined #systemtap
gila has joined #systemtap
<irker390>
systemtap: rth systemtap.git:refs/heads/rth/bpf * release-3.0-118-g3a8a8ae / : Initial commit of stap bpf support http://tinyurl.com/gmapc77
<jistone>
rth, that's quite a low-key commit message for such big functionality
<fche>
we'll make a big fuss in the next release notes
<jistone>
starting planning now what you will wear in the parade
<fche>
there better be a chorus line
rth has joined #systemtap
naveen1 has quit [Ping timeout: 260 seconds]
ph7 has quit [Quit: Leaving.]
<lukas`>
what goodie is coming?
<jistone>
lukas`, rth will send an email announcement, but that commit is a big hint
hkshaw has quit [Ping timeout: 276 seconds]
ph7 has joined #systemtap
naveen has joined #systemtap
wcohen has quit [Ping timeout: 246 seconds]
hkshaw has joined #systemtap
wcohen has joined #systemtap
ph7 has quit [Quit: Leaving.]
wcohen_ has joined #systemtap
ph7 has joined #systemtap
rawplayer has quit [Ping timeout: 244 seconds]
rawplayer has joined #systemtap
dtatulea has quit [Ping timeout: 260 seconds]
dtatulea has joined #systemtap
ego has quit [Ping timeout: 272 seconds]
mbenitez has quit [Quit: Leaving]
wcohen_ has quit [Ping timeout: 246 seconds]
wcohen has quit [Ping timeout: 250 seconds]
brolley has left #systemtap [#systemtap]
<irker390>
systemtap: fche systemtap.git:refs/heads/rth/bpf * release-3.0-119-g87d6733 / bpf-translate.cxx: bpf-translate: tolerate script with no kprobes http://tinyurl.com/hg974a4
mjw has quit [Quit: Leaving]
ph7 has quit [Quit: Leaving.]
<irker390>
systemtap: fche systemtap.git:refs/heads/rth/bpf * release-3.0-120-g516ee2b / bpf-translate.cxx: bpf-translate: same as above, for {begin,end}_probes http://tinyurl.com/h8lsalq