00:25
_whitelogger has joined #systemtap
00:31
<
agentzh >
fche: i added more debugging output and it seems like the uprobe indeed registers successfully even in bad runs.
00:32
<
agentzh >
and i compared the inode and offset numbers for the uprobe specifications and they are exactly the same as good runs.
00:32
<
agentzh >
is there any more direct way to verify that the uprobes are indeed in effect?
00:32
<
agentzh >
like inspecting mmaped regions of the libc so library in the kernel space, maybe?
00:39
khaled_ has quit [Quit: Konversation terminated!]
00:39
orivej has quit [Ping timeout: 245 seconds]
01:12
hpt has joined #systemtap
01:40
<
fche >
dunno, not sure
01:40
<
fche >
if the kernel has registered the inode-uprobes, and is not hitting them .... that's probably a kernel uprobes bug :(
01:41
<
fche >
does this happen with a newer-than-rhel7 (3.10 level) one?
01:50
<
agentzh >
fche: haven't tried newer kernels yet. will do.
01:51
<
agentzh >
will try on fedora and/or ubuntu
01:51
<
agentzh >
yeah, it might be a kernel bug.
01:51
<
fche >
there have been lots in this general area over the years
01:52
* fche
mopes that systemtap has gotten a bad rap in some quarters from kernel bugs that it helped expose :(
01:54
<
agentzh >
heh, indeed.
02:09
zodbot has quit [Disconnected by services]
02:10
<
fche >
would like to check out your debugging output patches if you think they're committable
02:18
<
agentzh >
sure, will prepare that patch.
02:25
zodbot has joined #systemtap
02:29
<
agentzh >
fche: ah, it turns out that the uprobe handlers are indeed fired. but i have a pid whitelist in the stap script myself. and for some reasons, process.begin might miss the existing running target process some times.
02:29
<
fche >
stap -t ftw then
02:29
<
agentzh >
so it's really a random issue in the process(EXE).begin probe.
02:29
<
agentzh >
ok, will try
02:30
<
agentzh >
-t is indeed handy. i was not aware of this.
02:30
<
agentzh >
thanks for the tip
02:31
<
fche >
might also try stap --monitor FOO.stp just for kicks
02:33
<
agentzh >
process.begin is indeed not firing.
02:33
<
agentzh >
for good runs, we have "process("/usr/local/openresty/nginx/sbin/nginx").begin, (./libc_usleep.stp:12:1), hits: 1, cycles: 10528min/10528avg/10528max, variance: 0, from: process("/usr/local/openresty/nginx/sbin/nginx").begin, index: 0" in the -t report.
02:34
<
agentzh >
now trying --kicks :)
02:34
<
agentzh >
sorry, --monitor
02:35
<
agentzh >
"WARNING: Monitor mode is not supported by this version of systemtap"
02:35
<
agentzh >
missing deps?
02:35
<
fche >
yeah presumably on your build ... a json library probably
02:36
<
fche >
json-c and ncurses
02:40
sscox has joined #systemtap
03:16
zodbot has quit [Ping timeout: 258 seconds]
03:25
<
agentzh >
i see. thanks
03:26
irker334 has joined #systemtap
03:26
<
irker334 >
systemtap: fche systemtap.git:refs/heads/master * release-4.2-10-g1427836 / session.cxx: session.cxx: Print MONITOR_LIBS in -V (version) feature list.
http://tinyurl.com/yfrquwd6
03:43
zodbot has joined #systemtap
03:47
orivej has joined #systemtap
03:52
agentzh has quit [Ping timeout: 245 seconds]
03:54
agentzh has joined #systemtap
04:02
sapatel_ has joined #systemtap
04:05
sapatel has quit [Ping timeout: 246 seconds]
04:09
<
agentzh >
fche: the monitor mode is really awesome.
04:23
sscox has quit [Ping timeout: 245 seconds]
06:26
irker334 has quit [Quit: transmission timeout]
06:35
eichiro has quit [Ping timeout: 250 seconds]
06:36
eichiro has joined #systemtap
06:42
<
agentzh >
fche: where is process(EXE).begin probe implemented please? i grepped through the source tree and the git log history and got lost. alas.
06:44
<
agentzh >
i found that for the same stap version, centos 7's 3.10 kernels and ubuntu 18.04's 4.15.0 kernel both have this issue (missing process(EXE).begin probe fires occassionally). but kernel 5.0.16 shipped with fedora 28 works fine usng exactly the same test case.
06:44
<
agentzh >
wondering how to narrow it down to a kernel patch between 4.15 and 5.0.
06:44
<
agentzh >
(quickly)
06:46
<
agentzh >
my bad, 5.0 kernel still has the same issue, just much rarer.
06:47
<
agentzh >
so it still might be an issue in stap itself (like a race condition or something)?
07:20
khaled has joined #systemtap
07:40
<
agentzh >
fche: just filed a PR for this problem with a standalone and minimal test case.
07:40
<
agentzh >
end of day for me &
10:00
hpt has quit [Ping timeout: 258 seconds]
10:06
gromero has joined #systemtap
10:10
agentzh has quit [Ping timeout: 245 seconds]
10:11
agentzh has joined #systemtap
11:29
mjw has joined #systemtap
12:05
yog_ has joined #systemtap
12:45
rmilkowski has joined #systemtap
12:46
<
rmilkowski >
probefunc() returns nfs4_setup_state_renewal.part.18, what does the .part.18 mean?
12:47
<
rmilkowski >
for a probe module("nfs*").function("nfs4_setup_state_renewal")
12:47
<
rmilkowski >
sometimes it returns just the function name, sometimes it adds the .part.NN suffix
12:48
<
lindi- >
rmilkowski: the compiler has decided to split the function
12:49
<
rmilkowski >
right.... how can I display how the function got splitted?
12:52
<
lindi- >
I don't know if the compiler documents this
12:53
<
rmilkowski >
so for a single probe: probe module("nfs*").function("nfs4_setup_state_renewal") with a single printf printing probefunc() if I get:
12:54
<
rmilkowski >
if I get:
12:54
<
rmilkowski >
nfs4_setup_state_renewal cl_last_renewal: 0
12:55
<
rmilkowski >
ahh, the last one can only come from another probe: probe module("nfs*").function("nfs4_proc_get_lease_time").return{
12:56
<
rmilkowski >
ok, so it split the function and calls the other one from the nfs4_proc_get_lease_time() function
12:56
<
rmilkowski >
ok, I think I understand it
12:56
<
rmilkowski >
thanks!
13:20
<
fche >
.part. = partially inlined copy of a function
13:20
<
fche >
stap doesn't completely understand these; we should skip their entry points but don't
13:23
bendlas has quit [Ping timeout: 252 seconds]
13:38
sscox has joined #systemtap
13:44
<
rmilkowski >
another issue, I booted a different kernel version and now pass 5 fails with:
13:44
<
rmilkowski >
ERROR: module release mismatch (5.5.0-rc2p3+ vs 5.5.0-rc2)
13:44
<
rmilkowski >
deleting cache doesn't help
13:45
<
fche >
do you have the appropriate new kernel's -devel and (if needed) -debuginfo installed?
13:45
<
rmilkowski >
kernel compiled by myself with debug
13:45
<
rmilkowski >
stap was working on this version until I rebooted to another version and then rebooted back
13:46
<
fche >
ok so you're running stap -r /path/to/build/tree ?
13:47
<
fche >
and is /path/to/build/tree exactly the old content from where the running kernel was built?
13:49
<
rmilkowski >
right, I did use -r in the past and forgot about it, let me try
13:55
orivej has quit [Ping timeout: 246 seconds]
14:27
<
rmilkowski >
fyu - it works fine now with -r, thank you
14:28
<
rmilkowski >
fyu=fyi
15:05
tromey has joined #systemtap
15:27
orivej has joined #systemtap
15:52
yog_ has quit [Ping timeout: 258 seconds]
15:58
sapatel_ has quit [Quit: Leaving]
16:30
sapatel has joined #systemtap
17:37
amerey has quit [Quit: Leaving]
17:41
bendlas has joined #systemtap
18:26
orivej has quit [Ping timeout: 248 seconds]
19:06
<
fche >
the _otf was for the on-the-fly arming/disarming facility
19:06
<
fche >
don't mind reusing that, but you can also have some new macro to control this
19:07
<
fche >
or -DDEBUG_PROBES
19:07
<
fche >
(and then use it later also for kprobes etc.)
19:07
<
agentzh >
i was thinking making it controlled by the existing -DDEBUG_UPROBES
19:07
<
fche >
the stp-warn to stp-error change is probably not wise
19:08
<
fche >
DEBUG_UPROBES would be fine
19:08
<
agentzh >
okay, will change it back.
19:08
<
fche >
just there isn't a dbug* macro thing in runtime/debug.h for it yet
19:08
<
agentzh >
i'll add that
19:08
<
fche >
but yeah registration errors can be transient and should not be errors
19:09
<
agentzh >
should i convert the existing otf macros too?
19:09
<
agentzh >
still not sure about the on-the-fly arming/disarming thing.
19:10
<
agentzh >
i'd also make them visible in -DDEBUG_UPROBES.
19:10
<
agentzh >
ok, i'll convert them.
19:23
<
agentzh >
does it look better now?
19:26
<
fche >
you have commit privs, right?
19:34
irker705 has joined #systemtap
19:34
<
irker705 >
systemtap: yichun systemtap.git:refs/heads/master * release-4.2-11-gf9b978d / runtime/linux/debug.h runtime/linux/uprobes-inode.c: uprobes-inode: Add more debugging logs enabled by -DDEBUG_UPROBES.
http://tinyurl.com/yx2djq7z
19:34
<
agentzh >
fche: just committed to the master branch.
19:38
<
agentzh >
since you're around, i'd quickly show you the patch for parallelizing the stapconf.h generation part.
19:41
<
agentzh >
a small patch.
19:41
<
agentzh >
you said previously that you would be interested in merging this.
19:42
<
agentzh >
how does it look?
19:44
<
fche >
hm, why the autoconf_cs inside the session object?
19:45
<
fche >
output_autoconf() could just take a vector<>& to push at the end of
19:46
<
fche >
would just prefer less state, esp. such short-lived
19:47
orivej has joined #systemtap
19:50
<
agentzh >
yeah, that makes sense. i just meant to reduce the amount of changes. i'll change that.
19:51
<
fche >
but sure otherwise go ahead
19:51
<
agentzh >
k, thanks
19:51
<
agentzh >
i'll let you have a final look just to make sure.
20:05
<
agentzh >
already tested on my side (Pass-4 reduces from 12.4s to 4.3s for a simple stap script).
20:06
<
agentzh >
k, thanks
20:16
<
irker705 >
systemtap: yichun systemtap.git:refs/heads/master * release-4.2-12-gefb03a3 / buildrun.cxx: buildrun.cxx: make the stapconf_xxx.h file generation process parallellable.
http://tinyurl.com/ubqngvd
20:16
<
agentzh >
fche: committed.
20:17
<
agentzh >
fche: there's another related patch to make stap-symbols.h a separate CU. are you interested in taking a look at it?
20:17
<
agentzh >
it saves 100ms ~ 200ms, not as dramatic as the previous patch but we're working on other CU-ization.
20:17
<
agentzh >
so every bit counts.
20:17
<
fche >
yeah wouldn't expect to make much difference
20:18
<
fche >
not super interested in that small a savings but wouldn't reject it either if it's clean
20:18
<
agentzh >
yeah, i'll show you it anyway.
20:19
<
agentzh >
the patch is small.
20:21
<
agentzh >
fche: please ignore s.use_user_stapconf stuff for now. i'll remove them in the final version of the patch.
20:22
<
fche >
yeah was wondering
20:22
<
agentzh >
it's for another thing that you show no interest in accepting.
20:22
<
fche >
yeah I remember talking about that part
20:23
<
agentzh >
the patch was from our own tree so it would need cleanup when rebasing to the mainstream repo.
20:23
<
agentzh >
*need some cleanup
20:25
<
agentzh >
just to make sure i'm on the right path before doing the cleanup.
20:25
<
fche >
curious why s.comm_hdr needs to be sort of so ugly to use (*s.comm_hdr )
20:26
<
agentzh >
i'm open to suggestions :)
20:26
<
agentzh >
i don't like it either.
20:32
<
fche >
maybe generalize translator_output, or have a second such object?
20:37
<
agentzh >
or s.op_h?
20:37
<
agentzh >
i don't mind s.op2 :)
20:38
<
agentzh >
it sounds a bigger change. i'll think about it.
20:38
<
agentzh >
*sounds like
20:39
<
agentzh >
but it would indeed be better when we further split the xxx_src.c CU.
20:40
amerey has joined #systemtap
20:40
<
agentzh >
in that case, op2 might not be a very good name.
20:40
<
agentzh >
since we would have op3, op4, and etc at that point...
20:41
<
agentzh >
i'll try the translator_output generalization first.
20:42
<
fche >
then it'd have to be s.op->newline("file.h") or somesuch other parametrization to tell the destinations apart
20:43
<
fche >
s.op->header->newline() maybe ?
20:43
<
fche >
or s.op["foo.h"]->newline() etc.
20:43
<
fche >
one can c++ it a couple of different not-too-ugly ways
20:43
<
agentzh >
oh i like this.
20:44
<
agentzh >
to make a single class applicable to multiple output streams.
20:44
<
agentzh >
how about s.op->hdr->newline() ?
20:44
<
agentzh >
to make it shorter :)
20:54
<
agentzh >
fche: hmm, can we do s.op->switch(filename) instead?
20:54
<
agentzh >
so that we don't need to touch existing code.
20:54
<
agentzh >
just switching back and forth.
20:55
<
agentzh >
and it's also easier to implement.
20:55
<
agentzh >
and also can be made quite efficient.
20:55
<
fche >
well, that hides some state from the programmer ... I don't mind change size per se
20:57
<
agentzh >
s.op->hdr->newline() would need some intermediate temp objects in the middle to pass the state, which i don't like either.
20:57
<
agentzh >
or we just make it persistent in the s.op object.
20:57
<
fche >
yeah no question some other objects (ostreams) are hidden in there
20:58
<
fche >
but with a switch(filename), the programmer needs to keep track of which file is being written to by any particular s.op()
20:58
<
fche >
now if these switches are very short-lived .. not like a screenful of text away from the writes, that could be ok
20:58
<
agentzh >
yeah, the code would look confusing.
20:58
<
agentzh >
if taken out of the context.
20:59
<
agentzh >
i'll do the op->hdr way then.
21:25
yog_ has joined #systemtap
21:26
<
agentzh >
fche: ah, the saving is bigger than i expected when doing stap --ldd and print_ubacktrace() in the stp script.
21:26
<
agentzh >
a small script's Pass-4 latency drops from 5.5s to 4.8s on my side consistently.
21:26
<
agentzh >
that's almost a second :)
21:27
<
agentzh >
the larger the symbols.h, the larger the saving.
21:27
<
agentzh >
i've just done the refactoring of this patch. i'll paste it somewhere. a sec...
21:27
<
agentzh >
also rebased to the current mainline master.
21:35
agentzh has quit [*.net *.split]
21:35
agentzh has joined #systemtap
21:35
<
agentzh >
hopefully it's better now.
21:38
<
agentzh >
rebased to the master already.
22:07
sscox has quit [Ping timeout: 258 seconds]
22:39
<
fche >
agentzh, be sure the code works with non-kernel too
22:39
<
fche >
that kallsyms_out ... bit makes me concerned that maybe it presumed --runtime=lkm
22:40
<
agentzh >
good point. i'll test it.
22:40
<
fche >
as a c++ism, I believe it's safe to delete a 0 pointer so the if () checks aren't needed in the translator_output functions
22:42
<
fche >
otherwise looks good
22:42
<
agentzh >
okay, i'll remove the if check.
22:51
<
fche >
but yea also check --runtime=dyninst on some small program
22:52
<
agentzh >
checking now.
22:55
<
agentzh >
tested --dyninst on the usleep C program and stp script and it works fine.
22:56
tromey has quit [Quit: ERC (IRC client for Emacs 26.2)]
22:58
<
fche >
ok, not sure it would use the symbol data but that's okay,
23:00
<
agentzh >
"stap_10835_aux_0.c stap_10835.so* stap_10835_src.c stap_common.h stap_symbols.c"
23:00
<
agentzh >
it seems so.
23:01
<
agentzh >
files in dyninst mode's tmp dir.
23:01
<
agentzh >
bpf is too limited to run this example.
23:02
<
agentzh >
but it compiles fine too.
23:02
<
fche >
well, bpf mode doesn't suck in c code at all
23:02
<
agentzh >
oh right.
23:02
<
fche >
there isn't a makefile etc
23:02
<
agentzh >
it emits bitcode directly.
23:12
mjw has quit [Quit: Leaving]
23:18
rmilkowski has quit [Remote host closed the connection]
23:35
khaled has quit [Quit: Konversation terminated!]
23:52
<
agentzh >
fche: okay, my bad. the stap_symbols.c was never compiled into the .so file in dyninst mode.
23:52
<
agentzh >
fortunately i haven't committed the patch yet.
23:52
<
agentzh >
what's the best way to test symbols in the dyninst mode?
23:53
<
agentzh >
print_ubacktrace(), probefunc(), and synname() are all absent in the dyninst mode.
23:54
<
fche >
yeah, I think we just haven't ported this functionality over there yet, so
23:54
<
fche >
breaking stap-symbols.* this way is not actually a problem
23:54
<
fche >
(if it's even a breakage :)
23:58
amerey has quit [Quit: Leaving]