fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
_whitelogger has joined #systemtap
<agentzh> fche: i added more debugging output and it seems like the uprobe indeed registers successfully even in bad runs.
<agentzh> and i compared the inode and offset numbers for the uprobe specifications and they are exactly the same as good runs.
<agentzh> is there any more direct way to verify that the uprobes are indeed in effect?
<agentzh> like inspecting mmaped regions of the libc so library in the kernel space, maybe?
khaled_ has quit [Quit: Konversation terminated!]
orivej has quit [Ping timeout: 245 seconds]
hpt has joined #systemtap
<fche> dunno, not sure
<fche> if the kernel has registered the inode-uprobes, and is not hitting them .... that's probably a kernel uprobes bug :(
<fche> does this happen with a newer-than-rhel7 (3.10 level) one?
<agentzh> fche: haven't tried newer kernels yet. will do.
<agentzh> will try on fedora and/or ubuntu
<agentzh> yeah, it might be a kernel bug.
<fche> there have been lots in this general area over the years
* fche mopes that systemtap has gotten a bad rap in some quarters from kernel bugs that it helped expose :(
<agentzh> heh, indeed.
zodbot has quit [Disconnected by services]
<fche> would like to check out your debugging output patches if you think they're committable
<agentzh> sure, will prepare that patch.
zodbot has joined #systemtap
<agentzh> fche: ah, it turns out that the uprobe handlers are indeed fired. but i have a pid whitelist in the stap script myself. and for some reasons, process.begin might miss the existing running target process some times.
<fche> haha
<fche> stap -t ftw then
<agentzh> so it's really a random issue in the process(EXE).begin probe.
<agentzh> ok, will try
<agentzh> thanks
<fche> hm could be
<agentzh> -t is indeed handy. i was not aware of this.
<agentzh> thanks for the tip
<fche> yup
<fche> might also try stap --monitor FOO.stp just for kicks
<agentzh> process.begin is indeed not firing.
<agentzh> for good runs, we have "process("/usr/local/openresty/nginx/sbin/nginx").begin, (./libc_usleep.stp:12:1), hits: 1, cycles: 10528min/10528avg/10528max, variance: 0, from: process("/usr/local/openresty/nginx/sbin/nginx").begin, index: 0" in the -t report.
<agentzh> now trying --kicks :)
<agentzh> sorry, --monitor
<agentzh> "WARNING: Monitor mode is not supported by this version of systemtap"
<agentzh> missing deps?
<fche> yeah presumably on your build ... a json library probably
<fche> json-c and ncurses
sscox has joined #systemtap
zodbot has quit [Ping timeout: 258 seconds]
<agentzh> i see. thanks
irker334 has joined #systemtap
<irker334> systemtap: fche systemtap.git:refs/heads/master * release-4.2-10-g1427836 / session.cxx: session.cxx: Print MONITOR_LIBS in -V (version) feature list. http://tinyurl.com/yfrquwd6
zodbot has joined #systemtap
orivej has joined #systemtap
agentzh has quit [Ping timeout: 245 seconds]
agentzh has joined #systemtap
sapatel_ has joined #systemtap
sapatel has quit [Ping timeout: 246 seconds]
<agentzh> fche: the monitor mode is really awesome.
sscox has quit [Ping timeout: 245 seconds]
irker334 has quit [Quit: transmission timeout]
eichiro has quit [Ping timeout: 250 seconds]
eichiro has joined #systemtap
<agentzh> fche: where is process(EXE).begin probe implemented please? i grepped through the source tree and the git log history and got lost. alas.
<agentzh> i found that for the same stap version, centos 7's 3.10 kernels and ubuntu 18.04's 4.15.0 kernel both have this issue (missing process(EXE).begin probe fires occassionally). but kernel 5.0.16 shipped with fedora 28 works fine usng exactly the same test case.
<agentzh> wondering how to narrow it down to a kernel patch between 4.15 and 5.0.
<agentzh> (quickly)
<agentzh> my bad, 5.0 kernel still has the same issue, just much rarer.
<agentzh> so it still might be an issue in stap itself (like a race condition or something)?
khaled has joined #systemtap
<agentzh> fche: just filed a PR for this problem with a standalone and minimal test case.
<agentzh> end of day for me &
hpt has quit [Ping timeout: 258 seconds]
gromero has joined #systemtap
agentzh has quit [Ping timeout: 245 seconds]
agentzh has joined #systemtap
mjw has joined #systemtap
yog_ has joined #systemtap
rmilkowski has joined #systemtap
<rmilkowski> hi
<rmilkowski> probefunc() returns nfs4_setup_state_renewal.part.18, what does the .part.18 mean?
<rmilkowski> for a probe module("nfs*").function("nfs4_setup_state_renewal")
<rmilkowski> sometimes it returns just the function name, sometimes it adds the .part.NN suffix
<lindi-> rmilkowski: the compiler has decided to split the function
<rmilkowski> right.... how can I display how the function got splitted?
<lindi-> I don't know if the compiler documents this
<rmilkowski> so for a single probe: probe module("nfs*").function("nfs4_setup_state_renewal") with a single printf printing probefunc() if I get:
<rmilkowski> if I get:
<rmilkowski> nfs4_setup_state_renewal cl_last_renewal: 0
<rmilkowski> ahh, the last one can only come from another probe: probe module("nfs*").function("nfs4_proc_get_lease_time").return{
<rmilkowski> ok, so it split the function and calls the other one from the nfs4_proc_get_lease_time() function
<rmilkowski> ok, I think I understand it
<rmilkowski> thanks!
<fche> yeah
<fche> .part. = partially inlined copy of a function
<fche> stap doesn't completely understand these; we should skip their entry points but don't
bendlas has quit [Ping timeout: 252 seconds]
sscox has joined #systemtap
<rmilkowski> another issue, I booted a different kernel version and now pass 5 fails with:
<rmilkowski> ERROR: module release mismatch (5.5.0-rc2p3+ vs 5.5.0-rc2)
<rmilkowski> deleting cache doesn't help
<fche> do you have the appropriate new kernel's -devel and (if needed) -debuginfo installed?
<rmilkowski> kernel compiled by myself with debug
<rmilkowski> stap was working on this version until I rebooted to another version and then rebooted back
<fche> ok so you're running stap -r /path/to/build/tree ?
<fche> and is /path/to/build/tree exactly the old content from where the running kernel was built?
<rmilkowski> right, I did use -r in the past and forgot about it, let me try
orivej has quit [Ping timeout: 246 seconds]
<rmilkowski> fyu - it works fine now with -r, thank you
<rmilkowski> fyu=fyi
tromey has joined #systemtap
orivej has joined #systemtap
yog_ has quit [Ping timeout: 258 seconds]
sapatel_ has quit [Quit: Leaving]
sapatel has joined #systemtap
amerey has quit [Quit: Leaving]
bendlas has joined #systemtap
orivej has quit [Ping timeout: 248 seconds]
<agentzh> fche: does this patch look good? https://pastebin.com/NMyCZr3F (for more debug logging in uprobes-inode)
<fche> the _otf was for the on-the-fly arming/disarming facility
<fche> don't mind reusing that, but you can also have some new macro to control this
<agentzh> dbg_ui ?
<fche> or -DDEBUG_PROBES
<fche> (and then use it later also for kprobes etc.)
<agentzh> i was thinking making it controlled by the existing -DDEBUG_UPROBES
<fche> the stp-warn to stp-error change is probably not wise
<fche> DEBUG_UPROBES would be fine
<agentzh> okay, will change it back.
<fche> just there isn't a dbug* macro thing in runtime/debug.h for it yet
<agentzh> yep
<agentzh> i'll add that
<fche> but yeah registration errors can be transient and should not be errors
<agentzh> a sec
<agentzh> should i convert the existing otf macros too?
<agentzh> still not sure about the on-the-fly arming/disarming thing.
<agentzh> i'd also make them visible in -DDEBUG_UPROBES.
<fche> sounds good
<agentzh> ok, i'll convert them.
<agentzh> does it look better now?
<fche> ship it
<agentzh> k
<agentzh> thanks
<fche> you have commit privs, right?
<agentzh> yep
irker705 has joined #systemtap
<irker705> systemtap: yichun systemtap.git:refs/heads/master * release-4.2-11-gf9b978d / runtime/linux/debug.h runtime/linux/uprobes-inode.c: uprobes-inode: Add more debugging logs enabled by -DDEBUG_UPROBES. http://tinyurl.com/yx2djq7z
<agentzh> fche: just committed to the master branch.
<fche> thanks dude
<agentzh> sure
<agentzh> since you're around, i'd quickly show you the patch for parallelizing the stapconf.h generation part.
<agentzh> a sec
<agentzh> a small patch.
<agentzh> you said previously that you would be interested in merging this.
<agentzh> how does it look?
<fche> loooking
<agentzh> thanks
<fche> hm, why the autoconf_cs inside the session object?
<fche> output_autoconf() could just take a vector<>& to push at the end of
<fche> would just prefer less state, esp. such short-lived
orivej has joined #systemtap
<agentzh> yeah, that makes sense. i just meant to reduce the amount of changes. i'll change that.
<agentzh> a sec
<fche> but sure otherwise go ahead
<agentzh> k, thanks
<agentzh> i'll let you have a final look just to make sure.
<agentzh> already tested on my side (Pass-4 reduces from 12.4s to 4.3s for a simple stap script).
<fche> looks good
<fche> ship it
<agentzh> k, thanks
<irker705> systemtap: yichun systemtap.git:refs/heads/master * release-4.2-12-gefb03a3 / buildrun.cxx: buildrun.cxx: make the stapconf_xxx.h file generation process parallellable. http://tinyurl.com/ubqngvd
<agentzh> fche: committed.
<fche> thanks
<agentzh> fche: there's another related patch to make stap-symbols.h a separate CU. are you interested in taking a look at it?
<agentzh> it saves 100ms ~ 200ms, not as dramatic as the previous patch but we're working on other CU-ization.
<agentzh> so every bit counts.
<fche> yeah wouldn't expect to make much difference
<fche> not super interested in that small a savings but wouldn't reject it either if it's clean
<agentzh> yeah, i'll show you it anyway.
<agentzh> the patch is small.
<agentzh> a sec
<agentzh> fche: here we go: https://pastebin.com/nHczvUgF
<agentzh> fche: please ignore s.use_user_stapconf stuff for now. i'll remove them in the final version of the patch.
<fche> yeah was wondering
<agentzh> it's for another thing that you show no interest in accepting.
<fche> yeah I remember talking about that part
<agentzh> right
<agentzh> the patch was from our own tree so it would need cleanup when rebasing to the mainstream repo.
<agentzh> *need some cleanup
<agentzh> just to make sure i'm on the right path before doing the cleanup.
<fche> aha
<fche> curious why s.comm_hdr needs to be sort of so ugly to use (*s.comm_hdr )
<agentzh> i'm open to suggestions :)
<agentzh> i don't like it either.
<fche> maybe generalize translator_output, or have a second such object?
<fche> s.op2 ?
<agentzh> or s.op_h?
<agentzh> i don't mind s.op2 :)
<agentzh> it sounds a bigger change. i'll think about it.
<agentzh> *sounds like
<agentzh> but it would indeed be better when we further split the xxx_src.c CU.
amerey has joined #systemtap
<agentzh> in that case, op2 might not be a very good name.
<agentzh> since we would have op3, op4, and etc at that point...
<agentzh> i'll try the translator_output generalization first.
<fche> then it'd have to be s.op->newline("file.h") or somesuch other parametrization to tell the destinations apart
<fche> s.op->header->newline() maybe ?
<fche> or s.op["foo.h"]->newline() etc.
<fche> one can c++ it a couple of different not-too-ugly ways
<agentzh> oh i like this.
<agentzh> to make a single class applicable to multiple output streams.
<agentzh> how about s.op->hdr->newline() ?
<agentzh> to make it shorter :)
<fche> sure
<agentzh> fche: hmm, can we do s.op->switch(filename) instead?
<agentzh> so that we don't need to touch existing code.
<agentzh> just switching back and forth.
<agentzh> and it's also easier to implement.
<agentzh> and also can be made quite efficient.
<fche> well, that hides some state from the programmer ... I don't mind change size per se
<agentzh> got it.
<agentzh> s.op->hdr->newline() would need some intermediate temp objects in the middle to pass the state, which i don't like either.
<agentzh> or we just make it persistent in the s.op object.
<fche> yeah no question some other objects (ostreams) are hidden in there
<agentzh> *nod*
<fche> but with a switch(filename), the programmer needs to keep track of which file is being written to by any particular s.op()
<fche> now if these switches are very short-lived .. not like a screenful of text away from the writes, that could be ok
<agentzh> yeah, the code would look confusing.
<agentzh> if taken out of the context.
<agentzh> i'll do the op->hdr way then.
yog_ has joined #systemtap
<agentzh> fche: ah, the saving is bigger than i expected when doing stap --ldd and print_ubacktrace() in the stp script.
<agentzh> a small script's Pass-4 latency drops from 5.5s to 4.8s on my side consistently.
<agentzh> that's almost a second :)
<agentzh> the larger the symbols.h, the larger the saving.
<agentzh> i've just done the refactoring of this patch. i'll paste it somewhere. a sec...
<agentzh> also rebased to the current mainline master.
agentzh has quit [*.net *.split]
agentzh has joined #systemtap
<agentzh> hopefully it's better now.
<agentzh> rebased to the master already.
<agentzh> just in case my messages were not delivered (i was suffering network issues): https://pastebin.com/6TbxJA2r
sscox has quit [Ping timeout: 258 seconds]
<agentzh> fche: also sent the patch to the mailing list: https://sourceware.org/ml/systemtap/2019-q4/msg00078.html
<fche> agentzh, be sure the code works with non-kernel too
<fche> that kallsyms_out ... bit makes me concerned that maybe it presumed --runtime=lkm
<agentzh> good point. i'll test it.
<fche> as a c++ism, I believe it's safe to delete a 0 pointer so the if () checks aren't needed in the translator_output functions
<fche> otherwise looks good
<agentzh> okay, i'll remove the if check.
<fche> but yea also check --runtime=dyninst on some small program
<agentzh> sure
<agentzh> checking now.
<agentzh> tested --dyninst on the usleep C program and stp script and it works fine.
tromey has quit [Quit: ERC (IRC client for Emacs 26.2)]
<fche> ok, not sure it would use the symbol data but that's okay,
<fche> we can try
<agentzh> "stap_10835_aux_0.c stap_10835.so* stap_10835_src.c stap_common.h stap_symbols.c"
<agentzh> it seems so.
<agentzh> files in dyninst mode's tmp dir.
<fche> ok, ship it
<agentzh> bpf is too limited to run this example.
<fche> yup
<agentzh> but it compiles fine too.
<fche> well, bpf mode doesn't suck in c code at all
<agentzh> oh right.
<fche> there isn't a makefile etc
<agentzh> it emits bitcode directly.
mjw has quit [Quit: Leaving]
rmilkowski has quit [Remote host closed the connection]
khaled has quit [Quit: Konversation terminated!]
<agentzh> fche: okay, my bad. the stap_symbols.c was never compiled into the .so file in dyninst mode.
<agentzh> fortunately i haven't committed the patch yet.
<agentzh> what's the best way to test symbols in the dyninst mode?
<agentzh> print_ubacktrace(), probefunc(), and synname() are all absent in the dyninst mode.
<fche> yeah, I think we just haven't ported this functionality over there yet, so
<fche> breaking stap-symbols.* this way is not actually a problem
<fche> (if it's even a breakage :)
amerey has quit [Quit: Leaving]