fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<fche>
hm, must find the part of the runtime that does the initial scan of the task list
<fche>
in order to populate that process().begin callback
<fche>
runtime/linux/task_finder2.c - drsmith's baby from way back - is closely related
<agentzh>
cool
<fche>
stap_start_task_finder()
<agentzh>
k
<fche>
we could use some diagnostic magic over in that function
<agentzh>
okay
<agentzh>
dbug_xxx huh?
<fche>
worth a shot
<agentzh>
will do.
<fche>
there is a dbug_task_vma already in use around there
<fche>
just not enough - for the initial traversal
<agentzh>
got it
<agentzh>
fche: btw, we do have a patch to add a --include FILE option to stap which loads the specified user tapset files ONLY. will you be interested?
<agentzh>
the idea is to load tapset lib and macro files only on demand in this mode.
<agentzh>
can save up to 700ms in Pass 1 and Pass 2.
<agentzh>
kinda similar to gcc's -include FILE option in some way.
<agentzh>
currently stap always loads all the tapset files which is quite wasteful.
<fche>
it does a fair bit less work now than it used to (4.1 and 4.2 methinks)
<fche>
particularly in not pass-2-processing functions/probes that are not actually referenced
<fche>
another way to accomplish such fine-grained control is to have a whole separate env SYSTEMTAP_TAPSET=/path/ directory, which has only the subset you know you need
<agentzh>
yeah, i noted that change but still quite slow. the SYSTEMTAP_TAPSET approach works but not very flexible, still need to copying files around.
<fche>
or symlink
<agentzh>
yeah, symlinks are better.
<fche>
finding the right --include=foo --include=bar flags would prereq about as much work as actually creating an alternate tapset directory
hpt has joined #systemtap
<agentzh>
fche: we already done that part ourselves and pre-build a database for it with all the dep relationship. but yeah, it may be beyond the scope of the stap project itself.
<agentzh>
similar to --use-user-stapconf FILE
<agentzh>
fche: we're sponsoring Linaro's developers to work on some patches for stap.
<agentzh>
yeah, maybe. is there any other spots i should check?
<irker265>
systemtap: sapatel systemtap.git:refs/heads/master * release-4.2-14-gfbf9a32 / bpf-opt.cxx: PR25298: stapbpf unused blocks may cause segfault http://tinyurl.com/s42kokk
<agentzh>
also, where can i find docs and code for those functions like utrace_set_events, utrace_control, and utrace_attach_task? i thought utrace is already replaced by uprobes in modern kernels.
<agentzh>
or they were emulated somehow by ptrace or uprobes?
<agentzh>
*they are
<agentzh>
they look like magic to me.
<agentzh>
more aligned with gdb's "attach" way it seems.
<fche>
utrace is not replaced by uprobes
<fche>
utrace was a proposed kernel-side api for building uprobes and other things
<fche>
kernel later grew uprobes equivalent apis without utrace
<fche>
so stap implements a baby utrace based on tracepoints etc., for those 'other things' that we needed still
<fche>
yes very much more of a gdb debugging level
serhei has quit [*.net *.split]
serhei has joined #systemtap
amerey has joined #systemtap
orivej has quit [Ping timeout: 268 seconds]
<agentzh>
fche: ah, that's interesting.
<agentzh>
thanks for the info.
<agentzh>
now it seems to that the utrace apis do register the probe handler callbacks successfully, but for some reasons that i don't know, the callback is never fired in some runs.
<agentzh>
am i mssing anything here?
<agentzh>
*missing
<agentzh>
any suggestions for further debugging would be very appreciated :)
<fche>
would put tracing into __stp_call_callbacks()
<agentzh>
k, will do
yogananth_ has joined #systemtap
<agentzh>
thanks
<fche>
were you able to test it on newer kernels btw?
<agentzh>
yes sure
<agentzh>
like 5.0
yog_ has quit [Ping timeout: 248 seconds]
<fche>
aha and same behaviour?
yogananth has quit [Ping timeout: 265 seconds]
yog_ has joined #systemtap
<agentzh>
same behavior
<agentzh>
just less frequently.
<agentzh>
also tried 4.15 kernel from ubuntu. same thing.
<agentzh>
kernel 5.0 is from fedora 28
<agentzh>
this issue has been bothering me for quite a while. alas.
<agentzh>
these new dbug_task calls make the problem much easier to reproduce.
sscox has quit [Ping timeout: 268 seconds]
<agentzh>
fche: okay, i further narrowed it down to the stap_utrace_probe_handler() function.
<agentzh>
it skipped calling '(*p->probe->ph) (c);' somehow.
<agentzh>
the stap_utrace_probe_handler() function is indeed entered.
<agentzh>
fche: okay, i nailed it down. it is because the condition in 'if (atomic_read (session_state()) != STAP_SESSION_RUNNING)' is true, thus skipping the probe handler altogether.
wcohen has quit [Ping timeout: 268 seconds]
irker265 has quit [Quit: transmission timeout]
orivej has quit [Ping timeout: 258 seconds]
<fche>
hm that sounds like a race condition
<fche>
we may set that flag a little too late
<fche>
maybe the condition needs to accept STAP_SESSION_STARTING too