fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
khaled has quit [Remote host closed the connection]
lindi- has joined #systemtap
khaled has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
amerey has quit [Ping timeout: 256 seconds]
_whitelogger has joined #systemtap
sscox has quit [Ping timeout: 240 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
kerneltoast has quit [Ping timeout: 260 seconds]
kerneltoast has joined #systemtap
derek0883 has quit [Remote host closed the connection]
orivej has quit [Ping timeout: 256 seconds]
irker878 has quit [Quit: transmission timeout]
lindi- has quit [*.net *.split]
kerneltoast_ has joined #systemtap
kerneltoast has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
lindi- has joined #systemtap
sscox has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
khaled has joined #systemtap
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
mjw has joined #systemtap
derek0883 has quit [Remote host closed the connection]
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
mrtimdog has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
orivej_ has joined #systemtap
mrtimdog has left #systemtap [#systemtap]
orivej_ has quit [Ping timeout: 260 seconds]
tromey has joined #systemtap
amerey has joined #systemtap
derek0883 has joined #systemtap
lindi- has quit [*.net *.split]
DUKENUKEM has quit [Ping timeout: 240 seconds]
lindi- has joined #systemtap
DUKENUKE1 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
przemoc has joined #systemtap
orivej has joined #systemtap
<irker982>
systemtap: scox systemtap.git:master * release-4.3-84-geffa3dbe1 / tapset/linux/dentry.stp: Use module_container_of to find kernel header.
<agentzh>
fche: got some time to continue yesterday's conversation regarding that patch?
<fche>
hi
<fche>
time yes, presence of mind, questionable :)
derek0883 has quit [Remote host closed the connection]
<kerneltoast>
fche, so it appears that there is no way for processes to exit utrace_stop() once they are place in the traced state
<kerneltoast>
once a process reaches utrace_stop() it's deadbeef
<kerneltoast>
the UTRACE_RESUME patch from yesterday attempted to address that, but it didn't resolve the issue completely
<kerneltoast>
so now the question is: why is utrace_stop() used when it can never be exited? the only way to currently get a process out of there is with a SIGKILL
<fche>
I would be surprised if it worked like that
<kerneltoast>
utrace_stop() is entered quite rarely from my logging
<kerneltoast>
it wouldn't surprise me if this were just something that went unnoticed
derek0883 has joined #systemtap
<fche>
I'm looking at it and it doesn't look like the only exit is a sigkill - how do you know that?
<kerneltoast>
fche, because i put a printk after the schedule() and it's never reached
<kerneltoast>
with the patch from yesterday, the code after the schedule() gets reached sometimes, but not always
<fche>
utrace_wakeup e.g. never called for such tasks?
<kerneltoast>
correct
<fche>
utrace_reset() calls it ... (am really just glancing at code, not at the grok level)
<kerneltoast>
fche, yeah. the plumbing is there but it doesn't work
<fche>
any idea why?
<kerneltoast>
nope :)
<kerneltoast>
perhaps ENGINE_STOP gets set and then unset before the reset can occur?
<fche>
maybe time for more STP_TF_DEBUG printk's
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
amerey_ has joined #systemtap
amerey has quit [Ping timeout: 272 seconds]
<kerneltoast>
fche, how about an easier question? :)
<fche>
so it's silencing those errors and pretending success to the caller? is this harmless?
<kerneltoast>
fche, the existing code already silences the error
<kerneltoast>
in other parts of that function the error is both silenced and not passed to the caller
<kerneltoast>
but in this part, the error is only silenced
<fche>
so basically we want to silence just those two cases
<kerneltoast>
yeah, otherwise they can cause an stp_error to be hit. but if they were really fatal, why ignore the stp_error to begin with?
<kerneltoast>
so it seems like the intended behavior is to ignore these errors
<fche>
well, they're not fatal so they should be warnings -- or maybe they are fatal with respect to actual probe placement?
<kerneltoast>
i'm not entirely certain, but i haven't noticed any nasty side effects of this patch, so i'm leaning towards nonfatal
<kerneltoast>
for some users of __stp_utrace_attach, the error is definitely nonfatal
<kerneltoast>
such as in __stp_utrace_attach_match_filename
<fche>
are the probes still hitting the subject processes?
<kerneltoast>
good question. maybe agentzh knows how to check, but i'm new here so i'm not sure how
mjw has quit [Quit: Leaving]
<agentzh>
kerneltoast: you need to check if the stap tool indeed generates any output when this error/warning happens. or you can pass -t option to stap (not staprun).
<agentzh>
the -t option generates probe hitting stats automatically at runtime.
<fche>
it may successfully attach probes to -some- but not all processes, so may register hits
<fche>
even if the errors are legitimate, and even if they are silenced
<agentzh>
i wonder that's the root cause for such errors.
<agentzh>
race conditions?
<agentzh>
is there anyway to retry?
<agentzh>
automatically?
<fche>
I believe these errors could relate to processes already dead, so retrying is futile
<fche>
or at least one of the errors
<fche>
the other may relate to some other condition :(
amerey_ has quit [Ping timeout: 265 seconds]
<fche>
agentzh, I suppose in general this class of error messages is not actionable to a user, so there's not much even printing them
<fche>
we could tabulate a failure-to-register count, or just add them to the "nmissed" type counters already being collected, and report on them at the end