fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
khaled has quit [Remote host closed the connection]
lindi- has joined #systemtap
khaled has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
amerey has quit [Ping timeout: 256 seconds]
_whitelogger has joined #systemtap
sscox has quit [Ping timeout: 240 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
kerneltoast has quit [Ping timeout: 260 seconds]
kerneltoast has joined #systemtap
derek0883 has quit [Remote host closed the connection]
orivej has quit [Ping timeout: 256 seconds]
irker878 has quit [Quit: transmission timeout]
lindi- has quit [*.net *.split]
kerneltoast_ has joined #systemtap
kerneltoast has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
lindi- has joined #systemtap
sscox has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
khaled has joined #systemtap
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
mjw has joined #systemtap
derek0883 has quit [Remote host closed the connection]
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
mrtimdog has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
orivej_ has joined #systemtap
mrtimdog has left #systemtap [#systemtap]
orivej_ has quit [Ping timeout: 260 seconds]
tromey has joined #systemtap
amerey has joined #systemtap
derek0883 has joined #systemtap
lindi- has quit [*.net *.split]
DUKENUKEM has quit [Ping timeout: 240 seconds]
lindi- has joined #systemtap
DUKENUKE1 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
tromey has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
derek0883 has joined #systemtap
irker982 has joined #systemtap
<irker982> systemtap: fche systemtap.git:master * release-4.3-83-g3d9ba2409 / runtime/procfs.c tapset-procfs.cxx: procfs tapset: compute STP_MAX_PROCFS_FILES
kerneltoast_ is now known as kerneltoast
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
przemoc has joined #systemtap
orivej has joined #systemtap
<irker982> systemtap: scox systemtap.git:master * release-4.3-84-geffa3dbe1 / tapset/linux/dentry.stp: Use module_container_of to find kernel header.
<agentzh> fche: got some time to continue yesterday's conversation regarding that patch?
<fche> hi
<fche> time yes, presence of mind, questionable :)
derek0883 has quit [Remote host closed the connection]
<kerneltoast> fche, so it appears that there is no way for processes to exit utrace_stop() once they are place in the traced state
<kerneltoast> once a process reaches utrace_stop() it's deadbeef
<kerneltoast> the UTRACE_RESUME patch from yesterday attempted to address that, but it didn't resolve the issue completely
<kerneltoast> so now the question is: why is utrace_stop() used when it can never be exited? the only way to currently get a process out of there is with a SIGKILL
<fche> I would be surprised if it worked like that
<kerneltoast> utrace_stop() is entered quite rarely from my logging
<kerneltoast> it wouldn't surprise me if this were just something that went unnoticed
derek0883 has joined #systemtap
<fche> I'm looking at it and it doesn't look like the only exit is a sigkill - how do you know that?
<kerneltoast> fche, because i put a printk after the schedule() and it's never reached
<kerneltoast> with the patch from yesterday, the code after the schedule() gets reached sometimes, but not always
<fche> utrace_wakeup e.g. never called for such tasks?
<kerneltoast> correct
<fche> utrace_reset() calls it ... (am really just glancing at code, not at the grok level)
<kerneltoast> fche, yeah. the plumbing is there but it doesn't work
<fche> any idea why?
<kerneltoast> nope :)
<kerneltoast> perhaps ENGINE_STOP gets set and then unset before the reset can occur?
<fche> maybe time for more STP_TF_DEBUG printk's
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
amerey_ has joined #systemtap
amerey has quit [Ping timeout: 272 seconds]
<kerneltoast> fche, how about an easier question? :)
derek0883 has joined #systemtap
<fche> hmmm
<fche> so it's silencing those errors and pretending success to the caller? is this harmless?
<kerneltoast> fche, the existing code already silences the error
<kerneltoast> in other parts of that function the error is both silenced and not passed to the caller
<kerneltoast> but in this part, the error is only silenced
<fche> so basically we want to silence just those two cases
<kerneltoast> yeah, otherwise they can cause an stp_error to be hit. but if they were really fatal, why ignore the stp_error to begin with?
<kerneltoast> so it seems like the intended behavior is to ignore these errors
<fche> well, they're not fatal so they should be warnings -- or maybe they are fatal with respect to actual probe placement?
<kerneltoast> i'm not entirely certain, but i haven't noticed any nasty side effects of this patch, so i'm leaning towards nonfatal
<kerneltoast> for some users of __stp_utrace_attach, the error is definitely nonfatal
<kerneltoast> such as in __stp_utrace_attach_match_filename
<fche> are the probes still hitting the subject processes?
<kerneltoast> good question. maybe agentzh knows how to check, but i'm new here so i'm not sure how
mjw has quit [Quit: Leaving]
<agentzh> kerneltoast: you need to check if the stap tool indeed generates any output when this error/warning happens. or you can pass -t option to stap (not staprun).
<agentzh> the -t option generates probe hitting stats automatically at runtime.
<fche> it may successfully attach probes to -some- but not all processes, so may register hits
<fche> even if the errors are legitimate, and even if they are silenced
<agentzh> i wonder that's the root cause for such errors.
<agentzh> race conditions?
<agentzh> is there anyway to retry?
<agentzh> automatically?
<fche> I believe these errors could relate to processes already dead, so retrying is futile
<fche> or at least one of the errors
<fche> the other may relate to some other condition :(
amerey_ has quit [Ping timeout: 265 seconds]
<fche> agentzh, I suppose in general this class of error messages is not actionable to a user, so there's not much even printing them
<fche> we could tabulate a failure-to-register count, or just add them to the "nmissed" type counters already being collected, and report on them at the end
<fche> there's not much POINT even printing them