fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
_whitelogger has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 246 seconds]
mjw has quit [Read error: Connection reset by peer]
mjw has joined #systemtap
mjw has quit [Remote host closed the connection]
hpt has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
wcohen has quit [Ping timeout: 244 seconds]
wcohen has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
tonyj has quit [Remote host closed the connection]
derek0883 has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
orivej has joined #systemtap
khaled has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 244 seconds]
orivej has quit [Ping timeout: 260 seconds]
lijunlong has quit [Ping timeout: 265 seconds]
lijunlong has joined #systemtap
fLiPr3VeRsE has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
hpt has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
fLiPr3VeRsE has joined #systemtap
amerey has joined #systemtap
orivej has quit [Ping timeout: 264 seconds]
tonyj has joined #systemtap
derek0883 has joined #systemtap
irker972 has joined #systemtap
<irker972> systemtap: fche systemtap.git:azhang/pr13838 * release-4.3-89-g995846f21 / tapset/floatingpoint.stp: PR13838: drop unneeded formatting #defines from floatingpoint.stp
<irker972> systemtap: fche systemtap.git:azhang/pr13838 * release-4.3-90-gc37ca85dd / testsuite/buildok/floatingpoint.stp: buildok/floatingpoint.stp: make runnable
derek0883 has quit [Ping timeout: 260 seconds]
<irker972> systemtap: fche systemtap.git:azhang/pr13838 * release-4.3-91-g047fb1ab6 / doc/SystemTap_Tapset_Reference/Makefile.am doc/SystemTap_Tapset_Reference/Makefile.in doc/SystemTap_Tapset_Reference/tapsets.tmpl java/Makefile.in python/Makefile.in stap-exporter/Makefile.in tapset/floatingpoint.stp: tapset/floatingpoint.stp: add to tapset reference guide
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
tromey has joined #systemtap
<kerneltoast> good morning fche. i think i've got to the bottom of some of the issues agentzh and i are facing, and i wanted to ask you a bit about what we've found
<fche> wow
<kerneltoast> it only took a few weeks :D
<kerneltoast> so, i noticed that in the reporting pass everywhere, every engine attached to a process is iterated through, and then the callback for each engine is run via start_callback()
<kerneltoast> start_callback() takes a pointer to a report struct
<kerneltoast> everywhere start_callback() is used in a loop on every engine, it is passed a pointer to a single on-stack report struct
<kerneltoast> a prime example is in utrace_resume()
<kerneltoast> where we have the following code:
<kerneltoast> list_for_each_entry(engine, &utrace->attached, entry)
<kerneltoast> start_callback(utrace, &report, engine, task, 0);
<kerneltoast> report.action = UTRACE_RESUME;
<kerneltoast> a single report struct allocated on the stack is used for every engine's callback, but this causes some weird behavior
<kerneltoast> each callback modifies report->action to indicate what it wants the target to do
<kerneltoast> so what ends up happening is that each subsequent engine callback overwrites report->action with what it wants
<kerneltoast> and the engine at the tail of the list gets the final say on what report->action is
<kerneltoast> the problem is that not all engines agree on what they want the target to do
<kerneltoast> some engines may want UTRACE_INTERRUPT, others may want UTRACE_STOP
<fche> (keep going: dude, we should attach this whole observation set as a block comment into stp_utrace.c!!! please)
<kerneltoast> when an engine is first attached in stap_start_task_finder(), the action it requests by default is UTRACE_STOP
<kerneltoast> the engine will then request UTRACE_INTERRUPT, but things get hairy when a reporting pass occurs before that
<kerneltoast> when a brand new engine is attached, and a reporting pass occurs before it can request UTRACE_INTERRUPT, it will end up requesting UTRACE_STOP to the target process during the reporting pass
<kerneltoast> if this new engine is not at the tail of the attached list, it will just be ignored
<kerneltoast> but when such an engine is at the tail of the list, its UTRACE_STOP request will go through
<fche> (ISTR the kernel utrace had some sort of algebra system to combine the various engines' UTRACE_* judgements)
<kerneltoast> and the target process will enter utrace_stop(), upon which it will never exit until it receives a SIGKILL
<fche> (maybe the stp_utrace emulation doesn't do that part the same way as the original)
<kerneltoast> so there are three issues here i think
<kerneltoast> 1. the last engine in the attached list decides what action the target process will take, because report->action gets overwritten
<kerneltoast> 2. the default action for an attached engine is UTRACE_STOP. i don't think we want the target process to stop when an engine gets attached to it
<kerneltoast> 3. the UTRACE_STOP state currently has no natural way of being exited. I think UTRACE_STOP as a whole is an incomplete feature, and i'm not sure what needs it
<fche> UTRACE_INTERRUPT instead?
<kerneltoast> either that, or we have the engine do nothing in the reporting passes until it actually requests something
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<fche> kerneltoast,
<fche> er we just kind of stopped there
<kerneltoast> i'm not sure what to do
<fche> aha
<kerneltoast> there are a lot of different ways to address these
<kerneltoast> and i have no idea which would be preferred :)
<fche> aha
<fche> re. 2, yeah I'm pretty sure a default of UTRACE_STOP is counterproductive
<fche> lemme check ancient rhel6 original-utrace code
<kerneltoast> UTRACE_STOP is also used in __stp_utrace_attach_match_filename()
<kerneltoast> not sure why
<kerneltoast> i also tried going through the git history to find incentives for these things, but all the old commits are just big code drops
<fche> this is from the original utrace (well, 10 years old code :)
<fche> but it was well documented
<kerneltoast> oh wow
<fche> the stp_utrace.c code was a reimplementation of the same thing roughly
<fche> and it's possible that some edge cases weren't thought through (maybe not in the old code either)
<fche> but at least there's more background info & another reference code base
<kerneltoast> yeah this sheds light on the intent behind a lot of things
<fche> nearby one can find various generations of the original utrace implementation too, patches etc.
<kerneltoast> from that original patch: "An attached engine does nothing by default."
<kerneltoast> so we should ignore newly attached engines in the reporting passes, until they request something, at least to follow the original intention
<kerneltoast> that can just be achieved with another bit to the flags field of the engine
<fche> they may not be able to request something unless they are reported to, not sure
<kerneltoast> ah
<fche> anyway
<fche> it may be worth reading through that for more background info before deciding on a suggestion
<kerneltoast> yeah. any idea where to find the old utrace flag algebra you mentioned?
<fche> been trying to find that with a quick glance, lemme try harder :)
<fche> it might be as simple as ... "the lowest number enum-value wins"
<kerneltoast> if (action < report->action)
<kerneltoast> report->action = action;
<kerneltoast> that's in finish_callback_report()
<fche> hm and in original utrace.patch:
<fche> +static void finish_report(struct task_struct *task, struct utrace *utrace,
<fche> + struct utrace_report *report, bool will_not_stop)
<fche> + enum utrace_resume_action resume = report->action;
<fche> +{
<fche> +
<fche> + if (resume == UTRACE_STOP)
<fche> + resume = will_not_stop ? UTRACE_REPORT : UTRACE_RESUME;
<fche> +
<fche> + if (resume < utrace->resume) {
<fche> + spin_lock(&utrace->lock);
<fche> + utrace->resume = resume;
<kerneltoast> ok so that behavior is implemented correctly. UTRACE_STOP wins as expected
<fche> yeah but it shouldn't persist
<fche> maybe some engine is not getting called again as it should be to release the stop?
gregwork has quit [Quit: Connection closed for inactivity]
<kerneltoast> it persists partially because UTRACE_RESUME gets filtered out
<kerneltoast> in __stp_utrace_attach():
<kerneltoast> if (action != UTRACE_RESUME) {
<kerneltoast> rc = utrace_control(tsk, engine, action);
<kerneltoast> i've shown you this before iirc
<kerneltoast> removing that UTRACE_RESUME filter got the stopped processes to resume again, but not always
<kerneltoast> so from what i can tell, UTRACE_STOP is missing a lot of stuff
<kerneltoast> and it's only used as the default request when an engine is attached
<kerneltoast> perhaps it should be removed?
<kerneltoast> the goal of it seems to be to stop the target process in order to safely inspect it. but with the task work api, that's not an issue
<kerneltoast> since we are always working from within the context of the target process
khaled has quit [Remote host closed the connection]
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
wcohen has quit [Remote host closed the connection]
derek0883 has quit [Remote host closed the connection]
khaled has joined #systemtap
derek0883 has joined #systemtap
wcohen has joined #systemtap
orivej has quit [Ping timeout: 264 seconds]
amerey has quit [Quit: Leaving]