fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<kerneltoast>
no i am sure
<kerneltoast>
because of my spaghetti explanation above
<kerneltoast>
i think i can tldr my explanation
<fche>
well can look at a fuller patch / explanation later on, i assume you can run it through ksan etc.
<kerneltoast>
utrace structs are used outside of an rcu_read_lock()
<kerneltoast>
so they can already be freed while they're "still in use"
<kerneltoast>
but it turns out they're not still in use
<kerneltoast>
that means the existing code doesn't touch utrace structs once we reach kfree_rcu
<kerneltoast>
because if it did we'd see crashes by now
<kerneltoast>
and the code's been like this for a decade
<kerneltoast>
how's that?
<fche>
'because it'd have crashed by now ......' um not strong
<kerneltoast>
"it hasn't crashed for 10 years even with our recent round of fuzzing thousands of kernels"
<fche>
can you make an argument based on code inspection?
<kerneltoast>
this *is* based on code inspection, leaning on old assumptions in stap
<kerneltoast>
if the old assumption is wrong then basically all of stp_utrace needs to be rewritten
<kerneltoast>
let's clarify the potential issue with reusing those utrace struct members:
<kerneltoast>
if it is possible code to pull a dead utrace struct and use those struct members under an RCU read lock, then we can't trash them
<kerneltoast>
*for code
<kerneltoast>
but take a look at task_utrace_struct()
<kerneltoast>
task_utrace_struct() makes it possible to keep using a utrace struct *outside* of an RCU read lock
<kerneltoast>
that means that the utrace struct can be freed by RCU while it is still in use, since it's accessed outside an RCU read lock
<kerneltoast>
but in reality that does not happen
<kerneltoast>
and it doesn't happen because the utrace code already ensures that utrace structs cannot still be in use when they get freed
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast>
if that assurance were not there, stap would've been exploding for the past 10 years, even before i RCUified the utrace structs
<kerneltoast>
i don't fully understand how this is done, but it must be done
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast>
fche, convinced yet?
<kerneltoast>
i can convince moar
<fche>
sorry my main focus is somewhere else
<kerneltoast>
logger bug?
derek0883 has quit [Ping timeout: 256 seconds]
<fche>
can you post the explanation in the patch comment form?
<fche>
nah something else - that one (subbuf thing timer) next later on
<kerneltoast>
k I'll go take a walk and post a nice and coherent explanation there
<fche>
ok
derek0883 has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
fdalleau_away has quit [Ping timeout: 272 seconds]
fdalleau_away has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 246 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 256 seconds]
hpt has joined #systemtap
derek0883 has joined #systemtap
hpt has quit [Ping timeout: 260 seconds]
hpt has joined #systemtap
irker249 has quit [Quit: transmission timeout]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
_whitelogger has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
fdalleau_away is now known as fdalleau
derek0883 has quit [Ping timeout: 264 seconds]
orivej has quit [Ping timeout: 264 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
hpt has quit [Ping timeout: 246 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
mjw has joined #systemtap
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
khaled has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
tromey has joined #systemtap
amerey has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
orivej has joined #systemtap
fdalleau is now known as fdalleau_away
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
<irker979>
systemtap: sultan systemtap.git:master * release-4.4-67-gd7ea535c6 / runtime/stp_utrace.c: stp_utrace: remove unneeded RCU-freed field from struct utrace
<kerneltoast>
now that's done with, i can go back to the stp_print issue
<kerneltoast>
i'm going to try a solution that modifies staprun
<fche>
I'd really like to see a per-cpu timer approach to compare/first
derek0883 has quit [Remote host closed the connection]
<kerneltoast>
hmm, ok if i can whip it up quickly enough
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
<agentzh>
fche: is it expected that -x PID also probes all the PID process's children? it seems like the current stap master has this behavior, which is a surprise.
<agentzh>
we're implementing the -x -PID for pgid filtering but cosmin found this "pgid" behavior is already there.
demon000_ has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
amerey has quit [Quit: Leaving]
<fche>
yes, children expected
<fche>
[man stapprobes] documents it
<agentzh>
hmm, this is unfortunate.
<agentzh>
so we'll have to add an if statement on the stap level to skip children?
<agentzh>
that sounds quite inefficient.
<agentzh>
given the current stap probe handler boilerplate overhead.
<fche>
what kind of children are we talking about, and to skip what?
<agentzh>
take apache httpd or nginx as an example, sometimes we just want to trace the master process and skip all its child (worker) processes.
<agentzh>
and currently stap will spend time to process the vma map and everything for all the children even if we only want to trace a single process.
<agentzh>
there can be many children.
demon000_ has quit [Ping timeout: 272 seconds]
<fche>
agentzh, a mode that doesn't follow across a fork would be fine as an option
<agentzh>
fche: cool
<agentzh>
thanks
khaled has quit [Quit: Konversation terminated!]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek088_ has joined #systemtap
derek0883 has quit [Remote host closed the connection]