fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<kerneltoast> no i am sure
<kerneltoast> because of my spaghetti explanation above
<kerneltoast> i think i can tldr my explanation
<fche> well can look at a fuller patch / explanation later on, i assume you can run it through ksan etc.
<kerneltoast> utrace structs are used outside of an rcu_read_lock()
<kerneltoast> so they can already be freed while they're "still in use"
<kerneltoast> but it turns out they're not still in use
<kerneltoast> that means the existing code doesn't touch utrace structs once we reach kfree_rcu
<kerneltoast> because if it did we'd see crashes by now
<kerneltoast> and the code's been like this for a decade
<kerneltoast> how's that?
<fche> 'because it'd have crashed by now ......' um not strong
<kerneltoast> "it hasn't crashed for 10 years even with our recent round of fuzzing thousands of kernels"
<fche> can you make an argument based on code inspection?
<kerneltoast> this *is* based on code inspection, leaning on old assumptions in stap
<kerneltoast> if the old assumption is wrong then basically all of stp_utrace needs to be rewritten
<kerneltoast> let's clarify the potential issue with reusing those utrace struct members:
<kerneltoast> if it is possible code to pull a dead utrace struct and use those struct members under an RCU read lock, then we can't trash them
<kerneltoast> *for code
<kerneltoast> but take a look at task_utrace_struct()
<kerneltoast> task_utrace_struct() makes it possible to keep using a utrace struct *outside* of an RCU read lock
<kerneltoast> that means that the utrace struct can be freed by RCU while it is still in use, since it's accessed outside an RCU read lock
<kerneltoast> but in reality that does not happen
<kerneltoast> and it doesn't happen because the utrace code already ensures that utrace structs cannot still be in use when they get freed
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> if that assurance were not there, stap would've been exploding for the past 10 years, even before i RCUified the utrace structs
<kerneltoast> i don't fully understand how this is done, but it must be done
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> fche, convinced yet?
<kerneltoast> i can convince moar
<fche> sorry my main focus is somewhere else
<kerneltoast> logger bug?
derek0883 has quit [Ping timeout: 256 seconds]
<fche> can you post the explanation in the patch comment form?
<fche> nah something else - that one (subbuf thing timer) next later on
<kerneltoast> k I'll go take a walk and post a nice and coherent explanation there
<fche> ok
derek0883 has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
fdalleau_away has quit [Ping timeout: 272 seconds]
fdalleau_away has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 246 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 256 seconds]
hpt has joined #systemtap
derek0883 has joined #systemtap
hpt has quit [Ping timeout: 260 seconds]
hpt has joined #systemtap
irker249 has quit [Quit: transmission timeout]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
_whitelogger has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
fdalleau_away is now known as fdalleau
derek0883 has quit [Ping timeout: 264 seconds]
orivej has quit [Ping timeout: 264 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
hpt has quit [Ping timeout: 246 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
mjw has joined #systemtap
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
khaled has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
tromey has joined #systemtap
amerey has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
orivej has joined #systemtap
fdalleau is now known as fdalleau_away
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> fche, yo
<kerneltoast> fche, this small cleanup is also possible thanks to the realization i made: https://gist.github.com/kerneltoast/9808dd94155cd1b3a38fd0ca6b4bd59a
<fche> how much testing did those survive?
<kerneltoast> yes
<kerneltoast> i just ran it once through our testsuite
<kerneltoast> i remember when writing that code that i thought utrace->free was flimsy
<kerneltoast> and it turns out my hunch was correct
<fche> ok, can give it a shot
<kerneltoast> give it a shot == push?
<fche> yeah
<kerneltoast> noice
irker979 has joined #systemtap
<irker979> systemtap: mcermak systemtap.git:master * release-4.4-65-g734f5acf6 / initscript/systemtap.in: systemtap-service onboot: Skip updating the bootloader
<irker979> systemtap: sultan systemtap.git:master * release-4.4-66-g5c6c7a994 / runtime/stp_utrace.c: stp_utrace: remove kmem cache usage
<irker979> systemtap: sultan systemtap.git:master * release-4.4-67-gd7ea535c6 / runtime/stp_utrace.c: stp_utrace: remove unneeded RCU-freed field from struct utrace
<kerneltoast> now that's done with, i can go back to the stp_print issue
<kerneltoast> i'm going to try a solution that modifies staprun
<fche> I'd really like to see a per-cpu timer approach to compare/first
derek0883 has quit [Remote host closed the connection]
<kerneltoast> hmm, ok if i can whip it up quickly enough
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
<agentzh> fche: is it expected that -x PID also probes all the PID process's children? it seems like the current stap master has this behavior, which is a surprise.
<agentzh> we're implementing the -x -PID for pgid filtering but cosmin found this "pgid" behavior is already there.
demon000_ has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
amerey has quit [Quit: Leaving]
<fche> yes, children expected
<fche> [man stapprobes] documents it
<agentzh> hmm, this is unfortunate.
<agentzh> so we'll have to add an if statement on the stap level to skip children?
<agentzh> that sounds quite inefficient.
<agentzh> given the current stap probe handler boilerplate overhead.
<fche> what kind of children are we talking about, and to skip what?
<agentzh> take apache httpd or nginx as an example, sometimes we just want to trace the master process and skip all its child (worker) processes.
<agentzh> and currently stap will spend time to process the vma map and everything for all the children even if we only want to trace a single process.
<agentzh> there can be many children.
demon000_ has quit [Ping timeout: 272 seconds]
<fche> agentzh, a mode that doesn't follow across a fork would be fine as an option
<agentzh> fche: cool
<agentzh> thanks
khaled has quit [Quit: Konversation terminated!]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek088_ has joined #systemtap
derek0883 has quit [Remote host closed the connection]