fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
hpt has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek088_ has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek088_ has quit [Read error: Connection reset by peer]
derek0883 has joined #systemtap
agentzh has quit [Ping timeout: 265 seconds]
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
_whitelogger has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
khaled has quit [Quit: Konversation terminated!]
agentzh has joined #systemtap
agentzh has quit [Changing host]
agentzh has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
_whitelogger has joined #systemtap
orivej has joined #systemtap
zhuizhuhaomeng has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
khaled has joined #systemtap
orivej has joined #systemtap
zhuizhuhaomeng has quit [Ping timeout: 240 seconds]
orivej has quit [Ping timeout: 265 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
derek0883 has joined #systemtap
<agentzh> fche: do you know what remaining work is needed to make STP_UTRACE_USE_TASK_WORK_QUEUE turn on by default?
<agentzh> the vma engine callbacks would get defered and reordered when the callbacks are triggered in atomic contexts, leading to process.begin probes fired before the vma maps'.
<agentzh> it can be reproduced easily on a kernel-debug kernel where everything is slowed down.
<agentzh> STP_UTRACE_USE_TASK_WORK_QUEUE seems to fix it.
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<fche> probably a runtime/autoconf issue
<agentzh> yeah i also saw "Need to decide correct CONFIG_* to check for" in the code comments. just wondering about the details.
<agentzh> maybe just for antient kernels?
amerey_ has quit [Remote host closed the connection]
amerey_ has joined #systemtap
<fche> yeah, work queues are probably a decade old addition to the kernel
<fche> don't have fresh memory about exactly when it came in tho
<agentzh> shall we just enable it by default in master and see what would break for the community?
<fche> I'd prefer to find why it's not working in your case
<fche> not make a broad change that we don't understand
<agentzh> for my case, it is that the vma callback is called in atomic contexts and it is deferred by the same mechanism anyway.
<agentzh> the work queue mechanism.
<agentzh> and then process begin runs before the vma maps callback.
<agentzh> leaving the process.begin probe run without the vma tracker initialized.
<agentzh> see func __stp_utrace_task_finder_target_quiesce
<agentzh> it calls stp_task_work_add() when the context is in atomic.
orivej has quit [Ping timeout: 256 seconds]
<fche> hm
<fche> ok I finally glanced at runtime/stp_utrace.c where this macro & your quotes are from
<fche> we do test this stuff on _RT_ kernels already, so STP_UTRACE_USE_TASK_WORK_QUEUE cannot be blatantly unsafe
<fche> does it survive on a lockdep/rawhide kernel okay?
<agentzh> fche: we've run a lot of tests on kernel-debug with lockdep/kmemleak/etc enabled and did not find anything. actually this flag is exactly how our tests can pass on lockdup/debug kernels.
<fche> ok
<fche> I'm leaning to go with your suggestion of turning it on more broadly then.
<fche> how about you turn it on, we'll run our buildbots, and in about 24 hours we should have a good idea whether they hurt anything
<agentzh> great
<agentzh> i'll prepare a patch for your review.
<agentzh> in the meaintime we'll run more tests on older kernels of centos 6, both lockdep/debug kind and production ones.
<fche> heh, just removing the #if / #endif and tweaking the comments ?
<fche> righto, go ahead.
<agentzh> yep
derek0883 has quit [Remote host closed the connection]
<fche> yeah interesting wording re. conservative choice ... ISTM that deferring into task context would be the conservative choice
<fche> just was probably written later in time
<agentzh> i agree :) the work queue case is more conservative.
orivej has joined #systemtap
derek0883 has joined #systemtap
<fche> (but is more costly; it involves a memcpy of pt_regs)
<fche> hmmmmmmm
<fche> and may preclude MODIFICATION
<fche> commit 9827e1858
<fche> need to think about this further
<fche> hm if it applies only to -tracepoint-invoked- probes, not uprobes, then it's probably fine, even with the loss of pt_regs modification capability
<fche> yeah uprobes don't go down this path; they come in via uprobes-inode.c .... which /me was working on for so long just recently
<fche> but syscall entries do come this way ... and those regs we want to be able to modify in situ
<fche> so yeah need to think about this further
<fche> utrace_report_syscall_entry specifically
orivej has quit [Ping timeout: 264 seconds]
orivej has joined #systemtap
<agentzh> fche: okay
orivej has quit [Ping timeout: 265 seconds]
orivej has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek088_ has joined #systemtap
derek088_ has quit [Ping timeout: 260 seconds]