fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
hpt has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek088_ has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek088_ has quit [Read error: Connection reset by peer]
derek0883 has joined #systemtap
agentzh has quit [Ping timeout: 265 seconds]
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
_whitelogger has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
khaled has quit [Quit: Konversation terminated!]
agentzh has joined #systemtap
agentzh has quit [Changing host]
agentzh has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
_whitelogger has joined #systemtap
orivej has joined #systemtap
zhuizhuhaomeng has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
khaled has joined #systemtap
orivej has joined #systemtap
zhuizhuhaomeng has quit [Ping timeout: 240 seconds]
orivej has quit [Ping timeout: 265 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
derek0883 has joined #systemtap
<agentzh>
fche: do you know what remaining work is needed to make STP_UTRACE_USE_TASK_WORK_QUEUE turn on by default?
<agentzh>
the vma engine callbacks would get defered and reordered when the callbacks are triggered in atomic contexts, leading to process.begin probes fired before the vma maps'.
<agentzh>
it can be reproduced easily on a kernel-debug kernel where everything is slowed down.
<agentzh>
STP_UTRACE_USE_TASK_WORK_QUEUE seems to fix it.
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<fche>
probably a runtime/autoconf issue
<agentzh>
yeah i also saw "Need to decide correct CONFIG_* to check for" in the code comments. just wondering about the details.
<agentzh>
maybe just for antient kernels?
amerey_ has quit [Remote host closed the connection]
amerey_ has joined #systemtap
<fche>
yeah, work queues are probably a decade old addition to the kernel
<fche>
don't have fresh memory about exactly when it came in tho
<agentzh>
shall we just enable it by default in master and see what would break for the community?
<fche>
I'd prefer to find why it's not working in your case
<fche>
not make a broad change that we don't understand
<agentzh>
for my case, it is that the vma callback is called in atomic contexts and it is deferred by the same mechanism anyway.
<agentzh>
the work queue mechanism.
<agentzh>
and then process begin runs before the vma maps callback.
<agentzh>
leaving the process.begin probe run without the vma tracker initialized.
<agentzh>
see func __stp_utrace_task_finder_target_quiesce
<agentzh>
it calls stp_task_work_add() when the context is in atomic.
orivej has quit [Ping timeout: 256 seconds]
<fche>
hm
<fche>
ok I finally glanced at runtime/stp_utrace.c where this macro & your quotes are from
<fche>
we do test this stuff on _RT_ kernels already, so STP_UTRACE_USE_TASK_WORK_QUEUE cannot be blatantly unsafe
<fche>
does it survive on a lockdep/rawhide kernel okay?
<agentzh>
fche: we've run a lot of tests on kernel-debug with lockdep/kmemleak/etc enabled and did not find anything. actually this flag is exactly how our tests can pass on lockdup/debug kernels.
<fche>
ok
<fche>
I'm leaning to go with your suggestion of turning it on more broadly then.
<fche>
how about you turn it on, we'll run our buildbots, and in about 24 hours we should have a good idea whether they hurt anything
<agentzh>
great
<agentzh>
i'll prepare a patch for your review.
<agentzh>
in the meaintime we'll run more tests on older kernels of centos 6, both lockdep/debug kind and production ones.
<fche>
heh, just removing the #if / #endif and tweaking the comments ?
<fche>
righto, go ahead.
<agentzh>
yep
derek0883 has quit [Remote host closed the connection]
<fche>
yeah interesting wording re. conservative choice ... ISTM that deferring into task context would be the conservative choice
<fche>
just was probably written later in time
<agentzh>
i agree :) the work queue case is more conservative.
orivej has joined #systemtap
derek0883 has joined #systemtap
<fche>
(but is more costly; it involves a memcpy of pt_regs)
<fche>
hmmmmmmm
<fche>
and may preclude MODIFICATION
<fche>
commit 9827e1858
<fche>
need to think about this further
<fche>
hm if it applies only to -tracepoint-invoked- probes, not uprobes, then it's probably fine, even with the loss of pt_regs modification capability
<fche>
yeah uprobes don't go down this path; they come in via uprobes-inode.c .... which /me was working on for so long just recently
<fche>
but syscall entries do come this way ... and those regs we want to be able to modify in situ
<fche>
so yeah need to think about this further
<fche>
utrace_report_syscall_entry specifically
orivej has quit [Ping timeout: 264 seconds]
orivej has joined #systemtap
<agentzh>
fche: okay
orivej has quit [Ping timeout: 265 seconds]
orivej has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]