fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
khaled has quit [Quit: Konversation terminated!]
hpt has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
sscox has quit [Ping timeout: 265 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
hpt has quit [Quit: Lost terminal]
hpt has joined #systemtap
khaled has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
mjw has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
lijunlong has quit [Ping timeout: 246 seconds]
lijunlong has joined #systemtap
hpt has quit [Ping timeout: 272 seconds]
khaled has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
orivej has joined #systemtap
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 258 seconds]
orivej has quit [Ping timeout: 260 seconds]
khaled_ has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
amerey has joined #systemtap
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
tromey has joined #systemtap
derek0883 has quit [Remote host closed the connection]
<kerneltoast>
the commit message is intentionally sparse at the moment (read: nonexistent)
<kerneltoast>
the patch passes the test suite as it is but i'm going to run the test suite with a lockdep kernel too to be extra certain
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<fche>
surprised that all the _rcu work is doable from all the contexts this code is invoked from, but maybe it's only a few
<kerneltoast>
what context were you expecting this to not work in?
<fche>
from within e.g. funky tracepoints or interrupts
<kerneltoast>
just because we can't sleep?
<fche>
yeah, probably beyond that
<fche>
I think many of the kernel tracepoints are "rated" for a fairly minimal type of workload being done from within their callbacks
<fche>
certainly not the full generality of stap probe handling & accessories
<fche>
(serhei is chasing down one such interaction with the on-the-fly arming/disarming logic just now)
<kerneltoast>
hmm, well the intention of RCU was to make the work as minimal as possible in these paths
<fche>
yeah
<kerneltoast>
it's quite lean. no RCU barriers or synchronizations needed
<fche>
btw, would not worry about one aspect mentioned in the patch: an rcu_barrier in the case of unloading the stap module. at least we've been always assuming this is a rare / possibly-heavy event
<kerneltoast>
i suspected it would be exceptionally rare, but i sleep better at night knowing it's covered
<fche>
sleeping is important
<fche>
at night, even more important
<kerneltoast>
but not in interrupt context
<kerneltoast>
that's why i only use doctor-recommended RCU to help me sleep
<fche>
take two RCUs and good night
<fche>
three is perfect
<fche>
FOUR IS RIGHT OUT
<kerneltoast>
9 out of 10 dentists concur that three RCUs a day is the perfect balance
<kerneltoast>
the tenth dentist has insomnia
<fche>
and the eleventh is crazy
<fche>
okay, please do run the testsuite
<fche>
and see if you can do it on a machine with some good heavy background process churn
<kerneltoast>
testsuite with lockdep, you mean?
<kerneltoast>
i've already run the testsuite normally a few times and it's been knock on wood
<kerneltoast>
but that wood may be balsa
amerey has quit [Remote host closed the connection]
<agentzh>
kerneltoast: i think fche means the full systemtap test suite, not just our lean test suite.
<kerneltoast>
ahh
<agentzh>
the former is horrible.
<agentzh>
can take many hours to run...
<fche>
in a good way
<kerneltoast>
oh boy
<fche>
in a very good way
<fche>
hey if you guys force me to think about this old code
<kerneltoast>
i guess if it passes the full test suite it's golden?
<agentzh>
sometimes leading to softlockup according to my experience.
<fche>
I'll force you back to run the dejagnu test suite :)
<agentzh>
and also many "expected" test failures.
<fche>
if the test results are "reasonable". there is no "pass the full testsuite" in the sense of 0 FAILs.
<agentzh>
but not marked as expected...depending on the kernel you are using.
<fche>
and on compiler and on .... too many parameters :(
<agentzh>
yep, i used to do diffs of the failures.
<agentzh>
before and after my own changes.
<kerneltoast>
hey this test suite sounds like what we used to test kernels at canonical
<kerneltoast>
we had an entire system of categorizing the known failures
<fche>
before very very long, we hope to have an online system to report testsuite results to, and let it report regressions etc. back to you
<kerneltoast>
and it was all done manually
<kerneltoast>
good times..
<fche>
I'm sure serhei would be interested to here about that categorization gadget
<kerneltoast>
it's called a human
<agentzh>
i believe the hash table is definitely read from interrupt contexts like uprobes and timer probes.
<agentzh>
is that okay for RCU locks?
<agentzh>
maybe it's easier than running the full stap test suite.
<kerneltoast>
yes that should be fine for RCU locks
<kerneltoast>
they're not really even locks
<kerneltoast>
they do not block at all
<agentzh>
do we need to worry about premptions here?
<agentzh>
even though they don't sleep.
<kerneltoast>
nope
<kerneltoast>
rcu_read_lock disables preemption
<agentzh>
okay
<kerneltoast>
you can't sleep inside the lock
<kerneltoast>
that's all
<agentzh>
makes sense.
<agentzh>
fche: any particular concerns here?
<agentzh>
we do have our own lean version of a test suite that can be run in parallel.
<agentzh>
which indeed caught a bug in kerneltoast's first version of the rcu patch.
<agentzh>
will that be enough for pushing to master? ;)
<fche>
please try the whole suite
<agentzh>
or maybe we can push it to a branch so that we can reuse your build bot?
<fche>
this is low level enough that tricky workloads could be required to trip it up
<agentzh>
kerneltoast: maybe you can run it overnight.
<agentzh>
see testsuite/README for details.
<agentzh>
it can run in parallel to save some time.
<agentzh>
a lot of time on a SMP system.
<fche>
sudo make installcheck and walk away for the night :)
<agentzh>
but i guess we still need to run twice to compare the diffs.
<agentzh>
the diffs of the test report.
<kerneltoast>
yeah that'll probably vary by the host machine...
<agentzh>
of course.
<agentzh>
also, you need to use the open source repo's master to test.
<agentzh>
not our private branch.
<kerneltoast>
yeah
<agentzh>
assuming the tests would complete.
<agentzh>
sometimes they don't...
<kerneltoast>
oof
<agentzh>
like hitting some bugs.
<fche>
your private branch doesn't carry the testsuite?