fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
mjw has quit [Quit: Leaving]
derek0883 has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
_whitelogger has joined #systemtap
orivej has quit [Quit: orivej]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
orivej has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
hpt has joined #systemtap
_whitelogger has joined #systemtap
derek088_ has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek088_ has quit [Ping timeout: 240 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
orivej has quit [Ping timeout: 264 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
hpt has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
mjw has joined #systemtap
khaled has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
amerey has joined #systemtap
amerey has quit [Quit: Leaving]
amerey has joined #systemtap
derek0883 has joined #systemtap
<kerneltoast> fche, so it wasn't just a fluke: the bulkmode patch makes the testsuite take 2.5x longer to run
<fche> interesting
<fche> maybe a startup or shutdown weird-delay thing?
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> there are a lot of timeouts like this: FAIL: tcptest startup (timeout)
<kerneltoast> guess i'll see why that happens
<fche> ok, that's a good thing
<fche> things are not just slower to run but something is temporarily stuck
<kerneltoast> so for something like this: FAIL: add startup (timeout)
<kerneltoast> how does the testsuite separate the startup from the test? what part of add.exp is "startup"?
<fche> look at add.exp
<fche> it calls some other dejagnu (tcl) procedure
<fche> and that bad boy (probably in testsuite/lib/*) prints that "startup (timeout)" message
<kerneltoast> set test "add"
<kerneltoast> --runtime=$runtime $srcdir/$subdir/$test.stp
<kerneltoast> stap_run $test no_load $all_pass_string \
<kerneltoast> foreach runtime [get_runtime_list] {
<kerneltoast> if {$runtime != ""} {
<kerneltoast> } else {
<kerneltoast> stap_run $test no_load $all_pass_string $srcdir/$subdir/$test.stp
<kerneltoast> }
<kerneltoast> }
<fche> so see stap_run
<fche> this flavour of tests prints a "systemtap test started" kind of message in probe-begin
<kerneltoast> ah
<kerneltoast> stap_run.exp
<kerneltoast> fche, how can i tell where it's timing out?
<kerneltoast> is there a way to see verbose execution of the .exp?
<fche> the .log file should print the stap command line being attempted
<fche> and the "startup (timeout)" message comes if the "systemtap starting probe\r\n" line is not seen quickly
<fche> one can also make installcheck RUNTESTFLAGS="foo.exp -v" <<< note the added -v for more verbosity
<fche> but the .log file is about as complete
<kerneltoast> FAIL: add startup (timeout)
<kerneltoast> Pass 5: starting run.
<kerneltoast> ^ that's what the .log has
<kerneltoast> hmm -v doesn't show what's going on inside add.exp
<fche> look up a few lines
<fche> it should show the stap command line
<fche> then I'd run that same command line by hand
<fche> perhaps with more verbosity
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> fche, hmm probe end doesn't happen until i ^C add.stp
<fche> ok lemme help from this side, which test case are you looking at?
<fche> tcptest ?
<kerneltoast> add.exp
<fche> ok
<kerneltoast> running add.stp just hangs
<fche> ok so in the .log file let's find out how the actual stap test case is run
<kerneltoast> add.stp is really simple
<kerneltoast> no arguments
<kerneltoast> it's the most beautiful test i've ever seen
<fche> that good!
<kerneltoast> add two numbers together, if they're not equal to the hardcoded sum, error
<kerneltoast> simple
<fche> math class is hard
<kerneltoast> i remember a dmesg entry i saw on my galaxy s2 some years ago
<kerneltoast> <4>[ 149.662139] ld9040 IElvss : 28+6=29
<kerneltoast> even the pros at samsung have trouble with math
<fche> wow
<fche> kerneltoast,
<fche> ok running that test shere
<fche> when you hit ^C, the "starting probe" etc. messages do appear
<fche> so they're in the queue
<fche> but I think userspace threads might just not have been woken up about them, so they didn't get the message till the ^C signal
irker129 has joined #systemtap
<irker129> systemtap: fche systemtap.git:master * release-4.4-25-gcd6399e62 / runtime/dyninst/print.c runtime/dyninst/runtime_defines.h: dyninst transport: add _stp_print_*lock_irq* stubs
<fche> looking at relay_wakeup_readers and related state
<fche> also can strace -f staprun/stapio that is running add.stp
<kerneltoast> stapio is just polling
<fche> yup, but not taking the data from that probe begin
<fche> ./stap -v -e 'probe begin {log("hi") }' <<< visible there too
<fche> so some wakeup is missing
<kerneltoast> hmmmmmmmmmmm
<kerneltoast> seems like this code really wants __stp_relay_switch_subbuf() to be called
<fche> every now and then
<fche> we have that wakeup timer thing, wonder why it's not enough / doing its job
<kerneltoast> the timer does go off
<kerneltoast> _stp_relay_data.wakeup isn't getting set
<fche> __stp_relay_switch_subbuf is the only place that can do that
<kerneltoast> right but i have no idea what "switching the subbuffer" means
<kerneltoast> or when we should do it
<kerneltoast> any ideas about that?
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<fche> just comparing master vs your branch
derek0883 has quit [Ping timeout: 264 seconds]
<fche> stap -v -e 'probe timer.ms(1) {log("hi") }' <<< interesting: no output for a while, then (once whatever buffer fills), staprun does finally get the hint and prints it out
<kerneltoast> yeah it looks like the _stp_get_rchan_subbuf() logic is configured to flush when the buffer is full
derek0883 has joined #systemtap
<kerneltoast> it's gotta be this subbuf switch stuff
<kerneltoast> it's black magic to me
<kerneltoast> i don't understand what it's doing
<fche> yeah
_whitelogger has joined #systemtap
_whitelogger has joined #systemtap
<fche> https://paste.centos.org/view/f0df3377 <<<< this doesn't help
<fche> (my theory was that the print_flush being invoked all the time, incl. at every probe-handler, would set that wakeup flag for the timer to run
<fche> but yeah I think it's a periodic subbuf-switch thing that's called for, just not finding the path for that in the current code
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
mjw has quit [Ping timeout: 258 seconds]
mjw has joined #systemtap
<kerneltoast> funny i did the same thing you did in your paste
amerey has quit [Quit: Leaving]
irker129 has quit [Quit: transmission timeout]