#systemtap on 2020-12-08 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

00:52 mjw has quit [Quit: Leaving]

02:13 derek0883 has quit [Ping timeout: 240 seconds]

02:41 derek0883 has joined #systemtap

02:52 _whitelogger has joined #systemtap

03:06 orivej has quit [Quit: orivej]

03:17 derek0883 has quit [Remote host closed the connection]

03:17 derek0883 has joined #systemtap

03:18 orivej has joined #systemtap

03:19 derek0883 has quit [Remote host closed the connection]

03:19 derek0883 has joined #systemtap

03:51 khaled has quit [Quit: Konversation terminated!]

04:38 hpt has joined #systemtap

04:43 _whitelogger has joined #systemtap

05:22 derek088_ has joined #systemtap

05:24 derek0883 has quit [Ping timeout: 264 seconds]

05:37 derek088_ has quit [Ping timeout: 240 seconds]

05:47 derek0883 has joined #systemtap

06:14 derek0883 has quit [Remote host closed the connection]

06:57 derek0883 has joined #systemtap

07:44 orivej has quit [Ping timeout: 264 seconds]

08:20 derek0883 has quit [Remote host closed the connection]

08:59 derek0883 has joined #systemtap

09:04 derek0883 has quit [Ping timeout: 260 seconds]

09:28 hpt has quit [Ping timeout: 260 seconds]

10:24 orivej has joined #systemtap

10:29 mjw has joined #systemtap

11:19 khaled has joined #systemtap

12:06 orivej has quit [Ping timeout: 240 seconds]

13:37 orivej has joined #systemtap

15:07 amerey has joined #systemtap

15:58 amerey has quit [Quit: Leaving]

16:59 amerey has joined #systemtap

17:01 derek0883 has joined #systemtap

17:48 <kerneltoast> fche, so it wasn't just a fluke: the bulkmode patch makes the testsuite take 2.5x longer to run

17:50 <fche> interesting

17:51 <fche> maybe a startup or shutdown weird-delay thing?

17:51 derek0883 has quit [Remote host closed the connection]

17:51 derek0883 has joined #systemtap

17:52 <kerneltoast> there are a lot of timeouts like this: FAIL: tcptest startup (timeout)

17:52 <kerneltoast> guess i'll see why that happens

17:52 <fche> ok, that's a good thing

17:53 <fche> things are not just slower to run but something is temporarily stuck

17:57 <kerneltoast> so for something like this: FAIL: add startup (timeout)

17:57 <kerneltoast> how does the testsuite separate the startup from the test? what part of add.exp is "startup"?

17:58 <fche> look at add.exp

17:58 <fche> it calls some other dejagnu (tcl) procedure

17:58 <fche> and that bad boy (probably in testsuite/lib/*) prints that "startup (timeout)" message

17:58 <kerneltoast> set test "add"

17:58 <kerneltoast> --runtime=$runtime $srcdir/$subdir/$test.stp

17:58 <kerneltoast> stap_run $test no_load $all_pass_string \

17:58 <kerneltoast> foreach runtime [get_runtime_list] {

17:58 <kerneltoast> if {$runtime != ""} {

17:58 <kerneltoast> } else {

17:58 <kerneltoast> stap_run $test no_load $all_pass_string $srcdir/$subdir/$test.stp

17:58 <kerneltoast> }

17:58 <fche> so see stap_run

17:59 <fche> this flavour of tests prints a "systemtap test started" kind of message in probe-begin

17:59 <kerneltoast> ah

17:59 <kerneltoast> stap_run.exp

18:38 <kerneltoast> fche, how can i tell where it's timing out?

18:39 <kerneltoast> is there a way to see verbose execution of the .exp?

18:39 <fche> the .log file should print the stap command line being attempted

18:39 <fche> and the "startup (timeout)" message comes if the "systemtap starting probe\r\n" line is not seen quickly

18:40 <fche> one can also make installcheck RUNTESTFLAGS="foo.exp -v" <<< note the added -v for more verbosity

18:40 <fche> but the .log file is about as complete

18:42 <kerneltoast> FAIL: add startup (timeout)

18:42 <kerneltoast> Pass 5: starting run.

18:42 <kerneltoast> ^ that's what the .log has

18:43 <kerneltoast> hmm -v doesn't show what's going on inside add.exp

18:49 <fche> look up a few lines

18:49 <fche> it should show the stap command line

18:49 <fche> then I'd run that same command line by hand

18:49 <fche> perhaps with more verbosity

18:51 derek0883 has quit [Remote host closed the connection]

18:52 derek0883 has joined #systemtap

18:54 <kerneltoast> fche, hmm probe end doesn't happen until i ^C add.stp

18:58 <fche> ok lemme help from this side, which test case are you looking at?

18:58 <fche> tcptest ?

18:59 <kerneltoast> add.exp

18:59 <fche> ok

18:59 <kerneltoast> running add.stp just hangs

19:00 <fche> ok so in the .log file let's find out how the actual stap test case is run

19:00 <kerneltoast> add.stp is really simple

19:00 <kerneltoast> no arguments

19:01 <kerneltoast> it's the most beautiful test i've ever seen

19:02 <fche> that good!

19:02 <kerneltoast> add two numbers together, if they're not equal to the hardcoded sum, error

19:02 <kerneltoast> simple

19:02 <fche> math class is hard

19:04 <kerneltoast> i remember a dmesg entry i saw on my galaxy s2 some years ago

19:04 <kerneltoast> <4>[ 149.662139] ld9040 IElvss : 28+6=29

19:04 <kerneltoast> even the pros at samsung have trouble with math

19:05 <fche> wow

19:24 <fche> kerneltoast,

19:24 <fche> ok running that test shere

19:24 <fche> when you hit ^C, the "starting probe" etc. messages do appear

19:24 <fche> so they're in the queue

19:24 <fche> but I think userspace threads might just not have been woken up about them, so they didn't get the message till the ^C signal

19:28 irker129 has joined #systemtap

19:28 <irker129> systemtap: fche systemtap.git:master * release-4.4-25-gcd6399e62 / runtime/dyninst/print.c runtime/dyninst/runtime_defines.h: dyninst transport: add _stp_print_*lock_irq* stubs

19:31 <fche> looking at relay_wakeup_readers and related state

19:34 <fche> also can strace -f staprun/stapio that is running add.stp

19:36 <kerneltoast> stapio is just polling

19:46 <fche> yup, but not taking the data from that probe begin

19:47 <fche> ./stap -v -e 'probe begin {log("hi") }' <<< visible there too

19:47 <fche> so some wakeup is missing

19:53 <kerneltoast> hmmmmmmmmmmm

20:06 <kerneltoast> seems like this code really wants __stp_relay_switch_subbuf() to be called

20:18 <fche> every now and then

20:18 <fche> we have that wakeup timer thing, wonder why it's not enough / doing its job

20:25 <kerneltoast> the timer does go off

20:32 <kerneltoast> _stp_relay_data.wakeup isn't getting set

20:33 <fche> __stp_relay_switch_subbuf is the only place that can do that

20:33 <kerneltoast> right but i have no idea what "switching the subbuffer" means

20:33 <kerneltoast> or when we should do it

20:34 <kerneltoast> any ideas about that?

20:34 derek0883 has quit [Remote host closed the connection]

20:35 derek0883 has joined #systemtap

20:37 <fche> just comparing master vs your branch

20:40 derek0883 has quit [Ping timeout: 264 seconds]

20:45 <fche> stap -v -e 'probe timer.ms(1) {log("hi") }' <<< interesting: no output for a while, then (once whatever buffer fills), staprun does finally get the hint and prints it out

20:47 <kerneltoast> yeah it looks like the _stp_get_rchan_subbuf() logic is configured to flush when the buffer is full

20:48 derek0883 has joined #systemtap

20:52 <kerneltoast> it's gotta be this subbuf switch stuff

20:52 <kerneltoast> it's black magic to me

20:52 <kerneltoast> i don't understand what it's doing

20:53 <fche> yeah

20:59 _whitelogger has joined #systemtap

21:05 _whitelogger has joined #systemtap

21:10 <fche> https://paste.centos.org/view/f0df3377 <<<< this doesn't help

21:11 <fche> (my theory was that the print_flush being invoked all the time, incl. at every probe-handler, would set that wakeup flag for the timer to run

21:13 <fche> but yeah I think it's a periodic subbuf-switch thing that's called for, just not finding the path for that in the current code

21:19 derek0883 has quit [Remote host closed the connection]

21:42 derek0883 has joined #systemtap

21:53 mjw has quit [Ping timeout: 258 seconds]

21:56 mjw has joined #systemtap

23:20 <kerneltoast> funny i did the same thing you did in your paste

23:24 amerey has quit [Quit: Leaving]

23:33 irker129 has quit [Quit: transmission timeout]