#systemtap on 2017-02-10 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

00:15 scox has joined #systemtap

00:33 adgud has joined #systemtap

00:35 <adgud> it's me again :) why probe timer.profile counting userspace tasks with user_mode() reports more kernel tasks than user tasks, while vmstat CPU util show more %us than %sy?

00:37 <adgud> its like completely inverse: in stap 10% us, 60% sy; in vmstat 40% us, 6% sy

00:50 pwithnall_ has joined #systemtap

00:53 pwithnall has quit [Ping timeout: 258 seconds]

00:55 pwithnall_ has quit [Client Quit]

00:56 zodbot has quit [Disconnected by services]

01:08 zodbot has joined #systemtap

01:10 hpt has joined #systemtap

01:18 hkshaw has joined #systemtap

01:25 flos has joined #systemtap

02:20 Humble has quit [Ping timeout: 255 seconds]

02:52 <fche> the kernel accounting logic accounts for time only at particular scheduling points; it -might- (not sure) credit some types of kernel activities as something other than sys% for example

02:53 <fche> the stap script counts more directly, but is limited to sampling intermittently. if the user/sys transitions occur much more frequently than the stap timer.profile sampling rate, both sets of numbers can be "right" but still be different

02:55 hkshaw has quit [Ping timeout: 240 seconds]

03:01 irker205 has quit [Quit: transmission timeout]

03:06 hkshaw has joined #systemtap

03:06 <adgud> but documentation says that "Profiling timers are available to provide probes that execute on all CPUs at each system tick"

03:06 <adgud> that would make timer.profile more accurate than other tools, wouldn't it?

03:07 <adgud> I thought that probing at every system tick is as accurate as it gets

03:08 <fche> system tick ~= 100 Hz

03:09 <adgud> well, then context switches may occur much more often

03:09 <fche> so sampling the system state at that frequency is accurate -if- there were no sampling artifacts - if system behavior did not correlate with system ticks e.g.

03:09 <fche> exactly

03:11 CME has quit [Ping timeout: 256 seconds]

03:12 CME has joined #systemtap

03:14 <adgud> would running timer at 1 nanosecond alleviate this and make the results more accurate?

03:14 <adgud> or is it an overkill and other problems would arise?

03:16 <fche> definitely other problems

03:17 <fche> btw see also [man tapset::task_time]

03:20 <adgud> yeah, timer.ms(1) gives me no user task time at all...

03:21 <fche> that's more because the .ms() timers are invoked from kernel space software timers, so by definition show up as kernel-space events

03:23 <adgud> oh well then my whole approach is completely flawed

03:24 <fche> wouldn't go that far ... but it's not right to expect it to match the differently-measured numbers

03:24 <fche> timer.profile is a good choice (or a finer-resolution perf.* type event)

03:25 <adgud> but I can't collect CPU time distribution with it, and that was my goal

03:27 <adgud> (I wanted to compare vmstat with stap and see if the results are different, that's why I've been asking those question for the last couple of days)

03:28 <fche> understood

03:28 <fche> one thing to keep in mind is that vmstat etc. aren't gospel either - they also estimate / define those times with their own idiosyncrasies

03:28 <fche> stap's profile/sample based one is also pretty well defined

03:29 <fche> for processes that are mostly cpu bound, they should roughly match

03:29 <fche> for processes that are highly kernel-interactive, they may not match. and actually those are interesting cases. the stap sampling one may even give more meaningful results

03:30 <fche> assuming the sampling times are not correlated with the program behaviour

03:31 <adgud> oh I see, I expected differences but not that big; this is quite an explanation

03:37 <adgud> that would totally explain (I think so, at least) why qemu+kvm make like 90% kernel ticks, even though qemu runs in userpace, I am correct?

03:38 <adgud> qemu relays most of stuff to kvm, which operates at kernel level

03:38 hkshaw has quit [Ping timeout: 276 seconds]

03:42 <fche> kvm kernel side intercepts only certain privileged operations, like i/o and paging and some cpu debugging stuff

03:42 <fche> most of it runs in userspace ideally

03:43 <fche> but qemu/kvm may have events that correlate tightly with the host's normal profiling timer (such as having its own profiling timer)

03:43 <fche> so sampled results could lead to false results

03:43 <fche> should probably try with different .hz type profiling/perf event sources

03:44 * fche must head off now, good luck , see you tomorrow

03:44 <adgud> see you, have a good day

03:50 ravi_ has joined #systemtap

03:56 flos has quit [Ping timeout: 255 seconds]

03:57 adgud has quit [Ping timeout: 240 seconds]

04:00 flos has joined #systemtap

05:15 ravi_ has quit [Ping timeout: 260 seconds]

05:26 ravi_ has joined #systemtap

05:30 Humble has joined #systemtap

05:53 hkshaw has joined #systemtap

06:17 drsmith_away has quit [Ping timeout: 255 seconds]

06:17 drsmith_away has joined #systemtap

09:17 pwithnall has joined #systemtap

09:20 hpt has quit [Quit: Lost terminal]

09:33 pwithnall_ has joined #systemtap

09:33 pwithnall_ has quit [Client Quit]

10:07 flos has quit [Quit: Those who know don't tell.]

10:47 ravi_ has quit [Ping timeout: 240 seconds]

10:49 adgud has joined #systemtap

11:05 hkshaw has quit [Ping timeout: 255 seconds]

11:17 adgud has quit [Ping timeout: 255 seconds]

11:39 adgud has joined #systemtap

11:53 adgud has quit [Ping timeout: 240 seconds]

12:07 hkshaw has joined #systemtap

12:45 groleo has quit [Ping timeout: 276 seconds]

12:57 mjw has joined #systemtap

13:12 Humble has quit [Quit: Leaving]

13:12 Humble has joined #systemtap

13:28 hkshaw has quit [Ping timeout: 240 seconds]

13:58 flos has joined #systemtap

14:02 wcohen has quit [Remote host closed the connection]

14:07 drsmith_away is now known as drsmith

14:14 mbenitez has joined #systemtap

14:39 ppetraki has joined #systemtap

14:42 adgud has joined #systemtap

14:44 brolley has joined #systemtap

14:46 tromey has joined #systemtap

14:46 wcohen has joined #systemtap

14:51 wcohen has quit [Ping timeout: 260 seconds]

14:52 wcohen has joined #systemtap

15:09 tromey has quit [Ping timeout: 240 seconds]

15:50 irker012 has joined #systemtap

15:50 <irker012> systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-924-g976cccc / testsuite/systemtap.base/procfs_maxsize.exp: Add gcc 7 fixes to procfs_maxsize.exp. http://tinyurl.com/zpodzps

16:06 flos has quit [Quit: Those who know don't tell.]

16:26 <irker012> systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-925-gb622421 / testsuite/systemtap.base/utf_user_trunc.c testsuite/systemtap.base/utf_user_trunc.exp: Update systemtap.base/utf_user_trunc.exp test case for gcc 7. http://tinyurl.com/ht8nxzo

17:01 fche is now known as fche2

17:17 fche2 is now known as fche

17:36 <irker012> systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-926-ge239d05 / testsuite/systemtap.base/utf_user_trunc.c: Updated error message in testsuite/systemtap.base/utf_user_trunc.c. http://tinyurl.com/jhwphpg

17:52 tromey has joined #systemtap

17:57 pwithnall has quit [Quit: pwithnall]

20:05 <irker012> systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-927-g19eace0 / systemtap.spec: Fix BZ1421105 by requiring 'kernel-devel-uname-r' in the spec file. http://tinyurl.com/j5f5p7b

21:50 wcohen has quit [Ping timeout: 276 seconds]

22:07 mbenitez has quit [Quit: Leaving]

22:19 tromey has quit [Quit: ERC (IRC client for Emacs 25.2.1)]

22:24 drsmith is now known as drsmith_away

22:31 brolley has left #systemtap [#systemtap]

22:53 wcohen has joined #systemtap

23:11 mjw has quit [Quit: Leaving]