#systemtap on 2019-04-16 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

01:22 hpt has joined #systemtap

01:30 changcheng has quit [Quit: WeeChat 1.9.1]

01:36 sscox has joined #systemtap

01:56 hpt has quit [Ping timeout: 252 seconds]

01:57 hpt has joined #systemtap

02:23 yog_ has joined #systemtap

03:04 yog_ has quit [Ping timeout: 252 seconds]

03:16 _whitelogger has joined #systemtap

03:42 orivej has quit [Ping timeout: 250 seconds]

03:49 sscox has quit [Ping timeout: 268 seconds]

03:52 sscox has joined #systemtap

04:36 yog_ has joined #systemtap

04:39 yog_ has quit [Remote host closed the connection]

04:39 yog_ has joined #systemtap

04:40 slowfranklin has joined #systemtap

04:41 yog_ has quit [Remote host closed the connection]

04:41 yog_ has joined #systemtap

04:51 sscox has quit [Ping timeout: 252 seconds]

05:01 yog_ has quit [Remote host closed the connection]

05:01 yog_ has joined #systemtap

05:11 khaled has joined #systemtap

06:16 slowfranklin has quit [Quit: slowfranklin]

06:25 gregwork has quit [Ping timeout: 268 seconds]

06:28 gregwork has joined #systemtap

06:35 gregwork has quit [Max SendQ exceeded]

06:43 gregwork has joined #systemtap

06:45 gregwork has quit [Max SendQ exceeded]

06:53 gregwork has joined #systemtap

07:08 mjw has joined #systemtap

07:49 slowfranklin has joined #systemtap

07:55 slowfranklin has quit [Read error: Connection reset by peer]

08:03 slowfranklin has joined #systemtap

08:05 gregwork has quit [Ping timeout: 255 seconds]

08:05 gregwork has joined #systemtap

08:09 slowfranklin has quit [Read error: Connection reset by peer]

08:32 slowfranklin has joined #systemtap

08:35 slowfranklin has quit [Read error: Connection reset by peer]

08:53 slowfranklin has joined #systemtap

08:55 orivej has joined #systemtap

08:57 goyda has joined #systemtap

08:59 <goyda> Hi, I am new here. I am trying to figure out a problem that is itching me a bit. I am not sure if it can be solved by systemtap, but I will it a shoot anyway.

08:59 slowfranklin has left #systemtap [#systemtap]

09:01 <goyda> I have 2 RHEL vms installed on 2 cloud instances which have exact hardware configurations and same OS version, glibc, OS patch level etc. I am running exact same binary on the both machines with same input load. On one of the vms, the process consume 20% more CPU than other

09:03 <goyda> I am wondering if there is any way to find out why one is consuming more CPU than other

09:07 <goyda> lscpu command output on both looks same except for couple of flags. The one where the CPU consumption is less has got 2 extra flags, arat and tsc_adjust. Will that make any difference?

09:12 hpt has quit [Ping timeout: 264 seconds]

09:23 changcheng has joined #systemtap

09:34 orivej has quit [Ping timeout: 268 seconds]

09:39 orivej has joined #systemtap

10:17 orivej has quit [Ping timeout: 245 seconds]

10:24 orivej has joined #systemtap

10:33 orivej has quit [Ping timeout: 250 seconds]

10:34 orivej has joined #systemtap

10:40 _whitelogger has joined #systemtap

10:43 orivej has quit [Ping timeout: 264 seconds]

11:01 <fche> goyda, hi

11:01 <fche> both those different cpu flags are timekeeping related, and that could have some impact though I'd be surprised at 20%

11:02 <fche> is it possible that the vms are running with different levels of spectre/meltdown protection?

11:24 orivej has joined #systemtap

11:36 khaled has quit [Ping timeout: 252 seconds]

11:38 <goyda> thanks. I did check that, both are not patched with meltdown/spectre. Also, I wrote a test program that did not do any system call. It basically incremented a integer in a loop. Ran the program as 'time a.out' both the vms. On one vm it takes more time than other consistently

11:45 khaled has joined #systemtap

11:57 <fche> goyda, is the time in the vms accurate? it's unlikely, but one could be running its clock slower than the other, making the computation seem faster

12:04 <goyda> Khaled, How do I check that?

12:07 <goyda> Will the top command also affected by timer? Because in one of the machines top always shows more CPU consumption for the same process than other vm

12:21 orivej has quit [Ping timeout: 264 seconds]

12:22 <fche> it's a long shot -- if only one process in the vm is affected, then I would not suspect this sort of thing

12:23 <fche> I might move on to other things like caching behaviour, which perhaps stap/perf stats can help with. Maybe one of the vms runs on a core that has a cache-hogging sibling

12:29 <goyda> Ok, the test program I wrote simply increments a integer, not sure if caching can cause any problem.

12:29 <goyda> #include<iostream> #include <unistd.h> void func() { volatile size_t i = 0; while(i < 10000) { ++i; } } int main() { for(int i = 0; i < 1000000; ++i) { func(); } }

12:30 <fche> volatile ... so the compiler is probably pushing it into ram rather than register, so yeah caching could play a role

12:30 <fche> if there were contention

12:31 <goyda> Looks interesting. Is there any way I can confirm it?

12:32 <goyda> Perf stat a.out does not seems to be printing too many things.

12:32 <fche> compare it running on the two machines

12:35 <goyda> Unfortunately it is printing not supported for everything. Looks something is not on these vms.

12:35 <goyda> [root@fastVM ~]# perf stat -d ./a.out Performance counter stats for './a.out': 18559.135247 task-clock (msec) # 1.000 CPUs utilized 29 context-switches # 0.002 K/sec 0 cpu-migrations # 0.000 K/sec 305 page-faults # 0.016 K/sec <not supported> cycles <not supported> instructions <not su

12:37 <goyda> [root@slowvm ~]# perf stat -d ./a.out Performance counter stats for './a.out': 21625.409915 task-clock (msec) # 1.000 CPUs utilized 25 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 305 page-faults # 0.014 K/sec <not supported> cycles <not supported> instructions <not su

12:37 sscox has joined #systemtap

12:38 <goyda> Let me try to remove volatile and try again. Hopefully it does not get optimized out

12:38 <fche> hm can you reconfigure these vms for the hypervisor to pass through hardware perfcounters?

12:38 wcohen has quit [Ping timeout: 246 seconds]

12:39 <fche> with kvm it's something like cpu-model=host-passthru

12:44 <goyda> thanks, Unfortunately I don't have the permission to reconfigure it :-(. I got the root permission for the vms, can I do something with that/

12:44 sscox has quit [Ping timeout: 255 seconds]

12:45 sscox has joined #systemtap

12:51 orivej has joined #systemtap

12:51 <goyda> Anything else I can check?

12:52 <fche> do you know what other workload is running on the hosts supporting these vms?

12:52 <fche> it isn't one host for both, is it?

12:57 <goyda> I am not sure about the workload on these hosts.

12:58 <goyda> No these vms are located in 2 different sites, so they are not same hosts

13:00 orivej has quit [Ping timeout: 246 seconds]

13:02 <fche> ok. yeah with the limited tooling available, not sure what else to suggest

13:02 <fche> I'd want to look at the hosts & kernel & the hypervisor configuration

13:05 <goyda> Do you want me to paste any command's output from both vms?

13:05 <fche> not sure you can get enough info. the /proc/cpuinfo part you already collected, but kvm can/will lie

13:06 <fche> that's why I would start -at the host-

13:06 <fche> a vm is a prison in more ways than one :)

13:10 <goyda> :-). Let me try to get some info on the hosts side( It is bit bureaucratic process to get these details, so fingers crossed :-( )

13:12 <goyda> One question on "cpu-model=host-passthru" option, will it affect the performance?

13:12 <goyda> or anything else of the vms?

13:12 <fche> I don't think so, most likely, except that the host cpu's perfctr facilities would be available to the guest

13:13 <fche> so you have at least a chance at checking up on those caching stats

13:24 <goyda> Ok, Let me give a shot then. Thanks a lot. It was nice talking to you guys

13:24 <fche> no problem, let us know what you find

13:27 wcohen has joined #systemtap

13:34 orivej has joined #systemtap

13:35 goyda has quit [Ping timeout: 256 seconds]

13:47 amneg_ has quit [Quit: Leaving]

13:56 orivej has quit [Ping timeout: 246 seconds]

14:28 pviktori has quit [Ping timeout: 246 seconds]

14:30 pviktori has joined #systemtap

14:52 goyda has joined #systemtap

14:57 goyda has quit [Ping timeout: 256 seconds]

14:57 tromey has joined #systemtap

16:56 slowfranklin has joined #systemtap

17:39 slowfranklin has quit [Quit: slowfranklin]

18:29 mjw has quit [Quit: Leaving]

18:49 slowfranklin has joined #systemtap

19:02 wcohen has quit [Ping timeout: 246 seconds]

19:03 orivej has joined #systemtap

19:15 wcohen has joined #systemtap

19:31 sscox has quit [Ping timeout: 246 seconds]

19:32 wcohen has quit [Ping timeout: 246 seconds]

19:33 wcohen has joined #systemtap

19:42 orivej has quit [Ping timeout: 245 seconds]

19:44 sscox has joined #systemtap

19:58 slowfranklin has quit [Quit: slowfranklin]

20:07 khaled has quit [Remote host closed the connection]

20:21 mjw has joined #systemtap

20:22 irker256 has joined #systemtap

20:22 <irker256> systemtap: juddin systemtap.git:refs/heads/jafeer/pr23074 * release-4.0-186-gb73d653 / dwflpp.cxx dwflpp.h loc2stap.cxx scripts/dump-parameter-ref.sh tapsets.cxx: Some code clean up http://tinyurl.com/y5kg67ed

20:23 tromey has quit [Quit: ERC (IRC client for Emacs 26.1)]

20:48 orivej has joined #systemtap

21:00 orivej has quit [Ping timeout: 245 seconds]

21:10 sscox has quit [Ping timeout: 255 seconds]

21:34 wcohen has quit [Ping timeout: 246 seconds]

22:27 wcohen has joined #systemtap

22:37 mjw has quit [Quit: Leaving]

23:22 irker256 has quit [Quit: transmission timeout]