fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has joined #systemtap
changcheng has quit [Quit: WeeChat 1.9.1]
sscox has joined #systemtap
hpt has quit [Ping timeout: 252 seconds]
hpt has joined #systemtap
yog_ has joined #systemtap
yog_ has quit [Ping timeout: 252 seconds]
_whitelogger has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
sscox has quit [Ping timeout: 268 seconds]
sscox has joined #systemtap
yog_ has joined #systemtap
yog_ has quit [Remote host closed the connection]
yog_ has joined #systemtap
slowfranklin has joined #systemtap
yog_ has quit [Remote host closed the connection]
yog_ has joined #systemtap
sscox has quit [Ping timeout: 252 seconds]
yog_ has quit [Remote host closed the connection]
yog_ has joined #systemtap
khaled has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
gregwork has quit [Ping timeout: 268 seconds]
gregwork has joined #systemtap
gregwork has quit [Max SendQ exceeded]
gregwork has joined #systemtap
gregwork has quit [Max SendQ exceeded]
gregwork has joined #systemtap
mjw has joined #systemtap
slowfranklin has joined #systemtap
slowfranklin has quit [Read error: Connection reset by peer]
slowfranklin has joined #systemtap
gregwork has quit [Ping timeout: 255 seconds]
gregwork has joined #systemtap
slowfranklin has quit [Read error: Connection reset by peer]
slowfranklin has joined #systemtap
slowfranklin has quit [Read error: Connection reset by peer]
slowfranklin has joined #systemtap
orivej has joined #systemtap
goyda has joined #systemtap
<goyda> Hi, I am new here. I am trying to figure out a problem that is itching me a bit. I am not sure if it can be solved by systemtap, but I will it a shoot anyway.
slowfranklin has left #systemtap [#systemtap]
<goyda> I have 2 RHEL vms installed on 2 cloud instances which have exact hardware configurations and same OS version, glibc, OS patch level etc. I am running exact same binary on the both machines with same input load. On one of the vms, the process consume 20% more CPU than other
<goyda> I am wondering if there is any way to find out why one is consuming more CPU than other
<goyda> lscpu command output on both looks same except for couple of flags. The one where the CPU consumption is less has got 2 extra flags, arat and tsc_adjust. Will that make any difference?
hpt has quit [Ping timeout: 264 seconds]
changcheng has joined #systemtap
orivej has quit [Ping timeout: 268 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
orivej has joined #systemtap
_whitelogger has joined #systemtap
orivej has quit [Ping timeout: 264 seconds]
<fche> goyda, hi
<fche> both those different cpu flags are timekeeping related, and that could have some impact though I'd be surprised at 20%
<fche> is it possible that the vms are running with different levels of spectre/meltdown protection?
orivej has joined #systemtap
khaled has quit [Ping timeout: 252 seconds]
<goyda> thanks. I did check that, both are not patched with meltdown/spectre. Also, I wrote a test program that did not do any system call. It basically incremented a integer in a loop. Ran the program as 'time a.out' both the vms. On one vm it takes more time than other consistently
khaled has joined #systemtap
<fche> goyda, is the time in the vms accurate? it's unlikely, but one could be running its clock slower than the other, making the computation seem faster
<goyda> Khaled, How do I check that?
<goyda> Will the top command also affected by timer? Because in one of the machines top always shows more CPU consumption for the same process than other vm
orivej has quit [Ping timeout: 264 seconds]
<fche> it's a long shot -- if only one process in the vm is affected, then I would not suspect this sort of thing
<fche> I might move on to other things like caching behaviour, which perhaps stap/perf stats can help with. Maybe one of the vms runs on a core that has a cache-hogging sibling
<goyda> Ok, the test program I wrote simply increments a integer, not sure if caching can cause any problem.
<goyda> #include<iostream> #include <unistd.h> void func() { volatile size_t i = 0; while(i < 10000) { ++i; } } int main() { for(int i = 0; i < 1000000; ++i) { func(); } }
<fche> volatile ... so the compiler is probably pushing it into ram rather than register, so yeah caching could play a role
<fche> if there were contention
<goyda> Looks interesting. Is there any way I can confirm it?
<goyda> Perf stat a.out does not seems to be printing too many things.
<fche> compare it running on the two machines
<goyda> Unfortunately it is printing not supported for everything. Looks something is not on these vms.
<goyda> [root@fastVM ~]# perf stat -d ./a.out Performance counter stats for './a.out': 18559.135247 task-clock (msec) # 1.000 CPUs utilized 29 context-switches # 0.002 K/sec 0 cpu-migrations # 0.000 K/sec 305 page-faults # 0.016 K/sec <not supported> cycles <not supported> instructions <not su
<goyda> [root@slowvm ~]# perf stat -d ./a.out Performance counter stats for './a.out': 21625.409915 task-clock (msec) # 1.000 CPUs utilized 25 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 305 page-faults # 0.014 K/sec <not supported> cycles <not supported> instructions <not su
sscox has joined #systemtap
<goyda> Let me try to remove volatile and try again. Hopefully it does not get optimized out
<fche> hm can you reconfigure these vms for the hypervisor to pass through hardware perfcounters?
wcohen has quit [Ping timeout: 246 seconds]
<fche> with kvm it's something like cpu-model=host-passthru
<goyda> thanks, Unfortunately I don't have the permission to reconfigure it :-(. I got the root permission for the vms, can I do something with that/
sscox has quit [Ping timeout: 255 seconds]
sscox has joined #systemtap
orivej has joined #systemtap
<goyda> Anything else I can check?
<fche> do you know what other workload is running on the hosts supporting these vms?
<fche> it isn't one host for both, is it?
<goyda> I am not sure about the workload on these hosts.
<goyda> No these vms are located in 2 different sites, so they are not same hosts
orivej has quit [Ping timeout: 246 seconds]
<fche> ok. yeah with the limited tooling available, not sure what else to suggest
<fche> I'd want to look at the hosts & kernel & the hypervisor configuration
<goyda> Do you want me to paste any command's output from both vms?
<fche> not sure you can get enough info. the /proc/cpuinfo part you already collected, but kvm can/will lie
<fche> that's why I would start -at the host-
<fche> a vm is a prison in more ways than one :)
<goyda> :-). Let me try to get some info on the hosts side( It is bit bureaucratic process to get these details, so fingers crossed :-( )
<goyda> One question on "cpu-model=host-passthru" option, will it affect the performance?
<goyda> or anything else of the vms?
<fche> I don't think so, most likely, except that the host cpu's perfctr facilities would be available to the guest
<fche> so you have at least a chance at checking up on those caching stats
<goyda> Ok, Let me give a shot then. Thanks a lot. It was nice talking to you guys
<fche> no problem, let us know what you find
wcohen has joined #systemtap
orivej has joined #systemtap
goyda has quit [Ping timeout: 256 seconds]
amneg_ has quit [Quit: Leaving]
orivej has quit [Ping timeout: 246 seconds]
pviktori has quit [Ping timeout: 246 seconds]
pviktori has joined #systemtap
goyda has joined #systemtap
goyda has quit [Ping timeout: 256 seconds]
tromey has joined #systemtap
slowfranklin has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
mjw has quit [Quit: Leaving]
slowfranklin has joined #systemtap
wcohen has quit [Ping timeout: 246 seconds]
orivej has joined #systemtap
wcohen has joined #systemtap
sscox has quit [Ping timeout: 246 seconds]
wcohen has quit [Ping timeout: 246 seconds]
wcohen has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
sscox has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
khaled has quit [Remote host closed the connection]
mjw has joined #systemtap
irker256 has joined #systemtap
<irker256> systemtap: juddin systemtap.git:refs/heads/jafeer/pr23074 * release-4.0-186-gb73d653 / dwflpp.cxx dwflpp.h loc2stap.cxx scripts/dump-parameter-ref.sh tapsets.cxx: Some code clean up http://tinyurl.com/y5kg67ed
tromey has quit [Quit: ERC (IRC client for Emacs 26.1)]
orivej has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
sscox has quit [Ping timeout: 255 seconds]
wcohen has quit [Ping timeout: 246 seconds]
wcohen has joined #systemtap
mjw has quit [Quit: Leaving]
irker256 has quit [Quit: transmission timeout]