fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has joined #systemtap
changcheng has quit [Quit: WeeChat 1.9.1]
sscox has joined #systemtap
hpt has quit [Ping timeout: 252 seconds]
hpt has joined #systemtap
yog_ has joined #systemtap
yog_ has quit [Ping timeout: 252 seconds]
_whitelogger has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
sscox has quit [Ping timeout: 268 seconds]
sscox has joined #systemtap
yog_ has joined #systemtap
yog_ has quit [Remote host closed the connection]
yog_ has joined #systemtap
slowfranklin has joined #systemtap
yog_ has quit [Remote host closed the connection]
yog_ has joined #systemtap
sscox has quit [Ping timeout: 252 seconds]
yog_ has quit [Remote host closed the connection]
yog_ has joined #systemtap
khaled has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
gregwork has quit [Ping timeout: 268 seconds]
gregwork has joined #systemtap
gregwork has quit [Max SendQ exceeded]
gregwork has joined #systemtap
gregwork has quit [Max SendQ exceeded]
gregwork has joined #systemtap
mjw has joined #systemtap
slowfranklin has joined #systemtap
slowfranklin has quit [Read error: Connection reset by peer]
slowfranklin has joined #systemtap
gregwork has quit [Ping timeout: 255 seconds]
gregwork has joined #systemtap
slowfranklin has quit [Read error: Connection reset by peer]
slowfranklin has joined #systemtap
slowfranklin has quit [Read error: Connection reset by peer]
slowfranklin has joined #systemtap
orivej has joined #systemtap
goyda has joined #systemtap
<goyda>
Hi, I am new here. I am trying to figure out a problem that is itching me a bit. I am not sure if it can be solved by systemtap, but I will it a shoot anyway.
slowfranklin has left #systemtap [#systemtap]
<goyda>
I have 2 RHEL vms installed on 2 cloud instances which have exact hardware configurations and same OS version, glibc, OS patch level etc. I am running exact same binary on the both machines with same input load. On one of the vms, the process consume 20% more CPU than other
<goyda>
I am wondering if there is any way to find out why one is consuming more CPU than other
<goyda>
lscpu command output on both looks same except for couple of flags. The one where the CPU consumption is less has got 2 extra flags, arat and tsc_adjust. Will that make any difference?
hpt has quit [Ping timeout: 264 seconds]
changcheng has joined #systemtap
orivej has quit [Ping timeout: 268 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
orivej has joined #systemtap
_whitelogger has joined #systemtap
orivej has quit [Ping timeout: 264 seconds]
<fche>
goyda, hi
<fche>
both those different cpu flags are timekeeping related, and that could have some impact though I'd be surprised at 20%
<fche>
is it possible that the vms are running with different levels of spectre/meltdown protection?
orivej has joined #systemtap
khaled has quit [Ping timeout: 252 seconds]
<goyda>
thanks. I did check that, both are not patched with meltdown/spectre. Also, I wrote a test program that did not do any system call. It basically incremented a integer in a loop. Ran the program as 'time a.out' both the vms. On one vm it takes more time than other consistently
khaled has joined #systemtap
<fche>
goyda, is the time in the vms accurate? it's unlikely, but one could be running its clock slower than the other, making the computation seem faster
<goyda>
Khaled, How do I check that?
<goyda>
Will the top command also affected by timer? Because in one of the machines top always shows more CPU consumption for the same process than other vm
orivej has quit [Ping timeout: 264 seconds]
<fche>
it's a long shot -- if only one process in the vm is affected, then I would not suspect this sort of thing
<fche>
I might move on to other things like caching behaviour, which perhaps stap/perf stats can help with. Maybe one of the vms runs on a core that has a cache-hogging sibling
<goyda>
Ok, the test program I wrote simply increments a integer, not sure if caching can cause any problem.
<goyda>
#include<iostream> #include <unistd.h> void func() { volatile size_t i = 0; while(i < 10000) { ++i; } } int main() { for(int i = 0; i < 1000000; ++i) { func(); } }
<fche>
volatile ... so the compiler is probably pushing it into ram rather than register, so yeah caching could play a role
<fche>
if there were contention
<goyda>
Looks interesting. Is there any way I can confirm it?
<goyda>
Perf stat a.out does not seems to be printing too many things.
<fche>
compare it running on the two machines
<goyda>
Unfortunately it is printing not supported for everything. Looks something is not on these vms.