fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has quit [Ping timeout: 276 seconds]
hpt has joined #systemtap
scox has quit [Ping timeout: 252 seconds]
scox has joined #systemtap
irker451 has quit [Quit: transmission timeout]
hchiramm has joined #systemtap
hkshaw has joined #systemtap
ananth has joined #systemtap
ravi has joined #systemtap
ego_ has joined #systemtap
scox has quit [Ping timeout: 244 seconds]
srikar_away is now known as srikar
nkambo has quit [Ping timeout: 258 seconds]
nkambo has joined #systemtap
ego_ has quit [Ping timeout: 244 seconds]
ego_ has joined #systemtap
hpt has quit [Quit: Lost terminal]
ego_ has quit [Ping timeout: 258 seconds]
ego_ has joined #systemtap
ego_ has quit [Ping timeout: 250 seconds]
ego_ has joined #systemtap
ananth has quit [Ping timeout: 240 seconds]
hkshaw has quit [Ping timeout: 252 seconds]
ravi has quit [Quit: Leaving]
srikar is now known as srikar_away
ericlee has quit [Ping timeout: 264 seconds]
ericlee has joined #systemtap
nkambo has quit [Ping timeout: 240 seconds]
ananth has joined #systemtap
hkshaw has joined #systemtap
flu has quit [Remote host closed the connection]
fche_ has joined #systemtap
nkambo has joined #systemtap
nkambo has quit [Client Quit]
brolley has joined #systemtap
nkambo has joined #systemtap
drsmith has joined #systemtap
drsmith has left #systemtap [#systemtap]
drsmith has joined #systemtap
scox has joined #systemtap
hpt has joined #systemtap
srikar_away is now known as srikar
tromey has joined #systemtap
wcohen has quit [Ping timeout: 276 seconds]
ananth has quit [Quit: Leaving]
ego_ has quit [Ping timeout: 252 seconds]
wcohen has joined #systemtap
hpt has quit [Quit: leaving]
fche_ has quit [Ping timeout: 264 seconds]
ego_ has joined #systemtap
irker988 has joined #systemtap
<irker988>
systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-212-g543563e / testsuite/systemtap.examples/process/procmod_watcher.stp: Update procmod_watcher.stp example for more modern kernels. http://tinyurl.com/hxls6fk
<drsmith>
pwithnall: it looks like we've got a bug in the '--rlimit-as=' handling, I'll try to fix it up today
<pwithnall>
drsmith: yeah, although that is probably unrelated to the underlying failure, which I guess is a massive allocation somewhere, although I haven’t debugged
<pwithnall>
Let me know if you need me to reproduce and investigate; I should have time today and tomorrow
<drsmith>
at least in my shell, the default rlimit-as value is unlimited, so you really should just be able to leave it alone
<pwithnall>
if I run without --rlimit-as then I get a std::bad_alloc and systemtap gracefully exits
<pwithnall>
if I run with --rlimit-as=$big_number I also get an abort, presumably due to hitting the rlimit
<drsmith>
your script (on the surface at least) really isn't that big, I'm not sure why it is trying to do that massive allocation
<drsmith>
perhaps the symbol tables for all of your libraries is what is causing the abort
<pwithnall>
Maybe, although I can reproduce the same problem with -c "echo hi"
<pwithnall>
Does stap-server load all the symbol tables for all the libraries on the system, or just those mapped in by the target process?
<drsmith>
the --ldd option causes stap to load all the libraries that ldd thinks is necessary to run your program
<pwithnall>
I just tried, and it aborts without --ldd too
<drsmith>
in the case of -c "echo hi", that is really bash, and it just loads in 5 libraries
<drsmith>
I wonder if there isn't a client/server bug here
<drsmith>
why did you decide to go the client/server route?
<pwithnall>
When I started this project, --dyninst didn’t work for me
<pwithnall>
It does seem to work now though, so I’m experimenting with it at the moment
<pwithnall>
(I’m on Fedora 24)
<drsmith>
if I were you, I'd remove the --unprivileged option and try a local compile first
<pwithnall>
Using that GLib tapset and dunfell-record.stp with --dyninst does work, which is interesting
<drsmith>
pwithnall: I'll take a look at that tapset after I finish eating lunch
<pwithnall>
drsmith: thanks, no rush :)
<pwithnall>
enjoy your lunch
<jistone>
pwithnall, I'm glad to hear dyninst is working for you. I haven't been able to see whether people are actually using that mode much
<pwithnall>
jistone: nobody’s filing bugs against it?
ravi has joined #systemtap
nkambo has quit [Ping timeout: 260 seconds]
<drsmith>
pwithnall: Let's try something fairly easy, if you've got the time. Replace all the 'usymname' calls in your script with something else ("USYMNAME" perhaps?) and try to compile your script (without -ldd). If that works, then the problem is certainly in the symbol reading code.
nkambo has joined #systemtap
ravi has quit [Quit: Leaving]
<jistone>
pwithnall, I see a bugs from our QA folks, but I'm not sure that counts as real users. :)
<pwithnall>
drsmith: I replaced usymname() by glib_usymname() which returns "" unconditionally, and:
<pwithnall>
Pass 1: parsed user script and 188 library scripts using 595132virt/396692res/7744shr/388756data kb, in 1470usr/120sys/1594real ms.
<pwithnall>
std::bad_alloc
<drsmith>
interesting
<jistone>
it may just be that our default server limits are too low. we used to get by much smaller, but the glib/qemu/etc started adding large tapsets, and we grow as a result
<pwithnall>
drsmith: I’ve trimmed dunfell-record.stp down to the following and I still get std::bad_alloc:
<pwithnall>
probe glib.main_context_new {
<pwithnall>
printdln (",", "g_main_context_new", gettimeofday_us (), tid (), context);
<pwithnall>
}
<pwithnall>
The same happens if I use "probe begin{}"
<drsmith>
wait, what?
<pwithnall>
If I have an empty script, stap detects that and exits early with no bad_alloc
<drsmith>
stap -ve 'probe begin { exit() }' gives you a bad_alloc?
<pwithnall>
$ stap -ve 'probe begin { exit() }'
<pwithnall>
Using a compile server.
<pwithnall>
Pass 1: parsed user script and 188 library scripts using 595132virt/396300res/7616shr/388756data kb, in 1550usr/80sys/1637real ms.
<pwithnall>
std::bad_alloc
<pwithnall>
Yes, apparently it does!
<drsmith>
ok, something wacky is going on
<drsmith>
try running that basic stap command as root (using sudo is fine) - that should run it locally
<pwithnall>
It’s not the GLib stp scripts either; I just removed them from my INCLUDES and probe begin still fails
<pwithnall>
Running as root works
<drsmith>
ok, is your server the same machine or a different machine?
<pwithnall>
Same machine, though I’ll check the logs to make sure there’s nothing funky on the network
<pwithnall>
Has anybody been secretly working on this?
<jistone>
pwithnall, nope -- most of my "stapdyn" work for a while has just been improving dyninst itself
ton31337 has quit [Ping timeout: 265 seconds]
<jistone>
pwithnall, dyninst's stackwalkerAPI can do first-party unwinding (i.e. in-process)
<jistone>
I'm not sure how well it interacts with dyninstAPI instrumentation
<pwithnall>
jistone: Do I have to call that from the process being probed?
<jistone>
(I don't think this stackwalker was part of the dyninst distribution when I filed 14703)
<jistone>
pwithnall, I think it can be used either way, within the process or from a ptracer
<jistone>
but running from a probe handler would be within the process
<pwithnall>
Right. Any way to include this in a user stap script?
<pwithnall>
(Sorry if my questions are stupid; still learning everything)
<drsmith>
fche/jistone: got a question about the '--rlimit-FOO' options
<drsmith>
pwithnall figured out that they don't quite work correctly
<drsmith>
but, it could depend on how you read the man page
<drsmith>
Here's the man page section:
<drsmith>
--rlimit-as=NUM
<drsmith>
Specify the maximum size of the process's virtual memory (ad‐
<drsmith>
dress space), in bytes. If nothing is specified, no limits are
<drsmith>
imposed.
<drsmith>
That last sentence "if nothing is specified, no limits are imposed" is wrong.
<drsmith>
Right now, if you say '--rlimit-as=', that translates to '--rlimit-as=0' (meaning no virtual memory allowed)
<drsmith>
so "no limits are imposed" mean the option doesn't do anything or does it mean set it to RLIM_INFINITY?
<drsmith>
I believe it means the latter, but I thought I'd ask
<drsmith>
Actually, the man page for setrlimit tends to support the RLIM_INFINITY viewpoint, since it says:
<drsmith>
The value RLIM_INFINITY denotes no limit on a resource (both in the
<drsmith>
structure returned by getrlimit() and in the structure passed to setr‐
<drsmith>
limit()).
<jistone>
drsmith, they are working correctly - it's just that stap-server has limits in it's rc file
<drsmith>
they aren't working correctly if you specify '--rlimit-as=' with no value
<jistone>
pwithnall, no worries, your questions are not stupid
<drsmith>
that value ends up being 0
<jistone>
well, perhaps we could improve that, but you still wouldn't be able to increase beyond what the server already set
<jistone>
I believe "nothing specified" really means if you didn't have the "--rlimit-as" option at all
<drsmith>
what I'm talking about isn't a client/server problem, just a problem with '--rlimit-as' in general
<drsmith>
if what you believe is true, then a '0' should mean 'do nothing'
<pwithnall>
jistone: if you are unable to increase the limit beyond what the server already set, it would be useful to print a warning
<jistone>
IMO we shouldn't accept blank "--rlimit-as=" at all. that should be a number-parsing error
<drsmith>
one "fix" for this problem would be to remove that language from the man page and error if there isn't a value specified
<drsmith>
(but I really kind of think whoever wrote that mean for nothing to mean RLIM_INFINITY)
<drsmith>
s/mean/meant/
<jistone>
or fix/clarify that language that it's referring to the total lack of the option
<drsmith>
yeah, I'm not sure I see the point of being able to specify an option that doesn't do anything
<jistone>
there's *nothing* in the code that even tries to make it RLIM_INFINITY, so it's hard to claim that was even the intent
<jistone>
anyway
<jistone>
the rlimit options are using unchecked strtoul
<drsmith>
right, that part is easy to fix
<drsmith>
that's what the man page sounds like to me
<drsmith>
I wonder if RLIM_INFINITY support got removed at some point along the way
ton31337 has joined #systemtap
<jistone>
I think the manpage is just poorly expressed. "If nothing is specified" to me implies "if you don't use --rlimit-as at all"
<jistone>
in other words, it's saying this option will "specify the max", so unspecified means no option
<jistone>
whatever the original intent, we can decide and clarify it now
<drsmith>
see to me it means RLIMIT_MAX, especially since the setrlimit man page calls RLIMIT_MAX "no limit on a resource"
<drsmith>
true, we can certainly decide/clarify now, we just need a plan
<jistone>
since this is also setting the hard limit, RLIM_INFINITY doesn't even make sense to me. only root can increase hard limits
<jistone>
so you can only "set" the hard limit to infinity if it was already there
ton31337 has quit [Ping timeout: 260 seconds]
<jistone>
actually we do sort of check the strtoul, looking at *num_endptr to make sure we parsed to the end
<jistone>
but that misses an empty option
<drsmith>
right, that's the only checking we do
<drsmith>
(but then the error message is wrong)
<jistone>
the actual call to setrlimit is unchecked
<jistone>
we should probably check that and have a warning, as pwithnall suggests
<drsmith>
you are looking at the wrong setrlimit() call
<drsmith>
there are 2 in rlimit-as
<jistone>
oh, I am, that's the core
<drsmith>
right
<drsmith>
the 1st is checked
<drsmith>
I'm ok with making --rlimit-as (and the others) require an argument and updating the man page
<drsmith>
sound like a plan?
<jistone>
yeah
<drsmith>
ok, I'll see what I can do
<jistone>
pwithnall, fwiw you *do* get a warning when unable to increase. it's just that the blank/0 option didn't actually mean infinity
<pwithnall>
right
<jistone>
$ stap --rlimit-as=1000000000 --rlimit-as=2000000000 -l begin
<jistone>
Unable to set resource limits for rlimit_as : Operation not permitted
<jistone>
begin
<drsmith>
ok, next question - if the --rlimit-as option isn't present, doesn't convert properly, or setrlimit() fails, do we keep going (as we do now) or exit?
<jistone>
"isn't present" is the basic unconstrained case (which I thought the manpage is alluding to, or at least that's how *I* would clarify it)
<jistone>
(though it's still constrained from outside ulimit/rlimits as usual)
<jistone>
"doesn't convert" should be an error
<jistone>
if setrlimit fails -- that's more of a question
<jistone>
perhaps EPERM is fine to warn and continue, but error out on others
<drsmith>
ok, sounds reasonable
<jistone>
to be extra clear, by "isn't present" I don't mean "--rlimit-as=", that should be an error. I mean no "--rlimit-as" at all
<drsmith>
as far as the man page goes, I'm just going to delete that "if nothing is specified" sentence
<drsmith>
by definition, if you don't do anything, nothing should change
<drsmith>
(i.e. on the '-v' option, we don't say 'if -v isn't specified, stap isn't verbose')
<drsmith>
s/i.e./e.g./
<jistone>
sure
ton31337 has joined #systemtap
ton31337 has quit [Ping timeout: 252 seconds]
ton31337 has joined #systemtap
<irker988>
systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-213-gcc72c6a / : Fix a '--rlimit-*' option problem identified by BZ1368188. http://tinyurl.com/hqz5opo
pwithnall has quit [Quit: pwithnall]
wcohen has quit [Ping timeout: 265 seconds]
ton31337 has quit [Ping timeout: 250 seconds]
tromey has quit [Quit: ERC (IRC client for Emacs 25.1.3)]