fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has quit [Ping timeout: 276 seconds]
hpt has joined #systemtap
scox has quit [Ping timeout: 252 seconds]
scox has joined #systemtap
irker451 has quit [Quit: transmission timeout]
hchiramm has joined #systemtap
hkshaw has joined #systemtap
ananth has joined #systemtap
ravi has joined #systemtap
ego_ has joined #systemtap
scox has quit [Ping timeout: 244 seconds]
srikar_away is now known as srikar
nkambo has quit [Ping timeout: 258 seconds]
nkambo has joined #systemtap
ego_ has quit [Ping timeout: 244 seconds]
ego_ has joined #systemtap
hpt has quit [Quit: Lost terminal]
ego_ has quit [Ping timeout: 258 seconds]
ego_ has joined #systemtap
ego_ has quit [Ping timeout: 250 seconds]
ego_ has joined #systemtap
ananth has quit [Ping timeout: 240 seconds]
hkshaw has quit [Ping timeout: 252 seconds]
ravi has quit [Quit: Leaving]
srikar is now known as srikar_away
ericlee has quit [Ping timeout: 264 seconds]
ericlee has joined #systemtap
nkambo has quit [Ping timeout: 240 seconds]
ananth has joined #systemtap
hkshaw has joined #systemtap
flu has quit [Remote host closed the connection]
fche_ has joined #systemtap
nkambo has joined #systemtap
nkambo has quit [Client Quit]
brolley has joined #systemtap
nkambo has joined #systemtap
drsmith has joined #systemtap
drsmith has left #systemtap [#systemtap]
drsmith has joined #systemtap
scox has joined #systemtap
hpt has joined #systemtap
srikar_away is now known as srikar
tromey has joined #systemtap
wcohen has quit [Ping timeout: 276 seconds]
ananth has quit [Quit: Leaving]
ego_ has quit [Ping timeout: 252 seconds]
wcohen has joined #systemtap
hpt has quit [Quit: leaving]
fche_ has quit [Ping timeout: 264 seconds]
ego_ has joined #systemtap
irker988 has joined #systemtap
<irker988> systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-212-g543563e / testsuite/systemtap.examples/process/procmod_watcher.stp: Update procmod_watcher.stp example for more modern kernels. http://tinyurl.com/hxls6fk
pwithnall has joined #systemtap
<pwithnall> fche: hi, I filed https://bugzilla.redhat.com/show_bug.cgi?id=1368188 and was directed here
<pwithnall> Got time to talk about my use case?
<pwithnall> Basically, the use case is this: https://gitlab.com/pwithnall/dunfell
<pwithnall> I’m using a set of probe points in GLib to extract timing information about its main loop, to be presented as a graph against time
<pwithnall> Hence it’s an unprivileged user-space use of systemtap
ton31337 has quit [Ping timeout: 276 seconds]
<pwithnall> I’m currently playing with --dyninst and it’s no longer crashing, which is good, but it would also be good to fix the crash/bad_alloc
<pwithnall> The major thing on my wishlist at the moment is usymname() and backtrace support for --dyninst, which is https://sourceware.org/bugzilla/show_bug.cgi?id=14703
<drsmith> pwithnall: it looks like we've got a bug in the '--rlimit-as=' handling, I'll try to fix it up today
<pwithnall> drsmith: yeah, although that is probably unrelated to the underlying failure, which I guess is a massive allocation somewhere, although I haven’t debugged
<pwithnall> Let me know if you need me to reproduce and investigate; I should have time today and tomorrow
<drsmith> at least in my shell, the default rlimit-as value is unlimited, so you really should just be able to leave it alone
<pwithnall> if I run without --rlimit-as then I get a std::bad_alloc and systemtap gracefully exits
<pwithnall> if I run with --rlimit-as=$big_number I also get an abort, presumably due to hitting the rlimit
<drsmith> your script (on the surface at least) really isn't that big, I'm not sure why it is trying to do that massive allocation
<drsmith> perhaps the symbol tables for all of your libraries is what is causing the abort
<pwithnall> Maybe, although I can reproduce the same problem with -c "echo hi"
<pwithnall> Does stap-server load all the symbol tables for all the libraries on the system, or just those mapped in by the target process?
<drsmith> the --ldd option causes stap to load all the libraries that ldd thinks is necessary to run your program
<pwithnall> I just tried, and it aborts without --ldd too
<drsmith> in the case of -c "echo hi", that is really bash, and it just loads in 5 libraries
<drsmith> I wonder if there isn't a client/server bug here
<drsmith> why did you decide to go the client/server route?
<pwithnall> When I started this project, --dyninst didn’t work for me
<pwithnall> It does seem to work now though, so I’m experimenting with it at the moment
<pwithnall> (I’m on Fedora 24)
<drsmith> if I were you, I'd remove the --unprivileged option and try a local compile first
<pwithnall> as in: stap -vvvv -o "some-file.log" -c "echo hi" record/dunfell-record.stp
<pwithnall> ?
<pwithnall> That gives:
<pwithnall> Pass 1: parsed user script and 188 library scripts using 595148virt/396808res/7848shr/388772data kb, in 1620usr/130sys/2187real ms.
<pwithnall> std::bad_alloc
<pwithnall> (amongst other output)
<drsmith> interesting
<drsmith> I guess I need to look into the glib tapset and see what it is doing
<pwithnall> you will need the tapset from GLib master
<pwithnall> Using that GLib tapset and dunfell-record.stp with --dyninst does work, which is interesting
<drsmith> pwithnall: I'll take a look at that tapset after I finish eating lunch
<pwithnall> drsmith: thanks, no rush :)
<pwithnall> enjoy your lunch
<jistone> pwithnall, I'm glad to hear dyninst is working for you. I haven't been able to see whether people are actually using that mode much
<pwithnall> jistone: nobody’s filing bugs against it?
ravi has joined #systemtap
nkambo has quit [Ping timeout: 260 seconds]
<drsmith> pwithnall: Let's try something fairly easy, if you've got the time. Replace all the 'usymname' calls in your script with something else ("USYMNAME" perhaps?) and try to compile your script (without -ldd). If that works, then the problem is certainly in the symbol reading code.
nkambo has joined #systemtap
ravi has quit [Quit: Leaving]
<jistone> pwithnall, I see a bugs from our QA folks, but I'm not sure that counts as real users. :)
<pwithnall> drsmith: I replaced usymname() by glib_usymname() which returns "" unconditionally, and:
<pwithnall> $ stap -v -c "echo hi" record/dunfell-record.stp
<pwithnall> Using a compile server.
<pwithnall> Pass 1: parsed user script and 188 library scripts using 595132virt/396692res/7744shr/388756data kb, in 1470usr/120sys/1594real ms.
<pwithnall> std::bad_alloc
<drsmith> interesting
<jistone> it may just be that our default server limits are too low. we used to get by much smaller, but the glib/qemu/etc started adding large tapsets, and we grow as a result
<pwithnall> drsmith: I’ve trimmed dunfell-record.stp down to the following and I still get std::bad_alloc:
<pwithnall> probe glib.main_context_new {
<pwithnall> printdln (",", "g_main_context_new", gettimeofday_us (), tid (), context);
<pwithnall> }
<pwithnall> The same happens if I use "probe begin{}"
<drsmith> wait, what?
<pwithnall> If I have an empty script, stap detects that and exits early with no bad_alloc
<drsmith> stap -ve 'probe begin { exit() }' gives you a bad_alloc?
<pwithnall> $ stap -ve 'probe begin { exit() }'
<pwithnall> Using a compile server.
<pwithnall> Pass 1: parsed user script and 188 library scripts using 595132virt/396300res/7616shr/388756data kb, in 1550usr/80sys/1637real ms.
<pwithnall> std::bad_alloc
<pwithnall> Yes, apparently it does!
<drsmith> ok, something wacky is going on
<drsmith> try running that basic stap command as root (using sudo is fine) - that should run it locally
<pwithnall> It’s not the GLib stp scripts either; I just removed them from my INCLUDES and probe begin still fails
<pwithnall> Running as root works
<drsmith> ok, is your server the same machine or a different machine?
<pwithnall> Same machine, though I’ll check the logs to make sure there’s nothing funky on the network
<pwithnall> Yup, same machine
<pwithnall> Attempting SSL connection with host=philip-work-laptop.local address=192.168.122.1 port=43535 sysinfo="4.6.4-301.fc24.x86_64 x86_64" version=3.0 certinfo="00:a4:cf:8e:da"
<pwithnall> using certificates from the database in /etc/systemtap/ssl/client
<pwithnall> although it does show 3 failed connections to my machine prior to that, which failed due to “issuer certificate is invalid”
<pwithnall> sorry, 4 failed connections
<jistone> are limits set in 'systemctl show stap-server' ?
<pwithnall> erk, that’s incomplete
<jistone> looks unlimited
<pwithnall> yeah
<jistone> ah, but systemtap.spec sets rlimits in ~stap-server/.systemtap/rc
<jistone> that's what I was looking for
<jistone> rlimit-as is about 614MB
<jistone> your pass-1 just parsing all the tapsets took 595MB
<jistone> so maybe it's no surprise that it runs out soon after
<pwithnall> indeed
srikar is now known as srikar_away
<pwithnall> although it’s weird that setting --rlimit-as=$big_number doesn’t help
<pwithnall> Unless --rlimit-as is completely broken as an argument to `stap`?
<jistone> because the server's options come first, and set both the hard and soft limits. so your later option can't increase it again
<jistone> (remember, we're extending only limited trust to the clients here)
<pwithnall> aha
<pwithnall> Doubling the --rlimit-as in /var/lib/stap-server/.systemtap/rc works
<jistone> yay!
<pwithnall> everything is doing its job fine then :)
<drsmith> good deal
<pwithnall> Thanks for your help
ton31337 has joined #systemtap
<pwithnall> Has anybody been secretly working on this?
<jistone> pwithnall, nope -- most of my "stapdyn" work for a while has just been improving dyninst itself
ton31337 has quit [Ping timeout: 265 seconds]
<jistone> pwithnall, dyninst's stackwalkerAPI can do first-party unwinding (i.e. in-process)
<jistone> I'm not sure how well it interacts with dyninstAPI instrumentation
<pwithnall> jistone: Do I have to call that from the process being probed?
<jistone> (I don't think this stackwalker was part of the dyninst distribution when I filed 14703)
<jistone> pwithnall, I think it can be used either way, within the process or from a ptracer
<jistone> but running from a probe handler would be within the process
<pwithnall> Right. Any way to include this in a user stap script?
<pwithnall> (Sorry if my questions are stupid; still learning everything)
<drsmith> fche/jistone: got a question about the '--rlimit-FOO' options
<drsmith> pwithnall figured out that they don't quite work correctly
<drsmith> but, it could depend on how you read the man page
<drsmith> Here's the man page section:
<drsmith> --rlimit-as=NUM
<drsmith> Specify the maximum size of the process's virtual memory (ad‐
<drsmith> dress space), in bytes. If nothing is specified, no limits are
<drsmith> imposed.
<drsmith> That last sentence "if nothing is specified, no limits are imposed" is wrong.
<drsmith> Right now, if you say '--rlimit-as=', that translates to '--rlimit-as=0' (meaning no virtual memory allowed)
<drsmith> so "no limits are imposed" mean the option doesn't do anything or does it mean set it to RLIM_INFINITY?
<drsmith> I believe it means the latter, but I thought I'd ask
<drsmith> Actually, the man page for setrlimit tends to support the RLIM_INFINITY viewpoint, since it says:
<drsmith> The value RLIM_INFINITY denotes no limit on a resource (both in the
<drsmith> structure returned by getrlimit() and in the structure passed to setr‐
<drsmith> limit()).
<jistone> drsmith, they are working correctly - it's just that stap-server has limits in it's rc file
<drsmith> they aren't working correctly if you specify '--rlimit-as=' with no value
<jistone> pwithnall, no worries, your questions are not stupid
<drsmith> that value ends up being 0
<jistone> well, perhaps we could improve that, but you still wouldn't be able to increase beyond what the server already set
<jistone> I believe "nothing specified" really means if you didn't have the "--rlimit-as" option at all
<drsmith> what I'm talking about isn't a client/server problem, just a problem with '--rlimit-as' in general
<drsmith> if what you believe is true, then a '0' should mean 'do nothing'
<pwithnall> jistone: if you are unable to increase the limit beyond what the server already set, it would be useful to print a warning
<jistone> IMO we shouldn't accept blank "--rlimit-as=" at all. that should be a number-parsing error
<drsmith> one "fix" for this problem would be to remove that language from the man page and error if there isn't a value specified
<drsmith> (but I really kind of think whoever wrote that mean for nothing to mean RLIM_INFINITY)
<drsmith> s/mean/meant/
<jistone> or fix/clarify that language that it's referring to the total lack of the option
<drsmith> yeah, I'm not sure I see the point of being able to specify an option that doesn't do anything
<jistone> there's *nothing* in the code that even tries to make it RLIM_INFINITY, so it's hard to claim that was even the intent
<jistone> anyway
<jistone> the rlimit options are using unchecked strtoul
<drsmith> right, that part is easy to fix
<drsmith> that's what the man page sounds like to me
<drsmith> I wonder if RLIM_INFINITY support got removed at some point along the way
ton31337 has joined #systemtap
<jistone> I think the manpage is just poorly expressed. "If nothing is specified" to me implies "if you don't use --rlimit-as at all"
<jistone> in other words, it's saying this option will "specify the max", so unspecified means no option
<jistone> whatever the original intent, we can decide and clarify it now
<drsmith> see to me it means RLIMIT_MAX, especially since the setrlimit man page calls RLIMIT_MAX "no limit on a resource"
<drsmith> true, we can certainly decide/clarify now, we just need a plan
<jistone> since this is also setting the hard limit, RLIM_INFINITY doesn't even make sense to me. only root can increase hard limits
<jistone> so you can only "set" the hard limit to infinity if it was already there
ton31337 has quit [Ping timeout: 260 seconds]
<jistone> actually we do sort of check the strtoul, looking at *num_endptr to make sure we parsed to the end
<jistone> but that misses an empty option
<drsmith> right, that's the only checking we do
<drsmith> (but then the error message is wrong)
<jistone> the actual call to setrlimit is unchecked
<jistone> we should probably check that and have a warning, as pwithnall suggests
<drsmith> you are looking at the wrong setrlimit() call
<drsmith> there are 2 in rlimit-as
<jistone> oh, I am, that's the core
<drsmith> right
<drsmith> the 1st is checked
<drsmith> I'm ok with making --rlimit-as (and the others) require an argument and updating the man page
<drsmith> sound like a plan?
<jistone> yeah
<drsmith> ok, I'll see what I can do
<jistone> pwithnall, fwiw you *do* get a warning when unable to increase. it's just that the blank/0 option didn't actually mean infinity
<pwithnall> right
<jistone> $ stap --rlimit-as=1000000000 --rlimit-as=2000000000 -l begin
<jistone> Unable to set resource limits for rlimit_as : Operation not permitted
<jistone> begin
<drsmith> ok, next question - if the --rlimit-as option isn't present, doesn't convert properly, or setrlimit() fails, do we keep going (as we do now) or exit?
<jistone> "isn't present" is the basic unconstrained case (which I thought the manpage is alluding to, or at least that's how *I* would clarify it)
<jistone> (though it's still constrained from outside ulimit/rlimits as usual)
<jistone> "doesn't convert" should be an error
<jistone> if setrlimit fails -- that's more of a question
<jistone> perhaps EPERM is fine to warn and continue, but error out on others
<drsmith> ok, sounds reasonable
<jistone> to be extra clear, by "isn't present" I don't mean "--rlimit-as=", that should be an error. I mean no "--rlimit-as" at all
<drsmith> as far as the man page goes, I'm just going to delete that "if nothing is specified" sentence
<drsmith> by definition, if you don't do anything, nothing should change
<drsmith> (i.e. on the '-v' option, we don't say 'if -v isn't specified, stap isn't verbose')
<drsmith> s/i.e./e.g./
<jistone> sure
ton31337 has joined #systemtap
ton31337 has quit [Ping timeout: 252 seconds]
ton31337 has joined #systemtap
<irker988> systemtap: dsmith systemtap.git:refs/heads/master * release-3.0-213-gcc72c6a / : Fix a '--rlimit-*' option problem identified by BZ1368188. http://tinyurl.com/hqz5opo
pwithnall has quit [Quit: pwithnall]
wcohen has quit [Ping timeout: 265 seconds]
ton31337 has quit [Ping timeout: 250 seconds]
tromey has quit [Quit: ERC (IRC client for Emacs 25.1.3)]
drsmith has left #systemtap [#systemtap]
brolley has left #systemtap [#systemtap]
ton31337 has joined #systemtap
wcohen has joined #systemtap
ton31337 has quit [Ping timeout: 244 seconds]
ton31337 has joined #systemtap
ego_ has quit [Quit: Leaving]
ton31337 has quit [Ping timeout: 244 seconds]
nkambo has quit [Ping timeout: 250 seconds]
hkshaw has quit [Quit: Leaving.]
nkambo has joined #systemtap
ton31337 has joined #systemtap