fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
orivej has quit [Ping timeout: 240 seconds]
fche has quit [Ping timeout: 244 seconds]
<invano>
fche: ok I'll try again with incredible -v combo. Yeah I'm still inspecting that @cast. Works on my x64 machine but fails when I cross-compile and I pass the external kernel tree. Also, I get @cast errors only on specific fields of the struct, Anyway I'll handle this
mjw has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
pfallenop has quit [Ping timeout: 240 seconds]
fche has joined #systemtap
orivej has joined #systemtap
<invano>
hitting a similar @cast problem with func pid2task because it calls @task that calls @cast on task_struct. Problem is that every time @cast has a module declared e.g. "kernel<HEADER.h>" I get "type definition X not found in 'kernel<Y>'". The problem does not arise if I don't use the module arg in @cast. Also, this only happens when I cross-compile with a different build tree (-r). I'm going to attach a full log showing the error.
<fche>
Copying /tmp/stap22u6zF/typequery_kmod_1/typequery_kmod_1.ko to /home/user/.systemtap/cache/3e/typequery_3e4e72f4faeacf6129c8e0c94896cc4f_1051.ko
<fche>
that is the .ko file built from the @cast() expression
<fche>
it's in there that stap ought look for the task_struct declaration
<invano>
I might leave soon but I'll be back in a couple of hours
<fche>
righto
<fche>
the eu-readelf -w of a similar typequery.*ko file on native x86-64 is much different
<fche>
it's as though the arm linux/sched.h may have just a forward declaration?
<fche>
e.g. here:
<fche>
[ f4b] structure_type
<fche>
decl_file (data1) 24
<fche>
alignment (data1) 64
<fche>
byte_size (data2) 11200
<fche>
name (strp) "task_struct"
<fche>
decl_line (data2) 524
<fche>
sibling (ref4) [ 1a47]
<fche>
[ f5a] member
<fche>
name (strp) "thread_info"
<fche>
decl_file (data1) 24
<fche>
decl_line (data2) 530
<fche>
type (ref4) [ 5684]
<fche>
data_member_location (data1) 0
<fche>
etc. etc.
<fche>
I don't see any "member" tags in your dump
<agentzh>
fche: do you know is there any particular reason to make the keys of session.stat_decls strings? git blame shows that that line of code dates back to 2005 :)
<agentzh>
fche: i'm currently working on supporting stat values in user-defined stap function parameters. and using names in that map makes it impossible to keep track of stat-typed function parameter variables without introducing a separate map.
<agentzh>
do you think it's feasible to simply using vardecl* as the keys to that stat_decls map?
<agentzh>
*to simply use
<fche>
agentzh, looking
<fche>
yes clearly some of that structure needs to change if the scope of these objects changes
<fche>
(similar as the case of arrays)
<fche>
by the way, have you considered using macros instead of functions for your purposes?
<agentzh>
fche: thanks for the suggestion. i'm afraid macros are not powerful enough since i have some statements that must be used on the expression level.
* agentzh
is missing gnu C's ({...}) construct in the stap language.
<fche>
that one's a possibility; we added %{ %} expressions at one point
<agentzh>
fche: i thought about using return/next statements right after exit(), but it won't work for nested function calls.
<agentzh>
we want to unwind the whole call chain.
<fche>
exit() error("") then?
<agentzh>
and there's no way for the caller function to know there is an "exit" on the stap language level.
<agentzh>
fche: i tried error("") too but it produces an error which is a false alarm :)
<agentzh>
it generates a big red ERROR to stderr ;)
<fche>
error("THIS IS A NORMAL EXIT, IGNORE THIS RED THING") :-)
<agentzh>
fche: oh yeah, sorry for being nitpicking.
<fche>
heh, yeah, I do understand. just don't think we can practically change that now
mjw has joined #systemtap
<agentzh>
fche: so you won't accept a patch to abort the execution flow immediately in exit()?
<agentzh>
how about a build-time option?
<fche>
we could make error("") more quiet e.g.
<agentzh>
but in the general case, it is really important to unwind the call chain for nested function calls.
<fche>
... especially if you start contemplating creating function-local arrays that need cleanup
<fche>
but yeah, I'd suggest pursuing the error("") case - to be prettier - or a new exit() type function that does the same thing, but not changing the old exit()
<agentzh>
fche: yeah, i understand the requirement of being backward compatible.
<agentzh>
i thought about treating error("") specially too, just thought it was a bit too hacky for you guys to accept.
<agentzh>
how about exit_now() ?
<agentzh>
i havne't got that far of designing function-local arrays but i got your point :)
<agentzh>
error("") always looks like a hack to me :)
<fche>
well, at least the error mechanism is designed to unwind cleanly and set an exit-equivalent flag too
<fche>
hey, you could always use error("") as a Very Clean exit, and wrap your probes in a try {} catch { exit() }
<agentzh>
yeah, i'm reusing the same mechanism for error() for my exit() patch.
<fche>
see also stap --suppress-handler-errors
<agentzh>
fche: yeah, that's one way to make it work, though still not very beautiful.
<agentzh>
fche: i'll submit a patch to implement exit_now(), is that okay?
<fche>
will be interested how it looks / works
<agentzh>
okay, great, it would also be a great chance for me to learn new things from you guys :)
<fche>
could call it abort() perhaps
<fche>
WE DON'T LIKE TEACHING
<fche>
ok well, maybe we do
<agentzh>
okay, abort() then :)
<agentzh>
well, giving feedback on patches is already the best teaching :)
<agentzh>
it does not have to be like formal teaching at all.
<fche>
and please ping regularly if we don't get to it quickly enough. I know there's a backlog :(
<agentzh>
sure, thanks.
<agentzh>
fche: i also have another patch to implement a new builtin, @vma(addr, module).
<agentzh>
for converting relative addresses in ELF to real VM address in the specified module.
<agentzh>
it invokes the vma tracker to do the runtime conversion if the module is PIC/PIE.
<agentzh>
do you like it?
<agentzh>
the motivation is to do dwarf-less probing for @var(...).
<fche>
aha, neat.
<agentzh>
and also for obtaining userland addrs in the target process which cannot be achieved by @var(...)
<agentzh>
like C's literal string data in the .rodata section.
<agentzh>
fche: glad you like it, i'll submit the patch to the mailing ist then.
<fche>
hm, those addresses aren't easy to handle by a normal user
<agentzh>
well, we already have dwarf-less probing support in the probe spec.
<agentzh>
where we just specify the relative addresses.
<agentzh>
like probe process.statement(0xdeadbeef) { ... }
<agentzh>
so already along the same line?
<agentzh>
but yeah, it's not meant for normal stap users, just advanced ones.
* agentzh
is using stap as a VM anyway.
<agentzh>
fche: i also have patches to add support for comma expressions (or compound expression) and allowing use of parentheses right after the unary '&' operator to the stap language. will send them soon too.
<agentzh>
fche: since you are around, are you open to adding native floating-point number support to the stap language and runtime? we're also working on it.
<agentzh>
do you already have some ideas or plans for it? i saw you replied to a stackoverflow question related to floating point number handling in stap.
<agentzh>
i guess we'll have to do software floating-point arithmetic in the kernel module.
<fche>
yes, almost certainly we'd have to emulate it
<fche>
the -fno-var-tracking-assignments -fno-var-tracking stuff should come out of your kbuild generally, it's a goofy artifact of a poor linus decision several years ago
<fche>
their presence degrades debuginfo quality generally within functions / inlines
<fche>
the more immediate problem is: -femit-struct-debug-baseonly
<agentzh>
running the command "make -j6 installcheck-parallel" gives the error "ERROR: Couldn't find library file site.exp." seems like running "touch testsuite/lib/site.exp" manually myself fixes it. is it the right way to work around it?
<agentzh>
i'm seeing it on both fedora 26 and fedora 27.
<fche>
hm, over here, the first thing installcheck-parallel does is create site.exp
<agentzh>
the paste is the file testsuite/artifacts/systemtap.unprivileged/unprivileged_myproc/systemtap.log
* fche
is missing irker
<fche>
agentzh, I do believe you're right, that set of tests has gone a bit crazy
<agentzh>
oh, good to know it's not my env going crazy :)
<fche>
maybe we are both crazy, pal
<agentzh>
lol
<agentzh>
do we have a CI service to keep track of the test suite passing rate and etc for the master branch?
<fche>
yes, but not a really public one
<agentzh>
oh, pity.
<agentzh>
as a contributor, i just want to know what failures are expected what are not.
<agentzh>
*and what are not
<agentzh>
ideally, all the tests are always passing on master.
<fche>
yeah, we're actually starting to investigate a more formal way of classifying testsuite results in general
<fche>
ideally, yeah ... but in practice systemtap is uniquely environment-sensitive (gcc, kernel versions, configurations, architectures) that a lot of things are out of our hands
<agentzh>
i wonder how other backends dyninst and bpf run the current test suite.
<agentzh>
do they run at all by default?
<agentzh>
are they enabled by some fancy system environments?
<fche>
some of the .exp files explicitly iterate over runtimes
<agentzh>
or do they do their own (sub) test suite?
<fche>
foreach runtime [get_runtime_list] { ... }
<agentzh>
fche: okay, i see. it needs to be explicitly turned on in the .exp files.
<fche>
yeah. the runtimes are not interchangeable in power unfortunately
<agentzh>
yeah, i understand.
<agentzh>
the bpf runtime always hangs on my side for userland probing.
<agentzh>
and the dyninst runtime lacks vma tracking. alas.
<fche>
bpf hanging would be good to know of
<agentzh>
fche: will dig it up with more details when i have the tuits.
<fche>
haha yeah. bugzilla PR appreciated when you have time.
<fche>
we should create a tuit dispenser
<agentzh>
sure.
<fche>
hmmmmm a stap script that monitors for idle keyboard
<agentzh>
lol
<fche>
and after a few minutes of idleness, gives you a tuit
<fche>
to use for good, not evil
<agentzh>
we're very interested in the bpf backend, so don't worry.
<agentzh>
it's great to see that stap emits bpf bytecode directly without involving llvm/gcc toolchains.
<agentzh>
bcc is much more heavyweight in that regard.
<fche>
oh yeah
<fche>
we're learning a lot from bcc/etc., a bit amused at its limitations sometimes :)
<fche>
but yeah we'll try to do a good job ... and darn it's fast when it works!
<agentzh>
good to know. bcc lacks debuginfo support so it's more like dwarf-less probing :)
<agentzh>
yeah, stap --bpf is so fast when it works.
<agentzh>
reminds me of the good old dtrace days :)
<fche>
heh, and for good reason, similar tech
<agentzh>
yep
<fche>
still bummed out about the kernel-side limitations of bpf, but who knows, over time we may be able to nudge them toward some higher capability/power
<agentzh>
fche: or maybe with a separate stap-bpf kernel module residing in the kernel space for good :)
<fche>
well, if we could have gotten one in 10 years ago :)