#systemtap on 2018-08-08 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

00:36 orivej has quit [Ping timeout: 244 seconds]

00:45 wmealing has joined #systemtap

00:45 * wmealing waves

02:31 fche has joined #systemtap

06:19 aryehw has quit [Quit: Leaving]

07:00 ema has quit [Remote host closed the connection]

08:19 CME has quit [Ping timeout: 265 seconds]

09:47 orivej has joined #systemtap

10:40 mjw has joined #systemtap

11:07 orivej has quit [Ping timeout: 256 seconds]

11:40 wmealing has quit [Remote host closed the connection]

13:30 <invano> hey fche it's me, again

13:31 <invano> I'm seeing a kernel hang on mips I've never seen before, when I was using systemtap 2.7 and an older kernel.

13:31 <invano> I have a simple script probe nd_syscall.*, probe nd_syscall.*.return

13:32 <invano> It appears there is a problem on kretprobes and I'm debugging the kernel right now

13:33 <invano> basically the kernel enters the kretprobe trampoline but the return address is not updated so the trampoline jumps over itself

13:33 <invano> global symbol "kretprobe_trampoline_holder" in kernel

13:34 <invano> This happens if I include all syscalls and, instead, it's not triggered if I probe only a bunch of them

13:36 <invano> I'm checking the code of systemtap/kernel and I'm debugging right now, I was only wondering if something like this ever happened in the past and/or you have some feelings on what could cause this

13:44 <fche> hi

13:44 <fche> doesn't sound too familiar, but the recent meltdown/spectre workarounds did impact k*probe operation at some point, causing all kinds of fun crashes/failures

13:45 <fche> so maybe worth looking over the date range of the kernel and avoid anything from ...dunny ... january-july 2018 ? (being paranoid)

13:46 <fche> if the failures are limited to kretprobes, that's more likely to localize to a piece of kernel or perhaps stap code

13:46 <invano> but I'm on mips

13:46 <fche> understood, I'm saying I wouldn't be surprised if those bugs also hit that

13:46 <invano> ahh ok

13:46 <invano> sorry

13:47 <fche> (but that's just speculation on my part)

13:47 <invano> yeah got it

14:08 brolley has joined #systemtap

14:18 orivej has joined #systemtap

15:02 tromey has joined #systemtap

16:44 <agentzh> fche: committed. thanks!

16:44 <fche> thanks

16:45 <agentzh> fche: what do you think of this small patch? https://sourceware.org/ml/systemtap/2018-q3/msg00055.html

16:45 <agentzh> is it okay to commit it too?

16:46 <fche> nope, that's not related

16:46 <fche> we need -Werror for the on-the-fly-compiled .c bits that stap itself generates

16:46 <fche> (like that comment block says)

16:47 <agentzh> it's still using -Werror by default.

16:47 <fche> a build-tree configure option is too blunt an instrument

16:47 <agentzh> i just add an option to disable it at compile time of stap.

16:47 <fche> that'll also nuke the stapconf* bits that we need

16:48 <fche> would be interested in seeing the a == a cases and the accompanying warnings/errors

16:48 <fche> maybe we can cure them with a more finely tuned -W option, or by changing our generated code

16:49 <agentzh> fche: see https://pastebin.com/468bLyLY

16:50 <agentzh> that's our test cases to cover that patch.

16:50 <fche> wouldn't mind adding '-Wno-tautological-compare' to the buildrun.cxx-generated makefiles

16:51 <fche> to suppress that warning, as opposed to suppressing warnings-as-errors

16:51 <agentzh> that sounds good to me.

16:51 <agentzh> oh, btw, will you be open to adding an alterative test scaffold to stap? like the one i just showed? it would be much more easier to write new tests or debug test failures in existing tests.

16:52 <agentzh> *much easier

16:52 <agentzh> it's more declarative and data driven.

16:52 <fche> that's a tough one

16:52 <agentzh> we can keep both in the official source tree.

16:52 <fche> for that particular case, you could just have added a testsuite/buildok/FOOBAR.stp file with that one-liner in it, that's all

16:53 <agentzh> right now, it requires writing several small files for the same test case.

16:54 <agentzh> it would be nice to put all the small pices together, as in this test case for our @vma() patch: https://pastebin.com/PxSNdmFk

16:55 <agentzh> it will encourage us to write way more tests for our patches :)

16:56 <agentzh> and it supports parallel testing too, just run the command "prove -j8 t/*.t" where t/*.t are the test files.

16:56 <agentzh> multiple backends are supported too, by default both kernel and dyninst runtimes are run, and each test case can explicitly turn on or off a particular runtime by specifying "--- bpf" or "--- no_kernel" and etc.

16:58 <agentzh> the test output is also much less verbose: https://pastebin.com/iJCL8dHv

17:00 <agentzh> and test failures are much easier to see as well: https://pastebin.com/cCLefpGS

17:00 <agentzh> no need to dig up separate systemtap.log files for failure details...

17:02 <fche> there's normally a single (big) systemtap.log

17:02 <fche> anyway I see kind of what you mean - some things could be simplified -- but with a bit of dejagnu/tcl work, one could automate the multi-runtime thing too

17:04 <agentzh> the most painful bit is that we now have to write several separate small files to write a single test case, like a .stp file, a .c file and a .exp file.

17:04 <fche> would have a hard time justifying a second test framework (with new prereqs, incompatible reporting)

17:04 <fche> agentzh, let's try simplifying that further; as I mentioned we have done that for the -p4 cases, and also for syscalls

17:05 <agentzh> and even worse, to see what's going on with a test failure, we have to dig a big and separate systemtap.log file instead of simply having a quick glance at the test run output on the terminal.

17:05 <fche> if you can characterize a new family of tests that would benefit from abbreviation, please describe them

17:05 <fche> one can run a single test case with dejagnu (make installcheck RUNTESTFLAGS=foobar.exp)

17:05 <agentzh> yeah, i know that RUNTESTFLAGS thing.

17:06 <agentzh> in the new test scaffold, it's as easy as adding a --- ONLY line to the test block in question.

17:06 <agentzh> or a --- SKIP line to skip it.

17:08 <agentzh> re tests that would benefit from abbreviation: https://pastebin.com/HV2ua2He these are our test cases for the @vma(addr, module) feature we just did.

17:08 <agentzh> they are like documentation.

17:09 <fche> we also have bunch of .exp files that carry .stp / .c parts within them

17:10 <fche> I am not a fan of that style, but that's easily done there too

17:10 <agentzh> and this is the test file for the stat-typed function parameters feature: https://pastebin.com/w7HPUJDv

17:10 <agentzh> all these tests are already passing completely on my side, btw.

17:12 <agentzh> fche: re .exp files that carry .stp / .c parts: that's still tcl coding though, but sure it could be hacked up.

17:12 <fche> yeah, much like how these .t files are just python coding :)

17:15 <fche> anyway ... I'm not going to ask that all that .t work be redone as .exp - that's not your burden :). I don't mind pulling in the .t files, but understand that we're not really in a position to run them

17:15 * fche doesn't actually recognize the testsuite framework here; is it a perl thing?

17:29 <agentzh> yeah, it's a perl thing.

17:29 <agentzh> it's based on perl's Test::Base framework.

17:29 <agentzh> but the perl code is just 2 lines at the beginning of each .t file.

17:29 <agentzh> and they are always the same :)

17:30 <agentzh> but yeah, i do understand your concerns. they are all valid points. i think we'll just make the test scaffold emit .exp/.c/.stp files targeting the stap's current test scaffold. that's the beautify of being data driven and declarative. it would be much much harder the other way around, if not impossible.

17:31 * agentzh used to write a tool to convert gnu make's perl 4 testing code into the Test::Base data driven syntax.

17:31 * agentzh remembers the pain of parsing perl 4 code.

17:32 <fche> that sounds like a plan. emitting a single .exp file from your .t is also possible, if it helps

17:33 <agentzh> fche: yeah, that would be definitely easier.

17:33 <agentzh> do you have such .exp files for our reference. would be great to have some samples :)

17:33 <agentzh> *?

17:37 <fche> systemtap.base/pr18649.exp e.g.

17:37 <fche> (I don't much like this model because it tends to create temporary files, which are gone by the time one may want to hand-rerun the test

17:38 <agentzh> thanks! as long as you would accept our patches with such tests :)

17:39 <fche> would be glad to take a look

17:39 <agentzh> okay, cool. we could always change the model of the emitted tests anyway. they are auto-genetated :)

17:39 <fche> and I wouldn't be surprised if we can make the .exp system more declarative for our use cases

17:40 <agentzh> yeah, sure, that'll deifnitely be possible though needing quite some work.

17:40 * agentzh lacks the motivation to hack tcl/expect/gnudeja

17:40 <fche> hehe yeah

17:40 <fche> understood

17:41 <agentzh> i'll try to get some generated .exp files to show you soon.

17:42 <fche> ok

17:42 <agentzh> thanks for your feedback.

17:42 <agentzh> i've already got the stat-func-arg feature working fully. i'll also continue working on the array-func-arg feature today.

17:43 * agentzh likes to move fast.

17:44 <fche> is the idea there to pass aliases of the entire array-of-whatever or singleton-stat to a function?

17:44 <fche> i.e., pass by reference? that's different from the normal pass-by-value approach

17:45 <agentzh> yeah, it's passing by references.

17:45 <agentzh> and i had to change the optimizers and analyzers in elaborate.cxx to follow references.

17:46 <agentzh> like varuse collector, stat decl collector, and etc.

17:47 <agentzh> so that a function can be shared among different stat vars and different arrays.

17:47 <fche> and what if two different signature arrays are passed

17:47 <agentzh> then an arity mismatch error would be emitted at Pass 2.

17:48 <agentzh> or Pass 3?

17:48 <agentzh> the same applies to incompatible stat types.

17:48 <agentzh> like a hist log and a hist linear

17:48 <agentzh> but count/avg/min/sum stat ops would be merged and collected among all the ref graph.

17:49 <fche> hm, we probably talked about this, but are you sure that macros are not sufficient to express this?

17:49 <agentzh> nope, macros lack control flow and statement support.

17:50 <agentzh> and also lacking code sharing, it's inlining per se :)

17:50 <agentzh> oh, btw, i'm thinking about adding backtrace info to stap's runtime errors.

17:50 <fche> https://pastebin.com/w7HPUJDv <-- from here, which TEST would be the best demonstration why a macro isn't enough?

17:50 <agentzh> it would be much easier to debug huge .stp files.

17:51 <fche> that could be useful

17:51 <agentzh> so when stap unwinds the function calling stack with c->last_error, it can also append the current frame info.

17:51 <fche> or just store the function name in a context->locals[] array slot

17:52 <fche> and if an error is detected. record the nesting depth, then traverse that array to the noted depth at reporting time

17:52 <agentzh> re which TEST would be the best demonstration why a macro isn't enough? not in those tests, i'll write you a more realistic one.

17:53 <fche> ok, curious

17:56 <agentzh> fche: https://pastebin.com/q0tunypb

17:56 <agentzh> not exactly correct, but it shows the idea.

17:57 <agentzh> for this particular example, we could also make delete statement work on the expression level.

17:57 <agentzh> so that a macro would also work.

17:57 <agentzh> but now, it can't.

17:57 <fche> so a plain statement-expression extension would do though?

17:57 <fche> the equivalent of the gnu-c ({ }) ?

17:58 <agentzh> for this very example, yes.

17:58 <agentzh> but because i'm working on a general c-to-stap compiler, it needs something much more general.

17:59 <fche> are you sure? the compiler could emit ({}) stuff too, if that existed

17:59 <fche> IOW I wonder whether that single general facility in the scripting language would make unnecessary other more intricate & policy changes (w.r.t. passing arrays by reference)

18:06 <agentzh> there can be complicated control flow inside the functions.

18:07 <fche> macros expand to anything; ({ }) hypothetically can do loops etc. too

18:07 <agentzh> fche: the changes are not big. the patch for stat-func-arg is minimal.

18:08 <agentzh> fche: it cannot do return, can it?

18:09 <fche> ({ }) doesn't exist yet, so indeed can't ... an early exit from the stmt block seems reasonable though

18:09 <agentzh> the function may want to return something early when some condition is met.

18:09 <agentzh> sorry for the confusion, i was talking about gnu C's ({...}).

18:10 <fche> aha; we could do more than they, as long as the concept is simple

18:10 <agentzh> one of the biggest hurdles is the macro expansion's resulting code size.

18:10 <agentzh> our stap functions can be quite large and has many call sites.

18:10 <agentzh> and thoese functions may call other functions further.

18:11 <agentzh> and it would also be tricky to get runtime backtraces for macros in case of runtime errors.

18:11 <fche> yeah ...

18:12 <agentzh> but i do agree ({...}) has its own metrits.

18:12 <agentzh> *merits

18:12 <fche> wonder how serious the c-to-stap case should be taken ... we take a lot of algorithmic shortcuts in the optimization / etc. passes, with the presumption that stap scripts just aren't that large

18:12 <agentzh> it's very handy for code emitters for many cases where functions are not needed.

18:12 <fche> but if you think you want to generate a ton of code - where inlined code size starts to matter - then I wonder if other parts of stap will bog you down at least as badly

18:13 <agentzh> fche: not yet, we used to manually port tons of C code to stap and they work pretty well in production.

18:13 <agentzh> now we decide to stop doing that manually, it's really painful :)

18:14 <agentzh> and once we hit another hurdle, we can always try fixing it :)

18:14 <agentzh> right now, the biggest hurdle is the function arg thing.

18:14 <agentzh> it's a showstopper for us.

18:14 <fche> would've been good to hear it before a lot of work was done

18:15 <fche> just to see if all the options were explored okay

18:15 <fche> how does your c-to-stap translator need passing arrays to functions ?

18:15 <agentzh> understood. we just have a lot of pressure from the business side. so we'll try our best.

18:15 <fche> what sorts of c constructs map to that?

18:15 orivej has quit [Ping timeout: 240 seconds]

18:15 <agentzh> it is used to emulate C's output arguments.

18:16 <agentzh> for example.

18:17 <agentzh> so ideally stap could support scalar's references in function arguments too.

18:17 <agentzh> now we use single-element arrays a lot, which is wasteful.

18:18 <agentzh> but working though.

18:19 <fche> could instead use one big array, and use indexes?

18:20 <agentzh> we could, but right now we expose a "builtin array" type to the language level. the source language is a superset of C11.

18:20 <agentzh> our compiler emits python arrays for such constrcuts in its gdb/python backend.

18:20 <agentzh> it has multiple backends, not just targeting stap.

18:22 <agentzh> so aggressive optimizations would require aggressive sematnic analyses in our compiler.

18:23 <agentzh> *semantic

18:25 <agentzh> my current stat-func-arg patch is for total 400 lines, not much. most of the code is just small refactoring of existing code. true additions are much less.

18:25 <agentzh> i can show you what i already have.

18:25 <agentzh> a sec.

18:27 <fche> I'm concerned about the pass-by-reference change to the model too. not sure about that at all. maybe some new syntax for that?

18:34 <agentzh> i think arrays and stats should just be references.

18:34 <agentzh> that's natural.

18:34 <agentzh> it makes little sense to do C/C++'s copy-by-value by default.

18:35 mjw has quit [Quit: Leaving]

18:36 <agentzh> here we go: https://pastebin.com/sewLdTz4

18:44 <agentzh> it's still a bit messy due to the debugging code. not ready for formal review.

18:49 <fche> would have to think hard about making sure e.g. alias-detection algorithms work with this sort of thing, everywhere

18:51 <agentzh> *nod*

18:51 * agentzh has been thinking hard himself these days.

18:52 <agentzh> and that's also why i really want to get the official stap test suite running on my machines :)

18:54 <fche> yeah, definitely

19:56 tromey has quit [Quit: ERC (IRC client for Emacs 26.1.50)]

20:05 orivej has joined #systemtap

20:55 <agentzh> fche: about adding a quit() tapset func instead of abort()? Because abort() in C also results in core dump and erroneous exit code of the current process, which is not what we need here.

20:55 <agentzh> *how about

20:55 <fche> sure

20:56 <agentzh> ok, thanks

20:58 <agentzh> re or just store the function name in a context->locals[] array slot: yeah, i've been thinking along the same line, though i also need the line numbers, not just the function names.

20:59 <agentzh> but we could also store all the info in an array which we can quickly look up at runtime in case of errors.

20:59 <agentzh> that would sound fun.

21:00 <agentzh> we'll definitely give it a shot at some point.

21:39 brolley has left #systemtap [#systemtap]

22:53 zodbot has quit [Disconnected by services]

23:02 agentzh has quit [*.net *.split]

23:04 zodbot has joined #systemtap

23:48 fche2 has joined #systemtap

23:49 fche has quit [Ping timeout: 240 seconds]