fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<fche> hey
<fche> agentzh, if we need grammar changes like that, I don't oppose them strongly, but they need to be noted in NEWS and made conditional on systemtap_v, so older scripts can run with --compatible=3.2
<agentzh> fche: will do that. thanks. it makes sense.
<fche> yeah, and similarly the quit/abort function is a definite user-relevant change that should get a blurb in the NEWS
<agentzh> fche: gotcha :)
<agentzh> btw, you mean --compatible=3.3, right? i see the last release is 3.3.
<agentzh> fche: a quick question: does the item order in NEWS matter? or should i just append to the end of the existing list for 4.0?
<agentzh> it does not seem to be follow some apparent ordering or arrangement to me.
<agentzh> *following
<fche> that part's kind of a random order
<fche> t
<fche> the release notes unscramble it
<agentzh> okay, i see.
<agentzh> fche: i've seen that existing code uses something like has_version("2.3") to do the conditionals. what version number should i use? 3.3 or 4.0?
<agentzh> should it be the next release or the last release number?
<agentzh> scanning the git history for existing instances of has_version(), it seems to be the former.
<agentzh> sorry, the latter, a previous release number.
<fche> you can compare for <4.0 as opposed to <=3.2
<agentzh> i'll try reusing the parser class's input.has_version("") API.
<fche> yup
<fche> do we have any instances of this ?: ?: nesting already? I bet we do somewhere in the tapset
<agentzh> nesting ?: ?: should work as before.
<agentzh> it's just a ? b : c = 3 now requires to be rewritten as a ? b : (c = 3)
<agentzh> but yeah, i'll add some more tests to cover nested ?:
<agentzh> and also for various --compaitble xxx cases.
<fche> yeah. I think the parser has some code to detect version-sensitive constructs; might want to look for that and check that it triggers for these constructs
<agentzh> does --compatible specifies a version inclusive or exclusive? in other words, for --compatible 3.3, does it mean the incompatibility is introduced *after* 3.3, excluding 3.3?
<fche> from the stap man page, it is the version of stap release that the given script is designed to run on
<agentzh> okay, thanks. i got confused by the word "beyond" in the usage text.
<fche> (by the way, in your tests ... probe begin
<fche> { ... exit () }
<fche> may be expressed as probe oneshot { ... }
<agentzh> ah, that's a nice trick.
<agentzh> i'll update newly written tests.
<agentzh> do you want me to update tests in existing patches too ?'
<fche> doesn't matter
<agentzh> okay, gotcha.
<agentzh> thanks for the reminder.
<agentzh> *tip
<fche> hm, that parser warning I was looking for is not quite what this would need (to help us find any preexisting code that needs those extra parens)
<fche> it was elaborate.cxx systemtap_v_check-related code line 1935ish
<agentzh> fche: by running the full test suite, i only found one existing test case requiring extra parens in ?:
<agentzh> and i'll fix that in my v2 patch.
<agentzh> it was under systemtap.examples/, ansci_colors2.stp
<fche> ok. would be curious how you found that only one needed changing
<fche> wonder if there is syntax that would be accepted with both variants of the grammar, but resulting in a different AST and thus different semantics
<fche> that's the scary part of changing grammar
<fche> how do you know that you're done adapting code?
<fche> it could be simply safer to leave the code as is, and note in the docs the slightly different precedence than in C, and then have your generator tool emit extra parens
<agentzh> fche: i wrote a script to do auto comparison and filters out true new test failures (which is a bit tricky due to random test failures in some test files).
<fche> an 'stap -v -p1' would pretty-print parse trees
<agentzh> i had to run the full test suite against the current master and my own branch with the patch applied for several times. and use my own script to analyze the results.
<fche> comparing with & without --compatible=foo could also be okay
<agentzh> fche: in the case of lacking parentheses, there will be an error. the current stap error is very similar to gcc's. so i think we're good?
<agentzh> it's not a warning.
<agentzh> so we would not end up with different ASTs anyway.
<agentzh> use of binary assigment operators directly after ? : 's 3rd operand will always lead to a compile-time error.
<agentzh> so i believe we're safe here.
<agentzh> in the case of gcc for similar code patterns, a similar error is given as well.
<fche> good night dude
<agentzh> night
orivej has quit [Ping timeout: 252 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 272 seconds]
orivej has joined #systemtap
wcohen has joined #systemtap
ericlee has joined #systemtap
<ericlee> how aggregation works in systemtap? so what exactly "1" does in this: reads[execname(),pid()] <<< 1 ?
<ericlee> thanks
ericlee has quit [Remote host closed the connection]
ericlee has joined #systemtap
<ericlee> Running under fedora 27, and saw following errors:
<ericlee> Missing separate debuginfos, use: debuginfo-install kernel-core-4.17.3-100.fc27.x86_64
<ericlee> but the package exists already, any fix? Thanks
<agentzh> ericlee: version mismatch?
<agentzh> it has to be of the exact version number as `uname -r`.
<agentzh> "1" is a data item. it can be other integer values.
<agentzh> or a data point.
<agentzh> in your example, maybe that script only cares abut the "count", so the exact data point value does not really matter (1 is just convenient, which can also be any other number too).
<agentzh> but for "sum", "avg", "hist_log", and etc, the data point values definitely make huge difference.
<ericlee> thanks agentzh but I still don't understand the usage of <<<, can u give me some small example?
orivej has quit [Ping timeout: 252 seconds]
ericlee has quit [Remote host closed the connection]
orivej has joined #systemtap
slowfranklin has joined #systemtap
pwithnall has joined #systemtap
orivej has quit [Ping timeout: 244 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 276 seconds]
wcohen has quit [Ping timeout: 252 seconds]
naveen1 has joined #systemtap
naveen has quit [Ping timeout: 264 seconds]
<fche> agentzh, btw [man stap] does discuss <<<, too bad ericlee's not here
orivej has joined #systemtap
orivej has quit [Ping timeout: 272 seconds]
brolley has joined #systemtap
tromey has joined #systemtap
<agentzh> fche: *nod*
<agentzh> fche: i sent quit v2 patch last night: https://sourceware.org/ml/systemtap/2018-q3/msg00130.html
<agentzh> please let me know if i should do any further changes. thanks!
slowfranklin has quit [Quit: slowfranklin]
orivej has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
orivej has joined #systemtap
slowfranklin has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
slowfranklin has joined #systemtap
<agentzh> fche: also sent the ternary op v2 patch with changes you suggested: https://sourceware.org/ml/systemtap/2018-q3/msg00131.html
<fche> re. ternary, would suggest parse_expression() as the false branch rather than parse_assignment, to match the earlier behaviour
<fche> otherwise looks good, thanks!
slowfranklin has left #systemtap [#systemtap]
<agentzh> fche: okay, will make that change and commit :) thanks!
pwithnall has quit [Ping timeout: 268 seconds]
<agentzh> fche: btw, parse_assignment() is currently exactly the same as parse_expression() and i figured that it would be safer to be more specific in case we introduce operators with looser precedence above assignment. but anyway, i'll use parse_expression () as you wish :)
<fche> yeah, to guarantee compatibility
<agentzh> fche: changed and committed. thanks.
<fche> re. quit()
<fche> silly toolshed, but consider renaming quit -> abort throughout?
<fche> re. the bpf case, instead of printf("..."), you could just run error("aborting")
<fche> it shouldn't need both _set_exit_status() and exit()
<fche> a new test would be nice which exercises something other than boring old probe begin for abort purposes
<fche> maybe timer.profile .... which can easily run concurrently on different cpus
orivej has quit [Ping timeout: 268 seconds]
<agentzh> fche: re quit -> abort renaming, your call. i'll make that change if you want me to :)
<agentzh> re timer.profile tests, sure, i'll add some.
<fche> anyone else with opinions on that name?
<agentzh> fche: if we name it abort(), we still keep the stap exit code zero, right?
<agentzh> otherwise it won't work for our purposes.
<fche> sure. those who want a nonzero exit code can say error("something") abort()
<fche> hm, actually they can't :)
<agentzh> *nod*
<agentzh> oh, they can't. indeed.
<fche> because error() will already unroll the stack, the abort() is unreachable
<agentzh> *nod*
<agentzh> error() never returns.
<fche> so what we need here is fork()
<fche> :)
<fche> oh no got it
<fche> try{ error("")} catch {} abort()
<fche> just sort of rolls off the tongue :-)
<agentzh> yeah, this works.
<fche> the abort() documentation should point out that it cannot be try/catch caught
<agentzh> okay, noted.
<agentzh> will add that.
<agentzh> btw, for timer.profile tests, is it good enough to simply probe the current whole system? or do i need to invent a silly hot-looping userland C program with multiple threads?
<fche> shouldn't need any special workload
<agentzh> okay, got it. thanks.
<agentzh> fche: is it okay to commit this patch to fix my previously committed test files? https://sourceware.org/ml/systemtap/2018-q3/msg00133.html
<fche> will leave those to your discretion
<agentzh> fche: okay, thanks!
<agentzh> fche: i'm considering adding some bpf tests. it seems that it's common practice to copy tests over to the systemtap.bpf/ folder?
<agentzh> instead of using the get_runtime_list tcl proc?
<fche> bpf-only tests don't need the get_runtime_list
<agentzh> get_runtime_list does not include bpf yet.
<agentzh> so i wonder what's the way to test existing tests with bpf in the meantime time (before bpf is mature enough to go into get_runtime_list).
<agentzh> *the recommended way
<fche> follow the pattern of the existing bpf tests
<agentzh> okay
<agentzh> the current pattern is not suitable for collaborations IMHO since it's all in a giant bpf.exp test file.
<agentzh> it's easy to get conflicts.
<agentzh> and also hard to do auto-test-generation.
<agentzh> can we allow adding new .exp test files under systemtap.bpf/ ?
<agentzh> just like the other subdirectories (eg systemta.base)?
<fche> certainly
<agentzh> okay, cool.
<agentzh> then no problems for me :)
orivej has joined #systemtap
<agentzh> fche: the bpf runtime's error() tapset function has bugs which cannot be used for abort(): https://sourceware.org/bugzilla/show_bug.cgi?id=23580
<fche> that's ok
<agentzh> i'll leave it for now then.
<agentzh> fche: btw, is this timer.profile + abort() test case good enough? https://pastebin.com/6Daj4JHC
<agentzh> or do you have more tricky ones in your mind?
<fche> that one has the drawback that the global rest variable incurs mutex locks amongst the timer.profile probe handlers
<fche> I'd just abort() in the plain handler, plus add that printf("fire") message afterwards (to show that it is NOTREACHED)
<fche> that way they can run concurrently
<agentzh> fche: i'll try.
<agentzh> fche: like this? https://pastebin.com/pBHt8K2X
<fche> sure
<agentzh> great. thanks.
<agentzh> fche: btw, Zexuan Luo is also on our team. i asked him to hack on stap with me :)
<fche> great
<fche> all the work is a bit overwhelming :)
<agentzh> heh
<agentzh> we want to move fast :)
<agentzh> fche: abort v3 patch is sent: https://sourceware.org/ml/systemtap/2018-q3/msg00137.html
<agentzh> with all your suggestions implemented (hopefully)
<fche> yup. I don't know if it matters, but given that _stp_abort_flag is going to be set & read possibly by multiple cpus concurrently, maybe it should be an atomic_t like the session_state()
<agentzh> fche: i was thinking about that too, but i convinced myself that setting a flag in only one direction is always safe anyway?
<agentzh> or maybe i'm wrong here?
<fche> it's probably 99% safe
<fche> and yeah, an atomic read every place would probably suck for purposes of performance
<agentzh> your call :)
<agentzh> i don't mind rolling out a v4 patch.
<agentzh> the performance concern is indeed valid.
<fche> yeah, the reading part is done many times within typical probe handlers
<agentzh> i think the worst case is that one cpu reads a zero when it is setting to one.
<agentzh> this should be fine behavior, i guess?
<agentzh> checking the flag is not at real time anyway.
<fche> -if- that were the worst case, yes, but am not convinced that with zero concurrency control lower level insns, that is the case
<agentzh> cannot imagine cases worse than that.
<agentzh> setting is only to 1, from either 0 or 1.
<fche> heh, that's not quite the standard of proof for concurrency software :-)
<fche> it's the cross-probe fine-granularity aborting which is the difficulty here
<agentzh> not trying to offer a proof, but trying to learn something new :)
<agentzh> if this will blow up the cpu on the instruction level, then i would learn something new :)
<fche> I am humble enough not to be sure that nothing else can go wrong with a non-concurrency-controlled global flag
<agentzh> that's respectable :)
<agentzh> so what's the next move? waiting for more feedback?
<fche> hm, a couple of possibilities
<fche> could benchmark stap with an atomic_t alternative (both in the reading & writing cases), see if the effect is measurable
<fche> 2) could reconsider whether the systemwide instant stoppage of all probe handlers is actually necessary, or whether you can let the other handlers finish peacably
<fche> 3) could look for another mechanism, maybe lighter than atomic_t, just for simple flags
<agentzh> any candidates for 3)?
<agentzh> 2) does not work for us.
<fche> can you elaborate why?
<agentzh> 1) looks tricky to do fair benchmarks.
<agentzh> other handlers can 1) have side effects which clobber the tool's output, 2) make the exiting take much longer than desired.
<fche> much longer? how? probe handlers run fast (-DMAXACTION statements total)
<agentzh> fche: a sec. i need to think a bit more about it :)
<agentzh> so you are suggesting making abort() just abort the current probe handler instead of all the running handlers in 2)?
<agentzh> so we just turn abort() into a special error()?
<agentzh> and simply set and check c->aborted, for exmaple?
<agentzh> is that what you are saying?
<agentzh> i'm good with this change if that's what your 2) suggestion is :)
<agentzh> you want me to update the patch?
tromey has quit [Quit: ERC (IRC client for Emacs 26.1.50)]
<agentzh> fche: i wonder whether the existing _stp_exit_flag global has the same concurrency problem.
<fche> there is a chance
<fche> the main flag though is that genuine atomic_t session_state() value
<agentzh> okay
<agentzh> fche: btw, what do you think of this patch please? https://sourceware.org/ml/systemtap/2018-q3/msg00138.html
<agentzh> for bare return statements.
<fche> the parser part is more tricky than that; we do not have compulsory ; statement terminators
<fche> so return a = 4 would be parsed differently before & after
<fche> (the fact that return is a special statement that precludes execution of subsequent statements is a separate issue)
<fche> is there something difficult about return 0 for such cases?
<agentzh> no, return a = 4 should parse like before.
<agentzh> right now the parser only checks if the next token is ';' or '}'.
<agentzh> if so, then a bare return is recognized. otherwise it parses just like before.
<fche> ok, so a strict extension
<agentzh> i'm not sure how far we should go with omitting ';', since it is ambiguous per se. maybe we could also check for statement prefixes like `for`, `if`, `while` and etc?
<agentzh> return 0 looks like a hack and will change the current function's return type, which is not desired.
<fche> if it were for any other statement type (one that didn't abort control flow), yeah the other statement keywords following would have to be dealt with
<fche> change current function's return type ... if it was unknown, it changes it to long ... but nothing else. after all, nothing must be reading that value. so why is it a problem?
<agentzh> because certain function calls should never be used in expressions with the pseudo values. such use cases should lead to compilation errors.
<agentzh> it's just like why C should have void functions ;)
<agentzh> it's for better code and also docs.
<agentzh> and better compile-time error messages.
<fche> ok, not sure of the value, but there isn't any serious cost, so why not
<fche> one thing though: in the pretty-printer, it better print return; instead of return
<fche> unless you can be sure the parent structure will add that semicolon
<fche> the NEWS & stap man page would have to identify that this void return is only accepted at the end of a block or with a ;
<fche> (language/grammar changes also need to be mentioned in the man page)
<agentzh> fche: sure, will do. thanks!
<agentzh> i'll add more tests to cover those.
<agentzh> fche: re it better print return; i've just added a test case and it seems that the pretty printer already adds a semicolon at the end.
<agentzh> so we're already good :)
<agentzh> i'll add support for "return if ..." and etc.
<fche> no need really; just as long as docs are clear
<agentzh> ah, i already did.
<fche> and it's not that simple anyway, you may need to deal with symbols and other stuff too, not just the few statement keywords.
<agentzh> *nod* so i should remove it?
<fche> re. the pretty-printer, check in different contexts like if foo return while(1) return
<agentzh> okay
<agentzh> tested your example, still fine.
<agentzh> i'll add it to my test suite.
<agentzh> anyway
brolley has left #systemtap [#systemtap]
<agentzh> fche: just sent return stmt v3 patch according to your feedback: https://sourceware.org/ml/systemtap/2018-q3/msg00139.html
<agentzh> fche: just sent abort v4 patch: https://sourceware.org/ml/systemtap/2018-q3/msg00140.html
<agentzh> now it's using c->aborted just like the existing c->last_error flag.
<agentzh> please let me know if it's good enough now :)