fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
irker303 has joined #systemtap
<irker303> systemtap: wcohen systemtap.git:refs/heads/master * release-4.0-138-gc7232ec / testsuite/systemtap.examples/process/syscalls_by_pid.stp: Use statistical aggregates to reduce overhead and contention for global array http://tinyurl.com/y5qqhzj2
hpt has joined #systemtap
gromero has quit [Ping timeout: 252 seconds]
irker303 has quit [Quit: transmission timeout]
jistone has quit [Ping timeout: 250 seconds]
jistone has joined #systemtap
jistone has quit [Ping timeout: 245 seconds]
jistone has joined #systemtap
slowfranklin has joined #systemtap
chappar has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
slowfranklin has joined #systemtap
chappar has quit [Ping timeout: 256 seconds]
gila has joined #systemtap
DUKENUKE1 has joined #systemtap
DUKENUKEM has quit [Ping timeout: 252 seconds]
slowfranklin has quit [Quit: slowfranklin]
slowfranklin has joined #systemtap
hpt has quit [Ping timeout: 250 seconds]
mjw has joined #systemtap
wcohen has quit [Remote host closed the connection]
sscox has quit [Ping timeout: 246 seconds]
wcohen has joined #systemtap
orivej has quit [Ping timeout: 272 seconds]
sscox has joined #systemtap
mjw has quit [Ping timeout: 250 seconds]
mjw has joined #systemtap
orivej has joined #systemtap
irker481 has joined #systemtap
<irker481> systemtap: smakarov systemtap.git:refs/heads/master * release-4.0-138-g57d177c / bpf-translate.cxx: stapbpf PR22330 :: cleanup round 3 of n (sprintf typo fix) http://tinyurl.com/y55yurxd
<irker481> systemtap: smakarov systemtap.git:refs/heads/master * release-4.0-139-gc497891 / testsuite/systemtap.bpf/bpf_tests/string3.stp testsuite/systemtap.bpf/bpf_tests/string4.stp: stapbpf PR22330,PR23816 :: add testcases http://tinyurl.com/y5a6nxu7
<irker481> systemtap: smakarov systemtap.git:refs/heads/master * release-4.0-140-gf1e1d05 / stapbpf/stapbpf.cxx: stapbpf PR22330 fix :: support for non-contiguous active cpus http://tinyurl.com/y5nud8ao
<irker481> systemtap: smakarov systemtap.git:refs/heads/master * release-4.0-141-g3b3f974 / stapbpf/bpfinterp.cxx: stapbpf PR22330 cleanup and fixes :: note PR24358 may affect code in bpfinterp.cxx http://tinyurl.com/y3nx24hx
<irker481> systemtap: smakarov systemtap.git:refs/heads/master * release-4.0-156-gab7f8b7 / : Merge branch 'stapbpf/pr22330': generate printf via event tuples, userspace formatting postprocessing http://tinyurl.com/yyqw56fs
tromey has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
fche2 is now known as fche
khaled has joined #systemtap
orivej has joined #systemtap
<irker481> systemtap: wcohen systemtap.git:refs/heads/master * release-4.0-157-g0664080 / testsuite/systemtap.examples/io/iodevstats.stp testsuite/systemtap.examples/io/iostats.stp testsuite/systemtap.examples/io/iotop.stp: Use statistical aggregates in iodevstats.stp, iostats.stp, and iotop.stp http://tinyurl.com/yxtk2orb
slowfranklin has quit [Quit: slowfranklin]
<irker481> systemtap: wcohen systemtap.git:refs/heads/master * release-4.0-158-g2282f2c / testsuite/systemtap.examples/process/sig_by_pid.stp testsuite/systemtap.examples/process/sig_by_proc.stp testsuite/systemtap.examples/process/syscalls_by_proc.stp: Optimize sig_by_pid.stp, sig_by_proc.stp, and syscalls_by_proc.stp http://tinyurl.com/y24mjnjo
wcohen has quit [Ping timeout: 246 seconds]
<agentzh> fche: is that patch good enough?
slowfranklin has joined #systemtap
wcohen has joined #systemtap
<agentzh> ok, just saw your email reply.
<fche> righto
<agentzh> fche: what you suggested makes sense but i'm worried that it still has to iterate through all the global vars over and over again
<agentzh> my patch can skip that loop as well.
<agentzh> our biggest tool also has a lot of globals.
<agentzh> and that visit_embeddedcode method is called extremely frequently.
<agentzh> string_find_memoized() should be made a hash table for its own right.
<agentzh> i agree with that.
<fche> suggest changing the global to a hash table first as the quickest test
<agentzh> but that thing is also a brute force cache which trades a *lot* of memory for CPU speed.
<fche> second best would be moving the tables to a member inside the embedded* objects
<fche> it shouldn't be a lot of memory, with interned strings
<agentzh> large stp scripts do have a lot of distinct names and code.
<agentzh> yeah, i'll do some measurement.
<agentzh> for the hash table thing.
<fche> thanks. a sample large generated stp file for others to test too would be good
<fche> maybe a new PR for the artifacts / conversation ?
wcohen has quit [Ping timeout: 246 seconds]
wcohen has joined #systemtap
<agentzh> a standalone script might be a bit tricky since our script also uses some stap features which haven't been merged into the mainline stap. but i'll see what we can do better.
<agentzh> *how we can do better
<agentzh> i'm trying both of your suggestions regarding to the tagged_p thing.
<fche> ok
<agentzh> separately
<agentzh> is STL's unordered_map good enough?
<fche> yes
<agentzh> okay, thanks
<fche> back a month ago when I worked in this area, the globals we worked with there were all synthetic
<fche> so some of the underlying algorithmic matters could be bypassed
<fche> but not the case for your scripts
<agentzh> ah, i need to prepare my own hash functions for the pair...
<agentzh> fche: yeah, i was aware of that a bit.
<fche> hash left ^ hash right
<agentzh> aye
<agentzh> does it look alright?
<agentzh> is it what you suggested?
<agentzh> with this patch, it's much slower :(
<agentzh> -p2 time increases from 32s to 53s. ouch.
<fche> weird, any theory why?
slowfranklin has quit [Quit: slowfranklin]
<agentzh> i have proof :)
<agentzh> it's the hash function being the bottleneck.
<agentzh> so a localized hash table won't help much here either.
<fche> a localized hash table doesn't need to hash the embeddedcode c string
<fche> only the /* tag */
<fche> and there are only like six of them so the table will be small
<agentzh> okay, that makes sense.
<agentzh> then we'll need to clear the cache every time the ->code value changes.
<agentzh> i'll try this.
<fche> yup, but I believe it never actually changes after creation
<agentzh> the only places i found are the python2/3 code.
<fche> (we don't enforce that at the moment, but could in some ways)
<agentzh> as handled in my original patch.
slowfranklin has joined #systemtap
<irker481> systemtap: wcohen systemtap.git:refs/heads/master * release-4.0-159-g45b9dfe / testsuite/systemtap.examples/network/nettop.stp: Optimize nettop.stp example http://tinyurl.com/y6q8jaxc
<fche> I don't see the python embeddedcode usage -change- the text string after creation (almost immediately after object ctor)
<fche> that should be ok too
<fche> btw
<fche> my little firefox copy is driving itself mad trying to render your svg
<agentzh> fche: okay, i didn't look closer.
<agentzh> yeah, that svg is huge.
slowfranklin has quit [Quit: slowfranklin]
<agentzh> i have a strong box.
<agentzh> intel core i9-9900k + 64g ram.
<fche> mine's ok too, but firefox is not well equipped to process those things
<fche> gets into uninterruptible loops
khaled has quit [Quit: Konversation terminated!]
<agentzh> fche: 2nd attempt: https://pastebin.com/GZ2kKBEL
<agentzh> using embedded* object level caches.
<agentzh> does it look sane?
<agentzh> seems like we should also cache the interned_string keys otherwise interned_string's constructor is the bottleneck...
<agentzh> we always pass in const char * strings in thost tagged_p() method calls..
<agentzh> currently with that patch, the p2 time only reduces by 20%. not much.
<fche> looking
<fche> wouldn't bother make the tag an interned string in this case
<fche> that'd generate extra traffic to the boost intern widgetry for what are basically temporary values
<fche> so suggest using only std::string and perhaps char*, not interned-string as the tag parameter type
<agentzh> yeah, i'm thinking about char * as well.
<agentzh> just a bit dangerous.
<fche> std::string it is
<agentzh> okay
<agentzh> trying
<agentzh> fche: tried std::string
<agentzh> p2 time has reduced by 55%. better, still far from my original patch's 93%.
<fche> would be interested in looking deeper into why, but for that some more repro stp scripts would be good
<agentzh> i sampled a new flame graph
<agentzh> now it's mostly hashtable::find() and std::string's constructor and destructor calls.
<agentzh> all derived from tagged_p()
<agentzh> i'll show you my latest patch just in case i could have done better.
<agentzh> here it is: https://pastebin.com/kygW58jN
<agentzh> do i need to try char *?
<fche> would ditch the tagged_p interned_string variant completely
<fche> no need for char*, would ditch that too
orivej has quit [Ping timeout: 268 seconds]
<agentzh> okay
<fche> and then I'd start looking at the number of calls to this thing and why there are so many, whether one can pipe down that traffic
<agentzh> ditched. similar.
<agentzh> 18s
<agentzh> maybe a little bit better.
<agentzh> it was 20s or something.
<agentzh> latest patch: https://pastebin.com/vCbxcwdf
<agentzh> fche: the varuse_collecting_visitor calls are initiated mostly from opt5's statement reducer and opt2's unused func/global analyzer.
<agentzh> see this rendered flame graph's screenshot: https://ibb.co/dPzWsMS
<agentzh> it won't block your firefox ;)
<fche> the patch looks good, imho let's go with that for the moment and then think more on how to reduce the number of those calls
<fche> AW MAN, getting spoonfed gif's now :-)
<agentzh> heh
<fche> it's ok, I can switch browsers, I have like five
<agentzh> lol
<fche> here in canada, it's lawful to have up to four separate web browses attached
<fche> if I install the fifth, I have to pay a special license to fund canadian content web programs
<fche> the sixth requires ministerial permission
<fche> NO ONE HAS SEVEN
<agentzh> btw, my .stp script is 382KB and has 3387 lines (including empty lines and comment lines).
<agentzh> so it's of some size.
<fche> that's ok, police will permit text files up to 400 kb
<fche> so we are good
<agentzh> regarding to the 4 separate web browser requirement in canada, that's very interesting law.
<agentzh> no one has seven...lol
<agentzh> is that real?
<fche> hard to be sure
<agentzh> or just a joke?
<fche> there are rumours of browser criminals
<agentzh> okay...
<fche> who have never been heard since they put a thumb drive
<fche> with windows 3.1 copies of netscape
<agentzh> one interesting feature of my .stp script is that it has 578 @cast() invocations.
<fche> aha
<agentzh> so it might be easy to reproduce just with that many @cast().
<agentzh> and it has ~40 global vars defined.
<agentzh> not all of them are used though.
<agentzh> and it has 79 functions defined by itsef.
<agentzh> and it just uses the standard tapset (mostly).
<agentzh> basic stats for my .stp script :)
<fche> ok. stap --vp 04 should give some meaningful info on how the rewriting / analysis process goes
<fche> 'course then the resulting log file might be CONTRABAND in this nation
<agentzh> oh, gathered so just could not remember. thanks.
<agentzh> i'll try
<fche> unless one reads it backward
<agentzh> fche: i just tried another 154KB .stp script, which was even slower (48s for pass 2).
<agentzh> it uses no @cast().
<agentzh> but 485 user_intXX() and user_uintXX() calls.
<agentzh> the flame graphs are similar, lots of varuse_collecting_visitor::visit_emmbeddedcode() overhead.
<agentzh> both of them have many embeddedcode nodes in the parsetree or AST.
<agentzh> fche: i didn't know you are in canada.
<agentzh> thought you were in the US east coast.
wcohen has quit [Ping timeout: 245 seconds]
<fche> pretty much the same thing
tromey has quit [Quit: ERC (IRC client for Emacs 26.1)]
<agentzh> heh
<agentzh> my patch makes the 2nd script take only 3.3s. so it's 48s vs 3.3s. 93% of the time is gone :)
<agentzh> *my ariginal patch using boolean flags.
<agentzh> *original
<fche> I'd be concerned about some cases not working right, in the form of mismatches the boolean flags and the text content
<fche> would like to reproduce the run against a script running on git-master stap
<fche> before digging much deeper
gila has quit [Quit: My Mac Pro has gone to sleep. ZZZzzz…]
<agentzh> created a PR to summarize everything so far.
<agentzh> also attached an artificial sample script generated by a perl script to demonstrate the problem.
<agentzh> hopefully you now have everything you need :)
<fche> thanks
<fche> will take a peek at it tomorrow
<agentzh> thanks
<agentzh> i've verified the flame graphs of this generated script.
<agentzh> it's indeed the same thing.
<agentzh> and the flame graphs show that this script is much easier to play with than my original script.
<agentzh> much less noises.
<agentzh> flame graph is also now much smaller and won't push your browsers to the limit: http://openresty.org/misc/flamegraph/stap-unordered-map-generated-script-2019-03-19.svg
<agentzh> the inversed one is much larger but still bearable.
<agentzh> posted the graphs to the PR as well.
wcohen has joined #systemtap
przemoc has joined #systemtap
orivej has joined #systemtap
irker481 has quit [Quit: transmission timeout]