fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
khaled has quit [Quit: Konversation terminated!]
_whitelogger has joined #systemtap
ppetraki has quit [Ping timeout: 268 seconds]
Guest65867 has quit [Remote host closed the connection]
Guest65867 has joined #systemtap
Guest65867 has quit [Read error: Connection reset by peer]
Guest65867 has joined #systemtap
Guest65867 has quit [Read error: Connection reset by peer]
Guest65867 has joined #systemtap
KDr2 has joined #systemtap
KDr2 has quit [Quit: Connection closed for inactivity]
_whitelogger has joined #systemtap
sscox has quit [Ping timeout: 245 seconds]
_whitelogger has joined #systemtap
Guest65867 has quit [Ping timeout: 268 seconds]
Guest65867 has joined #systemtap
khaled has joined #systemtap
_whitelogger has joined #systemtap
orivej has joined #systemtap
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 265 seconds]
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
khaled_ has quit [Remote host closed the connection]
khaled has joined #systemtap
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
khaled has quit [Ping timeout: 268 seconds]
khaled has joined #systemtap
sscox has joined #systemtap
<agentzh> fche: i just tried a simple patch to parallelise the "autoconf" phase for generating stapconf.h. the total Pass-3 time for a simple stap script reduces from 7.7s to 2.1s on my 8c/16t machine when the cache is cold.
<agentzh> also, tried to make stap-symbol.h a separate CU and knocked out 100ms ~ 300ms from Pass-3 for cases with nontrivial unwind/symbol data as well.
<agentzh> the latter is not very impressive since stap-symbols.h is already compiling fast anyway.
<fche> yeah not much in there
<fche> hm surprised the stapconf autoconf stuff wasn't already parallel
<fche> interested in your findings!
<agentzh> also played with gcc's .h.gch thing. got ~200ms knocked off for simple stap scripts from Pass-3.
<agentzh> fche: it uses serial make commands and >> autoconf.h redirects.
<fche> ah
<agentzh> i changed it to separate sub-files and cat everything together.
<fche> yup, or cat | sort for reproducibility
<agentzh> and also write the stapexport macros directly to file instead of using many echo xxx >> stapconf.h in makefile.
<agentzh> i already use $^ make variable to make the order the same as in the buildrun.cxx source file.
<agentzh> so already reproduciable.
<fche> nice
<agentzh> so interested in a patch?
<fche> definitely
<agentzh> great. i'll submit it soon.
<agentzh> to the ml
<agentzh> i mean
<agentzh> i'm also thinking about making the map.c, addrs.c separaet CUs.
<agentzh> as well as splitting up the main xxx_src.c files by the amount of code for bunches of stap global/private functions.
<agentzh> not easy there.
<fche> we haven't paid much attention to CU issues, so header file ordering & organizing stuff for the runtime has been unnecessary
<agentzh> yeah, it takes time to clean those macros up.
<fche> experimentation ok there, I'm not as optimistic
<agentzh> some macros do matter for memory layout and branched code.
<agentzh> i gather there would be gain for large .stp scripts (as with our cases).
<fche> could be
<agentzh> for small scripts, it's already fast enough (1 ~ 1.3s on my machine).
<agentzh> for Pass-3
<fche> pass-4
<agentzh> Sorry, Pass-4.
<agentzh> i referred to Pass-4 everywhere above. sorry.
<fche> heh ;)
<agentzh> Pass-3 is for C code generation.
<agentzh> oh btw, i also implemented a separate --gen-stapconf FILE and a --use-stapconf FILE option for external stapconf file caching and management.
<agentzh> i found stap's own cache not flexible enough.
<fche> ok, interested in learning more re. motive etc.
<agentzh> like encoding the kernel build tree dir name into the cache key and blind cache cleanup without LRU policies, and etc
<agentzh> furthermore we could pre-compile stapconf.h with the kernel header packages, just like we could pre-compile .ko module files for each kernel package.
<agentzh> for the latter, we cannot always do that due to userland process changes.
<agentzh> but stapconf.h only depends on the kernel header and the stap version.
<agentzh> so the cache hit rate is mugh higher.
<agentzh> then when building .ko, we do not need to pay for the stapconf generation phase at all.
<agentzh> it's gone.
<agentzh> that part is like 600ms for a 8c/16t CPU.
<agentzh> or 700ms
<agentzh> even fully parallelized.
<agentzh> that's a lot.
<fche> if the caching were to work properly, precompiling would not be necessary
<fche> just stap -p4 -e 'probe oneshot{}'
<fche> to 'precompile'
<agentzh> working with stap's own caching is tricky and also we have a lot of different kernels and a lot of build machines, the first time penalty is still a thing.
<fche> well, different kernels mean different stapconf* regardless
<agentzh> fche: yeah, i've already been using that for my own dev box, it works. but for distributed env, no, that's painful...
<agentzh> yeah, but we can iterate all the kernel packages in a linux distro beforehand.
<agentzh> and index the resuting stapconf.h in a database.
<agentzh> so for build boxes generating ko files, they just provide the stapconf.h readily.
<agentzh> by looking up the db.
<agentzh> simple and reliable.
<fche> well, requires someone to manage the database etc.
<fche> but sure, I wouldn't veto that feature, even if it's only rarely useful
<agentzh> but i agree it's a special use case and 99.9% of the stap users do not bother.
<agentzh> we can keep this thing to ourselves.
<agentzh> just sharing.
<fche> re. stapconf parallel generation, definite interest & usefulness there.
<agentzh> sure, will do.
<agentzh> they are separate patches.
* fche must divert to dinner-making, have a good $time_of_day !
<agentzh> thanks! and have a good day!
khaled has quit [Quit: Konversation terminated!]
orivej has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]