fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
khaled has quit [Quit: Konversation terminated!]
_whitelogger has joined #systemtap
ppetraki has quit [Ping timeout: 268 seconds]
Guest65867 has quit [Remote host closed the connection]
Guest65867 has joined #systemtap
Guest65867 has quit [Read error: Connection reset by peer]
Guest65867 has joined #systemtap
Guest65867 has quit [Read error: Connection reset by peer]
Guest65867 has joined #systemtap
KDr2 has joined #systemtap
KDr2 has quit [Quit: Connection closed for inactivity]
_whitelogger has joined #systemtap
sscox has quit [Ping timeout: 245 seconds]
_whitelogger has joined #systemtap
Guest65867 has quit [Ping timeout: 268 seconds]
Guest65867 has joined #systemtap
khaled has joined #systemtap
_whitelogger has joined #systemtap
orivej has joined #systemtap
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 265 seconds]
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
khaled_ has quit [Remote host closed the connection]
khaled has joined #systemtap
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
khaled has quit [Ping timeout: 268 seconds]
khaled has joined #systemtap
sscox has joined #systemtap
<agentzh>
fche: i just tried a simple patch to parallelise the "autoconf" phase for generating stapconf.h. the total Pass-3 time for a simple stap script reduces from 7.7s to 2.1s on my 8c/16t machine when the cache is cold.
<agentzh>
also, tried to make stap-symbol.h a separate CU and knocked out 100ms ~ 300ms from Pass-3 for cases with nontrivial unwind/symbol data as well.
<agentzh>
the latter is not very impressive since stap-symbols.h is already compiling fast anyway.
<fche>
yeah not much in there
<fche>
hm surprised the stapconf autoconf stuff wasn't already parallel
<fche>
interested in your findings!
<agentzh>
also played with gcc's .h.gch thing. got ~200ms knocked off for simple stap scripts from Pass-3.
<agentzh>
fche: it uses serial make commands and >> autoconf.h redirects.
<fche>
ah
<agentzh>
i changed it to separate sub-files and cat everything together.
<fche>
yup, or cat | sort for reproducibility
<agentzh>
and also write the stapexport macros directly to file instead of using many echo xxx >> stapconf.h in makefile.
<agentzh>
i already use $^ make variable to make the order the same as in the buildrun.cxx source file.
<agentzh>
so already reproduciable.
<fche>
nice
<agentzh>
so interested in a patch?
<fche>
definitely
<agentzh>
great. i'll submit it soon.
<agentzh>
to the ml
<agentzh>
i mean
<agentzh>
i'm also thinking about making the map.c, addrs.c separaet CUs.
<agentzh>
as well as splitting up the main xxx_src.c files by the amount of code for bunches of stap global/private functions.
<agentzh>
not easy there.
<fche>
we haven't paid much attention to CU issues, so header file ordering & organizing stuff for the runtime has been unnecessary
<agentzh>
yeah, it takes time to clean those macros up.
<fche>
experimentation ok there, I'm not as optimistic
<agentzh>
some macros do matter for memory layout and branched code.
<agentzh>
i gather there would be gain for large .stp scripts (as with our cases).
<fche>
could be
<agentzh>
for small scripts, it's already fast enough (1 ~ 1.3s on my machine).
<agentzh>
for Pass-3
<fche>
pass-4
<agentzh>
Sorry, Pass-4.
<agentzh>
i referred to Pass-4 everywhere above. sorry.
<fche>
heh ;)
<agentzh>
Pass-3 is for C code generation.
<agentzh>
oh btw, i also implemented a separate --gen-stapconf FILE and a --use-stapconf FILE option for external stapconf file caching and management.
<agentzh>
i found stap's own cache not flexible enough.
<fche>
ok, interested in learning more re. motive etc.
<agentzh>
like encoding the kernel build tree dir name into the cache key and blind cache cleanup without LRU policies, and etc
<agentzh>
furthermore we could pre-compile stapconf.h with the kernel header packages, just like we could pre-compile .ko module files for each kernel package.
<agentzh>
for the latter, we cannot always do that due to userland process changes.
<agentzh>
but stapconf.h only depends on the kernel header and the stap version.
<agentzh>
so the cache hit rate is mugh higher.
<agentzh>
then when building .ko, we do not need to pay for the stapconf generation phase at all.
<agentzh>
it's gone.
<agentzh>
that part is like 600ms for a 8c/16t CPU.
<agentzh>
or 700ms
<agentzh>
even fully parallelized.
<agentzh>
that's a lot.
<fche>
if the caching were to work properly, precompiling would not be necessary
<fche>
just stap -p4 -e 'probe oneshot{}'
<fche>
to 'precompile'
<agentzh>
working with stap's own caching is tricky and also we have a lot of different kernels and a lot of build machines, the first time penalty is still a thing.
<fche>
well, different kernels mean different stapconf* regardless
<agentzh>
fche: yeah, i've already been using that for my own dev box, it works. but for distributed env, no, that's painful...
<agentzh>
yeah, but we can iterate all the kernel packages in a linux distro beforehand.
<agentzh>
and index the resuting stapconf.h in a database.
<agentzh>
so for build boxes generating ko files, they just provide the stapconf.h readily.
<agentzh>
by looking up the db.
<agentzh>
simple and reliable.
<fche>
well, requires someone to manage the database etc.
<fche>
but sure, I wouldn't veto that feature, even if it's only rarely useful
<agentzh>
but i agree it's a special use case and 99.9% of the stap users do not bother.