fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<agentzh>
fche: okay, thanks.
<agentzh>
but loading ko module is way *before* the runtime kicks in?
<agentzh>
or are you suggesting using the runtime allocator to replace the statically allocated map and stas in the ko image?
<fche>
there is not much memory allocation at module load time
<fche>
most of it is at staprun-starting-to-talk-with-it, later on
<fche>
if even the module load phase is rejected, then I dunno if there's anything stap could do about that
<agentzh>
interesting.
<agentzh>
then there might be something deeper here.
<fche>
hm this reminds me .... methinks we've talked/thought about this a long time ago in passing
<agentzh>
yeah, maybe.
<fche>
staprun could automagically do a drop-caches kernel memory freeup effort if the module-load fails, then retry
<agentzh>
it's been biting us.
<agentzh>
but that can be very expensive.
<agentzh>
especially for production boxes.
<agentzh>
big impact on the production apps.
<fche>
yeah, it could be more graduated, maybe drop -some- sorts of kernel memory
<fche>
but then what alternative is there -- if the kernel itself is refusing to load us, ... what?
<agentzh>
i was wondering if the .ko image is just too big too load.
<agentzh>
maybe the .ko itself is small, but the data section can inflate to a big size.
<fche>
or the kernel must have a particular type of unfragmented memory into which to load
<agentzh>
maybe.
<fche>
(not .data -- that's included -- maybe .bss would inflate, but I don't think we use much of that)
<agentzh>
yeah, .bss.
<fche>
yeah just took a readelf -S look at a random big stap .ko
<fche>
000000000166a7f8 0000000000000000 WA 0 0 32
<fche>
[37] .bss NOBITS 0000000000000000 01970040
<fche>
0000000000015c54 0000000000000000 WA 0 0 32
<fche>
(this big .data is for a fileline-profile.c so a lot of debugging data)
<agentzh>
aye, i just looked at it too with size -A.
<agentzh>
.bss is not very big.
<agentzh>
20K in one typical .stp script of ours.
<agentzh>
it seems like kernel needs contiguous physical address space for the kernel module.
<fche>
yeah sounds familiar
<agentzh>
so maybe 100KBish may be hard to find for a long running box.
<agentzh>
due to fragmentation.
fche has quit [Read error: Connection reset by peer]
fche has joined #systemtap
<fche>
so staprun could grow code that responds to module-load failures with some sequence of
<fche>
sync(2)
<fche>
echo 2 > /proc/sys/vm/drop_caches
<fche>
echo 3 > /proc/sys/vm/drop_caches
<fche>
or other such emergency hacks
<fche>
followed by a retry
<agentzh>
well, we are already doing this from outside.
<agentzh>
maybe staprun should not be too clever or too aggressive?
<fche>
could be a flag
sscox has joined #systemtap
irker148 has quit [Quit: transmission timeout]
_whitelogger has joined #systemtap
slowfranklin has joined #systemtap
_whitelogger has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
sscox has quit [Ping timeout: 255 seconds]
orivej has joined #systemtap
mjw has joined #systemtap
orivej has quit [Quit: No Ping reply in 180 seconds.]
orivej has joined #systemtap
wcohen has quit [Ping timeout: 246 seconds]
sscox has joined #systemtap
gromero has quit [Ping timeout: 258 seconds]
tromey has joined #systemtap
gromero has joined #systemtap
wcohen has joined #systemtap
sscox has quit [Quit: sscox]
sscox has joined #systemtap
sscox has quit [Ping timeout: 250 seconds]
sscox has joined #systemtap
<ema>
I'm getting "struct blah is being accessed instead of a member" when trying to define a variable x to later actually access the members of struct blah
<ema>
is there a way around it? :)
<ema>
basically I want to do something like: `x = foo->bar->blah ; println(x->field1) ; println(x->field2)`
<ema>
those are just the script that became advanced enough that I didn't want to lose them, basically, but there are many more (ugly ones) I'm writing ad-hoc to see what's going on
<fche>
neat
<fche>
do you have any particular pain points or pains in the butt using the widget, something we should help with / work on ?
<fche>
have you by any chance ever written up those efforts for public consumption on a blog or such?
<ema>
no actual pain point I think, but I'll let you know if anything comes to mind!
<ema>
and unfortunately no blog post either, our bug tracking system is public though so searching for 'systemtap' might be interesting for you folks?
<fche>
please feel free to reach out any time, Emanuele
fche has quit [Remote host closed the connection]
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
orivej has quit [Ping timeout: 246 seconds]
gila has joined #systemtap
eck has quit [Ping timeout: 246 seconds]
orivej has joined #systemtap
khaled has quit [Ping timeout: 245 seconds]
khaled has joined #systemtap
khaled has quit [Client Quit]
khaled has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
eck has joined #systemtap
slowfranklin has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
slowfranklin has quit [Quit: slowfranklin]
LeoBras has joined #systemtap
<LeoBras>
Hello!
<LeoBras>
semantic error: while resolving probe point: identifier 'kernel' at test.stp:11:7, source: probe kernel.function("faultin_page").call { count++; } , semantic error: no match (similar functions: faultin_page, lock_page, put_page, sg_page, split_page)
<LeoBras>
what am I doing wrong?
<LeoBras>
I am starting at it, but the first 'similar function' is not the same function I am trying to use?
khaled has quit [Ping timeout: 245 seconds]
<LeoBras>
using probe kernel.function("sys_mkdir").call works fine , and sudo stap -l 'kernel.function( "faultin_page" )' returns : kernel.function("faultin_page@mm/gup.c:480")
slowfranklin has joined #systemtap
<wcohen>
LeoBras, that particular function might be inlined. you can see what variants there are with: stap -L 'kernel.function("faultin_page").*'
<wcohen>
I suspect that the function may be inlined.
<wcohen>
to instrument the possible inlined versions you might try: probe kernel.runction("faultin_page") {count++}