fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<agentzh> fche: okay, thanks.
<agentzh> but loading ko module is way *before* the runtime kicks in?
<agentzh> or are you suggesting using the runtime allocator to replace the statically allocated map and stas in the ko image?
<fche> there is not much memory allocation at module load time
<fche> most of it is at staprun-starting-to-talk-with-it, later on
<fche> if even the module load phase is rejected, then I dunno if there's anything stap could do about that
<agentzh> interesting.
<agentzh> then there might be something deeper here.
<fche> hm this reminds me .... methinks we've talked/thought about this a long time ago in passing
<agentzh> yeah, maybe.
<fche> staprun could automagically do a drop-caches kernel memory freeup effort if the module-load fails, then retry
<agentzh> it's been biting us.
<agentzh> but that can be very expensive.
<agentzh> especially for production boxes.
<agentzh> big impact on the production apps.
<fche> yeah, it could be more graduated, maybe drop -some- sorts of kernel memory
<fche> but then what alternative is there -- if the kernel itself is refusing to load us, ... what?
<agentzh> i was wondering if the .ko image is just too big too load.
<agentzh> maybe the .ko itself is small, but the data section can inflate to a big size.
<fche> or the kernel must have a particular type of unfragmented memory into which to load
<agentzh> maybe.
<fche> (not .data -- that's included -- maybe .bss would inflate, but I don't think we use much of that)
<agentzh> yeah, .bss.
<fche> yeah just took a readelf -S look at a random big stap .ko
<fche> [ 3] .text PROGBITS 0000000000000000 00000090
<fche> 000000000000ebdb 0000000000000000 AX 0 0 16
<fche> [30] .data PROGBITS 0000000000000000 00305480
<fche> 000000000166a7f8 0000000000000000 WA 0 0 32
<fche> [37] .bss NOBITS 0000000000000000 01970040
<fche> 0000000000015c54 0000000000000000 WA 0 0 32
<fche> (this big .data is for a fileline-profile.c so a lot of debugging data)
<agentzh> aye, i just looked at it too with size -A.
<agentzh> .bss is not very big.
<agentzh> 20K in one typical .stp script of ours.
<agentzh> it seems like kernel needs contiguous physical address space for the kernel module.
<fche> yeah sounds familiar
<agentzh> so maybe 100KBish may be hard to find for a long running box.
<agentzh> due to fragmentation.
fche has quit [Read error: Connection reset by peer]
fche has joined #systemtap
<fche> so staprun could grow code that responds to module-load failures with some sequence of
<fche> sync(2)
<fche> echo 2 > /proc/sys/vm/drop_caches
<fche> echo 3 > /proc/sys/vm/drop_caches
<fche> or other such emergency hacks
<fche> followed by a retry
<agentzh> well, we are already doing this from outside.
<agentzh> maybe staprun should not be too clever or too aggressive?
<fche> could be a flag
sscox has joined #systemtap
irker148 has quit [Quit: transmission timeout]
_whitelogger has joined #systemtap
slowfranklin has joined #systemtap
_whitelogger has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 250 seconds]
sscox has quit [Ping timeout: 255 seconds]
orivej has joined #systemtap
mjw has joined #systemtap
orivej has quit [Quit: No Ping reply in 180 seconds.]
orivej has joined #systemtap
wcohen has quit [Ping timeout: 246 seconds]
sscox has joined #systemtap
gromero has quit [Ping timeout: 258 seconds]
tromey has joined #systemtap
gromero has joined #systemtap
wcohen has joined #systemtap
sscox has quit [Quit: sscox]
sscox has joined #systemtap
sscox has quit [Ping timeout: 250 seconds]
sscox has joined #systemtap
<ema> I'm getting "struct blah is being accessed instead of a member" when trying to define a variable x to later actually access the members of struct blah
<ema> is there a way around it? :)
<ema> basically I want to do something like: `x = foo->bar->blah ; println(x->field1) ; println(x->field2)`
<fche> hm that should work
<fche> try x = & ...blah
<fche> 'cause it's a pointer right?
khaled has joined #systemtap
<ema> fche: ah yes, that worked! Thanks :)
<ema> the & did the trick ^
<fche> very good
<fche> would love to hear more about your usage scenario & any results (good or bad)
<ema> so many things! Back when we had to choose to drop SPDY and move to http2 we used this to get an idea of how many of our clients would have been impacted: https://github.com/wikimedia/puppet/blob/production/modules/tlsproxy/files/utils/h2_spdy_stats.stp
<ema> now we're moving from varnish to trafficserver for part of our CDN and I wrote these to debug things https://github.com/wikimedia/puppet/tree/production/modules/trafficserver/files
<ema> those are just the script that became advanced enough that I didn't want to lose them, basically, but there are many more (ugly ones) I'm writing ad-hoc to see what's going on
<fche> neat
<fche> do you have any particular pain points or pains in the butt using the widget, something we should help with / work on ?
<fche> have you by any chance ever written up those efforts for public consumption on a blog or such?
<ema> no actual pain point I think, but I'll let you know if anything comes to mind!
<ema> and unfortunately no blog post either, our bug tracking system is public though so searching for 'systemtap' might be interesting for you folks?
<fche> will do, thanks
<ema> thank you!
<fche> odd question but do you happen to know if there is a persistent url we could post to our wiki that has that phabricator search?
<fche> am worried that the 7P2fu* etc. synthetic strings would time out if reused later
<ema> ha, good question
<ema> nice one :)
<fche> please feel free to reach out any time, Emanuele
fche has quit [Remote host closed the connection]
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
orivej has quit [Ping timeout: 246 seconds]
gila has joined #systemtap
eck has quit [Ping timeout: 246 seconds]
orivej has joined #systemtap
khaled has quit [Ping timeout: 245 seconds]
khaled has joined #systemtap
khaled has quit [Client Quit]
khaled has joined #systemtap
slowfranklin has quit [Quit: slowfranklin]
eck has joined #systemtap
slowfranklin has joined #systemtap
orivej has quit [Ping timeout: 245 seconds]
slowfranklin has quit [Quit: slowfranklin]
LeoBras has joined #systemtap
<LeoBras> Hello!
<LeoBras> semantic error: while resolving probe point: identifier 'kernel' at test.stp:11:7, source: probe kernel.function("faultin_page").call { count++; } , semantic error: no match (similar functions: faultin_page, lock_page, put_page, sg_page, split_page)
<LeoBras> what am I doing wrong?
<LeoBras> I am starting at it, but the first 'similar function' is not the same function I am trying to use?
khaled has quit [Ping timeout: 245 seconds]
<LeoBras> using probe kernel.function("sys_mkdir").call works fine , and sudo stap -l 'kernel.function( "faultin_page" )' returns : kernel.function("faultin_page@mm/gup.c:480")
slowfranklin has joined #systemtap
<wcohen> LeoBras, that particular function might be inlined. you can see what variants there are with: stap -L 'kernel.function("faultin_page").*'
<wcohen> I suspect that the function may be inlined.
<wcohen> to instrument the possible inlined versions you might try: probe kernel.runction("faultin_page") {count++}
<wcohen> looking at the kernel code it is likely getting inlined as it is a static function only used in one place in the kernel: https://elixir.bootlin.com/linux/latest/source/mm/gup.c#L738
slowfranklin has quit [Quit: slowfranklin]
<wcohen> the sys_mkdir function is an entry for a system call, so it isn't going to be inline. Thus, the kernel.function("sys_mkdir").call is fine.
<wcohen> there has to be an function that the system call mechanism can call like: https://elixir.bootlin.com/linux/v5.0.5/source/fs/namei.c#L3853
sscox has quit [Ping timeout: 250 seconds]
slowfranklin has joined #systemtap
tromey has quit [Ping timeout: 250 seconds]
wcohen has quit [Ping timeout: 245 seconds]
slowfranklin has quit [Quit: slowfranklin]
mjw has quit [Quit: Leaving]
gila has quit [Quit: My Mac Pro has gone to sleep. ZZZzzz…]
LeoBras has quit [Ping timeout: 250 seconds]
wcohen has joined #systemtap