#systemtap on 2020-10-18 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

00:03 <fche> agentzh, re. that hash table, sure

00:04 <fche> not sure all our uses of the table are rcu-compatible, so the spinlock might be unavoidable, but will consider

00:09 <agentzh> fche: okay, cool, will proceed accordingly and propose patches here soon.

00:09 <fche> great

00:10 <agentzh> and there's another thing, what do you think of adding other kinds of filters to the task finder?

00:10 <fche> such as?

00:10 <agentzh> currently there's only the pid filter and the exe path filter.

00:10 <agentzh> such as the gpid filter and mount file system filters.

00:10 <agentzh> sorry, i mean mount namespace id fiter

00:10 <agentzh> *filter

00:11 <agentzh> there seems to pid namespace filter already?

00:11 <fche> interesting, so processes within a given container?

00:11 <agentzh> *seems to be

00:11 <agentzh> yep

00:11 <agentzh> mount namespace is better than pid for containers.

00:11 <fche> plausible

00:11 <agentzh> especially when the file path is relative to the container.

00:12 <agentzh> gpid is for process group id.

00:12 <agentzh> it's also common for apache and nginx.

00:12 <agentzh> and postgrsql

00:12 <agentzh> many multi-process apps.

00:12 <fche> would you couple this with new variants of stap -x PID ?

00:12 <agentzh> yes, i'm also thinking about this line.

00:13 <agentzh> adding new options to stap and staprun.

00:13 <agentzh> so you're good with that?

00:13 <agentzh> we can propose patches for these too.

00:14 <agentzh> currently stap's vma tracker is tracking too many processes at the same time, which leads to high CPU contention on that vma hash table's spinlocks.

00:14 <agentzh> it's crazy.

00:14 <agentzh> very obvious cpu spikes when running the stap tool.

00:14 <agentzh> the kernel cpu flame graph shows 80% of the CPU is spent there.

00:15 <agentzh> more than 80%

00:15 <fche> that's crazy, is there a lot of fork/mmap activity going on?

00:15 <agentzh> just a lot of live processes and threads.

00:15 <agentzh> thousdans

00:15 <agentzh> in production servers.

00:15 <agentzh> not many forks or mmap activilities though.

00:16 <agentzh> *activities

00:16 <fche> hm then surprised of much ongoing vma tracker activity

00:16 <agentzh> i can show you the flame graph

00:16 <agentzh> a sec

00:16 <fche> (by the way, over here we were talking about possibly exposing the vma tracker's contents to stap tapsets,

00:16 <fche> so we could query the dso's mapped into processes, etc.)

00:16 <agentzh> is it different from @vma and @cast?

00:17 <fche> yeah to get an actual list of shared libraries / base addresses methinks

00:17 <fche> not sure it's needed but may be

00:17 <agentzh> i used to propose the @vma() operator patch to you.

00:17 <agentzh> but it seems i never got the chance to commit it...

00:18 <agentzh> @vma(addr, module)

00:18 <agentzh> it converts relative address in a module to the absolute address.

00:18 <agentzh> when there's no symbols or dwarf are incorrect.

00:18 <fche> aha yeah

00:18 <agentzh> if you're still open to it, we are happy to commit.

00:18 <agentzh> with some docs and tests too.

00:19 <fche> how about posting it, let others take a look

00:19 <agentzh> sure, will post it here

00:19 <agentzh> with a link

00:19 <agentzh> email is more troublesome.

00:19 <fche> or you could push the code to a new branch, and discuss it in email

00:20 <fche> that's how some of us do feature work ehre

00:20 <agentzh> yeah, that's fine too.

00:20 <agentzh> will do.

00:20 <agentzh> we don't really want to maintain these patches in our own branch.

00:20 <agentzh> every time we wan to sync with the upstream repo, it's painful :)

00:20 <agentzh> especially when you like to do big code refactoring from time to time.

00:20 <agentzh> it's nightmare for us ;)

00:21 <fche> understood

00:21 <agentzh> btw, kerneltoast is preparing a Dl stapio hang bugfix patch these days.

00:22 <agentzh> will be ready very soon as well.

00:22 <agentzh> we'll need you review definitely.

00:22 <fche> nice, you guys are doing good helpful work, thank you

00:22 <agentzh> it's tricky.

00:22 <agentzh> of course. and we're hiring more kernel developers to do more work :)

00:23 <agentzh> we'd like to move faster here.

00:25 khaled has quit [Quit: Konversation terminated!]

00:27 <agentzh> fche: the flame graph: https://pasteboard.co/Jw8xb0A.png

00:28 <agentzh> the middle 3 frames are:

00:28 <agentzh> 7a88b0: _raw_spin_lock[0]

00:28 <agentzh> 7277: adjustStartLoc[15]

00:29 <agentzh> because there's a chick & egg issue with -d MODULE for the current ko, we don't have symbols for the current ko.

00:29 <agentzh> so i also wonder if it's possible to inject dwarf-derived data into the ko after the ko is generated...

00:29 <agentzh> right now we have to do symbolization via post-processing and it's not always possible.

00:30 <agentzh> ah, the image paste service is slow in the US.

00:31 <agentzh> hopefully it's not that slow for you.

00:31 <fche> this is for backtracing ?

00:31 <agentzh> yes, sprint_backtrace()

00:31 <agentzh> just that.

00:31 <agentzh> in timer.profile probe

00:31 <agentzh> very simple script.

00:31 <fche> yeah, this too has come up in the past as ideas in the queue

00:32 <fche> so like we can pass STP_RELOCATION messages containing run-time addresses from staprun to the kernel module

00:32 <agentzh> we'll verify the effectiveness of our patches by sampling new flame graphs for comparison.

00:32 <fche> we could in principle send over giant messages containing .eh_frame* / .symtab* extracts of relevant processes

00:32 <agentzh> oh wow

00:32 <fche> (assuming we can identify relevancy, and the traffic rate is reasonable)

00:32 <agentzh> that's...bloody :)

00:33 <fche> well, it's taking the concept of in-situ backtracing etc. to its limits: try to keep the kernel module up with unforeseen userspace

00:33 <agentzh> i'm always worried about sending over large data over the channels.

00:34 <agentzh> yeah, the direction is definitely good.

00:34 <agentzh> now everytime the userland changes, we have to regenerate ko.

00:34 <agentzh> maybe eventually we just make stap's kernel runtime a reusable ko.

00:35 <agentzh> like ebpf.

00:35 <agentzh> and everything is sent over to the ko to run.

00:35 <agentzh> that would be the ultimate extreme :)

00:36 <agentzh> for the meantime, it's definitely useful to have some middle ground.

00:36 <agentzh> like sending over the unwinding data.

00:36 <agentzh> that would already be useful for bootstrapping the ko itself.

00:37 <agentzh> i was thinking about a less dynamic approach like attaching a special elf section to the ko and etc...

00:37 <agentzh> so it's still static, just after the ko is generated.

00:37 <agentzh> maybe this is easlier...

00:37 <agentzh> *easier

00:38 <agentzh> not sure.

02:08 orivej has joined #systemtap

02:28 orivej has quit [Ping timeout: 246 seconds]

02:29 orivej has joined #systemtap

03:12 derek0883 has joined #systemtap

03:57 sscox has quit [Ping timeout: 272 seconds]

05:06 derek0883 has quit [Remote host closed the connection]

05:18 derek0883 has joined #systemtap

05:44 derek0883 has quit [Remote host closed the connection]

05:45 derek0883 has joined #systemtap

06:20 derek0883 has quit [Remote host closed the connection]

10:10 khaled has joined #systemtap

11:33 derek0883 has joined #systemtap

11:37 derek0883 has quit [Remote host closed the connection]

11:50 derek0883 has joined #systemtap

11:50 derek0883 has quit [Remote host closed the connection]

11:51 derek0883 has joined #systemtap

11:55 derek0883 has quit [Ping timeout: 240 seconds]

12:32 derek0883 has joined #systemtap

12:44 orivej has quit [Ping timeout: 240 seconds]

13:07 derek0883 has quit [Remote host closed the connection]

13:07 derek0883 has joined #systemtap

13:08 derek0883 has quit [Remote host closed the connection]

13:09 derek0883 has joined #systemtap

13:14 derek0883 has quit [Ping timeout: 260 seconds]

13:35 orivej has joined #systemtap

14:44 sscox has joined #systemtap

15:03 derek0883 has joined #systemtap

15:04 derek0883 has quit [Remote host closed the connection]

15:44 derek0883 has joined #systemtap

21:28 _whitelogger has joined #systemtap

23:05 derek0883 has quit [Remote host closed the connection]

23:05 derek0883 has joined #systemtap

23:34 derek0883 has quit [Remote host closed the connection]