#systemtap on 2019-08-02 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

00:15 sscox has quit [Ping timeout: 276 seconds]

00:36 sscox has joined #systemtap

00:48 orivej has quit [Ping timeout: 258 seconds]

01:12 sscox has quit [Ping timeout: 276 seconds]

01:23 sscox has joined #systemtap

02:16 hpt has joined #systemtap

02:21 yogananth has joined #systemtap

03:13 _whitelogger has joined #systemtap

03:37 yogananth has quit [Ping timeout: 246 seconds]

03:41 yogananth has joined #systemtap

04:51 yogananth_ has joined #systemtap

04:52 yogananth has quit [Read error: Connection reset by peer]

05:42 yogananth_ has quit [Ping timeout: 258 seconds]

07:26 orivej has joined #systemtap

07:47 orivej has quit [Ping timeout: 245 seconds]

07:52 orivej has joined #systemtap

07:58 _whitelogger has joined #systemtap

08:42 orivej has quit [Ping timeout: 245 seconds]

08:47 gromero has quit [Ping timeout: 276 seconds]

09:31 hpt has quit [Ping timeout: 244 seconds]

12:14 yogananth has joined #systemtap

13:03 tromey has joined #systemtap

13:50 gromero has joined #systemtap

13:54 orivej has joined #systemtap

14:36 amerey has joined #systemtap

14:44 orivej has quit [Ping timeout: 258 seconds]

14:57 orivej has joined #systemtap

15:10 orivej has quit [Ping timeout: 268 seconds]

15:16 orivej has joined #systemtap

15:37 orivej has quit [Ping timeout: 245 seconds]

15:51 orivej has joined #systemtap

15:56 orivej has quit [Read error: Connection reset by peer]

15:58 orivej has joined #systemtap

16:04 orivej has quit [Ping timeout: 245 seconds]

16:26 yakiza has joined #systemtap

16:26 <yakiza> Hello people i am trying to build systemtap from source and i am getting the foloowing error https://defuse.ca/b/9lBICWA6MvpdMp4ttSLXd5

16:30 agentzh has joined #systemtap

16:34 khaled has joined #systemtap

16:35 <fche> yakiza,

16:35 <fche> I don't see an error in there

16:35 <fche> maybe your paste is incomplete?

16:35 gromero has quit [Ping timeout: 250 seconds]

16:37 orivej has joined #systemtap

16:38 yogananth has quit [Remote host closed the connection]

16:44 orivej has quit [Ping timeout: 246 seconds]

16:44 yakiza has quit [Quit: WeeChat 2.4]

17:08 orivej has joined #systemtap

17:19 <agentzh> hi folks, i'm having a strange issue with calling __stp_get_mm_path() via a tapset function in the context of timer.profile: https://pastebin.com/zWBbSysT

17:20 <agentzh> when running the stap script repeatedly, the CPU lockup would happen.

17:20 <agentzh> do i need any kind of locking here? like task_lock()/task_unlock()?

17:22 <agentzh> adding task_lock/task_unlock around that snippet seems to make it deadlock much more quickly.

17:23 <agentzh> any hints on this please? many thanks!

17:29 <fche> from a timer.profile, you won't be able to take locks legally

17:29 <fche> so no wonder __stp_get_mm_path() causes you problems

17:30 <fche> it is generally best to assume it is *unsafe* to call any kernel or stap runtime C function unless you argue/prove it safe first

17:30 <fche> consider gathering that info from other places, like an execve probe or such

17:32 <agentzh> ah, intersting. but it seems it is safe to call execname() in timer.profile? because it is simple enough?

17:33 <agentzh> thanks for the info!

17:33 <agentzh> i've been fighting against this for hours. alas.

17:36 <agentzh> fche: by execve probes, you mean something like probe kprocess.exec?

17:37 <agentzh> maybe one way of doing this is to record the pids for the task mm paths, and then simply checking pids in timer.profile?

17:38 <agentzh> *record the pids in probe kprocess.exec

17:41 <fche> yes, something like that

17:42 <fche> tapset functions are designed to be safe from all contexts; those that we know aren't are marked with /* guru */ or something like that

17:42 <fche> so yes absolutely, go use execname() anywhere you like

17:42 <fche> (we rely on kernel guarantees to make this safe)

17:42 <fche> but: don't improvise with custom kernel or stap runtime calls!

17:42 <fche> or else you'll be fighting against this for hours, alas. :-) :-)

17:44 <agentzh> heh, indeed.

17:44 <agentzh> i wonder if you guys have wonderful ways to debug such deadlocks or cpu lockups. they are scary :)

17:45 <agentzh> or tools?

17:45 <fche> closest thing is to run with a lockdep-enabled kernel like fedora rawhide

17:46 <fche> but ideally avoid all this by not using custom embedded-C and/or not calling unknown-safety kernel or runtime code from there

17:47 <agentzh> i see. thanks!

17:58 <agentzh> fche: oh btw, i've just noted that @vma() always returns 0 on kernel 5.1 or 5.0. haven't investigated myself yet. is that a known problem?

18:01 <agentzh> sorry, not always returning zero, just seems like returning a wrong address on PIE.

18:01 <agentzh> i'll try create a proper PR.

18:01 <agentzh> *creating

18:04 wcohen has joined #systemtap

19:20 lindi- has quit [Ping timeout: 250 seconds]

19:22 tromey has quit [Quit: ERC (IRC client for Emacs 26.1)]

20:24 lindi- has joined #systemtap

20:42 sscox has quit [Ping timeout: 276 seconds]

20:49 wakatana has joined #systemtap

21:07 wakatana has quit [Quit: leaving]

21:10 sscox has joined #systemtap

21:19 dmalcolm has quit [Quit: Leaving]

21:39 khaled_ has joined #systemtap

21:40 wakatana has joined #systemtap

21:41 khaled has quit [Ping timeout: 245 seconds]

21:48 wakatana has quit [Quit: leaving]

21:51 amerey has quit [Quit: Leaving]

22:25 pfallenop has quit [Ping timeout: 245 seconds]

22:32 pfallenop has joined #systemtap

23:35 <agentzh> fche: OK, the vma tracker is indeed broken on fedora 29: https://sourceware.org/bugzilla/show_bug.cgi?id=24875

23:36 <agentzh> always getting address 0x0.

23:36 <agentzh> not sure what the real cause is. help needed :)

23:37 <fche> will need to wait till next week, I'm afraid

23:37 <fche> I'd compare a successful run's trace on f28 to this one on f29

23:37 <fche> (possibly kernel version dependent, so f29 older kernel may work fine too

23:37 <fche> so yeah, diff the traces is where I'd start

23:38 <fche> sounds a little bit familiar from a vaguely similar problem we had a few years (?) ago, whereby changes in ld.so loading policy or precise executable segment flags & layout caused a change in the way the kernel loaded the parts of the program

23:39 <fche> and the stap runtime wouldn't recognize the later-than-usual loaded areas as part

23:39 <agentzh> i tried earlier kernel on fedora 29, same thing. so it might not be a kernel issue. more likely be a toolchain issue like ld.so.

23:39 <agentzh> *older kernels

23:40 <agentzh> no worries. i can surely wait :)

23:40 <agentzh> also tried the latest master of elfutils with stap, still the same thing.

23:41 <agentzh> i compared the traces betwen fed29 and centos7. task->user makes the map lookup fail.

23:42 <agentzh> pid matches though.

23:42 <agentzh> more details are in that PR already :)

23:46 <fche> yeah I'd focus on the order & way in which the a.out binary is mapped piecemeal into the address space

23:51 <agentzh> interesting, i'll try playing with it.

23:51 <agentzh> thanks for the hint.