#systemtap on 2020-11-13 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

01:03 lijunlong has quit [Read error: Connection reset by peer]

01:06 lijunlong has joined #systemtap

01:20 derek0883 has joined #systemtap

01:33 derek0883 has quit [Remote host closed the connection]

01:39 derek0883 has joined #systemtap

02:00 derek0883 has quit [Remote host closed the connection]

02:06 derek0883 has joined #systemtap

02:11 khaled__ has quit [Quit: Konversation terminated!]

02:50 derek0883 has quit [Remote host closed the connection]

02:54 derek0883 has joined #systemtap

03:30 <kerneltoast> fche, you have the misfortune of reviewing this: https://gist.github.com/kerneltoast/947715b30d2d184f288f9d7d06873089

03:31 <kerneltoast> i'm not confident i formatted the translate.cxx changes correctly

03:31 <kerneltoast> i'll run the big ol testsuite on this tomorrow

03:31 <kerneltoast> or maybe overnight i guess

03:47 derek088_ has joined #systemtap

03:47 derek088_ has quit [Remote host closed the connection]

03:48 derek088_ has joined #systemtap

03:49 derek0883 has quit [Ping timeout: 260 seconds]

03:50 <agentzh> kerneltoast: it fixes the remaining panics/deadlocks?

03:51 <agentzh> maybe not the probe lock one?

03:51 <kerneltoast> it adds some new ones too, as i've just found out

03:51 <kerneltoast> can't confirm it fixes anything until i fix the patch itself...

03:52 <agentzh> okay

03:53 <kerneltoast> let's see if i can fix it in the next 5 minutes...

03:53 <agentzh> heh

03:58 * kerneltoast crosses his fingers that it's fixed

04:05 <kerneltoast> i fixed one problem with it and another came up

04:05 <kerneltoast> :)

04:22 derek0883 has joined #systemtap

04:22 derek088_ has quit [Ping timeout: 260 seconds]

04:24 irker921 has quit [Quit: transmission timeout]

04:26 sscox has quit [Ping timeout: 264 seconds]

04:47 derek0883 has quit [Remote host closed the connection]

04:47 orivej has quit [Ping timeout: 260 seconds]

04:52 derek0883 has joined #systemtap

05:03 derek0883 has quit [Remote host closed the connection]

05:05 derek0883 has joined #systemtap

05:23 derek0883 has quit [Remote host closed the connection]

05:57 derek0883 has joined #systemtap

06:09 derek0883 has quit [Remote host closed the connection]

06:34 sscox has joined #systemtap

07:01 _whitelogger has joined #systemtap

07:45 <kerneltoast> fche, agentzh, okay my brain is melting after trying to fix this for 4 hours

07:46 <kerneltoast> i'm hitting some tracing quirk that's causing my patch to deadlock

07:46 <kerneltoast> i'm gonna leave all the info here and pray one of you knows what's wrong with it by the time i wake up

07:48 <kerneltoast> https://gist.github.com/kerneltoast/29d042ff18e155e536e69a471af808ae

07:49 <kerneltoast> that gist has 3 files, in order: one showing the deadlock detected by lockdep, one with the C file for the stap script which caused the bug, and one for my broken print patch

07:50 <kerneltoast> the problem has something to do with my use of schedule_work_on() in a tracepoint

07:50 <kerneltoast> where i suspect the tracepoint is called while holding a lock that schedule_work_on() needs

07:50 <kerneltoast> ok i sleep

08:01 khaled__ has joined #systemtap

08:43 orivej has joined #systemtap

09:09 orivej has quit [Ping timeout: 272 seconds]

09:22 orivej has joined #systemtap

12:08 orivej has quit [Ping timeout: 260 seconds]

16:17 derek0883 has joined #systemtap

16:18 derek0883 has quit [Remote host closed the connection]

16:49 sscox has quit [Ping timeout: 272 seconds]

16:50 sscox has joined #systemtap

17:04 tromey has joined #systemtap

17:08 derek0883 has joined #systemtap

17:37 irker207 has joined #systemtap

17:37 <irker207> systemtap: fche systemtap.git:master * release-4.4-6-g83cb271b3 / runtime/stp_utrace.c: RHBZ1892179: double default UTRACE_TASK_WORKPOOL

18:05 <kerneltoast> fche, yo

18:05 <fche> hi

18:06 <fche> we may have to blacklist the tracepoint in question; I think serhei did something similar recently

18:07 <kerneltoast> it was sched_switch

18:07 <fche> grr, that's an important one :(*

18:08 <fche> could write a small kernel module that calls task_work* from a tracepoint callback for just that, to reproduce the problem

18:08 <fche> the kernel folks may be amenable to fixing it on their side

18:08 <kerneltoast> there are still all the old kernels

18:08 <fche> yes there are

18:08 <kerneltoast> we can't even use wake_up from that tracepoint

18:08 <kerneltoast> it sucks

18:09 <fche> wonder if the lock in this case is a lockdep-only timestamp-collector thing

18:09 <fche> so maybe it's lockdep + sched tracepoint + task_work_* all together

18:09 <kerneltoast> hmm

18:09 <fche> seeing the sched_clock() in the call stack

18:12 <kerneltoast> hm no

18:13 <kerneltoast> on that centos7 kernel, the sched_switch tracepoint is called inside prepare_task_switch()

18:13 <kerneltoast> which is called from context_switch()

18:13 <kerneltoast> and context_switch() is called with the runqueue locked

18:14 <kerneltoast> what a nightmare

18:16 <kerneltoast> one way to get around this would involve polling

18:17 <kerneltoast> but nothx

18:25 orivej has joined #systemtap

19:41 <kerneltoast> fche, wanna fix this? https://gist.github.com/kerneltoast/c96d9ff25d6f7506e9b7bc4353601c82

20:09 <kerneltoast> I'm gonna try abusing hrtimers to make this work

20:09 <kerneltoast> I couldn't use an IPI

20:10 <kerneltoast> Because the kernel's IPI queuing mechanism likes to be smart and checks if you're trying to send a function call IPI to the CPU you're running on

20:11 <kerneltoast> In which case it decides to execute the callback function directly

20:11 <kerneltoast> Instead of delegating it to an IPI

20:11 <fche> eyah

20:12 <kerneltoast> i am in physical pain

20:13 <kerneltoast> gosh darn log messages

20:15 <agentzh> oh this is brutal and bloody.

20:16 <agentzh> hopefully we can find a way out.

20:17 <agentzh> or at least we can prevent the user from shooting himself in the foot.

20:18 <agentzh> kerneltoast: take care.

20:18 <kerneltoast> yeah trying to make this work in any context is quite difficult

20:19 <kerneltoast> we can't call schedule_work_on from anywhere

20:19 <kerneltoast> so we need to have something else to do that for us

20:19 <agentzh> fche: never-giveup-man has any advice? ;)

20:20 <agentzh> yeah, that looks like a deadend.

20:20 <kerneltoast> we can't delegate it to an IPI because the IPI queue mechanism shoots us in the foot

20:20 <agentzh> *nod*

20:21 <kerneltoast> we can't just check if irqs_disabled() is true inside the print flush because although that will cover the case where there's a print from inside an irq, it will also kill the sched_switch tracepoint

20:21 <fche> is that the only tracepoint with this problem? does the problem exist on more modern kernels? we can just bite the bullet and block that tp on that generation kernel only

20:22 <kerneltoast> because the sched_switch tracepoint is called with irqs disabled (they are disabled when the runqueue lock is acquired)

20:23 <kerneltoast> fche, it is still a problem on new kernels

20:23 <kerneltoast> the sched_switch tracepoint is always called with the runqueue lock held and irqs disabled

20:27 <kerneltoast> agentzh's original suggestion of passing down context info from stap would be the cleanest solution

20:28 <fche> how would this help?

20:29 <kerneltoast> if (stap_in_irq()) schedule_work_on() else print_flush_directly()

20:30 <kerneltoast> we can't print flush inside an irq, but we can't schedule_work_on in every context

20:30 <fche> what happens if we disable print_flush

20:30 <kerneltoast> messages get dropped

20:30 <kerneltoast> and or truncated

20:31 <fche> ok but only if userspace doesn't get them out fast enough, right?

20:31 <fche> so something fixable with a larger bufsize

20:31 <kerneltoast> we still need to write them to the inode

20:31 <kerneltoast> and that's crux of the problem

20:31 <kerneltoast> the inode is locked with a mutex

20:31 <kerneltoast> we can't lock a mutex inside an irq

20:32 <kerneltoast> we can't even mutex_trylock it

20:32 <fche> would be surprised if reserve/committing a block of data to the percpu relay buffer should really require inode level ops, hmmm

20:33 <kerneltoast> take a look at print_flush.c to see if that can be torched

20:37 <agentzh> fche: the pointers may get messed up for multiple writers.

20:37 <agentzh> leading to garbled data.

20:37 <agentzh> that was the issue my original patch tried to fix.

20:37 <agentzh> or even just a writer and a reader.

20:37 <agentzh> unless we can make it lockfree...

20:38 <agentzh> but it would require a custom version of overlayfs?

20:39 <fche> eww

20:44 irker207 has quit [Quit: transmission timeout]

21:18 derek0883 has quit [Remote host closed the connection]

21:18 khaled__ has quit [Remote host closed the connection]

21:20 khaled has joined #systemtap

21:20 derek0883 has joined #systemtap

21:32 tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]

21:55 sscox has quit [Ping timeout: 265 seconds]

21:55 sscox has joined #systemtap