fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
lijunlong has quit [Read error: Connection reset by peer]
lijunlong has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled__ has quit [Quit: Konversation terminated!]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> fche, you have the misfortune of reviewing this: https://gist.github.com/kerneltoast/947715b30d2d184f288f9d7d06873089
<kerneltoast> i'm not confident i formatted the translate.cxx changes correctly
<kerneltoast> i'll run the big ol testsuite on this tomorrow
<kerneltoast> or maybe overnight i guess
derek088_ has joined #systemtap
derek088_ has quit [Remote host closed the connection]
derek088_ has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
<agentzh> kerneltoast: it fixes the remaining panics/deadlocks?
<agentzh> maybe not the probe lock one?
<kerneltoast> it adds some new ones too, as i've just found out
<kerneltoast> can't confirm it fixes anything until i fix the patch itself...
<agentzh> okay
<kerneltoast> let's see if i can fix it in the next 5 minutes...
<agentzh> heh
* kerneltoast crosses his fingers that it's fixed
<kerneltoast> i fixed one problem with it and another came up
<kerneltoast> :)
derek0883 has joined #systemtap
derek088_ has quit [Ping timeout: 260 seconds]
irker921 has quit [Quit: transmission timeout]
sscox has quit [Ping timeout: 264 seconds]
derek0883 has quit [Remote host closed the connection]
orivej has quit [Ping timeout: 260 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
sscox has joined #systemtap
_whitelogger has joined #systemtap
<kerneltoast> fche, agentzh, okay my brain is melting after trying to fix this for 4 hours
<kerneltoast> i'm hitting some tracing quirk that's causing my patch to deadlock
<kerneltoast> i'm gonna leave all the info here and pray one of you knows what's wrong with it by the time i wake up
<kerneltoast> that gist has 3 files, in order: one showing the deadlock detected by lockdep, one with the C file for the stap script which caused the bug, and one for my broken print patch
<kerneltoast> the problem has something to do with my use of schedule_work_on() in a tracepoint
<kerneltoast> where i suspect the tracepoint is called while holding a lock that schedule_work_on() needs
<kerneltoast> ok i sleep
khaled__ has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 272 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
sscox has quit [Ping timeout: 272 seconds]
sscox has joined #systemtap
tromey has joined #systemtap
derek0883 has joined #systemtap
irker207 has joined #systemtap
<irker207> systemtap: fche systemtap.git:master * release-4.4-6-g83cb271b3 / runtime/stp_utrace.c: RHBZ1892179: double default UTRACE_TASK_WORKPOOL
<kerneltoast> fche, yo
<fche> hi
<fche> we may have to blacklist the tracepoint in question; I think serhei did something similar recently
<kerneltoast> it was sched_switch
<fche> grr, that's an important one :(*
<fche> could write a small kernel module that calls task_work* from a tracepoint callback for just that, to reproduce the problem
<fche> the kernel folks may be amenable to fixing it on their side
<kerneltoast> there are still all the old kernels
<fche> yes there are
<kerneltoast> we can't even use wake_up from that tracepoint
<kerneltoast> it sucks
<fche> wonder if the lock in this case is a lockdep-only timestamp-collector thing
<fche> so maybe it's lockdep + sched tracepoint + task_work_* all together
<kerneltoast> hmm
<fche> seeing the sched_clock() in the call stack
<kerneltoast> hm no
<kerneltoast> on that centos7 kernel, the sched_switch tracepoint is called inside prepare_task_switch()
<kerneltoast> which is called from context_switch()
<kerneltoast> and context_switch() is called with the runqueue locked
<kerneltoast> what a nightmare
<kerneltoast> one way to get around this would involve polling
<kerneltoast> but nothx
orivej has joined #systemtap
<kerneltoast> I'm gonna try abusing hrtimers to make this work
<kerneltoast> I couldn't use an IPI
<kerneltoast> Because the kernel's IPI queuing mechanism likes to be smart and checks if you're trying to send a function call IPI to the CPU you're running on
<kerneltoast> In which case it decides to execute the callback function directly
<kerneltoast> Instead of delegating it to an IPI
<fche> eyah
<kerneltoast> i am in physical pain
<kerneltoast> gosh darn log messages
<agentzh> oh this is brutal and bloody.
<agentzh> hopefully we can find a way out.
<agentzh> or at least we can prevent the user from shooting himself in the foot.
<agentzh> kerneltoast: take care.
<kerneltoast> yeah trying to make this work in any context is quite difficult
<kerneltoast> we can't call schedule_work_on from anywhere
<kerneltoast> so we need to have something else to do that for us
<agentzh> fche: never-giveup-man has any advice? ;)
<agentzh> yeah, that looks like a deadend.
<kerneltoast> we can't delegate it to an IPI because the IPI queue mechanism shoots us in the foot
<agentzh> *nod*
<kerneltoast> we can't just check if irqs_disabled() is true inside the print flush because although that will cover the case where there's a print from inside an irq, it will also kill the sched_switch tracepoint
<fche> is that the only tracepoint with this problem? does the problem exist on more modern kernels? we can just bite the bullet and block that tp on that generation kernel only
<kerneltoast> because the sched_switch tracepoint is called with irqs disabled (they are disabled when the runqueue lock is acquired)
<kerneltoast> fche, it is still a problem on new kernels
<kerneltoast> the sched_switch tracepoint is always called with the runqueue lock held and irqs disabled
<kerneltoast> agentzh's original suggestion of passing down context info from stap would be the cleanest solution
<fche> how would this help?
<kerneltoast> if (stap_in_irq()) schedule_work_on() else print_flush_directly()
<kerneltoast> we can't print flush inside an irq, but we can't schedule_work_on in every context
<fche> what happens if we disable print_flush
<kerneltoast> messages get dropped
<kerneltoast> and or truncated
<fche> ok but only if userspace doesn't get them out fast enough, right?
<fche> so something fixable with a larger bufsize
<kerneltoast> we still need to write them to the inode
<kerneltoast> and that's crux of the problem
<kerneltoast> the inode is locked with a mutex
<kerneltoast> we can't lock a mutex inside an irq
<kerneltoast> we can't even mutex_trylock it
<fche> would be surprised if reserve/committing a block of data to the percpu relay buffer should really require inode level ops, hmmm
<kerneltoast> take a look at print_flush.c to see if that can be torched
<agentzh> fche: the pointers may get messed up for multiple writers.
<agentzh> leading to garbled data.
<agentzh> that was the issue my original patch tried to fix.
<agentzh> or even just a writer and a reader.
<agentzh> unless we can make it lockfree...
<agentzh> but it would require a custom version of overlayfs?
<fche> eww
irker207 has quit [Quit: transmission timeout]
derek0883 has quit [Remote host closed the connection]
khaled__ has quit [Remote host closed the connection]
khaled has joined #systemtap
derek0883 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
sscox has quit [Ping timeout: 265 seconds]
sscox has joined #systemtap