fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
lijunlong has quit [Read error: Connection reset by peer]
lijunlong has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled__ has quit [Quit: Konversation terminated!]
derek0883 has quit [Remote host closed the connection]
<kerneltoast>
that gist has 3 files, in order: one showing the deadlock detected by lockdep, one with the C file for the stap script which caused the bug, and one for my broken print patch
<kerneltoast>
the problem has something to do with my use of schedule_work_on() in a tracepoint
<kerneltoast>
where i suspect the tracepoint is called while holding a lock that schedule_work_on() needs
<kerneltoast>
ok i sleep
khaled__ has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 272 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
<kerneltoast>
I'm gonna try abusing hrtimers to make this work
<kerneltoast>
I couldn't use an IPI
<kerneltoast>
Because the kernel's IPI queuing mechanism likes to be smart and checks if you're trying to send a function call IPI to the CPU you're running on
<kerneltoast>
In which case it decides to execute the callback function directly
<kerneltoast>
Instead of delegating it to an IPI
<fche>
eyah
<kerneltoast>
i am in physical pain
<kerneltoast>
gosh darn log messages
<agentzh>
oh this is brutal and bloody.
<agentzh>
hopefully we can find a way out.
<agentzh>
or at least we can prevent the user from shooting himself in the foot.
<agentzh>
kerneltoast: take care.
<kerneltoast>
yeah trying to make this work in any context is quite difficult
<kerneltoast>
we can't call schedule_work_on from anywhere
<kerneltoast>
so we need to have something else to do that for us
<agentzh>
fche: never-giveup-man has any advice? ;)
<agentzh>
yeah, that looks like a deadend.
<kerneltoast>
we can't delegate it to an IPI because the IPI queue mechanism shoots us in the foot
<agentzh>
*nod*
<kerneltoast>
we can't just check if irqs_disabled() is true inside the print flush because although that will cover the case where there's a print from inside an irq, it will also kill the sched_switch tracepoint
<fche>
is that the only tracepoint with this problem? does the problem exist on more modern kernels? we can just bite the bullet and block that tp on that generation kernel only
<kerneltoast>
because the sched_switch tracepoint is called with irqs disabled (they are disabled when the runqueue lock is acquired)
<kerneltoast>
fche, it is still a problem on new kernels
<kerneltoast>
the sched_switch tracepoint is always called with the runqueue lock held and irqs disabled
<kerneltoast>
agentzh's original suggestion of passing down context info from stap would be the cleanest solution
<fche>
how would this help?
<kerneltoast>
if (stap_in_irq()) schedule_work_on() else print_flush_directly()
<kerneltoast>
we can't print flush inside an irq, but we can't schedule_work_on in every context
<fche>
what happens if we disable print_flush
<kerneltoast>
messages get dropped
<kerneltoast>
and or truncated
<fche>
ok but only if userspace doesn't get them out fast enough, right?
<fche>
so something fixable with a larger bufsize
<kerneltoast>
we still need to write them to the inode
<kerneltoast>
and that's crux of the problem
<kerneltoast>
the inode is locked with a mutex
<kerneltoast>
we can't lock a mutex inside an irq
<kerneltoast>
we can't even mutex_trylock it
<fche>
would be surprised if reserve/committing a block of data to the percpu relay buffer should really require inode level ops, hmmm
<kerneltoast>
take a look at print_flush.c to see if that can be torched
<agentzh>
fche: the pointers may get messed up for multiple writers.
<agentzh>
leading to garbled data.
<agentzh>
that was the issue my original patch tried to fix.
<agentzh>
or even just a writer and a reader.
<agentzh>
unless we can make it lockfree...
<agentzh>
but it would require a custom version of overlayfs?
<fche>
eww
irker207 has quit [Quit: transmission timeout]
derek0883 has quit [Remote host closed the connection]
khaled__ has quit [Remote host closed the connection]
khaled has joined #systemtap
derek0883 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]