fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<agentzh>
okay, thanks
<agentzh>
then i should introduce a last resort...
<fche>
but a signal to the threads should interrupt them
<agentzh>
as long as sa_restart is not in place?
<agentzh>
seems like like the stap process sets it.
<agentzh>
staprun/stapio doesn't.
<agentzh>
stap exits in its signal handler, so it should be fine anyway.
<agentzh>
then i don't need to worry about it then :) thanks for the info
<fche>
could be that sa-restart is the wrong thing to do
<agentzh>
okay
sscox has joined #systemtap
<agentzh>
fche: i tried foribly removing the kernel module in staprun, but the call on staprun/staprun.c:266 fails with the error "'xxx' is not a zombie systemtap module.".
<agentzh>
how to work around that?
<agentzh>
manually running "rmmod xxx" can successfully remove the kernel module though.
dmalcolm has joined #systemtap
dmalcolm has quit [Excess Flood]
dmalcolm has joined #systemtap
dmalcolm has quit [Ping timeout: 268 seconds]
<agentzh>
fche: we can remove the main thread SIGURG blocking code at staprun/mainloop.c:666?
<agentzh>
the comment says the only time staprun is sleeping is "pselect", but actually it may also "sleep" in write() when the write buffers are full.
<agentzh>
i'm trying to leverage SIGURG to make the signal handler interrupt and notify the main loop thread. that piece of code defeats my fix.
<agentzh>
indeed as you said, setting nonblocking on fds won't affect in-flight fds in other threads.
<fche>
the dwarf segfault -vvv one is great, thanks
<fche>
and the fcntl nonblock one is okay too. TBH the code doesn't really need to check the previous flags first - just set O_NONBLOCK unconditionally
<fche>
but this is okay too
brolley has joined #systemtap
orivej has quit [Ping timeout: 268 seconds]
vbernat has quit [Read error: Connection reset by peer]
orivej has joined #systemtap
<agentzh>
fche: okay, i'll remove the O_NONBLOCK check. thanks for your feedback!
<fche>
it's harmless, and in one case you're using it as a flag to avoid repeating some work
<fche>
so you can leave it as is if you like
<agentzh>
okay, got it.
<agentzh>
i'll leave it then.
<agentzh>
i've just fixed a bug in main.cxx in that stap hang patch. i incorrectly hard-coded the fd number to 2 in the loop body. will address that in the final version committed.
<agentzh>
Zexuan Luo found it in our in-house code review.
<fche>
STDERR_FILENO is always 2
<agentzh>
i know, i was setting both stderr and stdout there.
<agentzh>
and my tests did not catch it since it merges stdout and stderr into a single stream.
<agentzh>
the reader thread in staprun actually blocks writing to stdout.