fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<agentzh> okay, thanks
<agentzh> then i should introduce a last resort...
<fche> but a signal to the threads should interrupt them
<agentzh> as long as sa_restart is not in place?
<agentzh> seems like like the stap process sets it.
<agentzh> staprun/stapio doesn't.
<agentzh> stap exits in its signal handler, so it should be fine anyway.
<agentzh> then i don't need to worry about it then :) thanks for the info
<fche> could be that sa-restart is the wrong thing to do
<agentzh> okay
sscox has joined #systemtap
<agentzh> fche: i tried foribly removing the kernel module in staprun, but the call on staprun/staprun.c:266 fails with the error "'xxx' is not a zombie systemtap module.".
<agentzh> how to work around that?
<agentzh> manually running "rmmod xxx" can successfully remove the kernel module though.
dmalcolm has joined #systemtap
dmalcolm has quit [Excess Flood]
dmalcolm has joined #systemtap
dmalcolm has quit [Ping timeout: 268 seconds]
<agentzh> fche: we can remove the main thread SIGURG blocking code at staprun/mainloop.c:666?
<agentzh> the comment says the only time staprun is sleeping is "pselect", but actually it may also "sleep" in write() when the write buffers are full.
<agentzh> i'm trying to leverage SIGURG to make the signal handler interrupt and notify the main loop thread. that piece of code defeats my fix.
<agentzh> indeed as you said, setting nonblocking on fds won't affect in-flight fds in other threads.
orivej has quit [Ping timeout: 268 seconds]
<agentzh> fche: submitted a patch for the stap/staprun hanging issue: https://sourceware.org/ml/systemtap/2018-q4/msg00112.html
<agentzh> fche: also, please let me know if it's okay to commit this patch: https://sourceware.org/ml/systemtap/2018-q4/msg00113.html
orivej has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
wcohen has joined #systemtap
<fche> agentzh,
<fche> the @var/@entry one is great, thanks
<fche> the dwarf segfault -vvv one is great, thanks
<fche> and the fcntl nonblock one is okay too. TBH the code doesn't really need to check the previous flags first - just set O_NONBLOCK unconditionally
<fche> but this is okay too
brolley has joined #systemtap
orivej has quit [Ping timeout: 268 seconds]
vbernat has quit [Read error: Connection reset by peer]
orivej has joined #systemtap
<agentzh> fche: okay, i'll remove the O_NONBLOCK check. thanks for your feedback!
<fche> it's harmless, and in one case you're using it as a flag to avoid repeating some work
<fche> so you can leave it as is if you like
<agentzh> okay, got it.
<agentzh> i'll leave it then.
<agentzh> i've just fixed a bug in main.cxx in that stap hang patch. i incorrectly hard-coded the fd number to 2 in the loop body. will address that in the final version committed.
<agentzh> Zexuan Luo found it in our in-house code review.
<fche> STDERR_FILENO is always 2
<agentzh> i know, i was setting both stderr and stdout there.
<agentzh> and my tests did not catch it since it merges stdout and stderr into a single stream.
<agentzh> the reader thread in staprun actually blocks writing to stdout.
orivej has quit [Ping timeout: 240 seconds]
brolley has left #systemtap [#systemtap]
invano_ has joined #systemtap
invano__ has joined #systemtap
invano__ has quit [Client Quit]
pfallenop has quit [Ping timeout: 252 seconds]
pfallenop has joined #systemtap