fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<kerneltoast> ✓ r/w lock comment?
khaled has quit [Quit: Konversation terminated!]
derek0883 has quit [Remote host closed the connection]
<fche> nice
<fche> 🐄 moo
derek0883 has joined #systemtap
<fche> is there no race possibility between the time of read-unlock and then print-flush
<fche> AND the write-unlock and shutdown ?
hpt has joined #systemtap
mjw has quit [Quit: Leaving]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
modem has quit [Ping timeout: 240 seconds]
modem has joined #systemtap
<kerneltoast> you're looking at the part where i call the print flush outside the lock?
<kerneltoast> _stp_print_flush() does all those checks again
<kerneltoast> so no race potential there
<kerneltoast> no race possibility after the write-unlock because _stp_print_stop is updated to 1 before the lock and then the lock executes a full memory barrier
<kerneltoast> e91e63🐄 my alma mater is known for smelling like cow poop 🐄
<agentzh> so we're finally seeing the light at the end of the tunnel?
<agentzh> iirc, it's not related to the probe lock deadlock reproduced by stap's test suite in parallel?
<kerneltoast> yeah it's not related to the probe lock deadlock
<agentzh> k
<kerneltoast> it was responsible for one panic that led to the probe lock deadlock though
<kerneltoast> that was interesting
<agentzh> huh
<kerneltoast> [10:37:15 pm] <kerneltoast> fche, a debug kernel caught the soft lockup and left a beautiful backtrace: https://gist.github.com/kerneltoast/d0661fa02773a2f2e30b8697261587f6
<agentzh> yeah i also saw that one.
<kerneltoast> oh, no all the backtraces are different
<kerneltoast> nvm
<kerneltoast> it is unrelated to the probe lock stuff
<kerneltoast> i just got lucky
<kerneltoast> and hit a new panic
<kerneltoast> which led us here :)
<agentzh> k
<kerneltoast> agentzh, fche, just tested the print patch on centos6 and all is well with the lean testsuite
<agentzh> nice
<kerneltoast> time to run the serial testsuite on centos7
<agentzh> cool
<agentzh> sad we have to stick with the serial mode atm.
<kerneltoast> yeah...
<agentzh> eyes on your next probe lock patch :)
<kerneltoast> parallel is so explosive
<agentzh> to end this suffering.
<kerneltoast> i'm sure there are more bugs in parallel mode
<agentzh> there could be :)
<agentzh> it's a good stress test it seems.
<kerneltoast> yeah, and none of our tests use prints inside a timer iirc
<kerneltoast> maybe we should add that
<agentzh> so far it was all bugs in the stap runtime, fortunately.
<agentzh> kerneltoast: yeah we should some
<agentzh> patches welcome :)
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
irker157 has quit [Quit: transmission timeout]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
orivej has quit [Ping timeout: 272 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
lijunlong has quit [Ping timeout: 256 seconds]
lijunlong has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
khaled has joined #systemtap
_whitelogger has joined #systemtap
beauty1 has quit [Ping timeout: 244 seconds]
mjw has joined #systemtap
beauty1 has joined #systemtap
hpt has quit [Ping timeout: 256 seconds]
derek0883 has joined #systemtap
orivej has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
tromey has joined #systemtap
tonyj has quit [Ping timeout: 272 seconds]
orivej has quit [Ping timeout: 246 seconds]
amerey has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
orivej has joined #systemtap
amerey has quit [Remote host closed the connection]
amerey has joined #systemtap
amerey has quit [Quit: Leaving]
amerey has joined #systemtap
derek0883 has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
tonyj has joined #systemtap
orivej has joined #systemtap
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 264 seconds]
<kerneltoast> well, the print patch died while running the full testsuite in serial mode, without any info left behind in dmesg
derek0883 has quit [Remote host closed the connection]
<kerneltoast> :)))
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<fche> i am vomit
<kerneltoast> nothing in dmesg even with a debug kernel
derek0883 has quit [Remote host closed the connection]
<kerneltoast> fche, gimme some help
<kerneltoast> these are the last 3 messages in dmesg:
<kerneltoast> [ 1536.384112] stap_bc686be7cc530ab83427ab6d1f6fac72_25819 (pr14546.stp): systemtap: 4.4/0.177, base: ffffffffc083b000, memory: 2931data/76text/103ctx/2150net/135alloc kb, probes: 113
<kerneltoast> [ 1543.221295] stap_49ae8bf507aba76016fa80e5c8096ae_26689 (<input>): systemtap: 4.4/0.177, base: ffffffffc05b7000, memory: 223data/28text/12ctx/2150net/134alloc kb, probes: 1
<kerneltoast> [ 1541.385432] stap_93d8a595f433dc44db80412e903958f_26367 (<input>): systemtap: 4.4/0.177, base: ffffffffc0574000, memory: 228data/32text/12ctx/2150net/134alloc kb, probes: 1
<kerneltoast> this was done in serial mode
<kerneltoast> which test should i look at?
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
<fche> pr14546.stp + 2 ?\
<kerneltoast> yeah but i have no idea what that is
<kerneltoast> idk how to find the order of the tests
<fche> .exp files are executed alphabetically
<fche> so find whichever test ran pr14546.stp
<fche> and then check that one or the next one
<kerneltoast> and how do i run a specific test? the readme says that TESTS= is only for parallel mode
<fche> RUNTESTFLAGS=foobar.exp
derek0883 has joined #systemtap
<kerneltoast> ah shoot
<kerneltoast> fche, tasklet_schedule calls wakeup_softirqd
<kerneltoast> which leads to the same deadlock as calling schedule_work
khaled_ has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
<fche> pity
_whitelogger has joined #systemtap
<kerneltoast> fche, the option to poll remains
<kerneltoast> needing to poll to make print statements work is sad though
<fche> this is only for cases where the buffers are about to overflow, right?
<kerneltoast> we can't code it like that
<kerneltoast> this will have to be for cases where irqs are disabled
<kerneltoast> but there will always be a worker polling for print flush requests
<fche> yes, understood, but that worker would only have to do work if the buffers were about to overflow from an unfriendly context
<fche> AIUI
<kerneltoast> not about to overflow, just if there is a flush request
<kerneltoast> checking for an "about to overflow" condition could cause an actual overflow depending on how prints are used
<kerneltoast> we have to keep up the regular print flush maintenance
<fche> aha
orivej has quit [Ping timeout: 272 seconds]
orivej_ has joined #systemtap
<kerneltoast> the polling would be done with a worker
<kerneltoast> we may need to watch the poll worker if a user runs stap in Pennsylvania, or we could get sued
<fche> ouch
<kerneltoast> too soon?
amerey has quit [Quit: Leaving]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap