00:08
<
kerneltoast >
✓ r/w lock comment?
00:42
khaled has quit [Quit: Konversation terminated!]
00:46
derek0883 has quit [Remote host closed the connection]
00:59
derek0883 has joined #systemtap
01:01
<
fche >
is there no race possibility between the time of read-unlock and then print-flush
01:01
<
fche >
AND the write-unlock and shutdown ?
01:08
hpt has joined #systemtap
01:14
mjw has quit [Quit: Leaving]
01:29
derek0883 has quit [Remote host closed the connection]
01:34
derek0883 has joined #systemtap
01:44
derek0883 has quit [Remote host closed the connection]
01:45
derek0883 has joined #systemtap
01:51
modem has quit [Ping timeout: 240 seconds]
01:53
modem has joined #systemtap
01:59
<
kerneltoast >
you're looking at the part where i call the print flush outside the lock?
01:59
<
kerneltoast >
_stp_print_flush() does all those checks again
01:59
<
kerneltoast >
so no race potential there
02:01
<
kerneltoast >
no race possibility after the write-unlock because _stp_print_stop is updated to 1 before the lock and then the lock executes a full memory barrier
02:02
<
kerneltoast >
e91e63🐄 my alma mater is known for smelling like cow poop 🐄
02:16
<
agentzh >
so we're finally seeing the light at the end of the tunnel?
02:17
<
agentzh >
iirc, it's not related to the probe lock deadlock reproduced by stap's test suite in parallel?
02:27
<
kerneltoast >
yeah it's not related to the probe lock deadlock
02:28
<
kerneltoast >
it was responsible for one panic that led to the probe lock deadlock though
02:28
<
kerneltoast >
that was interesting
02:28
<
agentzh >
yeah i also saw that one.
02:29
<
kerneltoast >
oh, no all the backtraces are different
02:29
<
kerneltoast >
it is unrelated to the probe lock stuff
02:30
<
kerneltoast >
i just got lucky
02:30
<
kerneltoast >
and hit a new panic
02:30
<
kerneltoast >
which led us here :)
02:30
<
kerneltoast >
agentzh, fche, just tested the print patch on centos6 and all is well with the lean testsuite
02:31
<
kerneltoast >
time to run the serial testsuite on centos7
02:31
<
agentzh >
sad we have to stick with the serial mode atm.
02:31
<
kerneltoast >
yeah...
02:31
<
agentzh >
eyes on your next probe lock patch :)
02:32
<
kerneltoast >
parallel is so explosive
02:32
<
agentzh >
to end this suffering.
02:32
<
kerneltoast >
i'm sure there are more bugs in parallel mode
02:32
<
agentzh >
there could be :)
02:32
<
agentzh >
it's a good stress test it seems.
02:33
<
kerneltoast >
yeah, and none of our tests use prints inside a timer iirc
02:33
<
kerneltoast >
maybe we should add that
02:34
<
agentzh >
so far it was all bugs in the stap runtime, fortunately.
02:34
<
agentzh >
kerneltoast: yeah we should some
02:35
<
agentzh >
patches welcome :)
02:36
derek0883 has quit [Remote host closed the connection]
02:46
derek0883 has joined #systemtap
03:01
irker157 has quit [Quit: transmission timeout]
04:28
derek0883 has quit [Remote host closed the connection]
04:37
derek0883 has joined #systemtap
04:49
derek0883 has quit [Remote host closed the connection]
04:58
derek0883 has joined #systemtap
06:12
orivej has quit [Ping timeout: 272 seconds]
06:23
derek0883 has quit [Remote host closed the connection]
06:25
derek0883 has joined #systemtap
06:27
derek0883 has quit [Remote host closed the connection]
06:28
derek0883 has joined #systemtap
06:28
derek0883 has quit [Remote host closed the connection]
07:12
lijunlong has quit [Ping timeout: 256 seconds]
07:13
lijunlong has joined #systemtap
07:45
orivej has joined #systemtap
07:52
orivej has quit [Ping timeout: 240 seconds]
08:02
khaled has joined #systemtap
09:55
_whitelogger has joined #systemtap
10:20
beauty1 has quit [Ping timeout: 244 seconds]
10:42
mjw has joined #systemtap
11:13
beauty1 has joined #systemtap
11:14
hpt has quit [Ping timeout: 256 seconds]
12:47
derek0883 has joined #systemtap
12:55
orivej has joined #systemtap
13:50
derek0883 has quit [Ping timeout: 260 seconds]
14:09
tromey has joined #systemtap
14:14
tonyj has quit [Ping timeout: 272 seconds]
14:16
orivej has quit [Ping timeout: 246 seconds]
14:57
amerey has joined #systemtap
15:14
khaled has quit [Quit: Konversation terminated!]
15:15
khaled has joined #systemtap
16:42
orivej has joined #systemtap
16:48
amerey has quit [Remote host closed the connection]
16:48
amerey has joined #systemtap
16:57
amerey has quit [Quit: Leaving]
16:59
amerey has joined #systemtap
18:03
derek0883 has joined #systemtap
18:32
orivej has quit [Ping timeout: 265 seconds]
18:42
tonyj has joined #systemtap
19:05
orivej has joined #systemtap
19:06
khaled_ has joined #systemtap
19:06
khaled has quit [Ping timeout: 264 seconds]
19:53
<
kerneltoast >
well, the print patch died while running the full testsuite in serial mode, without any info left behind in dmesg
19:53
derek0883 has quit [Remote host closed the connection]
19:54
derek0883 has joined #systemtap
19:57
derek0883 has quit [Remote host closed the connection]
19:57
derek0883 has joined #systemtap
20:34
<
kerneltoast >
nothing in dmesg even with a debug kernel
20:36
derek0883 has quit [Remote host closed the connection]
20:48
<
kerneltoast >
fche, gimme some help
20:48
<
kerneltoast >
these are the last 3 messages in dmesg:
20:48
<
kerneltoast >
[ 1536.384112] stap_bc686be7cc530ab83427ab6d1f6fac72_25819 (pr14546.stp): systemtap: 4.4/0.177, base: ffffffffc083b000, memory: 2931data/76text/103ctx/2150net/135alloc kb, probes: 113
20:48
<
kerneltoast >
[ 1543.221295] stap_49ae8bf507aba76016fa80e5c8096ae_26689 (<input>): systemtap: 4.4/0.177, base: ffffffffc05b7000, memory: 223data/28text/12ctx/2150net/134alloc kb, probes: 1
20:48
<
kerneltoast >
[ 1541.385432] stap_93d8a595f433dc44db80412e903958f_26367 (<input>): systemtap: 4.4/0.177, base: ffffffffc0574000, memory: 228data/32text/12ctx/2150net/134alloc kb, probes: 1
20:48
<
kerneltoast >
this was done in serial mode
20:48
<
kerneltoast >
which test should i look at?
21:02
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
21:18
derek0883 has joined #systemtap
21:25
derek0883 has quit [Ping timeout: 264 seconds]
21:32
<
fche >
pr14546.stp + 2 ?\
21:35
<
kerneltoast >
yeah but i have no idea what that is
21:35
<
kerneltoast >
idk how to find the order of the tests
21:35
<
fche >
.exp files are executed alphabetically
21:35
<
fche >
so find whichever test ran pr14546.stp
21:36
<
fche >
and then check that one or the next one
21:37
<
kerneltoast >
and how do i run a specific test? the readme says that TESTS= is only for parallel mode
21:37
<
fche >
RUNTESTFLAGS=foobar.exp
21:47
derek0883 has joined #systemtap
21:54
<
kerneltoast >
ah shoot
21:55
<
kerneltoast >
fche, tasklet_schedule calls wakeup_softirqd
21:55
<
kerneltoast >
which leads to the same deadlock as calling schedule_work
22:00
khaled_ has quit [Quit: Konversation terminated!]
22:02
khaled has joined #systemtap
22:29
_whitelogger has joined #systemtap
22:53
<
kerneltoast >
fche, the option to poll remains
22:54
<
kerneltoast >
needing to poll to make print statements work is sad though
22:55
<
fche >
this is only for cases where the buffers are about to overflow, right?
22:56
<
kerneltoast >
we can't code it like that
22:56
<
kerneltoast >
this will have to be for cases where irqs are disabled
22:57
<
kerneltoast >
but there will always be a worker polling for print flush requests
22:57
<
fche >
yes, understood, but that worker would only have to do work if the buffers were about to overflow from an unfriendly context
22:58
<
kerneltoast >
not about to overflow, just if there is a flush request
23:00
<
kerneltoast >
checking for an "about to overflow" condition could cause an actual overflow depending on how prints are used
23:00
<
kerneltoast >
we have to keep up the regular print flush maintenance
23:12
orivej has quit [Ping timeout: 272 seconds]
23:12
orivej_ has joined #systemtap
23:14
<
kerneltoast >
the polling would be done with a worker
23:14
<
kerneltoast >
we may need to watch the poll worker if a user runs stap in Pennsylvania, or we could get sued
23:15
<
kerneltoast >
too soon?
23:28
amerey has quit [Quit: Leaving]
23:39
derek0883 has quit [Remote host closed the connection]
23:41
derek0883 has joined #systemtap