fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
orivej has quit [Ping timeout: 258 seconds]
khaled has quit [Remote host closed the connection]
<agentzh>
the last time fche replied to my patch 5am in my morning.
<kerneltoast>
i guess we'll wake up to a bunch of angry messages from fche
<agentzh>
i hope not.
derek0883 has quit [Remote host closed the connection]
irker825 has quit [Quit: transmission timeout]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
khaled has joined #systemtap
tonyj has quit [Remote host closed the connection]
hpt has quit [Ping timeout: 256 seconds]
orivej has joined #systemtap
_whitelogger has joined #systemtap
_whitelogger has joined #systemtap
<fche>
hi guys
<fche>
I see why you're thinking in this area, but we have two related mechanisms already, and am concerned they're not working:
<fche>
the -DINTERRUPTIBLE conditional in the probe prologues/epilogues which normally wraps a similar irq_save() gadget around the bulk of the code
<fche>
and
<fche>
the _stp_runtime_entryfn_get_context() gadget which is our primary reentrancy-prevention mechanism
<fche>
the latter aims to prevent just this sort of thing, and should have rejected giving the nested probe a context* at all, without which it would haven never tried the nested stp_probe_lock()
mjw has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
tromey has joined #systemtap
pviktori has quit [Ping timeout: 260 seconds]
orivej has quit [Ping timeout: 264 seconds]
pviktori has joined #systemtap
pviktori has quit [Ping timeout: 264 seconds]
pviktori has joined #systemtap
amerey has joined #systemtap
sscox has quit [Ping timeout: 264 seconds]
pviktori has quit [Ping timeout: 256 seconds]
orivej has joined #systemtap
<fche>
agentzh, morning morning
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has quit [Remote host closed the connection]
<fche>
it be easy
<fche>
but you have to see the previous few lines to see how sdt.c.exe.0 was built
derek0883 has joined #systemtap
<kerneltoast>
ok i got it
<agentzh>
fche: hey, i'm late today. customer meetings in the morning.
<agentzh>
glad kerneltoast was already talking to you :)
<agentzh>
so the current theory is that some special probe handlers lack _stp_runtime_entryfn_get_context() calls?
<agentzh>
or the _stp_runtime_entryfn_get_context() call itself is buggy?
<agentzh>
and the next step is to scan all the C code for the .stp files in the stap test suite? it sounds like a daunting task given the complexity of the generated C code.
<agentzh>
and maybe one stp script uses 3 probes but only 1 lacks _stp_runtime_entryfn_get_context().
<agentzh>
it may be easier if we can know which .stp file is at fault when the soft lockup happens.
orivej has quit [Ping timeout: 240 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<fche>
that get_context should be called by every possible code
<fche>
the soft lockup ... the dmesg gives a hint of the stap source file name
<irker782>
systemtap: alizhang systemtap.git:master * release-4.3-124-gc80f1453e / testsuite/systemtap.examples/general/floatingpoint.meta testsuite/systemtap.examples/general/floatingpoint.stp testsuite/systemtap.examples/general/floatingpoint.txt: PR13838: add floating point to systemtap.examples