fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
mjw has quit [Quit: Leaving]
khaled_ has quit [Quit: Konversation terminated!]
derek0883 has quit [Remote host closed the connection]
hpt has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
hpt has quit [Ping timeout: 246 seconds]
hpt has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
irker464 has quit [Quit: transmission timeout]
<agentzh> fche: how about emit a warning when !_stp_target in sultan's patch?
<agentzh> oh i mean kerneltoast :)
orivej has joined #systemtap
sscox has quit [Ping timeout: 240 seconds]
orivej has quit [Ping timeout: 240 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
hpt has quit [Ping timeout: 240 seconds]
hpt has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
tonyj has quit [Remote host closed the connection]
orivej has joined #systemtap
khaled_ has joined #systemtap
lijunlong has quit [Ping timeout: 264 seconds]
lijunlong has joined #systemtap
<fche> agentzh, if there is an error message already, why an extra warning?
<agentzh> fche: the current error is only for _stp_target != 0.
<agentzh> or should we just make it an error regardless of _stp_target?
<fche> _stp_target != 0 makes sense to treat more severely
<agentzh> agreed.
<agentzh> should a warning be issued when _stp_target == 0?
<fche> there isn't one already, at some level?
<agentzh> you mean debugging logs?
* agentzh starts to wonder what time zone fche is in.
<fche> well not sure, without an added warning you're talking about, is it a silent failure?
<fche> couldn't sleep tonight :)
<agentzh> yes, it's a silent failure atm.
<fche> ah ok, yeah sure, warn
* agentzh and kerneltoast has prepared several important patches tonight.
<agentzh> *have
<fche> we already warn on failed registrations in other cases
<agentzh> right
<agentzh> we finally get our fuzzer run without hang or panics.
<agentzh> it's a big night tonight
<fche> merry christmas etc.
<agentzh> with several patches from kerneltoast
<agentzh> he's our hero :)
<agentzh> and his earlier RCU patch seems to also fix a panic by accident...
<agentzh> his earlier RCU lock patch for utrace task lock seems to make this panic go away: https://gist.github.com/agentzh/fc9c1adfd72a07eca0456a8e99ec6d96
<agentzh> at least we can no longer reproduce it after the RCU patch is applied...
<agentzh> this spinlock is gone now.
<fche> nice
<agentzh> yeah, merry christmas :)
<agentzh> kerneltoast had a very tough time since september.
<agentzh> he's been fighting with all the stapio hang, kernel panics, and cpu soft lockups caught by our kernel fuzzer...
<agentzh> now the fuzzer is finally passing.
<fche> geez it would have been less trouble to just stop the fuzzer
<agentzh> good lord...
<fche> come on, next time come to me for advice early :)
<agentzh> lol
<agentzh> i already started to worry about his mood and health...
<agentzh> task finder is really a pain in the ass...
<agentzh> with no offense...
<agentzh> and all due respect...
<agentzh> :)
<fche> <newt> well, it's getting better </newt>
<agentzh> yep
<agentzh> fche: this patch is now good to push? https://gist.github.com/agentzh/93f1624ecb4f2e178b4db776fc4e0dcc
<agentzh> i just added a warning :)
<agentzh> based on kerneltoast's earlier patch.
<fche> ok
<agentzh> cool, thanks
<agentzh> fche: what do you think of this patch? https://gist.github.com/agentzh/3ebde2612b260eb40afd4992954ecd4d (please ignore the patch commit log, kerneltoast will prepare something serious there).
<agentzh> this fixes the cpu soft lockup bug like this one: https://gist.github.com/agentzh/68d4ef9574f69595c5d19da3688b8981
irker927 has joined #systemtap
<irker927> systemtap: sultan systemtap.git:master * release-4.3-108-gfeb0327b6 / runtime/linux/task_finder.c runtime/linux/task_finder2.c: task_finder: error out when we cannot attach to _stp_target
<agentzh> this bug is in stap -DDEBUG_MEM
* agentzh wonders if fche goes back to bed.
<fche> someday
<agentzh> heh
<agentzh> i think this patch is safe to push?
<agentzh> it just saves the irq state.
<fche> ah yes, good stuff.
<agentzh> cool, thanks
<agentzh> fche: okay, the last patch for tonight: https://gist.github.com/agentzh/cb093f410ef2740efa10d65656806e3c :)
<agentzh> this fixed the Dl hang in stapio.
<fche> aha, this looks like a tiny fix compared to the utrace-stop-state algebra thing we were talking about a month ago
<agentzh> yeah, we are essentially abandoning the UTRACE_STOP state here.
<agentzh> at least not for the intial state.
<agentzh> it's the source of pain once we get into the rabbit hole of UTRACE_STOP.
<fche> so makes the intermittent D hang go away and no other effects/
<fche> ?
<agentzh> no other effects.
<agentzh> kerneltoast tried more complex approaches like handling reporting and initial request. but it failed.
<agentzh> this simple approach works.
<fche> I <3 simple
<fche> ok
<fche> can you give it a better commit msg please
<agentzh> sure
<fche> or else I'll have to stay awake TOMORROW NIGHT too
<agentzh> that one was just a placeholder.
<agentzh> i'll leave it to kerneltoast since he usually write very long commit log message :)
<agentzh> *writes
<agentzh> i'm just the guy throwing out patches at you ;)
<agentzh> we can finally sleep well after tonight :)
<agentzh> and we will make the fuzzer more evil tomorrow.
<agentzh> in the hope of finding deeper issues.
<agentzh> and will also regress a lot more kernels too.
<agentzh> deeper issues in both the kernel and the stap runtime.
<agentzh> The stapio Dl hang issue was only able to reproduce on the crazy 32c/64t amd threadripper CPU...
<agentzh> it was never reproduced on the 8c/16t intel core i9 cpu for example.
<agentzh> maybe we'll need even bigger machines for the fuzzer in the future :)
<agentzh> kerneltoast will also be able to have a closer look at the kernel freeze issue when running the stap official test suite on a lockdep/debug kernel.
<irker927> systemtap: sultan systemtap.git:master * release-4.3-109-g55de61efb / runtime/linux/alloc.c: Bug: deadlocks might happen in the spinlocks when -DDEBUG_MEM is specified
hpt has quit [Quit: Lost terminal]
lkthomas has quit [Remote host closed the connection]
lkthomas has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
mjw has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
orivej_ has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
tromey has joined #systemtap
amerey has joined #systemtap
tonyj has joined #systemtap
tonyj has quit [Client Quit]
tonyj has joined #systemtap
irker927 has quit [Quit: transmission timeout]
sscox has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
amerey has quit [Quit: Leaving]
amerey has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> fche, bonjour
<fche> yessir
<fche> hello again
<kerneltoast> does anything use task_finder.c?
<kerneltoast> should i duplicate changes from task_finder2 to task_finder?
<fche> rhel6 era code
<kerneltoast> i guess task_finder should only get bugfixes then, and no fancy perf improvements?
<fche> sure
<fche> https://www.centos.org/download/#tab-3 <<< btw to get a centos6 rhel6-ish gadget
<kerneltoast> it'll be abandonware in decemeber anyway right ;)
<fche> oh I doubt it :)
<kerneltoast> redhat gives out extended support contracts?
<fche> I would not be surprised.
<fche> I'm not too familiar.
<fche> kerneltoast, I think that makes sense, this is a pure optimization right?
<kerneltoast> yes, just an optimization
<kerneltoast> i noticed it during the Sl+ hang bug hunt
<fche> ok
<kerneltoast> fche, how does this commit message look? https://gist.github.com/kerneltoast/f143ef88da408449bbdfbd13b4f50c01
<fche> looks good
<kerneltoast> time to shippit
<fche> thanks
<agentzh> kerneltoast: yeah, i can confirm in my centos 6 vm, task_finder.c is used.
<agentzh> so it makes sense to run tests there when we make changes to that file.
<fche> yup
<kerneltoast> agentzh, think we should give task_finder.c the Dl fix? the old task finder has a lot of UTRACE_STOP code that task_finder2 is missing, so it might actually work in the old code
<agentzh> kerneltoast: yeah, that might be different.
<agentzh> we never ran our fuzzer on centos 6 though.
<kerneltoast> i don't want to think about how many 2.6.32 bugs we'd hit running the fuzzer on centos 6...
<fche> well, I wouldn't consider it a high priority platform
<fche> rhel7 more interesting actually
<fche> that too would use the new code
<fche> but the kernel is more.... baroque ?
<kerneltoast> yeah we've been primarily using centos 7 to fuzz
<fche> aha
<fche> there's also centos8
<kerneltoast> with its franken 3.10 kernel
<fche> i resemble that remark
irker043 has joined #systemtap
<irker043> systemtap: yichun systemtap.git:master * release-4.3-110-g5f74d0db4 / runtime/linux/task_finder2.c: task_finder2: don't attach to forked children when the target PID is specified
<irker043> systemtap: sultan systemtap.git:master * release-4.3-111-g72f1927c9 / runtime/linux/task_finder2.c: task_finder2: change the default engine action to UTRACE_INTERRUPT
derek0883 has quit [Remote host closed the connection]
<irker043> systemtap: sultan systemtap.git:azhang/pr13838 * release-4.3-109-g55de61efb / runtime/linux/alloc.c: Bug: deadlocks might happen in the spinlocks when -DDEBUG_MEM is specified
<irker043> systemtap: yichun systemtap.git:azhang/pr13838 * release-4.3-110-g5f74d0db4 / runtime/linux/task_finder2.c: task_finder2: don't attach to forked children when the target PID is specified
<irker043> systemtap: sultan systemtap.git:azhang/pr13838 * release-4.3-111-g72f1927c9 / runtime/linux/task_finder2.c: task_finder2: change the default engine action to UTRACE_INTERRUPT
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-112-g4cf128741 / : PR13838: Added basic floating point support to systemtap
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-168-g38f1d2d26 / : fixed conflicts
derek0883 has joined #systemtap
derek088_ has joined #systemtap
derek0883 has quit [Ping timeout: 240 seconds]
derek088_ has quit [Remote host closed the connection]
derek0883 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-112-g07ad63ee6 / : PR13838: Added basic floating point support to systemtap
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-170-gd7238b874 / : PR13838: Fix formatting & white space
orivej_ has quit [Ping timeout: 268 seconds]
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-112-ge16dd56e8 / : author Alice Zhang <alizhang@redhat.com> committer Alice Zhang <alizhang@redhat.com>
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-172-g2b71618db / : PR13838: fix white space in runtime/softfloat.c
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
<irker043> systemtap: amerey systemtap.git:PR26015 * release-4.3-53-g1980ba3e9 / man/stapprobes.3stap tapset/linux/sysc_kcmp.stp tapset/linux/sysc_mknod.stp tapset/linux/sysc_mq_unlink.stp tapset/linux/sysc_preadv.stp tapset/linux/sysc_setresuid.stp tapset/linux/sysc_sigaction.stp: Mention unquoted string args in man page, fix typos
<irker043> systemtap: amerey systemtap.git:PR26015 * release-4.3-54-g59adcad36 / : Rename _IS_SREG_ARCH to _IS_SREG_KERNEL
<irker043> systemtap: amerey systemtap.git:PR26015 * release-4.3-55-g78fa3634d / testsuite/systemtap.base/sysarg_write.c testsuite/systemtap.base/sysarg_write.exp testsuite/systemtap.base/sysarg_write.stp testsuite/systemtap.syscall/arg_write.exp: Delete old testcase, add check for proper arch/kernel in arg_write.exp
<irker043> systemtap: fche systemtap.git:azhang/pr13838 * release-4.3-128-g3e9b25a03 / runtime/softfloat.c tapset/floatingpoint.stp testsuite/buildok/floatingpoint.stp: string_to_fp parse error handling
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-129-g46cd19895 / testsuite/buildok/floatingpoint.stp: use try catch to handle NaN error
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-130-ge924b3277 / main.cxx testsuite/buildok/floatingpoint.stp: added extracting floating point demo
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-131-gc80b600ce / NEWS: update news for fp
<irker043> systemtap: alizhang systemtap.git:azhang/pr13838 * release-4.3-197-g1796d16cf / : fix white spaces
<agentzh> fche: we're now running the fuzzer for much longer and can still reproduce this kernel panic: https://gist.github.com/agentzh/578ba18edc6c8f5a20606062c126af05
<agentzh> we saw very similar panics before so it should not be due to our recent patches.
<agentzh> if you recognize it or can shed some light on this, it'll be great.
<agentzh> kerneltoast thinks it may be unloaded modules fail to unregister the utrace/tracepoint hooks.
<agentzh> and then the kernel follows invaid pointers to nonexecutable memory regions.
<agentzh> the backtraces and error messages are consistent across our curent panic samples from the fuzzer.
<agentzh> very consistent.
<agentzh> and we need much larger load in the fuzzer to reproduce it relatively reliably.
<agentzh> it's still very rare though.
<irker043> systemtap: amerey systemtap.git:PR26015 * release-4.3-56-g96a606e17 / NEWS man/stap.1.in: Update NEWS, stap.1
<irker043> systemtap: yichun systemtap.git:master * release-4.3-112-gbc86fc8fe / NEWS: NEWS: mentioned the utrace task hash table optimization
<irker043> systemtap: amerey systemtap.git:master * release-4.3-113-ge3529b3e6 / elaborate.cxx parse.cxx staptree.cxx staptree.h: Allow individual probes to have both a prologue and epilogue.
<irker043> systemtap: sapatel systemtap.git:master * release-4.3-114-g8a7e5b0a3 / NEWS elaborate.cxx elaborate.h man/stap.1.in man/stapprobes.3stap man/stapref.1 parse.cxx staptree.cxx staptree.h testsuite/systemtap.base/probewrite.exp testsuite/systemtap.base/probewrite_1.stp testsuite/systemtap.base/probewrite_2.stp testsuite/systemtap.base/probewrite_3.c testsuite/systemtap.base/probewrite_3.stp translate.cxx: PR26015: Add @probewrite predi
<irker043> systemtap: amerey systemtap.git:master * release-4.3-115-g90f9123bb / : PR26015: Make syscall arguments writable again