fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
khaled has quit [Quit: Konversation terminated!]
hpt has joined #systemtap
irker615 has quit [Quit: transmission timeout]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 258 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 272 seconds]
khaled has joined #systemtap
hpt has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
derek0883 has joined #systemtap
Ultrasauce has joined #systemtap
tromey has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
amerey has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<agentzh> fche: we've noted that the delete_module() syscall can be interrupted often under load. retrying that syscall upon EINTR errno fixes the issue.
<agentzh> otherwise stap ko modules would be left in the system.
<agentzh> should we fix it this way?
<agentzh> this is in the staprun process btw.
<agentzh> the error is like this: ERROR: Couldn't remove module 'stap_b1bafbefd24d1b7c7b5ab1464e82c2f_40418': Interrupted system call.
<agentzh> the leftover ko modules are like this:
<agentzh> $ sudo lsmod|grep stap
<agentzh> stap_2eb3039808b647e990825b99fb1f9b6_18830 221184 0
<agentzh> stap_eee9b566240fbeb1194347a571ccd68_17219 208896 0
<fche> staprun could repeat a few times on EINTR
<agentzh> how about 5 times?
<fche> sold
<agentzh> cool
<agentzh> tested on my side.
<fche> pl
<fche> i mean
<fche> ok
<agentzh> seems like it needs to try at least 3 times to suceed with -j64
<agentzh> cool, will push.
<fche> could jam a little sleep at the bottom of the loop too perhaps
<agentzh> good idea
<agentzh> will add a 1ms sleep
derek0883 has quit [Remote host closed the connection]
<agentzh> 100us should be enough?
<fche> and nuke the /* XXX: maybe we should jsut accept ...>" comment, since the errors are interesting
<fche> whatever less than human-bothering total - I'd even say 100ms per try
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
<agentzh> okay
derek0883 has joined #systemtap
irker521 has joined #systemtap
<irker521> systemtap: yichun systemtap.git:master * release-4.4-84-ga399ff28c / staprun/staprun.c: Bug: delete_module() syscall might get interrupted under load.
<agentzh> fche: committed the patch with a growing sleep interval, which is `100 * i`.
<agentzh> from 100us to 500us at most.
<agentzh> working well for my tests.
<agentzh> stap seems to work well on aarch64 linux now. tested amazon linux 2 and fedora 32 myself.
<agentzh> we still need to run our kernel fuzzer there though.
<agentzh> tried eMAG and Graviton2 CPUs so far.
<fche> thanks, nice
<kerneltoast> i feel left out since only agentzh is bothering you
<kerneltoast> fche, wanna review a patch?
<kerneltoast> it's not final yet but is getting close
<fche> dinner calls real soon here, but shoot
<kerneltoast> it can probably be split into 3 patches
<kerneltoast> maybe 4
<fche> my eyes glazed over around the 30% mark
<fche> yeah please try to split
mjw has quit [Quit: Leaving]
<kerneltoast> hah ok
<fche> I would be especially interested in statements of what problems existed with the previous version, how to the fixes can be tested
<kerneltoast> fuzzing an -rt kernel exposes these issues
<kerneltoast> we've seen them on non rt but rarely
<fche> maybe avoid nearly-renamed function renaming (get/task/find_utrace_bucket/struct)
<fche> is it closer to operating on an -rt kernel?
<kerneltoast> i renamed them to make it clear that a reference is being acquired
<fche> ok that's something we can reproduce testing them
<fche> yeah
<fche> just makes the diffs bigger, but yeah I see
<kerneltoast> yes this gets -rt kernels closer to working, maybe even fixes all the big -rt issues
<kerneltoast> these are the only issues we have with -rt right now
<kerneltoast> I'm sure there are more
<fche> ok, that could be a big testable win here
<kerneltoast> still haven't confirmed if that latest patch i sent fixes the current utrace bugs
<kerneltoast> this is pretty gnarly stuff
<fche> ok
<kerneltoast> I'll let you know once our fuzzer man tests it
<kerneltoast> i have high hopes for it
<kerneltoast> especially now that i fixed the task work add recursion
amerey has quit [Quit: Leaving]
<agentzh> huh, this patch is HUGE :)
<kerneltoast> agentzh, it keeps growing
<kerneltoast> it used to be a baby and now it's a teenager
orivej has quit [Ping timeout: 246 seconds]