fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
lickingball has quit [Remote host closed the connection]
<kerneltoast> fche, agentzh, i believe i spotted the cause of the probe lock lockup
<fche> egg selent
<kerneltoast> it looks like an out of order lock scenario
<kerneltoast> where the probe lock and zone->lock inside mm are acquired and released out of order
<kerneltoast> so they deadlock
<fche> mmmm don't think we should be -taking- any mm locks during probe handler execution
khaled has quit [Quit: Konversation terminated!]
<fche> seems like the syscall.exit tracepoint probe handler being run on cpu0
<kerneltoast> that's the thing
<kerneltoast> it happens in an irq
<fche> hm and on cpu5 it's some other tracepoint being hit
<fche> don't see the role for mm lock
<kerneltoast> fche, CPU0 is stuck inside free_pcppages_bulk, trying to acquire zone->lock
<kerneltoast> CPU5 is further along in free_pcppages_bulk, stuck inside a tracepoint, while having zone->lock held
<fche> hmmmmm!
<kerneltoast> CPU0 has the probe lock held, and is waiting to get zone->lock
<kerneltoast> CPU5 has zone->lock held, and is waiting to get the probe lock
<fche> now the stap side locks (stp_probe_lock) are all designed to have timeouts
<kerneltoast> yes unless you have the timeouts turned off
<fche> I think this may be a case of "don't turn timeouts off"
<kerneltoast> that's not a very elegant solution either
<kerneltoast> we'll just trylock for a while when in reality we can't hold the lock
<fche> it'd prevent the (apparent) deadlock at least
<kerneltoast> disabling irqs while holding the lock would do it too
<fche> I'm curious how we get into that apic_timer_interrupt on cpu0 .... we normally block interrupts during probe handlers generally
<kerneltoast> i made a patch for this before but you said it was redundant because of the irq disable in the probe handlers, plus context reentrancy protection
<kerneltoast> yeah i'm curious as well
<kerneltoast> STP_INTERRUPT lets you toggle the interrupt blocking iirc
<fche> wonder if that apic_timer_interrupt thing is like the "nmi: cpu stuck in spinlock"
<kerneltoast> oh i guess the context reentrancy protection didn't help because this is happening on different CPUs
<fche> we don't use/document that
<fche> nothing suspiciuos seems to happen on cpu5: it's just a stap probe handler that happens to run associated with an interrupt handler
<fche> anything interesting going on at the other cpus?
<kerneltoast> not really, just some stuckage on the probe lock
<fche> cpu11
<kerneltoast> cpu11 is spinning on trying to acquire the lock
<fche> and 14
<kerneltoast> yeah same deal
<fche> and 15
<kerneltoast> yeeep
<fche> yeah what gives all those ought to time out etc
<kerneltoast> unless the test in question has timeouts disabled
<fche> I don't see that in the tests
<kerneltoast> it's also possible the backtrace just happened to show them while they were spinning
<kerneltoast> the real deadlock is that zone->lock is a normal spin_lock
<kerneltoast> so two cpus are deadbeefed
<fche> can see how 0 is waiting for 1
<fche> waiting for 5
<fche> not seeing how 5 is waiting for anyone
<kerneltoast> oh you mean if it had the timeout then 5 shoulda given up eventually
<fche> maybe the machine is not really hung just spinning very very busily
<fche> yes
<kerneltoast> and then it spins long enough that everything explodes i suppose
<kerneltoast> spinning for a while in irqs is bad m'kay
<fche> not sure I see explosions here, but rather maybe super very crazy slow progress
<kerneltoast> i couldn't ssh to the machine
<fche> well ya
<kerneltoast> re: bulkmode, still getting transport failures. only possible cause is that __stp_relay_subbuf_start_callback() returns 0
<kerneltoast> seems like we need moar subbuffers
<kerneltoast> and subwoofers
<fche> or make them bigger
<kerneltoast> yeah either way
<kerneltoast> i don't understand why this subbuffer thingy exists
<kerneltoast> maybe each subbuffer is kmalloced or something
<kerneltoast> and making it too big is bad
<kerneltoast> oh the subbuffers are vmapped
<kerneltoast> i guess you need multiple subbuffers for flushing purposes
<kerneltoast> if it takes too long for the reader to flush out a filled subbuffer, you're screwed
<fche> at least one for the probe to write into
<fche> and others for userspace to read from
<fche> screwed = missing some output, at worst, not the worst thing
<kerneltoast> yeah but i think increasing the subbuffer count is the better option
<kerneltoast> oh i think i see what's wrong
<kerneltoast> the subbuffers are way too big, and we only have 8 of them
<kerneltoast> if stp_print_flush needs to flush out a buffer that isn't full, access to that big unfilled buffer is lost until userspace reads out the data
<kerneltoast> static unsigned _stp_nsubbufs = 8;
<kerneltoast> static unsigned _stp_subbuf_size = 65536*4;
<kerneltoast> those subbuffers are bigly
<kerneltoast> let's make them match the log buffer size
<kerneltoast> ok i'm trying this:
<kerneltoast> static unsigned _stp_subbuf_size = STP_BUFFER_SIZE;
<kerneltoast> static unsigned _stp_nsubbufs = 256;
<kerneltoast> same amount of memory allocated as before
<kerneltoast> but divided among moar sub-buffers
<agentzh> fche kerneltoast: so the conclusion is to let probe lock spinning for a while in irq context? that sounds unacceptable to me too...
<agentzh> it can literally be called soft lockup...
<agentzh> though it may not be forever.
<fche> it should be on the order of milliseconds, not tens of seconds
<fche> oohhhhhhhh.....
<fche> ummmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
<fche> where is my brown paper bag
<fche> do you have one for me?
<fche> wait
orivej has quit [Ping timeout: 258 seconds]
<fche> do you want me to fix it or show you what I figured out so you can laugh at me some more
<fche> due to bad code generation (boo me), there's an infinite loop - the stp_lock_probe that's called just before evaluating dependent probe conditional expressions
<fche> it does a 'goto out;' in case of a locking error
<fche> unfortunately
<fche> there is an out: label just above
<fche> so ... infinite loop
<fche> the intent was to jump FORWARD to a subsequent out: label
<fche> and that explains why it's this particular test that triggers it
<fche> anyway will fix it
<agentzh> ah, xmas for me...
<agentzh> we lost 3 customers due to this!!!
<agentzh> :D
irker760 has quit [Quit: transmission timeout]
derek0883 has quit [Remote host closed the connection]
<kerneltoast> ahahahaha w o w
<kerneltoast> i wonder why i never noticed that
<kerneltoast> I looked at that generated stap code a lot when hunting this bug
<kerneltoast> agentzh, time to convince our customers to come back?
<kerneltoast> hmm I've got the testsuite burning through agentzh's box right now to get results for the bulkmode stuff
<fche> kill it restart it with this :)
<kerneltoast> hah
<fche> it's surviving my onthefly testing here but I'd love confirmation
<kerneltoast> I'll kill it and restart in parallel mode with that
<fche> BRAVE
<fche> no wait
derek0883 has joined #systemtap
<fche> <yes, minister> that is a courageous choice </?
<kerneltoast> what does the sign on your door say
<kerneltoast> something about a lair
<kerneltoast> caution, there's a man writing infinite loops inside?
<fche> The Frank's Banshee's Lair! The Abode Of Audial Awfulness
<kerneltoast> wow, introduced in august
<kerneltoast> pretty recent
<fche> yeah, stap 4.3 should not be affected
<kerneltoast> agentzh, our 3 customers ran away pretty fast huh
<fche> ALL THREE ? :)
<kerneltoast> "we've made hundreds in profits! HUNDREDS!"
<kerneltoast> "there are dozens of us!" <-- linux users
<kerneltoast> full steam parallel testing engaged
<kerneltoast> the future is now
<fche> I cannot guarantee all related problems are solved
<fche> I can only guarantee
<fche> well
<fche> nothing
<kerneltoast> is there a warranty guaranteed?
<fche> only to our three paying customers
<kerneltoast> wow look at you with your three paying customers, braggart
<fche> give or take
<kerneltoast> i'll take please
<kerneltoast> will all these fixes warrant cutting a new release?
<fche> dunno, distros can/do backport important fixes between releases
<fche> ya know my testing here is looking good, I'll push that to master now
<fche> let the buildbots confirm while I sleep like a happy pudgy little baby
irker960 has joined #systemtap
<irker960> systemtap: fche systemtap.git:master * release-4.4-27-ge3287bddc / translate.cxx: PR27044: fix lock loop for conditional probes
<kerneltoast> the distros can have fun porting my print stuff
* kerneltoast laughs in code refactoring
<fche> umyeah
derek0883 has quit [Remote host closed the connection]
<kerneltoast> fche, it didn't die
<kerneltoast> also does the parallel testing run fewer tests?
derek0883 has joined #systemtap
<fche> not as far as I know
* fche says confidently, as he's about to go zzzzzzzzzzz
<fche> dreaming of starships blowing up
<kerneltoast> hah i'm staying awake for the new star trek episode tonight
<kerneltoast> so i really will be dreaming of starships going boom
<fche> my lawyers have advised me not to comment on political subjects
<kerneltoast> the same lawyers who told you to provide warranty to your 3 customers?
<fche> my lips are sealed
<kerneltoast> fche, it looks like parallel mode runs fewer tests
<kerneltoast> by about 3000
<kerneltoast> here's the latest bulkmode patch with parallel testsuite diff at the bottom: https://gist.github.com/kerneltoast/f6a1190b3b0916fd63cb3282ba9a2ba7
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<agentzh> kerneltoast: re "time to convince our customers to come back?" that's exactly what i'm gonna do! begging them to come back :D
<agentzh> kerneltoast: re 3k less tests in parallel mode, wow, that's new to me.
<agentzh> kerneltoast: so fche's probe lock infinite loop bug is not in our private branch?
<agentzh> it seems?
<agentzh> if that's the case, then the probe lock deadlock is not the reason for our 3 customers.
<agentzh> but something else.
<agentzh> like earlier panics/freezes you fixed.
demon000_ has joined #systemtap
tonyj has quit [Remote host closed the connection]
irker960 has quit [Quit: transmission timeout]
<kerneltoast> agentzh, yeah i tried to diff a parallel run to a serial run and the parallel run had 5000 tests completed, while serial had 8000
<kerneltoast> i think our private branch was too old to have the probe lock lockup. I only ran into it running the full testsuite on stap master. I think I've run into mutex_trylock issues on our private branch though
<kerneltoast> our 3 former customers must've faced one of the other issues i fixed (the in_atomic() panic was most common on our private branch for me), or one of the print bugs I've fixed
<kerneltoast> we're lucky fche caught the probe lock lockup right now, or we might've been screwed after merging stap master into our own branch :P
<agentzh> yes, indeed!
<agentzh> so fche made a point in forcing you to run the fat test suite ;)
<agentzh> just to catch other's mistakes.
<agentzh> in this case, fche's own.
tux3_ has joined #systemtap
tux3 has quit [Read error: Connection reset by peer]
tux3_ has joined #systemtap
tux3_ has quit [Changing host]
demon000_ has quit [Ping timeout: 272 seconds]
khaled has joined #systemtap
derek0883 has quit [Remote host closed the connection]
demon000_ has joined #systemtap
tux3_ has quit [Quit: ZNC - https://znc.in]
tux3 has joined #systemtap
tux3 has quit [Changing host]
tux3 has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
demon000_ has quit [Remote host closed the connection]
demon000_ has joined #systemtap
orivej has joined #systemtap
mjw has joined #systemtap
wcohen has quit [Remote host closed the connection]
wcohen has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
tromey has joined #systemtap
orivej_ has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
amerey has joined #systemtap
derek0883 has joined #systemtap
orivej_ has quit [Ping timeout: 246 seconds]
derek0883 has quit [Remote host closed the connection]
tonyj has joined #systemtap
demon000__ has joined #systemtap
demon000_ has quit [Read error: Connection reset by peer]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 264 seconds]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
sscox has quit [Quit: sscox]
sscox has joined #systemtap
<kerneltoast> fche, yo
<fche> yo
<fche> code looks good, curious why the test results lack the tracepoint_onthefly test
<kerneltoast> yeah i dunno
<kerneltoast> that's parallel
<fche> could do an auxiliary run just with make installcheck RUNTESTFLAGS=tracepoint_onthefly.exp just for completeness
<kerneltoast> you want to see if it freezes or if the test runs successfully?
<fche> yes
<fche> :-)(
<kerneltoast> i see you like inclusive or
<kerneltoast> okay tracepoint_unzippedfly.exp is running
sscox has quit [Quit: sscox]
<fche> ok
<fche> nice job dude
<fche> lgm
<fche> lgtm
<fche> mlg
<fche> me looks good yeah I like that
<kerneltoast> yes you looks good with brown paper bag
<fche> try it, could be a new fashion trend
<kerneltoast> budget PPE
irker820 has joined #systemtap
<irker820> systemtap: sultan systemtap.git:master * release-4.4-28-g8819e2a04 / runtime/print_flush.c runtime/transport/relay_v2.c runtime/transport/transport.c runtime/transport/transport.h staprun/relay.c: always use per-cpu bulkmode relayfs files to communicate with userspace
<irker820> systemtap: sultan systemtap.git:master * release-4.4-29-gd86b64029 / tapset-timers.cxx: Revert "REVERTME: tapset-timers: work around on-the-fly deadlocks caused by mutex_trylock"
<kerneltoast> 🚢🚢🚢
derek0883 has quit [Remote host closed the connection]
sscox has joined #systemtap
derek0883 has joined #systemtap
* serhei considers potential proliferation of %( runtime != bpf conditionals in the main tapset code
<serhei> since I'll be making many more tapset functions, I think what I'll do instead of creating a ton of functions like
<serhei> function foo () %( runtime != bpf %? nonbpf %: bpf %)
<serhei> in the toplevel tapset/ folder
<serhei> instead have the toplevel tapsets contains
<serhei> %( runtime != bpf %? function foo() {} function bar() {} ... %)
<serhei> and put the bpf implementation into tapset/bpf/
<serhei> which previously contained only functions mirroring the ones under tapset/linux/
<serhei> giving ample notice to allow people (primarly fche) to bikeshed
<fche> do we need the toplevel ones at all/
<serhei> hmm
<serhei> the way this started was that toplevel existed
<serhei> then jistone moved the tapset functions with backend-specific implementations into linux/ and dyninst/
<serhei> and kept all other ones in place
<fche> the rascal
<serhei> then when bpf backend was written, the functions that were in linux/ grew counterparts in bpf/ and the functions that stayed at the top level grew little %( runtime != bpf %? branches
<serhei> the categories of tapset functions that exist now are
<serhei> - same implementation everywhere
<serhei> - same implementation on linux and dyninst, but not bpf
<serhei> - different implementation per each backend
<fche> ok
<fche> it seems as though category 2 examples could devolve into category 3 (with symlinks or some other trickery to make cross-references)
<fche> or if few in number, do the %( runtime %) trick
<fche> do you have a sample function name for each category?
<serhei> ah. symlinks would be some new feature whereby in e.g. dyninst or bpf uconversions.stp you would have
<serhei> function foo() %same_as_linux
<serhei> ?
<serhei> fche: category1 is user_string; category2 is user_string_n; category3 is kernel_string_n
<fche> symlinks could be physical symlinks
<serhei> ahh
<serhei> git supports symlinks
<fche> yeah ln -s ../linux/foo.stp bpf/foo.stp
* serhei just assumed function level rather than file level
<serhei> that also works
<fche> yeah depends on the granularity
* serhei looks forward to people editing a symlinked file by mistake
<fche> not too interested in a new parser level construct like that %same_as_linux
<serhei> and then realizing the mistake before they commit of course :)
<serhei> fche, me neither
<fche> ok.
<fche> so anyway we have a range of tools (conditionals, symlinks, wrapper functions ...) -- it's just a matter of tastefully choosing among them
<fche> go for it
<serhei> will do
orivej has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<agentzh> serhei: always glad to see you're working on the bpf runtime :)
<agentzh> kerneltoast fche: so it's already xmas?
<agentzh> no known deadlocks and panics right now?
<kerneltoast> yep
<agentzh> hooray!
<agentzh> it took us so long to get here.
<agentzh> but finally.
<kerneltoast> only 3 months
<agentzh> so happy.
<agentzh> lol
<fche> well ... "known" is contingent
<fche> but yeah real nice progress!
<kerneltoast> we do know that the unknown panics and deadlocks are rare
<kerneltoast> because we haven't seen them yet :P
<agentzh> so we have to run those tracepont_ontheflay.exp tests separately after running the whole thing in parallel?
<agentzh> that sounds like a bug in the test scaffold? fche?
<kerneltoast> agreed, parallel should be running all the tests that serial does
<agentzh> aye
<agentzh> kerneltoast: so next we should merge the latest master into our private branch and then i'll go begging our lost customers.
<kerneltoast> agentzh, sounds like a plan
<agentzh> great
<kerneltoast> time to bust out the exotic drinks and celebrate
<fche> agentzh, they should be the same set of tests; haven't looked into why it might have been skipped
<agentzh> kerneltoast: yeah, indeed. sadly i can't guy them for you personally ;)
derek0883 has quit [Remote host closed the connection]
<agentzh> fche: looking forward to your buildbots running tests in parallel, so maybe you can reprodeuce it.
<fche> we'll look into it
<agentzh> thanks
<fche> though our buildbots ares smallish VMs rather than beefy tens-of-cores hardware
<agentzh> time to upgrade! :D
<agentzh> we're running intel core i9-9900k and amd ryzen threadripper 3970x :D
<fche> must be nice to have budget :)
<agentzh> heh
<agentzh> amd is wiling to provide latest chips for free if we write about them in public
<fche> DUDE
sscox has quit [Read error: Connection reset by peer]
derek0883 has joined #systemtap
<kerneltoast> wow
<kerneltoast> i didn't know amd's latest chips existed
<kerneltoast> SOLDOUTSOLDOUTSOLDOUT
<kerneltoast> everywhere you go
sscox has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
derek0883 has quit [Ping timeout: 260 seconds]
derek0883 has joined #systemtap
demon000__ has quit [Ping timeout: 258 seconds]
demon000_ has joined #systemtap
<demon000_> @kerneltoast, you didn't know they releaed 5th gen?
<kerneltoast> demon000_, you can't buy it in the US, they got immediately sold out to scalpers
<demon000_> @kerneltoast, well seems like it's sold out in Romania too
<demon000_> it was available a few weeks ago
<kerneltoast> hah
<demon000_> and that was the CPU i was gonna build my new PC around :(
amerey has quit [Quit: Leaving]
derek0883 has quit [Remote host closed the connection]