#systemtap on 2020-12-10 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

00:00 lickingball has quit [Remote host closed the connection]

00:36 <kerneltoast> fche, agentzh, i believe i spotted the cause of the probe lock lockup

00:36 <fche> egg selent

00:36 <kerneltoast> https://gist.github.com/kerneltoast/8cb685d94ff74c9cc275bb57d11465ec

00:37 <kerneltoast> it looks like an out of order lock scenario

00:37 <kerneltoast> where the probe lock and zone->lock inside mm are acquired and released out of order

00:37 <kerneltoast> so they deadlock

00:38 <fche> mmmm don't think we should be -taking- any mm locks during probe handler execution

00:38 khaled has quit [Quit: Konversation terminated!]

00:39 <fche> seems like the syscall.exit tracepoint probe handler being run on cpu0

00:39 <kerneltoast> that's the thing

00:39 <kerneltoast> it happens in an irq

00:40 <fche> hm and on cpu5 it's some other tracepoint being hit

00:40 <fche> don't see the role for mm lock

00:42 <kerneltoast> fche, CPU0 is stuck inside free_pcppages_bulk, trying to acquire zone->lock

00:42 <kerneltoast> CPU5 is further along in free_pcppages_bulk, stuck inside a tracepoint, while having zone->lock held

00:43 <fche> hmmmmm!

00:43 <kerneltoast> CPU0 has the probe lock held, and is waiting to get zone->lock

00:43 <kerneltoast> CPU5 has zone->lock held, and is waiting to get the probe lock

00:43 <fche> now the stap side locks (stp_probe_lock) are all designed to have timeouts

00:44 <kerneltoast> yes unless you have the timeouts turned off

00:44 <fche> I think this may be a case of "don't turn timeouts off"

00:44 <kerneltoast> that's not a very elegant solution either

00:44 <kerneltoast> we'll just trylock for a while when in reality we can't hold the lock

00:45 <fche> it'd prevent the (apparent) deadlock at least

00:45 <kerneltoast> disabling irqs while holding the lock would do it too

00:45 <fche> I'm curious how we get into that apic_timer_interrupt on cpu0 .... we normally block interrupts during probe handlers generally

00:46 <kerneltoast> i made a patch for this before but you said it was redundant because of the irq disable in the probe handlers, plus context reentrancy protection

00:46 <kerneltoast> yeah i'm curious as well

00:46 <kerneltoast> STP_INTERRUPT lets you toggle the interrupt blocking iirc

00:47 <fche> wonder if that apic_timer_interrupt thing is like the "nmi: cpu stuck in spinlock"

00:47 <kerneltoast> oh i guess the context reentrancy protection didn't help because this is happening on different CPUs

00:47 <fche> we don't use/document that

00:48 <fche> nothing suspiciuos seems to happen on cpu5: it's just a stap probe handler that happens to run associated with an interrupt handler

00:50 <fche> anything interesting going on at the other cpus?

00:51 <kerneltoast> fche, https://gist.github.com/kerneltoast/8cb685d94ff74c9cc275bb57d11465ec#file-2other-cpus

00:51 <kerneltoast> not really, just some stuckage on the probe lock

00:54 <fche> cpu11

00:54 <kerneltoast> cpu11 is spinning on trying to acquire the lock

00:54 <fche> and 14

00:55 <kerneltoast> yeah same deal

00:55 <fche> and 15

00:55 <kerneltoast> yeeep

00:55 <fche> yeah what gives all those ought to time out etc

00:55 <kerneltoast> unless the test in question has timeouts disabled

00:55 <fche> I don't see that in the tests

00:56 <kerneltoast> it's also possible the backtrace just happened to show them while they were spinning

00:56 <kerneltoast> the real deadlock is that zone->lock is a normal spin_lock

00:56 <kerneltoast> so two cpus are deadbeefed

00:57 <fche> can see how 0 is waiting for 1

00:57 <fche> waiting for 5

00:57 <fche> not seeing how 5 is waiting for anyone

00:57 <kerneltoast> oh you mean if it had the timeout then 5 shoulda given up eventually

00:57 <fche> maybe the machine is not really hung just spinning very very busily

00:57 <fche> yes

00:59 <kerneltoast> and then it spins long enough that everything explodes i suppose

01:00 <kerneltoast> spinning for a while in irqs is bad m'kay

01:00 <fche> not sure I see explosions here, but rather maybe super very crazy slow progress

01:00 <kerneltoast> i couldn't ssh to the machine

01:00 <fche> well ya

01:03 <kerneltoast> re: bulkmode, still getting transport failures. only possible cause is that __stp_relay_subbuf_start_callback() returns 0

01:03 <kerneltoast> seems like we need moar subbuffers

01:04 <kerneltoast> and subwoofers

01:04 <fche> or make them bigger

01:05 <kerneltoast> yeah either way

01:05 <kerneltoast> i don't understand why this subbuffer thingy exists

01:05 <kerneltoast> maybe each subbuffer is kmalloced or something

01:05 <kerneltoast> and making it too big is bad

01:08 <kerneltoast> oh the subbuffers are vmapped

01:08 <kerneltoast> i guess you need multiple subbuffers for flushing purposes

01:08 <kerneltoast> if it takes too long for the reader to flush out a filled subbuffer, you're screwed

01:08 <fche> at least one for the probe to write into

01:08 <fche> and others for userspace to read from

01:09 <fche> screwed = missing some output, at worst, not the worst thing

01:09 <kerneltoast> yeah but i think increasing the subbuffer count is the better option

01:15 <kerneltoast> oh i think i see what's wrong

01:15 <kerneltoast> the subbuffers are way too big, and we only have 8 of them

01:15 <kerneltoast> if stp_print_flush needs to flush out a buffer that isn't full, access to that big unfilled buffer is lost until userspace reads out the data

01:16 <kerneltoast> static unsigned _stp_nsubbufs = 8;

01:16 <kerneltoast> static unsigned _stp_subbuf_size = 65536*4;

01:16 <kerneltoast> those subbuffers are bigly

01:16 <kerneltoast> let's make them match the log buffer size

01:25 <kerneltoast> ok i'm trying this:

01:25 <kerneltoast> static unsigned _stp_subbuf_size = STP_BUFFER_SIZE;

01:25 <kerneltoast> static unsigned _stp_nsubbufs = 256;

01:25 <kerneltoast> same amount of memory allocated as before

01:25 <kerneltoast> but divided among moar sub-buffers

01:39 <agentzh> fche kerneltoast: so the conclusion is to let probe lock spinning for a while in irq context? that sounds unacceptable to me too...

01:39 <agentzh> it can literally be called soft lockup...

01:39 <agentzh> though it may not be forever.

01:42 <fche> it should be on the order of milliseconds, not tens of seconds

01:45 <fche> oohhhhhhhh.....

01:45 <fche> ummmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm

01:45 <fche> where is my brown paper bag

01:45 <fche> do you have one for me?

01:45 <fche> wait

01:51 orivej has quit [Ping timeout: 258 seconds]

01:52 <fche> https://web.elastic.org/~fche/IMG_1693.jpg

01:55 <fche> do you want me to fix it or show you what I figured out so you can laugh at me some more

01:57 <fche> https://paste.centos.org/view/90e5d067

01:58 <fche> due to bad code generation (boo me), there's an infinite loop - the stp_lock_probe that's called just before evaluating dependent probe conditional expressions

01:58 <fche> it does a 'goto out;' in case of a locking error

01:58 <fche> unfortunately

01:58 <fche> there is an out: label just above

01:58 <fche> so ... infinite loop

01:59 <fche> the intent was to jump FORWARD to a subsequent out: label

02:05 <fche> and that explains why it's this particular test that triggers it

02:05 <fche> anyway will fix it

02:20 <agentzh> ah, xmas for me...

02:20 <agentzh> we lost 3 customers due to this!!!

02:21 <agentzh> :D

03:00 irker760 has quit [Quit: transmission timeout]

03:13 derek0883 has quit [Remote host closed the connection]

03:13 <kerneltoast> ahahahaha w o w

03:13 <kerneltoast> i wonder why i never noticed that

03:14 <kerneltoast> I looked at that generated stap code a lot when hunting this bug

03:15 <kerneltoast> agentzh, time to convince our customers to come back?

03:15 <fche> https://paste.centos.org/view/bca99f51 <<< try this guys

03:17 <kerneltoast> hmm I've got the testsuite burning through agentzh's box right now to get results for the bulkmode stuff

03:18 <fche> kill it restart it with this :)

03:18 <kerneltoast> hah

03:18 <fche> it's surviving my onthefly testing here but I'd love confirmation

03:18 <kerneltoast> I'll kill it and restart in parallel mode with that

03:18 <fche> BRAVE

03:18 <fche> no wait

03:18 derek0883 has joined #systemtap

03:18 <fche> <yes, minister> that is a courageous choice </?

03:23 <kerneltoast> what does the sign on your door say

03:23 <kerneltoast> something about a lair

03:24 <kerneltoast> caution, there's a man writing infinite loops inside?

03:24 <fche> The Frank's Banshee's Lair! The Abode Of Audial Awfulness

03:24 <fche> https://sourceware.org/bugzilla/show_bug.cgi?id=27044

03:25 <kerneltoast> wow, introduced in august

03:25 <kerneltoast> pretty recent

03:25 <fche> yeah, stap 4.3 should not be affected

03:25 <kerneltoast> agentzh, our 3 customers ran away pretty fast huh

03:25 <fche> ALL THREE ? :)

03:26 <kerneltoast> "we've made hundreds in profits! HUNDREDS!"

03:27 <kerneltoast> "there are dozens of us!" <-- linux users

03:29 <kerneltoast> full steam parallel testing engaged

03:29 <kerneltoast> the future is now

03:29 <fche> I cannot guarantee all related problems are solved

03:29 <fche> I can only guarantee

03:29 <fche> well

03:29 <fche> nothing

03:29 <kerneltoast> is there a warranty guaranteed?

03:30 <fche> only to our three paying customers

03:30 <kerneltoast> wow look at you with your three paying customers, braggart

03:30 <fche> give or take

03:32 <kerneltoast> i'll take please

03:36 <kerneltoast> will all these fixes warrant cutting a new release?

03:41 <fche> dunno, distros can/do backport important fixes between releases

03:41 <fche> ya know my testing here is looking good, I'll push that to master now

03:41 <fche> let the buildbots confirm while I sleep like a happy pudgy little baby

03:42 irker960 has joined #systemtap

03:42 <irker960> systemtap: fche systemtap.git:master * release-4.4-27-ge3287bddc / translate.cxx: PR27044: fix lock loop for conditional probes

03:45 <kerneltoast> the distros can have fun porting my print stuff

03:45 * kerneltoast laughs in code refactoring

03:46 <fche> umyeah

04:00 derek0883 has quit [Remote host closed the connection]

04:11 <kerneltoast> fche, it didn't die

04:11 <kerneltoast> also does the parallel testing run fewer tests?

04:13 derek0883 has joined #systemtap

04:14 <fche> not as far as I know

04:14 * fche says confidently, as he's about to go zzzzzzzzzzz

04:14 <fche> dreaming of starships blowing up

04:15 <kerneltoast> hah i'm staying awake for the new star trek episode tonight

04:15 <kerneltoast> so i really will be dreaming of starships going boom

04:15 <fche> my lawyers have advised me not to comment on political subjects

04:17 <kerneltoast> the same lawyers who told you to provide warranty to your 3 customers?

04:18 <fche> my lips are sealed

05:05 <kerneltoast> fche, it looks like parallel mode runs fewer tests

05:05 <kerneltoast> by about 3000

05:08 <kerneltoast> here's the latest bulkmode patch with parallel testsuite diff at the bottom: https://gist.github.com/kerneltoast/f6a1190b3b0916fd63cb3282ba9a2ba7

05:20 derek0883 has quit [Remote host closed the connection]

05:27 derek0883 has joined #systemtap

05:33 <agentzh> kerneltoast: re "time to convince our customers to come back?" that's exactly what i'm gonna do! begging them to come back :D

05:39 <agentzh> kerneltoast: re 3k less tests in parallel mode, wow, that's new to me.

06:04 <agentzh> kerneltoast: so fche's probe lock infinite loop bug is not in our private branch?

06:04 <agentzh> it seems?

06:04 <agentzh> if that's the case, then the probe lock deadlock is not the reason for our 3 customers.

06:04 <agentzh> but something else.

06:05 <agentzh> like earlier panics/freezes you fixed.

06:16 demon000_ has joined #systemtap

06:31 tonyj has quit [Remote host closed the connection]

06:42 irker960 has quit [Quit: transmission timeout]

07:32 <kerneltoast> agentzh, yeah i tried to diff a parallel run to a serial run and the parallel run had 5000 tests completed, while serial had 8000

07:35 <kerneltoast> i think our private branch was too old to have the probe lock lockup. I only ran into it running the full testsuite on stap master. I think I've run into mutex_trylock issues on our private branch though

07:36 <kerneltoast> our 3 former customers must've faced one of the other issues i fixed (the in_atomic() panic was most common on our private branch for me), or one of the print bugs I've fixed

07:38 <kerneltoast> we're lucky fche caught the probe lock lockup right now, or we might've been screwed after merging stap master into our own branch :P

07:38 <agentzh> yes, indeed!

07:38 <agentzh> so fche made a point in forcing you to run the fat test suite ;)

07:39 <agentzh> just to catch other's mistakes.

07:39 <agentzh> in this case, fche's own.

07:54 tux3_ has joined #systemtap

07:55 tux3 has quit [Read error: Connection reset by peer]

07:56 tux3_ has joined #systemtap

07:56 tux3_ has quit [Changing host]

08:02 demon000_ has quit [Ping timeout: 272 seconds]

08:02 khaled has joined #systemtap

08:10 derek0883 has quit [Remote host closed the connection]

08:14 demon000_ has joined #systemtap

08:33 tux3_ has quit [Quit: ZNC - https://znc.in]

08:34 tux3 has joined #systemtap

08:34 tux3 has quit [Changing host]

08:34 tux3 has joined #systemtap

08:41 derek0883 has joined #systemtap

08:52 derek0883 has quit [Remote host closed the connection]

09:04 derek0883 has joined #systemtap

09:15 derek0883 has quit [Remote host closed the connection]

09:34 demon000_ has quit [Remote host closed the connection]

09:34 demon000_ has joined #systemtap

09:52 orivej has joined #systemtap

11:21 mjw has joined #systemtap

11:47 wcohen has quit [Remote host closed the connection]

12:02 wcohen has joined #systemtap

12:45 orivej has quit [Ping timeout: 260 seconds]

12:45 orivej has joined #systemtap

14:18 tromey has joined #systemtap

14:39 orivej_ has joined #systemtap

14:40 orivej has quit [Ping timeout: 256 seconds]

15:03 amerey has joined #systemtap

15:48 derek0883 has joined #systemtap

16:01 orivej_ has quit [Ping timeout: 246 seconds]

16:02 derek0883 has quit [Remote host closed the connection]

16:17 tonyj has joined #systemtap

16:26 demon000__ has joined #systemtap

16:28 demon000_ has quit [Read error: Connection reset by peer]

16:41 derek0883 has joined #systemtap

16:46 derek0883 has quit [Ping timeout: 264 seconds]

17:32 derek0883 has joined #systemtap

17:50 derek0883 has quit [Remote host closed the connection]

17:50 derek0883 has joined #systemtap

18:08 derek0883 has quit [Remote host closed the connection]

18:11 derek0883 has joined #systemtap

18:38 sscox has quit [Quit: sscox]

18:41 sscox has joined #systemtap

18:49 <kerneltoast> fche, yo

18:50 <fche> yo

18:50 <kerneltoast> how did this look? https://gist.github.com/kerneltoast/f6a1190b3b0916fd63cb3282ba9a2ba7

18:55 <fche> code looks good, curious why the test results lack the tracepoint_onthefly test

18:55 <kerneltoast> yeah i dunno

18:55 <kerneltoast> that's parallel

18:55 <fche> could do an auxiliary run just with make installcheck RUNTESTFLAGS=tracepoint_onthefly.exp just for completeness

18:56 <kerneltoast> you want to see if it freezes or if the test runs successfully?

18:56 <fche> yes

18:56 <fche> :-)(

18:57 <kerneltoast> i see you like inclusive or

18:57 <kerneltoast> okay tracepoint_unzippedfly.exp is running

19:05 sscox has quit [Quit: sscox]

19:06 <kerneltoast> fche, https://paste.centos.org/view/14a3dd34

19:07 <fche> ok

19:07 <fche> nice job dude

19:07 <fche> lgm

19:07 <fche> lgtm

19:07 <fche> mlg

19:07 <fche> me looks good yeah I like that

19:07 <kerneltoast> yes you looks good with brown paper bag

19:08 <fche> try it, could be a new fashion trend

19:08 <kerneltoast> budget PPE

19:10 irker820 has joined #systemtap

19:10 <irker820> systemtap: sultan systemtap.git:master * release-4.4-28-g8819e2a04 / runtime/print_flush.c runtime/transport/relay_v2.c runtime/transport/transport.c runtime/transport/transport.h staprun/relay.c: always use per-cpu bulkmode relayfs files to communicate with userspace

19:10 <irker820> systemtap: sultan systemtap.git:master * release-4.4-29-gd86b64029 / tapset-timers.cxx: Revert "REVERTME: tapset-timers: work around on-the-fly deadlocks caused by mutex_trylock"

19:10 <kerneltoast> 🚢🚢🚢

19:15 derek0883 has quit [Remote host closed the connection]

19:16 sscox has joined #systemtap

19:18 derek0883 has joined #systemtap

19:50 * serhei considers potential proliferation of %( runtime != bpf conditionals in the main tapset code

19:50 <serhei> since I'll be making many more tapset functions, I think what I'll do instead of creating a ton of functions like

19:51 <serhei> function foo () %( runtime != bpf %? nonbpf %: bpf %)

19:51 <serhei> in the toplevel tapset/ folder

19:51 <serhei> instead have the toplevel tapsets contains

19:51 <serhei> %( runtime != bpf %? function foo() {} function bar() {} ... %)

19:52 <serhei> and put the bpf implementation into tapset/bpf/

19:52 <serhei> which previously contained only functions mirroring the ones under tapset/linux/

19:53 <serhei> giving ample notice to allow people (primarly fche) to bikeshed

19:58 <fche> do we need the toplevel ones at all/

19:59 <serhei> hmm

20:00 <serhei> the way this started was that toplevel existed

20:00 <serhei> then jistone moved the tapset functions with backend-specific implementations into linux/ and dyninst/

20:00 <serhei> and kept all other ones in place

20:00 <fche> the rascal

20:01 <serhei> then when bpf backend was written, the functions that were in linux/ grew counterparts in bpf/ and the functions that stayed at the top level grew little %( runtime != bpf %? branches

20:02 <serhei> the categories of tapset functions that exist now are

20:02 <serhei> - same implementation everywhere

20:02 <serhei> - same implementation on linux and dyninst, but not bpf

20:03 <serhei> - different implementation per each backend

20:09 <fche> ok

20:09 <fche> it seems as though category 2 examples could devolve into category 3 (with symlinks or some other trickery to make cross-references)

20:10 <fche> or if few in number, do the %( runtime %) trick

20:10 <fche> do you have a sample function name for each category?

20:10 <serhei> ah. symlinks would be some new feature whereby in e.g. dyninst or bpf uconversions.stp you would have

20:11 <serhei> function foo() %same_as_linux

20:11 <serhei> ?

20:12 <serhei> fche: category1 is user_string; category2 is user_string_n; category3 is kernel_string_n

20:12 <fche> symlinks could be physical symlinks

20:12 <serhei> ahh

20:12 <serhei> git supports symlinks

20:13 <fche> yeah ln -s ../linux/foo.stp bpf/foo.stp

20:13 * serhei just assumed function level rather than file level

20:13 <serhei> that also works

20:13 <fche> yeah depends on the granularity

20:14 * serhei looks forward to people editing a symlinked file by mistake

20:14 <fche> not too interested in a new parser level construct like that %same_as_linux

20:14 <serhei> and then realizing the mistake before they commit of course :)

20:14 <serhei> fche, me neither

20:21 <fche> ok.

20:21 <fche> so anyway we have a range of tools (conditionals, symlinks, wrapper functions ...) -- it's just a matter of tastefully choosing among them

20:22 <fche> go for it

20:22 <serhei> will do

20:28 orivej has joined #systemtap

20:31 derek0883 has quit [Remote host closed the connection]

20:32 derek0883 has joined #systemtap

20:41 <agentzh> serhei: always glad to see you're working on the bpf runtime :)

20:42 <agentzh> kerneltoast fche: so it's already xmas?

20:42 <agentzh> no known deadlocks and panics right now?

20:43 <kerneltoast> yep

20:43 <agentzh> hooray!

20:43 <agentzh> it took us so long to get here.

20:43 <agentzh> but finally.

20:43 <kerneltoast> only 3 months

20:43 <agentzh> so happy.

20:43 <agentzh> lol

20:43 <fche> well ... "known" is contingent

20:43 <fche> but yeah real nice progress!

20:44 <kerneltoast> we do know that the unknown panics and deadlocks are rare

20:44 <kerneltoast> because we haven't seen them yet :P

20:44 <agentzh> so we have to run those tracepont_ontheflay.exp tests separately after running the whole thing in parallel?

20:44 <agentzh> that sounds like a bug in the test scaffold? fche?

20:45 <kerneltoast> agreed, parallel should be running all the tests that serial does

20:45 <agentzh> aye

20:45 <agentzh> kerneltoast: so next we should merge the latest master into our private branch and then i'll go begging our lost customers.

20:46 <kerneltoast> agentzh, sounds like a plan

20:46 <agentzh> great

20:46 <kerneltoast> time to bust out the exotic drinks and celebrate

20:48 <fche> agentzh, they should be the same set of tests; haven't looked into why it might have been skipped

20:50 <agentzh> kerneltoast: yeah, indeed. sadly i can't guy them for you personally ;)

20:56 derek0883 has quit [Remote host closed the connection]

20:57 <agentzh> fche: looking forward to your buildbots running tests in parallel, so maybe you can reprodeuce it.

20:57 <fche> we'll look into it

20:57 <agentzh> thanks

20:57 <fche> though our buildbots ares smallish VMs rather than beefy tens-of-cores hardware

20:58 <agentzh> time to upgrade! :D

20:59 <agentzh> we're running intel core i9-9900k and amd ryzen threadripper 3970x :D

20:59 <fche> must be nice to have budget :)

20:59 <agentzh> heh

20:59 <agentzh> amd is wiling to provide latest chips for free if we write about them in public

21:00 <fche> DUDE

21:01 sscox has quit [Read error: Connection reset by peer]

21:04 derek0883 has joined #systemtap

21:07 <kerneltoast> wow

21:08 <kerneltoast> i didn't know amd's latest chips existed

21:08 <kerneltoast> SOLDOUTSOLDOUTSOLDOUT

21:08 <kerneltoast> everywhere you go

21:09 sscox has joined #systemtap

21:13 tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]

21:19 derek0883 has quit [Ping timeout: 260 seconds]

21:46 derek0883 has joined #systemtap

22:22 demon000__ has quit [Ping timeout: 258 seconds]

22:25 demon000_ has joined #systemtap

22:29 <demon000_> @kerneltoast, you didn't know they releaed 5th gen?

22:29 <kerneltoast> demon000_, you can't buy it in the US, they got immediately sold out to scalpers

22:47 <demon000_> @kerneltoast, well seems like it's sold out in Romania too

22:47 <demon000_> it was available a few weeks ago

22:47 <kerneltoast> hah

22:47 <demon000_> and that was the CPU i was gonna build my new PC around :(

23:03 amerey has quit [Quit: Leaving]

23:56 derek0883 has quit [Remote host closed the connection]