#systemtap on 2021-01-13 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

00:11 amerey has quit [Quit: Leaving]

00:33 <fche> hey kerneltoast buddy, that patch makes this test case very happy, now running a larger suite

00:34 <kerneltoast> ooh

00:34 <kerneltoast> noice

00:34 <kerneltoast> still need to check that it won't drop messages upon module unload

00:35 <kerneltoast> via a partially filled subbuf

00:35 <fche> well, if a module is being unloaded, nothing is listening for messages, so dropping them is fine

00:36 <kerneltoast> the scenario i was thinking was filling the subbuf to the brim while the module is running, and then it's unloaded

00:36 <kerneltoast> an easy test case would be a single small printf actually

00:37 <kerneltoast> I'm not at my laptop right now but you can try that out

00:37 <kerneltoast> the printf needs to print something smaller than the size of a subbuf

00:39 <kerneltoast> basically, just see if a hello world stap module works

00:40 <fche> it does

00:41 <kerneltoast> noice

00:41 <kerneltoast> to confirm, you're using my second paste, right?

00:41 <fche> https://paste.centos.org/view/raw/7e738827

00:41 <kerneltoast> noice

00:42 <kerneltoast> that went better than expected

01:14 <fche> thoug

01:16 <fche> though this version appears to result in content being stuck in subbufs or whatnot, without waking up the userspace until later

01:16 <kerneltoast> yeah that's hard to deal with

01:17 <fche> so probe begin {log("hi")} ... doesn't print anything until a ^C or later

01:19 <fche> we dealt with that before

01:20 <kerneltoast> gotta deal with it differently now

01:20 <kerneltoast> maybe with a timer

01:21 <fche> isn't that _stp_relay_wakeup_timer ?

01:23 <kerneltoast> (still not at laptop)

01:23 <kerneltoast> a different approach would be to scan for empty subbufs

01:23 <kerneltoast> instead of quitting after a single swap

01:23 <kerneltoast> that would get rid of the pesky fudge aspect that'd come with having a timer enforce timely printing

01:26 <kerneltoast> I'm not sure how that would affect cross-subbuf ordering though

01:26 <fche> meaning cross-cpu absolute time ordering?

01:27 <kerneltoast> no, i mean if you have a single print that gets fragmented across different subbufs

01:27 <fche> those would still be sequential,surely

01:28 <kerneltoast> i think relay keeps track of this by having the subbufs ordered

01:28 <fche> yes

01:28 <kerneltoast> so if we just printed half a message in one subbuf, then skipped a subbuf and printed the rest in yet another subbuf, something would go funky i suspect

01:29 <kerneltoast> i think relay makes a subbuf unavailable after you swap it

01:29 <kerneltoast> and frees it back up once userspace consumes it

01:30 <kerneltoast> that must've exacerbated your test case

01:30 <kerneltoast> because userspace has to catch up with the log spam

01:35 <kerneltoast> fche, can you test the first patch i sent? this one: https://paste.centos.org/view/8da5796d

01:36 <kerneltoast> it doesn't suffer from the delay

01:36 <kerneltoast> I'm curious if it's good enough to pass your test

01:36 <fche> ok stand by

01:37 * kerneltoast gets behind blast shield

01:37 <fche> wonder why curl $URL | git apply is b0rked

01:39 <kerneltoast> might've been because i omitted those index lines git puts at the top

01:39 <fche> bad toast

01:39 <kerneltoast> otherwise i may reveal which old revision I'm using

01:39 <kerneltoast> and get exposed as a phony

01:40 <fche> ummm

01:40 <fche> ok that one works for 'probe begin {log("hi") } '

01:40 <fche> but

01:41 <fche> it seems to lose data or loop or something with the ruby bulk test

01:41 <fche> yeah no good my friend, this time stapio userspace is unkillable, looping

01:41 <fche> oh my

01:42 <fche> I'm going to have to do something

01:42 <kerneltoast> wowza

01:42 <kerneltoast> where did it all go wrong

01:43 hpt has joined #systemtap

01:44 <fche> maybe the twice-modifed size_request?

01:44 <fche> dunno

01:44 <kerneltoast> i think the first modification will never happen

01:45 <kerneltoast> wanna check if that if-statement gets hit?

01:45 <kerneltoast> if (unlikely(buf->offset == buf->chan->subbuf_size)) {

01:45 <kerneltoast> ^ that one

01:46 <fche> that sounds as though it will hit only if a subbuf magically turns out to be Exactly packed, not over?

01:46 <kerneltoast> yeah because the second modification makes sure we only ever exactly pack the subbuf

01:46 <kerneltoast> but i think it'll never happen because of the write commit function

01:47 <kerneltoast> subbuf is always swapped out after it's written to

01:47 <kerneltoast> so there's no way a lingering full subbuf could remain

01:48 <kerneltoast> in fact, each available subbuf is empty because of this

01:48 <kerneltoast> we never let a subbuf sit with data inside it

01:50 khaled has quit [Quit: Konversation terminated!]

02:02 <fche> alas, will think about it more tomorrow

02:02 <fche> the first patch was nicer, except for the not-enough-wakeups problem

02:02 <fche> it was stable & didn't lose data on that test

02:09 <kerneltoast> relaaaaayyyyyyy

02:52 derek0883 has quit [Remote host closed the connection]

03:00 derek0883 has joined #systemtap

04:12 orivej has quit [Ping timeout: 256 seconds]

05:14 derek0883 has quit [Remote host closed the connection]

05:15 derek0883 has joined #systemtap

06:41 derek0883 has quit [Remote host closed the connection]

06:41 derek0883 has joined #systemtap

06:42 derek0883 has quit [Remote host closed the connection]

06:42 derek0883 has joined #systemtap

06:53 derek0883 has quit [Remote host closed the connection]

07:10 orivej has joined #systemtap

07:10 derek0883 has joined #systemtap

07:21 derek0883 has quit [Remote host closed the connection]

07:21 orivej has quit [Ping timeout: 264 seconds]

07:21 derek0883 has joined #systemtap

07:22 orivej has joined #systemtap

07:32 derek0883 has quit [Remote host closed the connection]

08:02 khaled has joined #systemtap

08:11 derek0883 has joined #systemtap

08:16 derek0883 has quit [Ping timeout: 264 seconds]

08:21 derek0883 has joined #systemtap

08:31 derek0883 has quit [Remote host closed the connection]

09:10 derek0883 has joined #systemtap

09:11 derek0883 has quit [Remote host closed the connection]

10:06 hpt has quit [Ping timeout: 246 seconds]

10:19 mjw has joined #systemtap

12:07 derek0883 has joined #systemtap

12:07 derek0883 has quit [Remote host closed the connection]

12:08 derek0883 has joined #systemtap

12:19 derek0883 has quit [Remote host closed the connection]

12:20 derek0883 has joined #systemtap

12:24 derek0883 has quit [Ping timeout: 240 seconds]

13:09 orivej has quit [Ping timeout: 272 seconds]

14:01 tromey has joined #systemtap

14:15 orivej has joined #systemtap

14:46 <fche> kerneltoast, hey

14:46 <fche> when you're back, wondering if you had any more thoughts about a hybrid between patch1 & 2

14:47 <kerneltoast> still thonking about it

14:47 <kerneltoast> it's a dilemma for sure

14:48 <fche> is it the __stp_relay_switch_subbuf() that flags the timer to wake up userspace?

14:48 <kerneltoast> yeah

14:53 <fche> can the timer invoke that itself, for a 'dirty' subbuf?

14:54 <kerneltoast> the timer needs to be on the same cpu to do so

14:57 <fche> hm, so why not have one there

14:57 <fche> a timer per cpu

14:57 <fche> running just infrequently, say once per second

14:57 <kerneltoast> it's fudgy

14:58 <kerneltoast> it would work but it's not pretty

14:58 <fche> hey I have to look at the mirror every morning

14:58 <fche> and I work too

14:58 <fche> and I'm not pretty

14:58 <kerneltoast> whether or not you work is still under heavy academic debate

14:58 <fche> OOF

14:58 <fche> big oof

14:59 <kerneltoast> :P

14:59 derek0883 has joined #systemtap

15:05 derek0883 has quit [Ping timeout: 264 seconds]

15:05 amerey has joined #systemtap

15:09 <kerneltoast> we could let userspace handle subbuf swapping for subbufs which aren't full

15:10 <kerneltoast> if subbuf is full, we swap ourselves

15:10 <kerneltoast> if subbuf is partially full, we wake userspace and have userspace do the swap

15:10 <fche> whether userspace wakes up via its own timers

15:10 <fche> or whether a kernel timer wakes up periodically and wakes up userspace

15:10 <fche> doesn't seem to make much difference

15:11 <fche> still need one per cpu

15:11 <kerneltoast> yeah but it's less ghetto than waking up once per time quantum

15:11 <kerneltoast> since userspace needs to be woken up anyway

15:12 <fche> if the kernel timer checks frequentlyish, it does not need to wake up the userspace

15:12 <fche> if userspace timer needs to check frequentlyish, it's more costly in terms of context switches

15:28 <kerneltoast> lemme clarify: we still kick userspace from the kernel

15:28 <kerneltoast> but what we do is

15:28 <kerneltoast> always kick userspace from the write commit function

15:28 <kerneltoast> and only swap in the write commit function if the subbuf is full

15:30 <fche> so ISTM we should swap subbufs & kick userspace if EITHER (a) subbuf is full OR (b) on a timed basis even if the subbuf is not full

15:31 <kerneltoast> instead of a timed basis we can just use write commit

15:32 <fche> -every- write commit? don't we get the same problem from yesterday then? lost traffic due to excessive subbuf switching etc. ?

15:32 <kerneltoast> no because we'll only swap from write commit if the subbuf is full

15:32 <kerneltoast> if the subbuf isn't full, we leave it but wake up userspace

15:33 <fche> what can userspace do with a non-swapped non-full subbuf?

15:33 <kerneltoast> it can flush it

15:33 <kerneltoast> all printing is protected by disabled irqs

15:34 <fche> then we'd be storming userspace with wakeups -> flush requests

15:34 <kerneltoast> userspace won't read any incomplete prints

15:34 <kerneltoast> no it would be just as many wakeups as stap currently does

15:35 <kerneltoast> but we don't swap out the subbuf when we wakeup unless it is full

15:35 <fche> ok so again

15:35 <fche> if we have a stap probe that produces 10000 small prints per second per cpu

15:35 <fche> how many wakeups to userspace would that cause

15:35 <fche> how would userspace react

15:35 <kerneltoast> it would cause exactly the same number of wakeups as git stap

15:36 <kerneltoast> right now we have wakeups bonded together with subbuf swapping

15:36 <kerneltoast> i want to decouple them

15:38 <kerneltoast> lemme see if i can open up my laptop without waking the missus

15:38 <fche> WAKEY WAKEY

15:39 <kerneltoast> the cat won't be screaming her into consciousness for another 50 min

15:40 <fche> don't tell me you're married to a cat

15:40 <fche> none of my business but

15:41 <kerneltoast> not married, i'm still a free range toast

15:42 <kerneltoast> our cat is an asshole and wakes us up at ~8 every day for food

15:45 <kerneltoast> fche, https://paste.centos.org/view/981f5665

15:47 <kerneltoast> this would also let us get rid of __stp_relay_wakeup_timer

15:55 <fche> I thought the wake_up_interruptible* goo was just not safe to invoke from _write_commit (arbitrary probe context)

15:55 <fche> that's why we bothered have timers

15:57 <kerneltoast> ah crap

15:57 <kerneltoast> i fell for the classic blunder

15:58 <kerneltoast> okay we can punt it onto __stp_relay_wakeup_timer

15:58 <kerneltoast> and do the same thing to avoid per-cpu timers

15:59 <kerneltoast> this will work better than per-cpu timers because it takes an arbitrary amount of time after the wakeup from the timer before the logger thread in userspace starts running

16:00 <kerneltoast> with per-cpu timers if you fire too frequently, you might exhaust the subbufs again

16:01 <kerneltoast> and telling how frequently that may be varies on the environment

16:01 <kerneltoast> what if I'm using stap on my amd geode

16:33 orivej has quit [Ping timeout: 272 seconds]

16:41 mjw has quit [Ping timeout: 240 seconds]

16:59 derek0883 has joined #systemtap

17:00 mjw has joined #systemtap

17:25 derek0883 has quit [Remote host closed the connection]

17:26 derek0883 has joined #systemtap

17:29 derek088_ has joined #systemtap

17:30 derek0883 has quit [Ping timeout: 264 seconds]

17:56 mjw has quit [Quit: Leaving]

18:08 derek088_ has quit [Remote host closed the connection]

18:08 derek0883 has joined #systemtap

18:13 derek0883 has quit [Remote host closed the connection]

18:13 derek0883 has joined #systemtap

18:31 irker485 has joined #systemtap

18:31 <irker485> systemtap: fche systemtap.git:master * release-4.4-56-g263f980e2 / NEWS: NEWS: correct arch names for recent tls code

18:34 derek0883 has quit [Read error: Connection reset by peer]

18:35 derek0883 has joined #systemtap

19:32 <irker485> systemtap: smakarov systemtap.git:master * release-4.4-57-g8a62fac08 / bpf-translate.cxx: stapbpf (for PR27030): bugfix the b71d20af bugfix

20:18 derek0883 has quit [Remote host closed the connection]

20:21 derek0883 has joined #systemtap

20:27 <fche> kerneltoast, still around? still wondering about the timers

20:28 <kerneltoast> yeah

20:28 <kerneltoast> wazzap

20:28 <fche> I'm trying to process your objection to per-cpu wakeup timers

20:30 <kerneltoast> yes

20:31 <kerneltoast> what's got you confused?

20:31 <kerneltoast> other than me

20:31 <fche> istm we'd want per-cpu wakeup timers

20:31 <fche> ditch the central one

20:31 <fche> just make one per cpu

20:32 <fche> and it could implement the policy decision of how frequently to wake up userspace, at what levels of subbuf filledness

20:32 <fche> i.e., rapidly if there are full subbufs (in the case of a data dump)

20:32 <fche> or slower if there are nonempty nonfull subbufs (in the case of probe-begin dribble)

20:33 <kerneltoast> that might work but it'd be easy to break

20:33 <fche> how?

20:33 <kerneltoast> because the subbufs can be exhausted in the time between the timer decides to wake staprun to consume and the time that staprun actually wakes up and consumes

20:35 <kerneltoast> the scenario i'm thinking of:

20:35 <kerneltoast> 2. some milliseconds go by and the subbufs get filled

20:35 <kerneltoast> 3. staprun is now awake and starts consuming

20:35 <kerneltoast> 1. your timer does the wake_up_interruptible() to tell staprun to start consuming

20:35 <kerneltoast> and the "some milliseconds" varies depending on hardware speed

20:35 <fche> yes, ok, that as opposed to what?

20:37 <kerneltoast> my alternative:

20:37 <kerneltoast> 2. the partially filled subbuf can keep getting filled until staprun wakes up and consumes

20:37 <kerneltoast> 1. there's a print flush. a subbuf is partially filled but has lots of empty space. we don't swap the subbuf, but we still do wake_up_interruptible()

20:37 <kerneltoast> 3. staprun is now awake and starts consuming

20:37 derek0883 has quit [Remote host closed the connection]

20:38 <kerneltoast> this will require some userspace cooperation though

20:38 <fche> I thought we can't do a wake_up* from a print_flush for the same reason (general probe context)

20:38 <kerneltoast> because when staprun wakes up, it will need to tell the kernel module "hey i'm here now, swap your partially filled subbuf"

20:39 <kerneltoast> yes so instead we do the wakeup it from the relay timer

20:39 <fche> yes - so how is that substantially different?

20:39 <fche> except there being one timer vs. one per cpu ?

20:40 <kerneltoast> the main difference: the kernel module is not swapping a partially filled subbuf on its own. instead it waits for staprun to tell it to swap a partially filled subbuf

20:40 <kerneltoast> what you proposed was having per-cpu timers swap out the subbuf every X amount of time

20:40 <kerneltoast> so that data won't linger in the subbuf

20:41 <fche> I don't care specifically about the swapping aspect - if userspace threads can cause that once they wake up, fine with me

20:42 <kerneltoast> then you don't need percpu timers

20:42 <kerneltoast> because you're not using them to swap

20:42 <kerneltoast> swapping must occur on the cpu that owns the subbuf

20:42 <kerneltoast> if all you're doing is waking the userspace threads then there's no reason to have percpu timers

20:43 <fche> no locality advantage?

20:43 <kerneltoast> locality advantage for what?

20:43 <fche> instead of one thread that must scan N buffers and notify N userspace threads/fds

20:43 <fche> (and must fault all that stuff across cpus)

20:43 <fche> we could have N threads, one per cpu, looking at local data only

20:45 <kerneltoast> i don't see how that's helpful when there's already shared data used in the print path

20:45 <kerneltoast> and you don't need to scan N buffers

20:45 <fche> is it? thought we were per-cpu quite a lot

20:45 <kerneltoast> we still have the global lock

20:45 <kerneltoast> to avoid racing with print unregister

20:46 <kerneltoast> in the write commit function we can do this: cpumask_set_cpu(cpu, subbuf_flush_mask);

20:47 <kerneltoast> and then have __stp_relay_wakeup_timer go through every cpu in the mask

20:48 orivej has joined #systemtap

20:51 <fche> yeah and in case of pending data, fetch all that control stuff across numa/cpu

20:53 <kerneltoast> but that already happens when printing

20:54 <kerneltoast> via _stp_print_ctr

20:55 <fche> ok that's one more global, as opposed to all the subbuf counter/etc. stuff

20:55 <fche> anyway, I'm not saying the effect is bound to be large, but maybe some.

20:55 <fche> ok

20:55 <fche> so if we were to go your way,

20:55 <fche> what would we have to do

20:59 <kerneltoast> https://paste.centos.org/view/cf86171e

20:59 <kerneltoast> that should cover it

20:59 <kerneltoast> high quality design document right there

20:59 <fche> needs more punctuation

21:00 <fche> and at least one capital letter

21:00 <kerneltoast> and needs to end in .doc

21:01 <fche> that's just too far

21:01 <fche> xls

21:01 <kerneltoast> we need a flowchart too

21:01 <kerneltoast> one of my professors in college didn't let us write any code until we made a flowchart

21:01 <kerneltoast> it was for the intro to C programming class

21:01 <kerneltoast> i suffered

21:02 <fche> wait, is this hypothetical in the sense that it requires More changes to staprun or whatnot?

21:02 <fche> "send a command ... swap ?"

21:02 <kerneltoast> yes it does

21:02 <kerneltoast> staprun is going to wake up and there won't be any subbufs to consume

21:02 <kerneltoast> so it needs to tell the kernel to give it something to consume

21:03 <kerneltoast> staprun needs to be the one to invoke __stp_relay_switch_subbuf

21:03 <fche> not crazy about having to extend the staprun|kernel abi

21:04 <fche> (we've had, for quite a long time, mutual version compatibility)

21:04 <kerneltoast> hmm

21:05 <fche> how about a per-cpu quasi timer thing

21:05 <fche> namely:

21:05 <fche> - whenever a write_commit is complete, write a local timestamp into a local var

21:06 <fche> - if the subbuf has become full, and we're switching, set the cpumask bit

21:06 <fche> - if the subbuf is NOT full, but it has not been switched (prev timestamp too old), switch anyway and set the cpumask bit

21:08 tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]

21:08 <kerneltoast> that's no different than what you were thinking of before

21:09 <fche> well, it is, still only one timer Thread

21:09 <fche> and no staprun change

21:09 <kerneltoast> that still wastes a subbuf that can take more data in the time between the write commit and staprun waking up

21:10 <fche> that's ok, that second clause should trigger literally rarely

21:10 <kerneltoast> cooperation from staprun would allow maximum utilization of the subbufs

21:11 <fche> I can accept less than Maximum utilization

21:11 <fche> just the current git code appears to have much less than maximum

21:11 <kerneltoast> i dunno if that'll make your ruby test case pass

21:11 <kerneltoast> the patch you tested which lets non-full subbufs linger is really max utilization

21:12 <kerneltoast> can't get any better than that

21:13 <kerneltoast> another option could be tacking on a task worker to staprun

21:13 <kerneltoast> so that staprun code itself doesn't need to handle the cooperation

21:13 <kerneltoast> and instead the task worker can do it

21:14 derek0883 has joined #systemtap

21:15 <kerneltoast> the subbuf would get swapped right before the context switch to staprun code

21:16 <kerneltoast> and then staprun would stapconsume our stapbufs

21:16 <fche> can see why taht could be a way of eking out every last bit of storage, but don't think that's necessary

21:16 <kerneltoast> can you reduce the subbuf count a bit with my max utilization patch and see where the ruby tester breaks?

21:17 <kerneltoast> cut subbuf count in half, see what happens

21:18 <kerneltoast> if you can reduce the subbuf count very much then we'll need moar subbufs to cope with *not* eking out every last bit of storage

21:18 <kerneltoast> *if you can't reduce subbuf count very much

21:24 derek0883 has quit [Remote host closed the connection]

22:13 derek0883 has joined #systemtap

22:40 amerey has quit [Remote host closed the connection]