fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
amerey has quit [Quit: Leaving]
agentzh has quit [Remote host closed the connection]
hpt has joined #systemtap
KDr2 has joined #systemtap
khaled has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
hpt has quit [Ping timeout: 260 seconds]
mjw has joined #systemtap
KDr2 has quit []
wcohen has quit [Ping timeout: 258 seconds]
tromey has joined #systemtap
wcohen has joined #systemtap
gromero has joined #systemtap
amerey has joined #systemtap
orivej has joined #systemtap
_whitelogger has quit [Ping timeout: 248 seconds]
_whitelogger has joined #systemtap
dalehamel has joined #systemtap
<dalehamel> howdy folks, I discovered a workaround for an issue with USDT probes using the headers provided by systemtap-sdt-dev package on many distros
<dalehamel> i documented it and proposed a patch on https://sourceware.org/bugzilla/show_bug.cgi?id=25581
<dalehamel> i've never contributed to systemtap before though, so I'm wondering what the best way to go about getting this reviewed would be
<dalehamel> the gist of it is that on X86 at least, you can bind the label for the probe point to a function declaration, and use that function pointer as a handle to determine if a uprobe is attached or no
<fche> dalehamel, hi
<fche> haven't looked into it in super detail but the idea is familiar
<dalehamel> howdy!
<dalehamel> yeah in my case it was ingenuity from necessity, we are running on google container OS on GKE, and it blocks writes to /proc/[pid/mem
<fche> wondering -- what conditions exist whereby a userspace tracer can attach to the probes (overwrite the nop)
<fche> but not also overwrite the semaphore enable counts ?
<dalehamel> so the attach to the probe is done by the kernel no problem
<dalehamel> but incrementing the semaphore is done in userspace by default in BCC, i'm working on a patch to use the new api added in 4.20 to do this through the kernel
<dalehamel> but we're still running a 4.14 kernel
<dalehamel> so basically if you try to attach to a USDT probe it works, but the probe is never fired because nothing can increment the semaphore
<fche> so it's a tool limitation, not an sdt.h one per se
<dalehamel> well a bit of both, with sdt.h there is no fallback mechanism for semaphores
sapatel__ is now known as sapatel
<dalehamel> like if semaphores aren't supported, probes have to be all enabled
<fche> one can compile out semaphores with an sdt.h control macro
<fche> (map foo_ENABLED to 1)
<dalehamel> right, but that's impractical because it tanks performance
<dalehamel> like with ruby for instance, it raises the response time fro a request from 30ms to 130ms
<fche> yeah, depending on what you trace, right.
<dalehamel> as all probes are shown as enabled, and lots of unecessary processing is done
<dalehamel> whereas with this check, you can see that the specific probe you want is attached to
<dalehamel> so much less overhead
<dalehamel> and you only need uprobes - no external semaphore incrementing / management
<dalehamel> i only tested it for X86, but it should be possible to port to other architectures easily enough by examining their uprobe implementations
<fche> remember that kernel uprobes are not the only consumer of sdt.h macros
<fche> userspace like gdb can also use it
<dalehamel> oh i wasn't aware, i'm biased by my own experience
<dalehamel> i think the approach should still be valid for them though, doesn't gdb still use INT3?
<dalehamel> like if you don't have semaphores, and gdb can still set int3, i think it will still work
<fche> generally, for x86, unless some other mechanism like hardware debug breakpoints
<dalehamel> in my patch I made this the default behavior if there is no SDT semaphore support
<fche> and we have other consumers that aren't trap based at all
<dalehamel> but if that's a concern it could be gated by another flag
<dalehamel> like #define _USE_SDT_UPROBE_FALLBACK 1
<dalehamel> perhaps a better name, now that I know its' not just uprobes
<fche> well the gist of the suggestion AIUI is to have the _ENABLED() predicate check whether the nop has been overwritten
<dalehamel> yes exactly
<fche> that as you know is arch-centric, so the general case would have to be like a memcmp between the nop slot and some copy of the SDT_NOP
<dalehamel> i'm not sure i follow on the memcpy
<fche> memcmp
<dalehamel> woops misread
<fche> by the way -- how does perf deal with this?
<dalehamel> the resent implementation dereferences the function pointer address and checks the insn
<dalehamel> that's a good question, I think they also do it in userspace iirc
<dalehamel> so i think they would also have this issue if the ref_ctr_offset api isn't available
<fche> would be good to check how. ptrace?
<fche> I think perf-events in the kernel have learned how to set these semaphores
<fche> hehe
<fche> it parses /sys/kernel/debug/tracing/README for the substring ref_ctr_offset
<fche> and if it's there, considers the semaphore feature supported by the kernel
<dalehamel> yeah i know, i am working on support for that in bcc
<dalehamel> but that API is only available in 4.20+, still lots of systems on older kernels than that in the wild
<fche> aha
<dalehamel> oops
<dalehamel> meant to quote
<dalehamel> > it parses /sys/kernel/debug/tracing/README
<dalehamel> really? that's what perf does?
<fche> would lie to anyone but you :-)
<dalehamel> the better way is to check /sys/bus/event_source/devices/uprobe/format/ref_ctr_offset lol
<dalehamel> smh
<fche> anywya
<fche> would seriously consider giving an alternate _ENABLED() implementation a try if it were
<fche> - conditional on some other macro
<fche> - it were implemented in an arch-neutral way if at all possible, such as with that memcmp()
<fche> re. condition: _SDT_HAS_AUTO_SEMAPHORES ?
<dalehamel> also it's not just sdt.h that has to be patched, but also dtrace.in of course
<dalehamel> > _SDT_HAS_AUTO_SEMAPHORES
<dalehamel> cool i'll revise my patch
<dalehamel> i'll also modify it to use memcmp, that looks like it shouldn't be too bad
<dalehamel> thanks for the feedback fche
<fche> the tricky part would be the memcmp -- choosing the other operand, passing the operand size ... probably not too easy, and not too hard
<fche> np
<fche> could imagine this being a default if we (you! :)
<fche> get it working
<dalehamel> also re the bugzilla link
<dalehamel> did i do that right?
<fche> sure tha tworks
<dalehamel> sorry i'm used to github -_-
<fche> well especially for bugs proper
<fche> in this case we're talking about a new feature
<fche> but whatever
<dalehamel> oh i'm just wondering if i did it right on https://sourceware.org/bugzilla/show_bug.cgi?id=25581
<fche> yeah that'll work
<dalehamel> sorry i've updated the patch like 7 times as i've found / fixed bugs
<fche> we are not too picky.
<dalehamel> there is a lot of variety of experiences in open source, so good to hear :)
<dalehamel> looks like memcmp would introduce a dep on string.h vs my current approach
<dalehamel> right now it's this
<dalehamel> + if platform.machine() in ["x86", "x86_64"]:
<dalehamel> + fdesc.write("#define __PLATFORM_UPROBE_ENABLED(provider, name)\\\n")
<dalehamel> + fdesc.write(" ((*(char *) provider##_##name##_asm_check) & 0x90) != 0x90\n\n") fdesc.write("\n#include <sys/sdt.h>\n\n")
<dalehamel> can you explain how memcmp is more portable?
<dalehamel> (versus different cases for different arches)
<fche> 0x90 is not portable AIUI.
<fche> you don't know for sure what the byte pattern the assembler chooses for "nop" is going to be
<fche> so you can't hard code that into the c header file
<dalehamel> it's not in the header file
<dalehamel> the change to the header file is very small, just adding a label
<dalehamel> err wait, sorry, which header file? The generated header file, or the sys/sdt.h
<dalehamel> it isa dded to the generated file
<fche> well, I guess either
<fche> but in either case you can't hardcode too much a portable check for whether an assembly "nop" has changed
<dalehamel> so if you generate the header on x86, then go to use it on arm for instance?
<dalehamel> in that case i think the generator would need to put an if ladder on different arches in the generated header, if the generated header needs to be portable also
<dalehamel> my current assumption was you generate the header on the arch you plan to run / build on
<fche> even then you need machine-specific content
<dalehamel> wdym? How is that different than the existing arch-specific stuff in sdt.h for instance
<fche> there's not that much in there
<fche> anyway, I'm not trying to hold you back :) just what I'd like to see is no magic arch- (and assembler-) sensitive literals like 0x90
<dalehamel> so a way to detect what the nop insn is for instance?
<dalehamel> ah so try and have things at this level maybe:
<dalehamel> #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
<dalehamel> #define _SDT_NOP nop 0
<dalehamel> #define _SDT_NOP nop
<dalehamel> #else
wcohen has quit [Ping timeout: 240 seconds]
<dalehamel> ok thanks i'll mull this over :)
<fche> but note that those are assembler directives; they could hypothetically map to different byte sequences in the object code
<fche> I mean they definitely do for different arch's
<dalehamel> even with the asm volatile?
<fche> and could perhaps vary within an arch too
<fche> asm volatile just means that gcc will truly send the string to the assembler
<fche> what it produces from that is up to it
<fche> 1-byte nop? 5-byte nop? who knows
<dalehamel> hmm interesting, hadn't thought of that
<dalehamel> good starting point is this table anyways lol https://en.wikipedia.org/wiki/NOP_(code)
<dalehamel> i guess the trick is getting what a NOP translates to in a dynamic way
<dalehamel> perhaps the same trick can be used twice, and we can just have a block that creates a nop with a label but not at a point where the uprobe will attach
<dalehamel> so that at runtime we compare against this reference
<fche> yeah something like that
<dalehamel> then the trick is knowing the number of bytes to compare, that might have to be from a lookup table by arch.... anyways this is great food for thought
<dalehamel> thanks i don't think i would have thought of this otherwise
<fche> ok good luck!
<fche> note we could change the sys/sdt.h NOP copy to be literal bytes rather than mnemonics for the assembler to play with
<dalehamel> you mean the NOP we already inject?
<fche> yeah
<dalehamel> i was hoping to be as unintrusive as possible heh ;)
<fche> yea
<dalehamel> i did just have a thought about a way to get how many bytes though, if we put another label after the nop, we could subtract the labels and store that difference somewhere. Alignment might be an issue, and I'd need to test on a couple arches...
<dalehamel> sorry just thinking out loud here lol
<fche> yeah
gromero has quit [Ping timeout: 272 seconds]
tromey has quit [Quit: ERC (IRC client for Emacs 28.0.50)]
wcohen has joined #systemtap
sscox has quit [Ping timeout: 268 seconds]
amerey has quit [Quit: Leaving]
sscox has joined #systemtap
drsmith has left #systemtap [#systemtap]