fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
amerey has quit [Quit: Leaving]
agentzh has quit [Remote host closed the connection]
hpt has joined #systemtap
KDr2 has joined #systemtap
khaled has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
hpt has quit [Ping timeout: 260 seconds]
mjw has joined #systemtap
KDr2 has quit []
wcohen has quit [Ping timeout: 258 seconds]
tromey has joined #systemtap
wcohen has joined #systemtap
gromero has joined #systemtap
amerey has joined #systemtap
orivej has joined #systemtap
_whitelogger has quit [Ping timeout: 248 seconds]
_whitelogger has joined #systemtap
dalehamel has joined #systemtap
<dalehamel>
howdy folks, I discovered a workaround for an issue with USDT probes using the headers provided by systemtap-sdt-dev package on many distros
<dalehamel>
i've never contributed to systemtap before though, so I'm wondering what the best way to go about getting this reviewed would be
<dalehamel>
the gist of it is that on X86 at least, you can bind the label for the probe point to a function declaration, and use that function pointer as a handle to determine if a uprobe is attached or no
<fche>
dalehamel, hi
<fche>
haven't looked into it in super detail but the idea is familiar
<dalehamel>
howdy!
<dalehamel>
yeah in my case it was ingenuity from necessity, we are running on google container OS on GKE, and it blocks writes to /proc/[pid/mem
<fche>
wondering -- what conditions exist whereby a userspace tracer can attach to the probes (overwrite the nop)
<fche>
but not also overwrite the semaphore enable counts ?
<dalehamel>
so the attach to the probe is done by the kernel no problem
<dalehamel>
but incrementing the semaphore is done in userspace by default in BCC, i'm working on a patch to use the new api added in 4.20 to do this through the kernel
<dalehamel>
but we're still running a 4.14 kernel
<dalehamel>
so basically if you try to attach to a USDT probe it works, but the probe is never fired because nothing can increment the semaphore
<fche>
so it's a tool limitation, not an sdt.h one per se
<dalehamel>
well a bit of both, with sdt.h there is no fallback mechanism for semaphores
sapatel__ is now known as sapatel
<dalehamel>
like if semaphores aren't supported, probes have to be all enabled
<fche>
one can compile out semaphores with an sdt.h control macro
<fche>
(map foo_ENABLED to 1)
<dalehamel>
right, but that's impractical because it tanks performance
<dalehamel>
like with ruby for instance, it raises the response time fro a request from 30ms to 130ms
<fche>
yeah, depending on what you trace, right.
<dalehamel>
as all probes are shown as enabled, and lots of unecessary processing is done
<dalehamel>
whereas with this check, you can see that the specific probe you want is attached to
<dalehamel>
so much less overhead
<dalehamel>
and you only need uprobes - no external semaphore incrementing / management
<dalehamel>
i only tested it for X86, but it should be possible to port to other architectures easily enough by examining their uprobe implementations
<fche>
remember that kernel uprobes are not the only consumer of sdt.h macros
<fche>
userspace like gdb can also use it
<dalehamel>
oh i wasn't aware, i'm biased by my own experience
<dalehamel>
i think the approach should still be valid for them though, doesn't gdb still use INT3?
<dalehamel>
like if you don't have semaphores, and gdb can still set int3, i think it will still work
<fche>
generally, for x86, unless some other mechanism like hardware debug breakpoints
<dalehamel>
in my patch I made this the default behavior if there is no SDT semaphore support
<fche>
and we have other consumers that aren't trap based at all
<dalehamel>
but if that's a concern it could be gated by another flag
<dalehamel>
like #define _USE_SDT_UPROBE_FALLBACK 1
<dalehamel>
perhaps a better name, now that I know its' not just uprobes
<fche>
well the gist of the suggestion AIUI is to have the _ENABLED() predicate check whether the nop has been overwritten
<dalehamel>
yes exactly
<fche>
that as you know is arch-centric, so the general case would have to be like a memcmp between the nop slot and some copy of the SDT_NOP
<dalehamel>
i'm not sure i follow on the memcpy
<fche>
memcmp
<dalehamel>
woops misread
<fche>
by the way -- how does perf deal with this?
<dalehamel>
the resent implementation dereferences the function pointer address and checks the insn
<dalehamel>
that's a good question, I think they also do it in userspace iirc
<dalehamel>
so i think they would also have this issue if the ref_ctr_offset api isn't available
<fche>
would be good to check how. ptrace?
<fche>
I think perf-events in the kernel have learned how to set these semaphores
<fche>
hehe
<fche>
it parses /sys/kernel/debug/tracing/README for the substring ref_ctr_offset
<fche>
and if it's there, considers the semaphore feature supported by the kernel
<dalehamel>
yeah i know, i am working on support for that in bcc
<dalehamel>
can you explain how memcmp is more portable?
<dalehamel>
(versus different cases for different arches)
<fche>
0x90 is not portable AIUI.
<fche>
you don't know for sure what the byte pattern the assembler chooses for "nop" is going to be
<fche>
so you can't hard code that into the c header file
<dalehamel>
it's not in the header file
<dalehamel>
the change to the header file is very small, just adding a label
<dalehamel>
err wait, sorry, which header file? The generated header file, or the sys/sdt.h
<dalehamel>
it isa dded to the generated file
<fche>
well, I guess either
<fche>
but in either case you can't hardcode too much a portable check for whether an assembly "nop" has changed
<dalehamel>
so if you generate the header on x86, then go to use it on arm for instance?
<dalehamel>
in that case i think the generator would need to put an if ladder on different arches in the generated header, if the generated header needs to be portable also
<dalehamel>
my current assumption was you generate the header on the arch you plan to run / build on
<fche>
even then you need machine-specific content
<dalehamel>
wdym? How is that different than the existing arch-specific stuff in sdt.h for instance
<fche>
there's not that much in there
<fche>
anyway, I'm not trying to hold you back :) just what I'd like to see is no magic arch- (and assembler-) sensitive literals like 0x90
<dalehamel>
so a way to detect what the nop insn is for instance?
<dalehamel>
ah so try and have things at this level maybe:
<dalehamel>
i guess the trick is getting what a NOP translates to in a dynamic way
<dalehamel>
perhaps the same trick can be used twice, and we can just have a block that creates a nop with a label but not at a point where the uprobe will attach
<dalehamel>
so that at runtime we compare against this reference
<fche>
yeah something like that
<dalehamel>
then the trick is knowing the number of bytes to compare, that might have to be from a lookup table by arch.... anyways this is great food for thought
<dalehamel>
thanks i don't think i would have thought of this otherwise
<fche>
ok good luck!
<fche>
note we could change the sys/sdt.h NOP copy to be literal bytes rather than mnemonics for the assembler to play with
<dalehamel>
you mean the NOP we already inject?
<fche>
yeah
<dalehamel>
i was hoping to be as unintrusive as possible heh ;)
<fche>
yea
<dalehamel>
i did just have a thought about a way to get how many bytes though, if we put another label after the nop, we could subtract the labels and store that difference somewhere. Alignment might be an issue, and I'd need to test on a couple arches...
<dalehamel>
sorry just thinking out loud here lol
<fche>
yeah
gromero has quit [Ping timeout: 272 seconds]
tromey has quit [Quit: ERC (IRC client for Emacs 28.0.50)]