fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
<agentzh>
serhei: i'm thinking a bit more about the abort() function. it seems tricky to do exceptions or longjmp in the ebpf context? i remember stap's kernel runtime uses horrible emulation hacks to do those. very inefficient hacks. not sure we have anything more clever in ebpf...
orivej has joined #systemtap
fdalleau_away is now known as fdalleau
khaled has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
ema has quit [Remote host closed the connection]
amerey has quit [Ping timeout: 240 seconds]
amerey has joined #systemtap
irker573 has quit [Quit: transmission timeout]
mjw has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 252 seconds]
<serhei>
the main thing abort does is immediately exit the probe -- not hard to do when we're building the CFG for the entire probe
<serhei>
the second thing is does is set session_state to STAP_SESSION_STOPPING -- that may be more fiddly to replicate exactly in terms of how soon we stop other probes from running
<fche>
serhei, bpf has try/catch now, that's all abort does in the lkm case, throw an exception right?
<serhei>
in bpf, try/catch is just a fancy term for "find the enclosing catch block and insert a branch to it" fwiw
<fche>
yes
<fche>
same as in lkm
<serhei>
yeah
<serhei>
abort() is the same thing, but exits the probe unconditionally, skipping the catch block
<serhei>
ah, don't even need to fiddle with CFG for that. Just call exit()
<fche>
ah yes using that other context flag
sscox has quit [Quit: sscox]
sscox has joined #systemtap
sscox has quit [Quit: sscox]
sscox has joined #systemtap
tromey has joined #systemtap
irker216 has joined #systemtap
<irker216>
systemtap: wcohen systemtap.git:master * release-4.4-149-g3ca13a712 / tapset/linux/netfilter.stp: Use the skb_frag_size accessor function rather than directly reading field
orivej has joined #systemtap
khaled_ has joined #systemtap
khaled has quit [Ping timeout: 268 seconds]
mjw has quit [Quit: Leaving]
<agentzh>
fche: wow, ebpf supports try/catch already? that's news to me. will you mind share the lkm thread's link? google is not smart enough to find it, it seems. thanks!
<fche>
lkm = linux kernel module, not lkml = mailing list
<agentzh>
serhei: oh, i didn't know stap's bpf backend builds a full CFG. my concern here is it would be tricky if we want to print out a backtrace for exceptions in the future?
<agentzh>
fche: okay...it means the kernel runtime of stap.
<fche>
'course it will be tricky, everything is tricky with bpf :)
<agentzh>
serhei mentions "bpf has try/catch now". is that a builtin support or just in stap's bpf variant?
<fche>
try/catch is a language level thing
<fche>
not a bpf bytecode level thing
<fche>
so .... 'no' ?
<agentzh>
any pointer to that feature?
<agentzh>
that's exciting!
<fche>
well, .... stap script language try/catch is in multiple docs
<serhei>
iirc try { error("woops") } catch {}
<fche>
the new bit is that more of that works on the bpf backend than used to.
<agentzh>
oh, you mean the stap language...
<serhei>
yeah
<agentzh>
i thought you mean the ebpf C language feature.
<fche>
there is no bpf c language :-) it's just c
<agentzh>
yeah, bcc's dialect does not really count.
<agentzh>
so it seems like bpf native would still need stap's kernel module's way to emulate exceptions...like checking a flag after every func call and shortcut the current func's execution flow and then propagate upwards the calling chain?
<fche>
sure
<agentzh>
that would be slow though. longjmp would be nice for bpf.
<serhei>
bpf native doesn't really have a calling chain
<serhei>
just a blob of bytecode that calls out to a restricted list of kernel helper functions
<agentzh>
i mean the bpf functions, not bpf helpers.
<agentzh>
bpf already supports functions.
<agentzh>
user-defined functions in the same bpf file.
<fche>
yes, we have to manage the control flow through bpf (forward) jump insns
<fche>
like a normal compiler
<fche>
speed schpeed, if you wanted fast you wouldn't be using bpf :)
<agentzh>
i see. so this is the secret weapon of stap to tackle the bpf verifier?
<agentzh>
bpf verifier makes bpf programming a nightmare.
<fche>
don't think this affects that at all
<agentzh>
at least using the std tool chain.
<fche>
it is still a nightmare
<agentzh>
heh
<serhei>
ah, found a patch thread for bpf's function feature (call bytecode pointing to other bpf code). We don't really use it
<agentzh>
yep, we're already using it through the clang tool chain.
<agentzh>
we're limited by 5-arg limitation in the func calls though.
<agentzh>
thinking about workaronds like pushing extra args to bpf maps or something.
<serhei>
hmm, another thing to evaluate
<agentzh>
that won't be fast either.
<agentzh>
but at least we can have backtraces (emulated ones at least).
<serhei>
wonder if exiting the bpf 'function' by branching would work as a longjmp
<agentzh>
like a direct goto?
<agentzh>
across function boundries?
<serhei>
make different copies of the function if they appear inside different catch blocks
<serhei>
it would just save some duplication for programs that call tapset functions repeatedly
<agentzh>
hmm, i'm a bit lost here. will you elaborate?
<agentzh>
what do you mean by exiting by branching?
<serhei>
currently all stap function calls are inlined
<agentzh>
yep
<serhei>
(for the bpf backend)
<agentzh>
aye
<serhei>
but that means that if we call a complicated tapset function 15 times
<serhei>
we have 15 copies of the code
<agentzh>
right, that's exactly what inlines do.
<serhei>
and probably hit the insn limit
<agentzh>
true
<serhei>
that makes handling try/catch very easy since we know the location of the catch block statically
<agentzh>
indeed
<serhei>
if we make one copy of the tapset function and call it with the bpf call opcode, we don't know which catch block to branch to in case of an error
<agentzh>
makes sense
<serhei>
if we did know, we could probably fudge a longjmp by just doing a goto to the catch block
<agentzh>
i see.
<serhei>
at least in the case where the catch block is at the top level and isn't followed by a return
<agentzh>
so bpf functions are still involved.
<agentzh>
*not involved
<serhei>
in any case
<serhei>
the scheme I just came up with in my head is rather ugly
<agentzh>
but it should work.
<agentzh>
another convern i have with the all-inlining scheme is that the bpf verifier is very bad at verifying large functions.
<agentzh>
we had to introduce functions to help the verifier.
<serhei>
the all-inlining scheme is what we have right now
<agentzh>
otherwise it would frequently give up saying the control flow is too complex...
<agentzh>
every time i see a verifier error, i'll start throwing things on my desk...
<serhei>
I don't think switching to functions would help much because handling try/catch will require us to be very judicious about what to inline and what to put into a function
<serhei>
but I've never seen a 'control flow too complex' error with stap generated code
<agentzh>
yeah, throw/catch would be much harder if we put bpf functions into the mix.
<agentzh>
i'm just thinking along another line.
<serhei>
I usually see an 'out of stack space' error
<agentzh>
yeah, stack space is also common.
<agentzh>
for us.
<agentzh>
gotta run for a therapy. brb.
<serhei>
thanks for the brainstorming
mjw has joined #systemtap
irker216 has quit [Quit: transmission timeout]
irker697 has joined #systemtap
<irker697>
systemtap: amerey systemtap.git:master * release-4.4-150-g439fb4cc4 / dwflpp.h loc2stap.h session.h: Make declarations consistent with corresponding definitions
<agentzh>
serhei: another quick question: how does stap handle string creations like "a" . "b" please? it's not obvious by reading the disassembly code of the .bo files emitted. i'm trying to understand the string length limits as mentioned by fche previously.
<agentzh>
if we use a bpf map or a bpf ringbuf for string allocations, we no longer suffer from the current 64-byte or 128 byte limits?
<agentzh>
is stapbpf allocating strings on the kernel C stack right now?
<fche>
kernel c stack is not accessible to bpf
<fche>
bpf bytecodes must alloc from bpf stack
<agentzh>
sorry, i mean bpf stack.
<agentzh>
the bpf stack seems to only allow static allocations like 'const char buf[] = "xxxx"'.
<agentzh>
not much here.
<agentzh>
no fancy toys like alloca().
<agentzh>
afaik
<fche>
correct
<agentzh>
so stap bpf is allocating strings on bpf stack's static buffers?
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
<agentzh>
fche: is that a yes?
<fche>
stack != static but yes
<fche>
the stack is AIUI the only place where one can allocate things in bpf land
<fche>
(other than the kernel-side map/etc. data structures)
<agentzh>
gotcha
<agentzh>
thanks for the info
<agentzh>
we're trying to achieve most of the stap kernel runtime capabilities in the ebpf route. it's a very bumpy road :)
<agentzh>
as i said, the ebpf world is still in stone age as compared to the stap flagship runtime.
<agentzh>
but i still have faith in ebpf just like serhei. since it shows a lot of promises.
<agentzh>
the stock kernel ebpf is lame though, with all due respect :)
<agentzh>
*stock kernel ebpf implementation
<fche>
you might not believe it, but in another thread, .... ummm.... kernel ebpf is held up as the magical standard-bearer of capability
<agentzh>
lol, i know there's a lot of hype around ebpf nowadays.
<agentzh>
i wonder if they ever go deep enough.
<agentzh>
or just happy enough with trivial things.
<fche>
trivial things are still useful
<fche>
but still.
<agentzh>
true.
<agentzh>
but our use cases definitely go way beyond trivial.
<agentzh>
even beyond what stap's flagship runtime can do.
<fche>
yup
<agentzh>
:)
<fche>
IMPOSSIBLE
<fche>
inconceivable
<agentzh>
:D
<fche>
un not impossible
<agentzh>
well, we already got rid of the ebpf verifier mostly in our own kernel.
<agentzh>
we only keep the necessary info collection work in the verifier. the info is needed by jit compiler and interpreter.
<agentzh>
so we cannot kill the whole verifier.
<agentzh>
and we're adopting the same actions counter mechanism for safety.
<agentzh>
as the stap kernel runtime.
<agentzh>
we shall see how far we can go down this route :)
<agentzh>
the stock ebpf verifier is to be damned...
<agentzh>
that's all the source of troubles and pains.
<fche>
ehhehhehehe
<agentzh>
i can finally stop throwing things from my desk :D
<fche>
brings back memories from, what, 2004, when stap had to choose a direction based on assumptions of what the kernel folks could live with
<fche>
little virtual machines were on the table back then too, but looked hopeless
<agentzh>
dtrace uses little in-kernel VM.
<fche>
yes.
<agentzh>
i believe in the in-kernel VM.
<agentzh>
i don't like dynamic .ko loading and unloading...
<agentzh>
especially when our customers require ko signing...
<fche>
plenty of reasons not to Prefer it, but you might Need it anyway
<fche>
you know stap has some signature capability, right?
<fche>
dynamic signing, yes, but with a key loaded into the mok/efi
<agentzh>
they require manual auditing and signing process.
<agentzh>
manual code auditing
<agentzh>
ah
<fche>
but they don't mind a frankenkernel with disabled bpf verifier? neato :)
<agentzh>
no, they don't and they don't care.
<agentzh>
just process.
<agentzh>
you know.
<agentzh>
*grin*
<fche>
many such cases
<agentzh>
most of our customers are not tech gurus like you guys.
<agentzh>
they just want transparency but never really understand them.
<agentzh>
it's the reality we live in.
<fche>
yeah, we all make/design compromises
<agentzh>
back to in-kernel VMs. do they look hopeful *nowadays*&?
<fche>
well dunno, there is one now, so obviously
<agentzh>
in your opinion?
<agentzh>
okay, but they hook it up with a hopeless verifier.
<fche>
but you see what's done with the verifier
<agentzh>
right
fdalleau is now known as fdalleau_away
<agentzh>
well, we will still embrace stap's kernel runtime for any feasible future. we're just trying something new for use cases the existing stap runtime does not handle very well.
<agentzh>
like android, like kernel module (static) signing.
irker697 has quit [Quit: transmission timeout]
orivej has quit [Ping timeout: 240 seconds]
<serhei>
wait, I have faith in ebpf?
<serhei>
I just do things with it :/ it goes relatively smoothly because, on the contrary, I expect nothing from it
<serhei>
so I can only be surprised in a positive direction