fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
mjw has quit [Quit: Leaving]
<kerneltoast> fche, small change to my patch to accommodate for nasty NMIs: https://gist.github.com/kerneltoast/3171a233f88d574e9712e49e926a59cf
derek0883 has joined #systemtap
hpt has joined #systemtap
derek0883 has quit [Remote host closed the connection]
<fche> kerneltoast, will think about it more tomorrow
<fche> I don't HATE it but I don't like it
<fche> ISTM this really should be driven from the kernel side
<kerneltoast> agreed but our hands are tied
<kerneltoast> file ops hacking sounds possible at least
<fche> well, the per-cpu timer stuff is not a dead idea yet
<kerneltoast> ehhhhhhh
<kerneltoast> i would take file ops hacking over per-cpu timer
<kerneltoast> for our usecase, dropping messages is intolerable, so i'm aiming for maximum subbuf utilization to avoid that
<kerneltoast> file op hacking would allow even better utilization than my staprun patch
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 272 seconds]
derek0883 has joined #systemtap
fdalleau_away has quit [Quit: Coyote finally caught me]
khaled has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
fdalleau has joined #systemtap
derek0883 has quit [Ping timeout: 265 seconds]
wmealing has quit [Ping timeout: 256 seconds]
hassan64 has joined #systemtap
hassan64 has left #systemtap [#systemtap]
hassan64 has joined #systemtap
hassan64 has quit [Client Quit]
mjw has joined #systemtap
hpt has quit [Ping timeout: 240 seconds]
orivej has quit [Ping timeout: 265 seconds]
jhg_ has joined #systemtap
tromey has joined #systemtap
amerey has joined #systemtap
derek0883 has joined #systemtap
orivej has joined #systemtap
<kerneltoast> fche, bonjour
<fche> oui
<kerneltoast> omelette du fromage
<fche> comme d'habitude
<kerneltoast> oui oui
<kerneltoast> (gf speaks french, i'll stop while i'm ahead)
<fche> I snort french
<kerneltoast> so i asked agentzh to stress test my staprun patch yesterday
<kerneltoast> it's a big improvement but not good enough
<kerneltoast> it'll still drop messages for our heaviest usecase
<fche> stap -s XXXX ?
* kerneltoast runs man stap
<fche> oh btw weren't we going to switch to STAP_TRANS_PROCFS by default?
<kerneltoast> yes we were
<kerneltoast> but you were busy :P
<fche> mon dieu
<kerneltoast> dior givenchy
<fche> haute couture
<kerneltoast> yeah i have no idea how to make STAP_TRANS_PROCFS the default
<kerneltoast> is that some autoconf thing?
<fche> nah, simpler
<kerneltoast> removing the ifdefs?
<kerneltoast> inverting the ifdefs?
<kerneltoast> reverting the ifdefs?
<kerneltoast> anyway, re: printing, i'm going to try hacking file ops
<fche> re: that, ok
<fche> re. defaulting to procfs ....
<fche> could be as simple as
<fche> procfs_p = 1; // default to procfs as the new first line of _stp_transport_fs_init
<kerneltoast> ah it's hardcoded
<fche> sure
<kerneltoast> noice
<kerneltoast> shiptime
<fche> bon
irker366 has joined #systemtap
<irker366> systemtap: sultan systemtap.git:master * release-4.4-68-g4706ab3ca / runtime/transport/transport.c: runtime: default to using procfs for the transport
jhg_ has quit [Quit: adde parvum parvo magnus acerrus erit]
orivej has quit [Ping timeout: 264 seconds]
<kerneltoast> fche, holy crap we can pass our own file ops struct to relay
<kerneltoast> i thought relay decided what file ops struct you got
<kerneltoast> you wat
<kerneltoast> anyway this is great, i can do some beautiful hackering now
<fche> now now try not to get too excited
<fche> with great power comes great resistivity
<fche> oh yeah
fdalleau is now known as fdalleau_away
<kerneltoast> ah the days of going to staples and admiring the new centrino 2 laptops
derek088_ has joined #systemtap
derek0883 has quit [Ping timeout: 256 seconds]
derek088_ has quit [Ping timeout: 260 seconds]
derek0883 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 27.1)]
<fche> does it work okay?
<fche> in __stp_relay_file_read, do you need to do that subbufs_produced/consumed check? isn't it something done by relay_file...read() ?
<fche> I'm generally liking it tho
<kerneltoast> yes that check is needed
<kerneltoast> and yes it works okay
<fche> is it just okay
<kerneltoast> passes the stress test, our testsuite, and just printing a single message without exiting
<fche> or is it AMAZING AWESOME
<kerneltoast> it is AMAZEING
<fche> well then
<fche> ship it
<kerneltoast> gonna take it for a spin yourself first?
<fche> hell it's friday, let's let the buildbots have it
<kerneltoast> i forgot to eat lunch while writing this so now I'm in a parking lot sipping my jamba
<fche> if the *produced/consumed check is needed, can you add a blurb why?
<fche> what is a jamba
<kerneltoast> jamba = smoothie chain store
<fche> you are sipping a chain store?
<fche> are you some kind of giant alien blob ?
<kerneltoast> yes
<fche> i'm okay with that
<kerneltoast> i am *the* blob in fact
<kerneltoast> 1950's movie i think
<kerneltoast> I'm trying to think of why the check is needed and nothing's coming to mind
<kerneltoast> i printk'd it earlier and it was needed
<kerneltoast> i wasn't entirely sure why
<kerneltoast> i can spin it as an optimization
<kerneltoast> but yeah i dunno
<fche> yeah I think the same checks are subsumed in relayfs code proper
<fche> relay_file_read_avail() e.g
<fche> but behind inode_locks and other such guff so yeah I buy optimization
<kerneltoast> s0weet
<kerneltoast> it'll be a somewhat deceptive comment
<kerneltoast> because without that check things explode
<fche> well .... in that case the somewhat deceptive comment should not say that :)
<fche> things explode? really?
<agentzh> fche: ah, just remember i have a bugfix patch for you to review too...
<agentzh> a sec
<agentzh> this bug broke most of the asm routines' unwinding in openssl's libcrypto, for example.
<fche> wow, um how could this bug be there so long, lemme see
<agentzh> it's been bothering us for months if not years...
<agentzh> i had to dig it up by reading a lot of stuff about dwarf myself...
<agentzh> alas.
<agentzh> fortunately the fix is simple.
<agentzh> once i know what it's doing.
<fche> mjw is here but probably asleep
<fche> but yeah that looks more than plausible to me
<agentzh> we're so happy that we can finally unwind openssl's stacks...
<fche> we must not have many DW_OP_deref* ops in the unwind code of other code, wonder what makes openssl different
<fche> inline asm maybe or something else super optimized
<fche> anyway ship it thanks
<agentzh> yeah openssl uses dwarf expressions with DW_OP_deref a lot for its asm routines.
<agentzh> in DW_CFI_def_cfa_expression.
<fche> neat
<agentzh> like this: cfa={[0] breg7 -8; [2] deref; [3] plus_uconst 8}
<agentzh> cool, will push it.
<agentzh> will also do more code review for those "cold" dwarf instructions' handling in the stap unwinder.
<fche> we probably won't see a case in the testsuite for this
<fche> thanks dude
<agentzh> yeah it's rare.
orivej has joined #systemtap
<agentzh> not seen anything like that with gcc/clang's emitted code.
<agentzh> the openssl asm routines also like to use rax as the frame pointer...
<agentzh> and move rsp across the asm routine's boundary like crazy.
<agentzh> oh man.
<agentzh> it's bloody.
<fche> there oughtta be a law
<kerneltoast> fche, just tested it and removing the check is okay. it was necessary on an old patch version. so it's just an optimization
amerey_ has joined #systemtap
<fche> kerneltoast, ok, thanks for checking
* fche makes a note to doubt kerneltoast twice as much from now on :)
* kerneltoast was hungry and confused from all this relay code melting my brain
<fche> go sip on something more substantial than a corner store
<fche> maybe try a train station
<agentzh> kerneltoast: late lunch?
<agentzh> take care, man
<kerneltoast> agentzh, forgot to eat lunch while writing that patch lol
<agentzh> lunch time then :)
<kerneltoast> just had a smoothie yeah
amerey has quit [Ping timeout: 240 seconds]
<kerneltoast> now my brain is fresh and ready for new abuse from fche
<fche> COMING UP
<agentzh> fche: yeah there must be some law there. it must be some weird one.
<irker366> systemtap: sultan systemtap.git:master * release-4.4-69-g175f2b068 / runtime/transport/procfs.c runtime/transport/relay_v2.c: runtime: utilize relay subbufs as much as possible
<kerneltoast> fche, agentzh, print patch shipped :)
<kerneltoast> \o/
amerey has joined #systemtap
amerey_ has quit [Ping timeout: 240 seconds]
<irker366> systemtap: yichun systemtap.git:master * release-4.4-70-ge8ac3e296 / runtime/unwind.c: bugfix: unwinder: expr: DW_OP_push*: we forgot to push the result to the dwarf stack.
<agentzh> kerneltoast: hooray!
amerey has quit [Remote host closed the connection]
<agentzh> kerneltoast: oy, the latest master breaks our lean test suite horrible.
<agentzh> *horribly
<kerneltoast> huh? i've been running it though
<agentzh> kerneltoast: the first test failure: https://gist.github.com/agentzh/bff39bed69d6582eec3413eed5a4be90
<kerneltoast> ah i wasn't using procfs
<fche> agentzh, do you have an odd sparse cpu# smp box?
<agentzh> nope
<kerneltoast> agentzh, using -DSTAP_TRANS_PROCFS=1 even before all of my patches from this week results in the same testsuite failures
<kerneltoast> agentzh, i checked out to before "transport: procfs: fix transposed procfs removal ordering" and still have the same error
<kerneltoast> so it's not anything i did :\
<fche> agentzh, can you fpaste your /proc/cpuinfo -- I'm curious about the apparent mismatch
<fche> anything in dmesg about a different processor enumeration situation?
<kerneltoast> no
<kerneltoast> unless you mean on boot?
<kerneltoast> what message am i looking for
<fche> can also run with -DDEBUG_TRANS=3 or something like that to get a litlte more info
<fche> interesting, note the create_buf_file_callback i=15 case
<fche> that should be the trace15 file
<fche> one way to debug this a bit is to insmod the stap .ko file
<fche> and to then spy on the /proc/systemtap/*/ directory
<fche> there should be a .cmd and traceNNN [0..#cpus-1] files
<kerneltoast> that's in /proc/systemtap/stap_245a2bbf3a9ea733c33e6bc23763f8cb_1274
<fche> that looks lovely