fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
derek0883 has quit [Remote host closed the connection]
mjw has quit [Quit: Leaving]
orivej has quit [Ping timeout: 246 seconds]
lijunlong has quit [Read error: Connection reset by peer]
derek0883 has joined #systemtap
lijunlong has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<irker300> systemtap: fche systemtap.git:master * release-4.4-1-g1f608d213 / runtime/procfs.c runtime/transport/procfs.c: PR26665: relayfs-on-procfs megapatch, rhel6 tweaks
<irker300> systemtap: fche systemtap.git:master * release-4.4-2-g931e0870a / po/cs.po po/en.po po/fr.po po/pl.po po/systemtap.pot: releng: update-po
hpt has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
<kerneltoast> fche, hiya
<fche> too late
<fche> going to bed
<kerneltoast> hey it was a quick thing
<fche> NOT WANTING TO HEAR OF ANOTHER BUG
<fche> nope
<fche> NO SIR
<fche> NOOOO SIR.
<kerneltoast> *is a quick thing
<fche> ok go ahead
<kerneltoast> turns out checking preempt_count() was nonsense because the centos 7 kernel doesn't have CONFIG_PREEMPT_COUNT enabled
<kerneltoast> so disregard any of my findings involving preempt_count()
<irker300> systemtap: fche systemtap.git:master * release-4.4-3-g34e62f15d / runtime/stp_utrace.c: RHBZ1892179: handle exhausted stp_task_work structs
<kerneltoast> and lend me a shoulder to cry on
<kerneltoast> as i start debugging this back from step 1
<kerneltoast> :))))))))))))
<fche> haha shpi it
<kerneltoast> kILL mE
<fche> ummmm no thanks
<kerneltoast> sigkill me?
<kerneltoast> how about sigsegv
<kerneltoast> anyway feel free to resume your canadian things
<fche> those things consist of ... well, sleeping.
khaled__ has quit [Quit: Konversation terminated!]
derek0883 has quit [Remote host closed the connection]
<agentzh> fche: lockdep kernel found some old utrace deadlock bug which kerneltoast has prepared a patch. we're currently testing it. will share it here for your review. just FYI. sleep well :)
derek0883 has joined #systemtap
<kerneltoast> fche, check out this lockdep warning i got while trying to smoke out the lockup on a debug kernel: https://gist.github.com/kerneltoast/eb823dac163412193d7e283eb0845987
<kerneltoast> mutex_trylock used inside an interrupt
<kerneltoast> * This function must not be used in interrupt context. The
<kerneltoast> * mutex must be released by the same task that acquired it.
_whitelogger has joined #systemtap
orivej has joined #systemtap
derek0883 has quit [Remote host closed the connection]
<agentzh> kerneltoast fche: okay, it was introduced by one of my recent patches, like a few months ago?
<agentzh> i didn't know mutex_trylock() is unsafe in interrupt contexts.
<agentzh> according to kerneltoast, maybe we can simply disable preempts around that lock?
<agentzh> otherwise we would have to avoid print flushing in all those interrupt contexts.
<kerneltoast> disable interrupts, you mean
<agentzh> i'm fine with either approach. the 2nd approach would require much larger changeset. we currently have no reliable way to know if we are in an interrupt context.
<agentzh> the probe handler caller has to pass down such info.
<agentzh> oh yeah, disable interrupts. sorry.
beauty1 has quit [Ping timeout: 260 seconds]
<agentzh> this is a fundamental problem we have to address cleanly.
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 244 seconds]
beauty1 has joined #systemtap
<kerneltoast> fche, a debug kernel caught the soft lockup and left a beautiful backtrace: https://gist.github.com/kerneltoast/d0661fa02773a2f2e30b8697261587f6
orivej has quit [Ping timeout: 246 seconds]
<kerneltoast> the .c file for the faulting stap module is in that gist as well (scroll down)
<agentzh> kerneltoast: seems like the first error is "BUG: unable to handle kernel paging request at ffffe8ffff621000"?
<agentzh> instead of the soft lockup?
orivej has joined #systemtap
<agentzh> it's in kretprobes, it seems.
<agentzh> and also in stp_lock_probe().
hpt has quit [Ping timeout: 265 seconds]
khaled__ has joined #systemtap
mjw has joined #systemtap
<fche> agentzh, kerneltoast, 'morning
<fche> both those tracebacks appear to relate to printing
<fche> yeah that inode trylock stuff sounds agentzh-ish familiar, doesn't it
<fche> and yeah we should be strictly atomic down in this part of the code, non-mutexy
orivej has quit [Ping timeout: 265 seconds]
<fche> agentzh, kerneltoast, guys, I feel so let down with you not being here all night with mne
<fche> it's just not the same
tromey has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
xlei has quit [Quit: ZNC - https://znc.in]
xlei has joined #systemtap
<kerneltoast> fche, as much as I'd love to debug stap at 3am, i do enjoy sleeping :)
derek0883 has joined #systemtap
<fche> it's a tossup for me
<fche> sleep or debugging
<fche> tossing a coin every night
<kerneltoast> true, I've made major breakthroughs at 4am
<kerneltoast> and then when i read my code the next morning and realize i pushed it, i start feeling a knot in my stomach
<kerneltoast> I have a no-push rule after midnight
<fche> every moment is after -some- midnight tho
<kerneltoast> oh boy, does that mean i've been pushing garbage this whole time?
<fche> has always been . gif
<kerneltoast> i thought that was a static meme
<kerneltoast> not a gif
<fche> close enough!
* fche is old enough to remember gifs used as static images
<fche> the Before Times of <1992 when JPEG came on the scene
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
<kerneltoast> wow, before i was born
<kerneltoast> that must've been right after task finder was written
<fche> no, no, task finder comes from the paleozolic era
<fche> :)
<kerneltoast> ah, of course. i meant task finder2
<fche> :)
<fche> now now anyway it's fine code for its day and its context, and glad we're making it better
<kerneltoast> i wonder how often the original stap authors thought about giving up
<fche> NEVER
<fche> never give up, never surrender
<kerneltoast> there's just so much duct tape
<kerneltoast> and then duct tape to reinforce the duct tape
<kerneltoast> and then a kernel update breaks one layer of duct tape
<kerneltoast> so you gotta duct tape that
<fche> well, such is life with the (lack of) constraints of the kernel api evolution
<fche> but that's ok
<fche> so anyway
<fche> did y'all have any good news re. the bugs being chased down?
<kerneltoast> i going through https://gist.github.com/kerneltoast/d0661fa02773a2f2e30b8697261587f6 in ghidra right now...
<kerneltoast> the crash happened here because `str` was invalid:
<kerneltoast> for (i = 0; i < len && str <= end; ++i)
<kerneltoast> *str++ = *ptr++;
<kerneltoast> else/* %s format */
<kerneltoast> and that snippet is from _stp_vsprint_memory, which was called from stp_printf_2
<kerneltoast> stp_printf_2 generates `str` as follows:
<kerneltoast> str = (char*)_stp_reserve_bytes(num_bytes);
<kerneltoast> so the pointer from _stp_reserve_bytes was poop
<fche> well that's a poopy development
<fche> what was the ptr?
<kerneltoast> BUG: unable to handle kernel paging request at ffffe8ffff621000
<kerneltoast> must be use-after-free i guess
<fche> weird, wonder where that range of values points to
<kerneltoast> you mean that address?
<fche> yes
<kerneltoast> well, dmesg says the faulting process' task struct address is ffff88001f4f0000, and this is a little bit after that
<kerneltoast> so probably just kernel heap
<kerneltoast> (since task structs are allocated from the kernel heap)
<kerneltoast> i would try and find out when the values from _stp_reserve_bytes aren't safe to use
<fche> maybe back up on that inode mutex thingamabob from earlier on, and switch the mindset to "this must be atomic code"
<kerneltoast> not, "this must be atomic code", because it is atomic
<kerneltoast> but a mutex cannot be owned by interrupt context because it's not backed by a task struct
<kerneltoast> so the mutex owner would be nonsense, and that's what lockdep was warning about
<fche> yeah.
irker300 has quit [Quit: transmission timeout]
<kerneltoast> okay, it's an out of bounds issue
<kerneltoast> crash> kmem ffffe8ffff621000
<kerneltoast> ffffe8ffff621000: kernel virtual address not found in mem map
<kerneltoast> crash> kmem ffffe8ffff620fff
<kerneltoast> kmem: WARNING: cannot make virtual-to-physical translation: ffffe8ffff621000
<kerneltoast> VMAP_AREA VM_STRUCT ADDRESS RANGE SIZE
<kerneltoast> ffff880174af3800 ffff880174aea700 ffffe8fffde00000 - ffffe8ffffe00000 33554432
<kerneltoast> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
<kerneltoast> ffffea0005704a40 15c129000 0 ffff880174ae9e00 1 2fffff00000000
<kerneltoast> crash> kmem ffffe8ffff621000
<kerneltoast> kmem: WARNING: cannot make virtual-to-physical translation: ffffe8ffff621000
<kerneltoast> ffffe8ffff621000: kernel virtual address not found in mem map
<kerneltoast> crash>
<kerneltoast> oops i pasted too much
<kerneltoast> ffffe8ffff621000 is the faulting address, ffffe8ffff620fff is 1 byte before the faulting address
<kerneltoast> ffffe8ffff620fff is valid, ffffe8ffff621000 is not
<kerneltoast> so da loopdy loop went too far
<fche> hey, getting used to crash(1), good job
<fche> can you tell whether the loop has in fact overflown the thing, vs. started off at the bad address?
<kerneltoast> lemme see which register has `i` in it...
<kerneltoast> back into ghidra we go
<kerneltoast> `i` got optimized out, nice
<fche> step up out of the function and look there?
<kerneltoast> ok, the `end` pointer is in %r13, which is ffffe8ffff621004
<kerneltoast> so it started off at a bad address
<kerneltoast> _stp_reserve_bytes didn't reserve enough bytez
<kerneltoast> _stp_reserve_bytes relies on _stp_print_flush to free up space
<kerneltoast> but _stp_print_flush can return prematurely due to lock contention
<kerneltoast> in which case pb->len will not be adjusted
<kerneltoast> err no, pb->len is just set to 0
<kerneltoast> in stp_print_flush
<kerneltoast> before anything else is done
<kerneltoast> i suspect the issue is that the prints can be called from inside an interrupt, which ties back to the other bug
<kerneltoast> so pb->len is unreliable. we need to disable interrupts when reading pb->len inside _stp_reserve_bytes
<fche> lock contention -> dropped data is an acceptable price
<kerneltoast> yeah
<fche> but yeah when I say "atomic" it means "callable from any context, incl. interrupts etc."
<kerneltoast> supermegaatomic
<kerneltoast> so there are a couple things we can do
<kerneltoast> we could forbid printing inside interrupt probes at the translation layer
<kerneltoast> which would take care of the mutex bug as well
<kerneltoast> but i dunno how to catch every probe that's in an interrupt
<kerneltoast> so that might not be feasible
<kerneltoast> we can still do printing inside an interrupt, but we can only add those messages to the buffer. flushing isn't allowed
<kerneltoast> in that case, we'd just need to sprinkle in some local_irq_save into _stp_reserve_bytes and _stp_unreserve_bytes
<kerneltoast> and then add some machinery to forbid _stp_print_flush from running in an interrupt
<kerneltoast> this is a pain to fix :)
<fche> suppressing the flush is probably a manageable thing
<fche> heck it's probably a job for another task_work kind of callback maybe
<kerneltoast> no easy way to check if we're in an interrupt though
<kerneltoast> yeah we can dump it onto a regular worker but then we'll have no way to clear up the buffer from _stp_reserve_bytes
<fche> ok, like for the task-finder, delegating things to a task_work by default sounds fine to me
<kerneltoast> so we're gonna be dropping messages like crazy
<fche> well let's hope not
<kerneltoast> this check inside _stp_reserve_bytes will be converted to a message drop:
<kerneltoast> if (unlikely(numbytes > size))
<kerneltoast> _stp_print_flush();
<fche> yeah ok
<fche> (there is a 'dropped' counter)
<fche> and we could use this opportuhnity to bump up default buffer sizes if indeed this is hit much more
<kerneltoast> i don't see any dropped counter being used
<kerneltoast> there's no such thing in the parent function calling _stp_reserve_bytes
<fche> hmmmmmmmmmmmmmmm
<fche> atomic_inc(&_stp_relay_data.dropped); <<< there's that but I thought we had another
<fche> (that's relay_v2.c)
<kerneltoast> if str is null it just silently drops the message
<fche> intelesting
<fche> well
<fche> let's add an atomic_t counter of those events happening and then figure out when/how to best print them
<fche> probably near sthutdown an stp_warn() thing
<fche> my my I'm surprised at a call to stp_print_flush from deep down atomic code like that, ... does seem fishy!
<kerneltoast> the atomic counter would need to be added inside translate.cxx and i think my brain will explode if i try doing it
<kerneltoast> unless you want it straight inside _stp_reserve_bytes
<fche> in whatever file has that function, yes
<kerneltoast> time to rewrite runtime/linux/print.c
<fche> on the bright side it's not big
<fche> but rewrite? er probably unnecessary.
<kerneltoast> the problem goes much deeper than _stp_reserve_bytes
<kerneltoast> there's also _stp_print and _stp_print_char
<kerneltoast> and _stp_vlog in runtime/linux/io.c
<kerneltoast> which uses a pointer allocated inside print.c (Stp_lbuf)
<kerneltoast> and also calls print_flush
<kerneltoast> though at least it's not relying on _stp_print_flush to free up space in the buffer
<fche> yeah all this code should be atomic, so print_flush needs to become atomic (and not assumed capable of freeing up space)
<kerneltoast> yep and that's why it's gonna look like a rewrite
<fche> hm there we have that _stp_transport_failure atomic in print_flush.c
<kerneltoast> that should be moved into print.c i guess
<fche> yeah probably
<fche> ok
<fche> wdyt, would have it done by tomorrow morning, say? who needs to sleep??!?!?!
<fche> :-)
<kerneltoast> does your irc client support utf?
<kerneltoast> 😊🔫
<fche> dunno if that's a steak or texas on the right
<kerneltoast> squint harder
<fche> oh god .... "watergun" emoji .... is that the PC version of "gun" or is that a separate thing?
<fche> Unicode Character 'PISTOL' (U+1F52B) sigh
<kerneltoast> you can thank apple for initially ruining it
<kerneltoast> then everyone else followed suit
<fche> thanks apple for initially ruining it etc. etc. etc.
<fche> ok
<fche> ok so thanks for digging into this aread
<kerneltoast> i'd like to thank my friend ghidra for making this possible
<kerneltoast> along with objdump and crash(1)
orivej has quit [Ping timeout: 246 seconds]
<fche> wait
<kerneltoast> i'm waiting
<fche> I'm not giving you a trophy or something, and this is not a stage
<fche> there is no fancy-dressup party
<fche> so save your "thanks to your agent and your publicist" for later :)
<kerneltoast> can i thank my agentzh?
<fche> please
khaled__ has quit [Quit: Konversation terminated!]
tromey has quit [Quit: ERC (IRC client for Emacs 27.1.50)]
zodbot has quit [Read error: Connection reset by peer]
derek0883 has quit [Remote host closed the connection]
<agentzh> seems like we need more kernel geeks in EU to cover 24x7 :D
<agentzh> oh, aisa too.
<agentzh> *Asia
<agentzh> fche kerneltoast: i think we need a way to pass down the info about whether it is an interrupt context probe.
<agentzh> we lack such info atm.
<agentzh> fed by the probe handler caller.
<agentzh> in_atomic() and in_interrupt() are unreliable as we've seen.
<agentzh> fche: you mentioned you were working on the new transport impl. how was that going?
<kerneltoast> agentzh, that seems more difficult to orchestrate
<agentzh> do we need to coordinate?
<fche> it's not that new or big
<fche> and it's in
<agentzh> okay
<fche> agentzh, can't really pass in ... we can' tgenerally know
<fche> so assume YES
<agentzh> so we just assume the worst?
<fche> the best :)
<agentzh> then we would never do print_flush() in probe handler?
<fche> yes
<agentzh> that's...sad...
<agentzh> we may flood the work queue?
derek0883 has joined #systemtap
<kerneltoast> flooding a workqueue is not a problem
<agentzh> okay
<agentzh> kerneltoast: you mentioned that we could simply disable interrupts around the mutex trylock thing?
<agentzh> that's already off the table now?
<kerneltoast> agentzh, yeah that's not correct
<agentzh> okay
<kerneltoast> i was mistaken
<agentzh> then it'll be a relatively large patch.
<kerneltoast> we can't acquire the mutex at all inside an interrupt, as we discussed
<kerneltoast> yep big patch
<agentzh> okay, i saw the messages now.
<kerneltoast> the print_flush will be delegated to schedule_work_on()
<agentzh> gotcha
<kerneltoast> and we might need to increase the buffer size if we end up dropping a lot of messages
<agentzh> so many fundamental changes lately :)
<kerneltoast> yep, no rest for me
<kerneltoast> fix one bug and another comes crawling out :)
<agentzh> i'm worried about large buffers.
<agentzh> since we need to run it in embedded systems....
<agentzh> where ram is very limited.
<kerneltoast> stap is already quite liberal with its memory usage though
<agentzh> i know.
<agentzh> just don't want it to be a lot worse :)
<kerneltoast> how small of a system are you thinking of?
<agentzh> 512m
<agentzh> 1g at most...
<agentzh> some servers at also in this range.
<agentzh> *are
<agentzh> like the cheapest cloud servers.
<kerneltoast> stap currently works fine on these systems?
<agentzh> quirky on 512m, fine on 1g.
<agentzh> we have a lot of 1g servers ourselves in our mini CDN network.
<agentzh> it's been fine.
<agentzh> also we may have to run stap to debug large memory usage when the system is already swapping and etc...
<agentzh> so the smaller the better...
<agentzh> oh well fortunately we only output stuff in probe end {} and probe timer.s() ourselves.
<agentzh> not very frequently.
<agentzh> it'll be crazy to print() inside probe timer.profile.
zodbot has joined #systemtap
<agentzh> some stap tests do that for the sake of testing.
<agentzh> but that's it.
<kerneltoast> and that makes it all explode. i hope that's the cause for the lockup, and we'll get all three of these issues knocked out in one patch
<agentzh> the kernel space programming is a horror world due to all those interrupts, preempts, and locks.
<agentzh> i had a lot of sleepless nights due to those things.
<kerneltoast> hopefully you're sleeping better now :)
<agentzh> lockdep makes my life easier.
<kerneltoast> though maybe not while i'm running stuff on your machines
<agentzh> yeah, thanks to kerneltoast :)
<agentzh> i sleep a lot better now :)
<agentzh> don't worry about machines, we now shut the doors of the bedroom :)
<agentzh> so all yours.
<kerneltoast> awesome
derek0883 has quit [Remote host closed the connection]
<agentzh> yeah, fche doesn't mind big patches.
<agentzh> as long as you run his giant test suite :)
<fche> run it twice for good luck ;)
<agentzh> or even more times...
* agentzh has already offered his big machines for kerneltoast to burn.
* agentzh has solar panels on his roof fortunately.
wcohen has quit [*.net *.split]
CME has quit [*.net *.split]
wcohen has joined #systemtap
CME has joined #systemtap
mjw has quit [Quit: Leaving]
fLiPr3VeRsE has quit [*.net *.split]
modem has quit [*.net *.split]
xlei has quit [*.net *.split]
sscox has quit [*.net *.split]
eichiro has quit [*.net *.split]
eichiro has joined #systemtap
xlei has joined #systemtap
sscox has joined #systemtap
zodbot has quit [*.net *.split]
jistone has quit [*.net *.split]
modem has joined #systemtap
fLiPr3VeRsE has joined #systemtap
DTEIT has quit [*.net *.split]
DTEIT has joined #systemtap
zodbot has joined #systemtap
jistone has joined #systemtap
DUKENUKEM has quit [*.net *.split]
darvon has quit [*.net *.split]
serhei has quit [*.net *.split]
tux3 has quit [*.net *.split]
thibaultcha has quit [*.net *.split]
serhei has joined #systemtap
darvon has joined #systemtap
DUKENUKEM has joined #systemtap
* fche has bad news about the coming season
thibaultcha has joined #systemtap
tux3 has joined #systemtap
ema has quit [*.net *.split]
zamba has quit [*.net *.split]
zamba has joined #systemtap
ema has joined #systemtap
gavinguo___ has quit [*.net *.split]
gavinguo___ has joined #systemtap
lijunlong has quit [*.net *.split]
pviktori has quit [*.net *.split]
przemoc has quit [*.net *.split]
kerneltoast has quit [*.net *.split]
pviktori has joined #systemtap
przemoc has joined #systemtap
lijunlong has joined #systemtap
kerneltoast has joined #systemtap
ggherdov has quit [*.net *.split]
xar- has quit [*.net *.split]
lindi- has quit [*.net *.split]
tonyj has quit [*.net *.split]
agentzh has quit [*.net *.split]
fche has quit [*.net *.split]
xar- has joined #systemtap
ggherdov has joined #systemtap
lindi- has joined #systemtap
tonyj has joined #systemtap
agentzh has joined #systemtap
fche has joined #systemtap
ggherdov has quit [Ping timeout: 254 seconds]
derek0883 has joined #systemtap
khaled__ has joined #systemtap
ggherdov has joined #systemtap
derek0883 has quit [Ping timeout: 246 seconds]
derek0883 has joined #systemtap
orivej has joined #systemtap