fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
hpt has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
hpt has quit [Ping timeout: 256 seconds]
hpt has joined #systemtap
hpt has quit [Ping timeout: 256 seconds]
khaled_ has quit [Quit: Konversation terminated!]
orivej has joined #systemtap
hpt has joined #systemtap
orivej has quit [Ping timeout: 246 seconds]
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
hpt has quit [Ping timeout: 240 seconds]
khaled has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 264 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #systemtap
derek0883 has joined #systemtap
derek0883 has quit [Read error: Connection reset by peer]
derek0883 has joined #systemtap
derek0883 has quit [Remote host closed the connection]
derek0883 has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
<kerneltoast>
fche, bonjour
<kerneltoast>
out fishing i guess?
khaled has quit [Remote host closed the connection]
irker718 has joined #systemtap
<irker718>
systemtap: sultan systemtap.git:master * release-4.4-47-gb26b4e2c2 / runtime/linux/task_finder2.c: task_finder2: fix panics due to broken task work cancellation
khaled has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
<kerneltoast>
fche, that patch i just pushed didn't fix the issue entirely. now there's a new, rarer bug when `tf_work == tmp`
<kerneltoast>
because list_for_each_entry_safe doesn't expect you to free the next element of the list
<fche>
maybe a while(list-not-empty) { work = .... stp_kfree (tf_work) ; } instead of that list_for_each_entry_safe ?
<fche>
where node = list head every time?
<kerneltoast>
can't because list head could be an active task worker that we always fail to cancel
<fche>
aha. is that the only case of concern for list_for_each .. ?\
<kerneltoast>
yeah, if we free a node way later into the list or if we free the current node, it's no problem
<kerneltoast>
but if we free the next node, everything combusts
<fche>
can't go backward can it?
<kerneltoast>
nope
<fche>
ok, please add a comment to the code to explain why
<kerneltoast>
of course
orivej has joined #systemtap
<kerneltoast>
fche, i have a theory that there could be another issue in here
<kerneltoast>
so, when __stp_tf_cancel_all_task_work() finishes, there can still be taskfinder task workers running
<kerneltoast>
when that happens, we rely on the belt and suspenders in the _stp_task_work API to wait until all task workers are freed before unloading the module
<kerneltoast>
the potential problem i see is that the instructions to notify the _stp_task_work API of a task worker finishing are not the last ones in the actual task worker function
<kerneltoast>
take a look at __stp_tf_task_worker_fn()
<kerneltoast>
tf_work->func(&tf_work->work); <--- this is where the _stp_task_work API will be notified of the final task worker being freed
<fche>
can we not wait until all are done before we let cancel_all_task_work return?
<kerneltoast>
any ideas for how to do that?
<kerneltoast>
we could also move some code around so that tf_work->func() is not responsible for notifying the _stp_task_work API
<fche>
how to wait until all are done? wait until we have no "fail to cancel" in __stp_tf_cancel_all_task_work ?
<kerneltoast>
but we have to release the lock for that to occur
<irker718>
systemtap: sultan systemtap.git:master * release-4.4-48-g96470399a / runtime/linux/task_finder2.c: task_finder2: fix list corruption in __stp_tf_cancel_all_task_work()
<irker718>
systemtap: sultan systemtap.git:master * release-4.4-49-g6cb54128e / runtime/linux/task_finder2.c: task_finder2: fix task worker race on module unload