Monday, March 28, 2011

How to terminate script's process tree in Cygwin bash from bash script

I have a Cygwin bash script that I need to watch and terminate under certain conditions - specifically, after a certain file has been created. I'm having difficulty figuring out how exactly to terminate the script with the same level of completeness that Ctrl+C does, however.

Here's a simple script (called test1) that does little more than wait around to be terminated.

#!/bin/bash

test -f kill_me && rm kill_me

touch kill_me
tail -f kill_me

If this script is run in the foreground, Ctrl+C will terminate both the tail and the script itself. If the script is run in the background, a kill %1 (assuming it is job 1) will also terminate both tail and the script.

However, when I try to do the same thing from a script, I'm finding that only the bash process running the script is terminated, while tail hangs around disconnected from its parent. Here's one way I tried (test2):

#!/bin/bash

test -f kill_me && rm kill_me

(
    touch kill_me
    tail -f kill_me
) &

while true; do
    sleep 1
    test -f kill_me && {
        kill %1
        exit
    }
done

If this is run, the bash subshell running in the background is terminated OK, but tail still hangs around.

If I use an explicitly separate script, like this, it still doesn't work (test3):

#!/bin/bash

test -f kill_me && rm kill_me

# assuming test1 above is included in the same directory
./test1 &

while true; do
    sleep 1
    test -f kill_me && {
        kill %1
        exit
    }
done

tail is still hanging around after this script is run.

In my actual case, the process creating files is not particularly instrumentable, so I can't get it to terminate of its own accord; by finding out when it has created a particular file, however, I can at that point know that it's OK to terminate it. Unfortunately, I can't use a simple killall or equivalent, as there may be multiple instances running, and I only want to kill the specific instance.

From stackoverflow
  • This script looks like it'll do the job:

    #!/bin/bash
    # Author: Sunil Alankar
    
    ##
    # recursive kill. kills the process tree down from the specified pid
    #
    
    # foreach child of pid, recursive call dokill
    dokill() {
        local pid=$1
        local itsparent=""
        local aprocess=""
        local x=""
        # next line is a single line
        for x in `/bin/ps -f | sed -e '/UID/d;s/[a-zA-Z0-9_-]\{1,\}
    \{1,\}\([0-9]\{1,\}\) \{1,\}\([0-9]\{1,\}\) .*/\1 \2/g'`
        do
            if [ "$aprocess" = "" ]; then
                aprocess=$x
                itsparent=""
                continue
            else
                itsparent=$x
                if [ "$itsparent" = "$pid" ]; then
                    dokill $aprocess
                fi
                aprocess=""
            fi
        done
        echo "killing $1"
        kill -9 $1 > /dev/null 2>&1
    }
    
    case $# in
    1) PID=$1
            ;;
    *) echo "usage: rekill <top pid to kill>";
            exit 1;
            ;;
    esac
    
    dokill $PID
    
    Barry Kelly : The script doesn't work unmodied in Cygwin, but it was a starting point. Upvoted, but with a working script in my own answer.
  • Adam's link put me in a direction that will solve the problem, albeit not without some minor caveats.

    The script doesn't work unmodified under Cygwin, so I rewrote it, and with a couple more options. Here's my version:

    #!/bin/bash
    
    function usage
    {
        echo "usage: $(basename $0) [-c] [-<sigspec>] <pid>..."
        echo "Recursively kill the process tree(s) rooted by <pid>."
        echo "Options:"
        echo "  -c        Only kill children; don't kill root"
        echo "  <sigspec> Arbitrary argument to pass to kill, expected to be signal specification"
        exit 1
    }
    
    kill_parent=1
    sig_spec=-9
    
    function do_kill # <pid>...
    {
        kill "$sig_spec" "$@"
    }
    
    function kill_children # pid
    {
        local target=$1
        local pid=
        local ppid=
        local i
        # Returns alternating ids: first is pid, second is parent
        for i in $(ps -f | tail +2 | cut -b 10-24); do
            if [ ! -n "$pid" ]; then
                # first in pair
                pid=$i
            else
                # second in pair
                ppid=$i
                (( ppid == target && pid != $$ )) && {
                    kill_children $pid
                    do_kill $pid
                }
                # reset pid for next pair
                pid=
            fi
        done
    
    }
    
    test -n "$1" || usage
    
    while [ -n "$1" ]; do
        case "$1" in
            -c)
                kill_parent=0
                ;;
    
            -*)
                sig_spec="$1"
                ;;
    
            *)
                kill_children $1
                (( kill_parent )) && do_kill $1
                ;;
        esac
        shift
    done
    

    The only real downside is the somewhat ugly message that bash prints out when it receives a fatal signal, namely "Terminated", "Killed" or "Interrupted" (depending on what you send). However, I can live with that in batch scripts.

  • /bin/kill (the program, not the bash builtin) interprets a negative PID as “kill the process group” which will get all the children too.

    Changing

    kill %1
    

    to

    /bin/kill -- -$$
    

    works for me.

    Barry Kelly : Thanks! It's odd that that is not documented in the Cygwin kill manpage. It does, however, work with the Cygwin version of kill.

0 comments:

Post a Comment