如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


当前回答

我看到这里列出了很多很好的例子,我也想把我的举出来。

#! /bin/bash

items="1 2 3 4 5 6"
pids=""

for item in $items; do
    sleep $item &
    pids+="$! "
done

for pid in $pids; do
    wait $pid
    if [ $? -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of $?"
    else
        echo "FAILED - Job $pid exited with a status of $?"
    fi
done

我使用非常类似的方法并行启动/停止服务器/服务,并检查每个退出状态。对我来说很好。希望这能帮助到一些人!

其他回答

我有一个类似的情况,但有各种各样的问题与循环子shell,确保这里的其他解决方案不能工作,所以我让我的循环编写脚本,我将运行,等待结束。有效:

#!/bin/bash
echo > tmpscript.sh
for i in `seq 0 9`; do
    echo "doCalculations $i &" >> tmpscript.sh
done
echo "wait" >> tmpscript.sh
chmod u+x tmpscript.sh
./tmpscript.sh

愚蠢,但简单,并帮助调试一些事后的事情。

如果我有时间,我会更深入地了解GNU并行,但这对我自己的“doCalculations”过程来说很困难。

这里有一个使用等待的简单例子。

运行一些进程:

$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &

然后用wait命令等待他们:

$ wait < <(jobs -p)

或者只是等待(没有争论)。

这将等待后台的所有作业完成。

如果提供了-n选项,则等待下一个作业终止并返回其退出状态。

参见:帮助等待和帮助作业语法。

然而,缺点是这将只返回最后一个ID的状态,因此您需要检查每个子流程的状态并将其存储在变量中。

或者让你的计算函数在失败时创建一些文件(空的或有失败日志的),然后检查该文件是否存在。

$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.

这是有效的,应该是一个很好的,如果不是更好的@HoverHell的答案!

#!/usr/bin/env bash

set -m # allow for job control
EXIT_CODE=0;  # exit code of overall script

function foo() {
     echo "CHLD exit code is $1"
     echo "CHLD pid is $2"
     echo $(jobs -l)

     for job in `jobs -p`; do
         echo "PID => ${job}"
         wait ${job} ||  echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
     done
}

trap 'foo $? $$' CHLD

DIRN=$(dirname "$0");

commands=(
    "{ echo "foo" && exit 4; }"
    "{ echo "bar" && exit 3; }"
    "{ echo "baz" && exit 5; }"
)

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    echo "$i ith command has been issued as a background job"
done

# wait for all to finish
wait;

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"

# end

当然,我已经在一个NPM项目中保存了这个脚本,它允许你并行运行bash命令,对测试很有用:

https://github.com/ORESoftware/generic-subshell

这里已经有很多答案了,但我很惊讶似乎没有人建议使用数组……这就是我所做的——这可能在将来对一些人有用。

n=10 # run 10 jobs
c=0
PIDS=()

while true

    my_function_or_command &
    PID=$!
    echo "Launched job as PID=$PID"
    PIDS+=($PID)

    (( c+=1 ))

    # required to prevent any exit due to error
    # caused by additional commands run which you
    # may add when modifying this example
    true

do

    if (( c < n ))
    then
        continue
    else
        break
    fi
done 


# collect launched jobs

for pid in "${PIDS[@]}"
do
    wait $pid || echo "failed job PID=$pid"
done

下面是我的版本,适用于多个pid,如果执行时间过长,则记录警告,如果执行时间超过给定值,则停止子进程。

[编辑]我已经在https://github.com/deajan/ofunctions上传了我的WaitForTaskCompletion的新实现,称为ExecTasks。 还有一个用于WaitForTaskCompletion的compat层 (/编辑)

function WaitForTaskCompletion {
    local pids="${1}" # pids to wait for, separated by semi-colon
    local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
    local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
    local caller_name="${4}" # Who called this function
    local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       

    Logger "${FUNCNAME[0]} called by [$caller_name]."

    local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once 
    local log_ttime=0 # local time instance for comparaison

    local seconds_begin=$SECONDS # Seconds since the beginning of the script
    local exec_time=0 # Seconds since the beginning of this function

    local retval=0 # return value of monitored pid process
    local errorcount=0 # Number of pids that finished with errors

    local pidCount # number of given pids

    IFS=';' read -a pidsArray <<< "$pids"
    pidCount=${#pidsArray[@]}

    while [ ${#pidsArray[@]} -gt 0 ]; do
        newPidsArray=()
        for pid in "${pidsArray[@]}"; do
            if kill -0 $pid > /dev/null 2>&1; then
                newPidsArray+=($pid)
            else
                wait $pid
                result=$?
                if [ $result -ne 0 ]; then
                    errorcount=$((errorcount+1))
                    Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
                fi
            fi
        done

        ## Log a standby message every hour
        exec_time=$(($SECONDS - $seconds_begin))
        if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
            if [ $log_ttime -ne $exec_time ]; then
                log_ttime=$exec_time
                Logger "Current tasks still running with pids [${pidsArray[@]}]."
            fi
        fi

        if [ $exec_time -gt $soft_max_time ]; then
            if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
                Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
                soft_alert=1
                SendAlert

            fi
            if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
                Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
                kill -SIGTERM $pid
                if [ $? == 0 ]; then
                    Logger "Task stopped successfully"
                else
                    errrorcount=$((errorcount+1))
                fi
            fi
        fi

        pidsArray=("${newPidsArray[@]}")
        sleep 1
    done

    Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
    if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
        Logger "Stopping execution."
        exit 1337
    else
        return $errorcount
    fi
}

# Just a plain stupid logging function to be replaced by yours
function Logger {
    local value="${1}"

    echo $value
}

例如,等待所有三个进程完成,如果执行时间超过5秒,则记录警告,如果执行时间超过120秒,则停止所有进程。不要在失败时退出程序。

function something {

    sleep 10 &
    pids="$!"
    sleep 12 &
    pids="$pids;$!"
    sleep 9 &
    pids="$pids;$!"

    WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting