In Bash you can start new processes (theads) on the background simply by running a command with ampersand &. The wait command can be used to wait until all background processes have finished (to wait for a certain process do wait PID where PID is a process ID). So here’s a simple pseudocode for parallel processing:
for ARG in $*; do
command $ARG &
NPROC=$(($NPROC+1))
if [ "$NPROC" -ge 4 ]; then
wait
NPROC=0
fi
done
I.e. you run 4 processes at a time and wait until all of them have finished before executing the next four. This is a sufficient solution if all of the processes take equally long to finish. However this is suboptimal if running time of the processes vary a lot.
A better solution is to track the process IDs and poll if all of them are still running. In Bash $! returns the ID of last initiated background process. If a process is running, the corresponding PID is found in directory /proc/.
Based on the ideas given in a Ubuntu forum thread and a template on command line parsing, I wrote a simple script “parallel” that allows you to run virtually any simple command concurrently.
Assume that you have a program proc and you want to run something like proc *.jpg using three concurrent processes. Then simply do
parallel -j 3 proc *.jpg
The script takes care of dividing the task. Obviously -j 3 stands for three simultaneous jobs.
If you need command line options, use quotes to separate the command from the variable arguments, e.g.
parallel -j 3 "proc -r -A=40" *.jpg
Furthermore, -r allows even more sophisticated commands by replacing asterisks in the command string by the argument:
parallel -j 6 -r "convert -scale 50% * small/small_*" *.jpg
I.e. this executes convert -scale 50% file1.jpg small/small_file1.jpg for all the jpg files. This is a real-life example for scaling down images by 50% (requires imagemagick).
Finally, here’s the script. It can be easily manipulated to handle different jobs, too. Just write your command between #DEFINE COMMAND and #DEFINE COMMAND END.
#!/bin/bash
NUM=0
QUEUE=""
MAX_NPROC=2 # default
REPLACE_CMD=0 # no replacement by default
USAGE="A simple wrapper for running processes in parallel.
Usage: `basename $0` [-h] [-r] [-j nb_jobs] command arg_list
-h Shows this help
-r Replace asterix * in the command string with argument
-j nb_jobs Set number of simultanious jobs [2]
Examples:
`basename $0` somecommand arg1 arg2 arg3
`basename $0` -j 3 \"somecommand -r -p\" arg1 arg2 arg3
`basename $0` -j 6 -r \"convert -scale 50% * small/small_*\" *.jpg"
function queue {
QUEUE="$QUEUE $1"
NUM=$(($NUM+1))
}
function regeneratequeue {
OLDREQUEUE=$QUEUE
QUEUE=""
NUM=0
for PID in $OLDREQUEUE
do
if [ -d /proc/$PID ] ; then
QUEUE="$QUEUE $PID"
NUM=$(($NUM+1))
fi
done
}
function checkqueue {
OLDCHQUEUE=$QUEUE
for PID in $OLDCHQUEUE
do
if [ ! -d /proc/$PID ] ; then
regeneratequeue # at least one PID has finished
break
fi
done
}
# parse command line
if [ $# -eq 0 ]; then # must be at least one arg
echo "$USAGE" >&2
exit 1
fi
while getopts j:rh OPT; do # "j:" waits for an argument "h" doesnt
case $OPT in
h) echo "$USAGE"
exit 0 ;;
j) MAX_NPROC=$OPTARG ;;
r) REPLACE_CMD=1 ;;
\?) # getopts issues an error message
echo "$USAGE" >&2
exit 1 ;;
esac
done
# Main program
echo Using $MAX_NPROC parallel threads
shift `expr $OPTIND - 1` # shift input args, ignore processed args
COMMAND=$1
shift
for INS in $* # for the rest of the arguments
do
# DEFINE COMMAND
if [ $REPLACE_CMD -eq 1 ]; then
CMD=${COMMAND//"*"/$INS}
else
CMD="$COMMAND $INS" #append args
fi
echo "Running $CMD"
$CMD &
# DEFINE COMMAND END
PID=$!
queue $PID
while [ $NUM -ge $MAX_NPROC ]; do
checkqueue
sleep 0.4
done
done
wait # wait for all processes to finish before exit
June 3, 2008 at 21:49
Change:
$CMD &
To:
eval “$CMD &”
If you want to do things like:
par.sh ‘tr -d ” ” * > $(basename * .txt)-stripped.txt’ *.txt
Without the eval it’ll treat > and $(basename…) as arguments to tr.
October 18, 2008 at 13:53
Great script. Curiously when I use it to batch compress a folder of .wav files to .mp3 it doesn’t always take the same amount of time, sometimes finishing around 1m20s, sometimes 1m40s.
October 20, 2008 at 16:30
> Paul
Good point, never thought of that.
> Leon Roy
Hmm, maybe you have some other processes running that occasionally steal your cputime. There’s an command line utility called htop that you can use to monitor what your CPUs are actually doing…
December 1, 2008 at 12:03
Thank you, great script!
Using it to spawn some server instances.
Any idea how to keep track (in Bash) the spawned
processes and kill them after N seconds?
Joe
December 2, 2008 at 9:40
Thanks!
I don’t know, I never tried that. But since you have the PIDs, you could poll for the run times in the checkqueue routine and terminate processes if necessary. I suppose that there is a way for getting run times in Bash.