I recently discovered the xargs --max-procs
feature.
How can split the output of the command by proc? Should I just create a mycommand --logfile $LOGFILE
, or can I do it from xargs itself?
An example (for womble):
Suppose I have script myprocessor.sh, and a list of files. They can go in any order, but i want to keep the logging for each separate, then:
find $MY_FILE_TREE --print0 | xargs --null --max-procs 3 --max-args 1 --no-run-if-empty myprocess.sh
might be the parallel job I want to run. If myprocessor.sh
is mouthy, then I'd like to be able to have each invocation print to a different log. Otherwise the stdout for each is the same, and the logs get jumbled.
-
You could do this by running your xargs command through a shell - this will let you redirect the output - something like this:
find blah -type f | xargs -I{} -P 4 -n 1 sh -c 'yourcommand --input {} > {}.output'
...you'll probably have to tweak it a bit - xargs replaces {} with the item/file it's working on
Gregg Lind : that replacement is a little hairy when the find output is rooted paths, but it's a good idea!From James -
You could change your script so that on startup it'll choose a random number/text, then prefix each line with this number? Then you can later split it using grep.
From Rory McCann -
GNU Parallel http://www.gnu.org/software/parallel/ seems to be made for you:
find $MY_FILE_TREE --print0 | parallel --null --max-procs 3 --max-args 1 --no-run-if-empty myprocess.sh ">" {}.output
or shorter:
find $MY_FILE_TREE --print0 | parallel -0 -j3 -r myprocess.sh ">" {}.output
Watch the intro video: http://www.youtube.com/watch?v=OpaiGYxkSuQ
From Ole Tange
0 comments:
Post a Comment