Tuesday, January 25, 2011

xargs --max-proc split output per proc?

I recently discovered the xargs --max-procs feature.

How can split the output of the command by proc? Should I just create a mycommand --logfile $LOGFILE, or can I do it from xargs itself?

An example (for womble):

Suppose I have script myprocessor.sh, and a list of files. They can go in any order, but i want to keep the logging for each separate, then:

find $MY_FILE_TREE --print0 | xargs --null --max-procs 3 --max-args 1 --no-run-if-empty myprocess.sh  

might be the parallel job I want to run. If myprocessor.sh is mouthy, then I'd like to be able to have each invocation print to a different log. Otherwise the stdout for each is the same, and the logs get jumbled.

  • You could do this by running your xargs command through a shell - this will let you redirect the output - something like this:

    find blah -type f | xargs -I{} -P 4 -n 1 sh -c 'yourcommand --input {} > {}.output'
    

    ...you'll probably have to tweak it a bit - xargs replaces {} with the item/file it's working on

    Gregg Lind : that replacement is a little hairy when the find output is rooted paths, but it's a good idea!
    From James
  • You could change your script so that on startup it'll choose a random number/text, then prefix each line with this number? Then you can later split it using grep.

  • GNU Parallel http://www.gnu.org/software/parallel/ seems to be made for you:

    find $MY_FILE_TREE --print0 | parallel --null --max-procs 3 --max-args 1 --no-run-if-empty myprocess.sh ">" {}.output
    

    or shorter:

    find $MY_FILE_TREE --print0 | parallel -0 -j3 -r myprocess.sh ">" {}.output
    

    Watch the intro video: http://www.youtube.com/watch?v=OpaiGYxkSuQ

    From Ole Tange

0 comments:

Post a Comment