I have a list of files I need to copy on a Linux system - each file ranges from 10 to 100GB in size.
I only want to copy to the local filesystem. Is there a way to do this in parallel - with multiple processes each responsible for copying a file - in a simple manner?
I can easily write a multithreaded program to do this, but I'm interested in finding out if there's a low level Linux method for doing this.
-
There is no low-level mechanism for this for a very simple reason: doing this will destroy your system performance. With platter drives each write will contend for placement of the head, leading to massive I/O wait. With SSDs, this will end up saturating one or more of your system buses, causing other problems.
Jon : Err that doesn't seem to be the case with a single cp at present, I'm sure there's a happy medium for multiple parallel "cp's" at which you're I/O channel doesn't become completely saturated... -
As mentioned, this is a terrible idea. But I believe everyone should be able to implement their own horrible plans, sooo...
for FILE in *;do cp $FILE <destination> &;done
The asterisk can be replaced with a regular expression of your files, or
$(cat <listfile>)
if you've got them all in a text document. The ampersand kicks off a command in the background, so the loop will continue, spawning off more copies.As mentioned, this will completely annihilate your IO. So...I really wouldn't recommend doing it.
--Christopher Karel
From Christopher Karel -
The only answer that will not trash your machine's responsiveneess isn't exactly a 'copy', but it is very fast. If you won't be editing the files in the new or old location, then a hard link is effectively like a copy, and (only) if you're on the same filesystem, they are created very very very fast.
Check out
cp -l
and see if it will work for you.From Slartibartfast -
If you system is not trashed by it (e.g. maybe the files are in cache) then GNU Parallel http://www.gnu.org/software/parallel/ may work for you:
find . -print0 | parallel -0 cp {} destdir
This will run 9 concurrent cp's.
Pro: It is simple to read.
Con: GNU Parallel is not standard on most systems - so you probably have to install it.
Watch the intro video for more info: http://www.youtube.com/watch?v=OpaiGYxkSuQ
From Ole Tange
0 comments:
Post a Comment