Friday, January 28, 2011

Using DD for disk cloning

There's been a number of questions regarding disk cloning tools (yes, I did search for it) :) and dd has been suggested at least once. I've already considered using dd myself, mainly because ease of use, and that it's readily available on pretty much all run of the mill bootable linux distributions.

To my question, what is the best way to use dd for cloning a disk. I don't have much time to go around, and no test-hardware to play with, so I need to be prepared, and it's pretty much a one-shoot chance to get it done.

I did a quick google search, and the first result was an apparent failed attempt. Is there anything I need to do after using dd, i.e. is there anything that CAN'T be read using dd?

Thanks!

  • To clone a disk, all you really need to do is specify the input and output to dd:

    dd if=/dev/hdb of=/image.img
    

    Of course, make sure that you have proper permissions to read directly from /dev/hdb (I'd recommend running as root), and that /dev/hdb isn't mounted (you don't want to copy while the disk is being changed - mounting as read-only is also acceptable). Once complete, image.img will be a byte-for-byte clone of the entire disk.

    There are a few drawbacks to using dd to clone disks. First, dd will copy your entire disk, even empty space, and if done on a large disk can result in an extremely large image file. Second, dd provides absolutely no progress indications, which can be frustrating because the copy takes a long time. Third, if you copy this image to other drives (again, using dd), they must be as large or larger than the original disk, yet you won't be able to use any additional space you may have on the target disk until you resize your partitions.

    You can also do a direct disk-to-disk copy:

    dd if=/dev/hdb of=/dev/hdc
    

    but you're still subject to the above limitations regarding free space.

    As far as issues or gotchas go, dd, for the most part, does an excellent job. However, a while ago I had a hard drive that was about to die, so I used dd to try and copy what information I could off it before it died completely. It was then learned that dd doesn't handle read errors very well - there were several sectors on the disk that dd couldn't read, causing dd to give up and stop the copy. At the time I couldn't find a way to tell dd to continue despite encountering a read error (though it appears as though it does have that setting), so I spent quite a bit of time manually specifying skip and seek to hop over the unreadable sections.

    I spent some time researching solutions to this problem (after I had completed the task) and I found a program called ddrescue, which, according to the site, operates like dd but continues reading even if it encounters an error. I've never actually used the program, but it's worth considering, especially if the disk you're copying from is old, which can have bad sectors even if the system appears fine.

    Paul de Vrieze : You can actually also use a read-only mount. A filesystem can be remounted with: mount -o remount,ro /path/to/device
    Kyle Cronin : Good point, I added a note in my answer about that.
    sleske : I used ddrescue to scrape data off a dying hard drive, and can confirm that it's awesome.
  • dd is most certainly the best cloning tool, it will create a 100% replica simply by using the following command. I've never once had any problems with it.

    dd if=/dev/sda of=/dev/sdb
    
    Eddie : Of course, as long as /dev/sdb is at least as large as /dev/sda...
    Tim Williscroft : add a "bs=100M conv=notrunc" and it's much faster in my experience.
    Alnitak : @Eddie - and of course the partition table will be copied too, so if sdb is larger you'll have unused space at the end.
    Svish : @Tim, What does conv=notrunc do exactly?
    Manu : notrunc means (according to the man page) : do not trunc the output file. I don't understand how it can be faster. You can follow the progression of the operation with : # dd if=/dev/sda of=/dev/sdb & pid=$! # kill -USR1 $pid; sleep 1; kill $pid
    Michael Kohne : Play with bs a little when you start. I've encountered systems where any bs > 4k slowed the process down for some reason. If both drives are internal (ide/sata) it's probably not an issue, but if there's a network share or a USB disk involved, take care on the block size.
    bandi : just be very careful with the 'i' and 'o' letters...
    : Oh yes. I've used dd a great many times and it always gives me shivers, when I think just how will that feel when I'll eventually get those "i" and "o" wrong..
    Mistiry : Nobody seems to know this trick... dd is an asymmetrical copying program, meaning it will read first, then write, then back. You can pipe dd to itself and force it to perform the copy symmetrically, like this: `dd if=/dev/sda | dd of=/dev/sdb`. In my tests, running the command without the pipe gave me a throughput of ~112kb/s. With the pipe, I got ~235kb/s. I've never experienced any issues with this method. Good luck!
  • Keep in mind that dd makes an exact copy, including all the blank space.

    That means:

    1. 2nd drive must be at least as big as first
    2. If 2nd drive is larger, extra space will be wasted (filesystem can be expanded mind you)
    3. If the source drive is not full, dd will waste alot of time copying blank space.
    4. You can copy either the entire drive, or a single partition this way.
    5. If this is a bootable drive, I'm pretty sure you need to install the bootloader after using dd

    Hope that is helpful

    Cristian Ciupitu : If you're cloning the whole hard disk, you're also cloning the boot loader.
    From Brent
  • CAUTION: dd'ing a live filesystem can corrupt files. The reason is simple, it has no understanding of the filesystem activity that may be going on, and makes no attempt to mitigate it. If a write is partially underway, you will get a partial write. This is usually not good for things, and generally fatal for databases. Moreover, if you screw up the typo-prone if and of parameters, woe unto you. In most cases, rsync is an equally effective tool written after the advent of multitasking, and will provide consistent views of individual files.

    However, DD should accurately capture the bit state of an unmounted drive. Bootloaders, llvm volumes, partition UUIDs and labels, etc. Just make sure that you have a drive capable of mirroring the target drive bit for bit.

    LiraNuna : You can always use 'sync' to sync the file system to the hdd before running dd.
    DeletedAccount : I suspect that `sync` is not the answer to file corruption problems. What happens if a deamon or something writes more files after the `sync`, during the `dd` operation?
    Oleksandr Bolotov : It's a good idea to umount the drive first (or remount as read-only) but it's not always possible
    jldugger : In which case, you use rsync and let it do file handle magic to get a consistent file and let Copy On Write semantics handle the incoming writes.
    davr : I'm not sure what having multiple CPUs/cores has to do with using rsync to copy files?
    jldugger : Apologies, I intended to refer to the concept of multitasking.
    From jldugger
  • For future reference it might be of interest to check out ddrescue. It has saved my day a couple of times.

    From Subtwo
  • To save space, you can compress data produced by dd with gzip, e.g.:

    dd if=/dev/hdb | gzip -c  > /image.img
    

    You can restore your disk with:

    gunzip -c /image.img.gz | dd of=/dev/hdb
    

    To save even more space, defragment the drive/partition you wish to clone beforehand (if appropriate), then zero-out all the remaining unused space, making it easier for gzip to compress:

    mkdir /mnt/hdb
    mount /dev/hdb /mnt/hdb
    dd if=/dev/zero of=/mnt/hdb/zero
    

    Wait a bit, dd will eventually fail with a "disk full" message, then:

    rm /mnt/hdb/zero
    umount /mnt/hdb
    dd if=/dev/hdb | gzip -c  > /image.img
    

    Also, you can get a dd process running in the background to report status by sending it a signal with the kill command, e.g.:

    dd if=/dev/hdb of=/image.img &
    kill -SIGUSR1 1234
    

    Check your system - the above command is for Linux, OSX and BSD dd commands differ in the signals they accept (OSX uses "SIGINFO", I understand).

    Steve Schnepp : Does this also work with "modern" fs such a BTRFS, NILFS, [whatever you can dream of] ?
    David Hicks : DD works on block devices, a level of abstraction lower than the file system, so it should, yes. I haven't actually tried it, though. Hmm, NILFS looks interesting, I'll have to take a look at that.
    David Hicks : Sorry, just checked out NILFS' homepage and realised what you might have meant - can you use DD to copy a snapshot from a NILFS filesystem? I don't know, but it'd be interesting to find out.
  • When using dd to clone a disk which may contain bad sectors, use "conv=noerror,sync" to ensure that it doesn't stop when it encounters an error, and fills in the missing sector(s) with null bytes. This is usually the first step I take if trying to recover from a failed or failing disk -- get a copy before doing any recovery attempts, and then do recovery on the good (cloned) disk. I leave it to the recovery tool to cope with any blank sectors that couldn't be copied.

    Also, you may find dd's speed can be affected by the bs (block size) setting. I usually try bs=32768, but you might like to test it on your own systems to see what works the fastest for you. (This assumes that you don't need to use a specific block size for another reason, e.g. if you're writing to a tape.)

    davr : If you have a disk with bad sectors, you really should be using 'ddrescue' instead of dd. It's much more efficient, and has a much better chance of recovering more data. (Don't get it confused with dd_rescue, which is not as good)
    From TimB
  • dd does provide progress information - well most versions in linux. I've seen some which don't but don't recall the unix flavour.

    The man page says: Sending a USR1 signal to a running ‘dd’ process makes it print I/O statistics to standard error and then resume copying.

    I use this feature regularly.

    From Steven
  • Someone had to say this: give Clonezilla a try (http:// clonezilla.org/)

    What do you get? To copy only used parts of the filesystem. Clonezilla uses dd, grub, sfdisk, parted, partimage, ntfsclone, and/or partclone. Depending on the options you choose.

    Decent documentation can be found at: http:// clonezilla.org/clonezilla-live/doc/

    jbdavid : I found the documentation a little rough, and cloning a linux PATA drive to a SATA drive did Not leave me with something I could boot (yet). But much faster to same result as dd, and it worked great for my laptop drive upgrades.
  • Another nice thing you can do with dd and rescue disks is copy data over the network:

    remote_machine$ nc -l -p 12345
    
    local_machine$ dd if=/dev/sda | nc remote_machine 12345
    

    You can stick gzip in both these pipelines if the network is not local. For progress, use pv. To make local_machine's netcat quit after it's done copying, you might add -w 5 or something.

    davr : This is not quite correct. The 'remote_machine' command is missing something, such as `> disk_backup.img` or `|dd of=/dev/sdb` or something else, depending on what you want to do. I'm guessing you don't want to dump a disk image to stdout.
    From
  • Another grand feature is copying MBRs, partition tables and boot records.

    Just

    dd if=/dev/sda of=parttable bs=512 count=1
    

    and the other direction around when you're writing it. Polish with fdisk after.

    You feel much more safe when you have your partition table backed up.

    Also, it makes migrating to another hard drive (while changing partion structure) a joy.

    From alamar
  • For some reason, dd fails when imaging CDs with audio tracks. You need to use cdrdao or something similar to get an image + TOC file.

    From Matt
  • The source disk must not have any mounted filesystems. As a user able to read the block device (root works), run 'dd if=/dev/sda ....'

    Now, one of the neat things here is that you're creating a stream of bytes... and you can do a lot with that: compress it, send it over the network, chunk it into smaller blobs, etc.

    For instance:

    dd if=/dev/sda | ssh user@backupserver "cat > backup.img"
    

    But more powerfully:

    dd if=/dev/sda | pv -c | gzip | ssh user@backupserver "split -b 2048m -d - backup-`hostname -s`.img.gz"
    

    The above copies a compressed image of the source harddrive to a remote system, where it stores it in numbered 2G chunks using the source host's name while keeping you updated on progress.

    Note that depending on the size of disk, speed of cpu on source, speed of cpu on destination, speed of network, etc. You may want to skip compression, or do the compression on the remote side, or enable ssh's compression.

    From retracile
  • A note on speed: in my experience dd is twice as fast if you specify bs=1024 instead of the default bs=512. Using an even larger block size gives no noticeable speedup over bs=1024.

    : disk clusters are generally around 4k now, so using 4096 is probably a good option, and even 8192 if you want to read 2 clusters at a time. Don't go too big though, as you run into fragmented memory problems
  • To clone a disk, all you really need to do is specify the input and output to dd:

    dd if=/dev/hdb of=hdb.img
    

    Of course, make sure that you have proper permissions to read directly from /dev/hdb (I'd recommend running as root), and that /dev/hdb isn't mounted (you don't want to copy while the disk is being changed). Once complete, hdb.img will be a byte-for-byte clone of the entire disk.

    There are a few drawbacks to using dd to clone disks. First, dd will copy your entire disk, even empty space, and if done on a large disk can result in an extremely large image file. Second, dd provides absolutely no progress indications, which can be frustrating because the copy takes a long time. Third, if you copy this image to other drives (again, using dd), they must be as large or larger than the original disk, yet you won't be able to use any additional space you may have on the target disk until you resize your partitions.

    You can also do a direct disk-to-disk copy:

    dd if=/dev/hdb of=/dev/hdc
    

    but you're still subject to the above limitations regarding free space.

    The first drawback can be resolved by gzipping the data as you make the copy. For example:

    dd if=/dev/hdb | gzip -9 > hdb.img.gz
    

    The second drawback can be resolved by using the pipeview (pv) tool. For example:

    dd if=/dev/hdb | (pv -s `fdisk -l /dev/hdb | grep -o '[0-9]*\{1\} MB' | awk '{print $1}'`m) | cat > hdb.img
    

    I know of no way to overcome the third drawback.

    Additionally, you can speed up the copy time by telling dd to work with larger chunks of data. For example:

    dd if=/dev/hdb of=hdb.img bs=1024
    
    davr : You already told the way to overcome the third drawback...resize the partitions. Enlarging a partition is generally a safe and fast operation (versus shrinking or moving, which is slow and more dangerous since it's moving data around).
    From jsumners
  • One thing you must be aware of when dd-ing a full disk is that doing so will overwrite the master boot record of the receiving disk. This contains the partition table and other vital information. If the new disk is not the same as the old disk, this can create all sorts of tables. Copying over partitions is generally safer (and swap partitions do not have to be copied over)

  • If the source drive is damaged at all, you'll have more luck using dd_rhelp with dd_rescue (my personal preference) or GNU ddrescue.

    The reason behind this is that, on read errors, dd keeps trying and trying and trying - potentially waiting for a long time for timeouts to occur. dd_rescue does smart things like reading up to an error, then picking a spot further on on the disk and reading backwards to the last error, and dd_rhelp is basically a dd_rescue session manager - cleverly starting and resuming dd_rescue runs to make it quicker again.

    The end result of dd_rhelp is maximum data recovered in minimum time. If you leave dd_rhelp running, in the end it does the exact same job as dd in the same time. However, if dd encountered read errors at byte 100 of your 100Gb disk, you'd have to wait a long time to recover the other 9,999,900 bytes*, whereas dd_rhelp+dd_rescue would recover the bulk of the data much faster.

  • I've been out of the admin role for many years now, but I know that 'dd' is up to the job. I used this technique regularly in the late 80s on Sun Sparc and 386i computers. I had one client order over 30 386i systems running CAD software that was distributed on multiple QIC tapes.

    We installed on the first computer, configured the app, ran SunOS' sys-unconfig, placed the drive in a shoebox with a different SCSI address and then proceeded to 'dd' to the other 30 drives.

    From pbrooks100
  • For NTFS volumes, I prefer using ntfsclone. It's part of the ntfsprogs package.

    From Ed Brannin
  • As others have mentioned above, one of the gotchas with cloning a mounted file system is potential data corruption. This obviously won't apply to full drive clones, but if you're using LVM you can Snapshot the LogicalVolume and dd from the snapshot to get a consistent image.

    From Ophidian
  • You can create a compressed image file of the partition (or disk) on the fly using bzip2 or gzip instead of dd. This is nice for storing away images in removable media:

    bzip2 -c /dev/sdaX >imagefile.bz2
    or
    gzip -c /dev/sdaX >imagefile.gz
    

    If the disk has been heavily used before, you can enhance the compression by filling all unused space with zeros before the imaging:

    mkdir /mnt/mymountpoint
    mount /dev/sdaX /mnt/mymountpoint
    cat /dev/zero >/mnt/mymountpoint/dummyfile.bin
    (Wait for it to end with a "disk full" error)
    rm /mnt/mymountpoint/dummyfile.bin
    umount /mnt/mymountpoint
    

    To restore the image into another disk, all you have to do is:

    bzcat imagefile.bz2 >/dev/sdbY
    or
    zcat imagefile.gz >/dev/sdbY
    
    From JCCyC
  • This is kind of a cheap hack, but it's a quick and dirty way to monitor your DD process.

    Run your dd command. Open a new shell and do a ps awx to find your dd process' PID. Now in the new shell type watch -n 10 kill -USR1 {pid of your DD process}

    This will do nothing in the watch output window, but back in the original DD shell, DD will start outputting status reports every 10 seconds. You can change the -n 10 in the watch command to any other time frame of course.

    Tachyon

    From Tachyon
  • You could actually try something like this dd if=/dev/sda2 of=/dev/sdb2 bs=4096 conv=notrunc,noerror to skip all errors and have exact clone of a hard drive

0 comments:

Post a Comment