Thursday, January 13, 2011

What is the most efficient way to push out/publish large sets files over a network?

Here's my scenario:

We have a need to push out a set of files (a 'workspace') from a source server to multiple destination servers. The files are kept under version control (SVN) so the first step is to export the latest version of the repository to the source server. I then need to ensure that several regional servers, each on a different continent, have the latest version of that workspace.

I have a script that works using ROBOCOPY, but it's sending the entire workspace to each destination server. Copy times are very long and the transfer can clog up the network to some of the sites that aren't as well wired up..... I've already had complaints!

In most cases, the changes will only apply to a small percentage of all the files, so some sort of differential copy would be ideal. I'm reluctant to go the synchronization/mirror route because the files on the remote servers will be used and modified quite often and it's likely that the files on the regional servers will be deemed to be newer by the synchronization tool.

Are there any scripts/tools that you can recommend to make publishing this workspace more efficient? I don't mind if there is a reasonably-priced too that can help do the job.

OS: Windows Server 2003 Network bandwith: it's a private company network and connection speeds can vary greatly

Thanks!

  • You mentioned that the source is in SVN. Can you just do svn up ?

    Tim Howland : Alternatively, if you use a different tag for each production-ready version, you could simply do an svn switch to the new URL; same idea, but uses tags so you can update everyone to the same release without worrying about developers doing rogue checkins while you're broadcasting.
    Dave Cheney : +1 for deploying from tags rather than trunk
  • This exact scenario is addressed by Microsoft's Distributed File System (DFS). Windows Server 2003 R2 added support for Remote Differential Compression, which only synchronizes the differences. Exactly what you need.

    http://technet.microsoft.com/en-us/library/cc787066.aspx

    Mark Brackett : DFS doesn't work well with one way (or single master) replication: http://blogs.technet.com/filecab/archive/2007/08/16/using-one-way-connections-in-dfs-replication.aspx
    Tim Long : @MarkBrackett - the article you linked to does actually propose a strategy for doing one-way replication. It proposes that the replica is configured as two-way, but that on the readonly replica, the file ACL on the root folder is set to readonly, so the users can't make changes. The end result is the same.
    Mark Brackett : @Tim Long-Take note of the rather large caveats though. "The replicated folder...will never be guaranteed to be an exact copy of...the source server". "[If] a folder...has been deleted on the [replica]...and an update is made to [the master]...the DFSR service is unable to apply such updates." "Scenarios where [DFS] is unable to over-write undesired updates...may arise." "With time it is possible to see substantial divergence in the contents." And, finally - "one way connections is not a configuration supported by Microsoft Product Support Services". Not exactly recommended or designed for.
    Tim Long : @MarkBrackett - That's why you have to make the replica readonly. To my mind, the requirements of a replica that can be edited are fundamentally conflicted. Without knowing more about the problem that needs to be solved, it is hard to make concrete suggestions, but what's the point in making a replica, allowing users to edit the replica, only to overwrite them with a new replica the next day?
    From Tim Long
  • Given that the files are under version control, how about creating a branch in svn for the "production ready" version of your files. THen instead of doing an scn export on your master server, do a checkout on ALL of the servers, and use "svn update" to update them all from the "Production Ready" branch.

    The idea of having a production ready branch is that you can continue working on the files and at the point you're ready to "publish" them to all servers move copy the files into the branch. This way you won't have "unfinished" stuff showing up on your productiuon servers.

    Jim.

    PS: THis is essentially the same as Dave Cheney's answer, just more detailed.

  • In most cases, the changes will only apply to a small percentage of all the files, so some sort of differential copy would be ideal

    Rsync. Yes, for Windows too.

    As mentioned, svn up would work too, but since

    the files on the remote servers will be used and modified quite often

    you'll need to svn revert first.

    alphabeat : I believe you can accept the base changes and overwrite if there are any conflicts using: svn up --accept theirs-full
    Mark Brackett : @alphabeat - yeah, but that only overwrites if there's a conflict (I think it'd auto merge if it can) and the file has been updated in SVN. The OP is currently blowing away local changes, which I assume he wanted to continue to do.
  • puppet the configuration management tool, may be of interest to you.

    It may be a little overkill here, functionality wise, but it's comparably lightweight compared to the solutions you're considering.

    From jamesh

0 comments:

Post a Comment