Sunday, March 27, 2011

Joining Binary files that have been split via download

Hi I am trying to join a number of binary files that were split during download. The requirement stemmed from the project http://asproxy.sourceforge.net/. In this project author allows you to download files by providing a url.

The problem comes through where my server does not have enough memory to keep a file that is larger than 20 meg in memory.So to solve this problem i modified the code to not download files larger than 10 meg's , if the file is larger it would then allow the user to download the first 10 megs. The user must then continue the download and hopefully get the second 10 megs. Now i have got all this working , except when the user needs to join the files they downloaded i end up with corrupt files , as far as i can tell something is either being added or removed via the download.

I am currently join the files together by reading all the files then writing them to one file.This should work since i am reading and writing in bytes. The code i used to join the files is listed here http://www.geekpedia.com/tutorial201_Splitting-and-joining-files-using-C.html

I do not have the exact code with me atm , as soon as i am home i will post the exact code if anyone is willing to help out.

Please let me know if i am missing out anything or if there is a better way to do this , i.e what could i use as an alternative to a memory stream. The source code for the original project which i made changes to can be found here http://asproxy.sourceforge.net/download.html , it should be noted i am using version 5.0. The file i modified is called WebDataCore.cs and i modified line 606 to only too till 10 megs of data had been loaded the continue execution.

Let me know if there is anything i missed.

Thanks

From stackoverflow
  • You shouldn't split for memory reasons... the reason to split is usually to avoid having to re-download everything in case of failure. If memory is an issue, you are doing it wrong... you shouldn't be buffering in memory, for example.

    The easiest way to download a file is simply:

    using(WebClient client = new WebClient()) {
        client.DownloadFile(remoteUrl, localPath);
    }
    


    Re your split/join code - again, the problem is that you are buffering everything in memory; File.ReadAllBytes is a bad thing unless you know you have small files. What you should have is something like:

    byte[] buffer = new byte[8192]; // why not...
    int read;
    while((read = inStream.Read(buffer, 0, buffer.Length)) > 0)
    {
        outStream.Write(buffer, 0, read);
    }
    

    This uses a moderate buffer to pump data between the two as a stream. A lot more efficient. The loop says:

    • try to read some data (at most, the buffer-size)
    • (this will read at least 1 byte, or we have reached the end of the stream)
    • if we read something, write this many bytes from the buffer to the output
    RC1140 : Thanks dude , your code looks perfect except i am not sure how would i implement this into project as the place where the stream is and where the data is written out is on opposite end of the projects. Can i just ask for a little more help.
    Marc Gravell : Any specific question?
    RC1140 : Hi , I have tested and made the changes as you suggested , it works fine on my desktop pc but as soon as i deploy to the server it fails, do you have any other ideas that i could try as i am totally at drained :(
    Marc Gravell : Can you define "it fails"...
    RC1140 : Sorry about that , basically it still throws the OutOfMemoryException
    Marc Gravell : Are you using a `MemoryStream` perhaps? (still in-memory) - or are you transferring between two IO streams?
  • That example is loading each entire chunk into memory, instead you could do something like this:

    int bufSize = 1024 * 32;
    byte[] buffer = new byte[bufSize];
    
    using (FileStream outputFile = new FileStream(OutputFileName, FileMode.OpenOrCreate,
    FileAccess.Write, FileShare.None, bufSize))
    {
        foreach (string inputFileName in inputFiles)
        {
            using (FileStream inputFile = new FileStream(inputFileName, FileMode.Append,
                FileAccess.Write, FileShare.None, buffer.Length))
        {
        int bytesRead = 0;
    
        while ((bytesRead = inputFile.Read(buffer, 0, buffer.Length)) != 0)
        {
            outputFile.Write(buffer, 0, bytesRead);
        }
    }
    
    Marc Gravell : "buffer.Length" in Write should be "bytesRead" - corruption otherwise; and there is no point initializing "bytesRead" to 0 - it gets reset on the next line anyway
    andynormancx : Doh. I'll fix the write error.
    RC1140 : Thanks for the file stream idea ,i have already tried this and got it working except that i might overshoot my hosting limit :(
  • Hi

    In the end i have found that by using a FTP request i was able to get arround the memory issue and the file is saved correctly.

    Thanks for all the help

0 comments:

Post a Comment