Friday, March 4, 2011

What is the best practice for storing uploaded images?

I'm writing an application that allows users to upload images onto the server. I expect about 20 images per day all jpeg and probably not edited/resized. (This is another question, how to resize the images on the server side before storing. Maybe someone can please drop a .NET resource for that in the comment or so). I wonder now what the best practice for storing uploaded images is.

Is it a) I store the images as a file in the file system and create a record in a table with the exact path to that image.

or b) I store the image itself in a table using an "image" or "binary data" data type of the database server.

I see advantages and disadvantages in both. I like a) because I can easily relocate the files and just have to change the table entry. On the other hand I don't like storing business data on the web server and I don't really want to connect the web server to any other datasource that holds business data (for security reasons) I like b) because all the information is in one place and easily accessible by a query. On the other hand the database will get very big very soon. Outsourcing that data could be more difficult.

Cheers, Tobias

Edit: http://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay

From stackoverflow
  • I generally store files on the file-system, since that's what its there for, though there are exceptions. For files, the file-system is the most flexible and performant solution (usually).

    There are a few problems with storing files on a database - files are generally much larger than your average row - result-sets containing many large files will consume a lot of memory. Also, if you use a storage engine that employs table-locks for writes (ISAM for example), your files table might be locked often depending on the size / rate of files you are storing there.

    Regarding security - I usually store the files in a directory that is outside of the document root (not accessible through an http request) and serve them through a script that checks for the proper authorization first.

  • We use A. I would put it on a shared drive (unless you don't plan on running more than one server).

    If the time comes when this won't scale for you then you can investigate caching mechanisms.

  • Most implementations are option A.

    With option B, you open a whole big can of whoop4ss when you marshall those bits from the database into something that can be displayed on a browser... Also, if the db is down, the images are not available.

    I don't think that space is too much of an issue... Terabyte drives are a couple hundred bucks now.

    We are implementing with option A because we don't have the time or resources to do option B.

  • Flickr use the filesystem -they discuss the reasons here

  • Absolutely, positively option A. Others have mentioned that databases generally don't deal well with BLOBs, whether they're designed to do so or not. Filesystems, on the other hand, live for this stuff. You have the option of using RAID striping, spreading images across multiple drives, even spreading them across geographically disparate servers.

    Another advantage is your database backups/replication would be monstrous.

  • For auto resizing, try imagemagick... it is used for many major open source content/photo management systems... and I believe that there are some .net extensions for it.

  • We have had clients insist on option B a few times on a few different backends, and we always ended up going back to option A eventually.

    Large BLOBs like that just have not been handled well enough even by SQL Server 2005, which is the latest one we tried it on.

    Specifically, we saw serious bloat and I think maybe locking problems.

    One other note: if you are using NTFS based storage (windows server, etc) you might consider finding a way around putting thousands and thousands of files in one directory. I am not sure why, but sometimes the file system does not cope well with that situation. If anyone knows more about this I would love to hear it.

    But I always try to use subdirectories to break things up a bit. Creation date often works well for this:

    Images/2008/12/17/.jpg

    ...This provides a decent level of seperation, and also helps a bit during debugging. Explorer and FTP clients alike can choke a bit when there are truly huge directories.

  • I use uploaded images on my website and I would definitely say option a).

    One other thing I'd highly recommend is immediately changing the file name from what the user has named the photo, to something more manageable. For example something with the date and time to uniquely identify each picture.

    It also helps to strip the user's file name of any strange characters to avoid future complications.

  • Definitely resize the image, and check it's format if you can. There have been cases of malicious files being uploaded and served by unwitting hosts- for instance, the GIFAR vulnerability allowed you to hide a malicious java applet in a GIF file, which would then be able to read cookies in the current context and send them to another site for a cross-site scripting attack. Resizing the images usually prevents this, as it munges the embedded code. While this attack has been fixed by JVM patches, naively serving up binary files without scrubbing them opens you up to a whole range of vulnerabilities.

    Remeber, most virus scanners can only run against the filesystem- if you store your binaries in the DB, you won't be able to run a scanner against them very easily.

  • I have recently created a PHP/MySQL app which stores PDFs/Word files in a MySQL table (as big as 40MB per file so far).

    Pros:

    • Uploaded files are replicated to backup server along with everything else, no separate backup strategy is needed (peace of mind).
    • Setting up the web server is slightly simpler because I don't need to have an uploads/ folder and tell all my applications where it is.
    • I get to use transactions for edits to improve data integrity - I don't have to worry about orphaned and missing files

    Cons:

    • mysqldump now takes a looooong time because there is 500MB of file data in one of the tables.
    • Overall not very memory/cpu efficient when compared to filesystem

    I'd call my implementation a success, it takes care of backup requirements and simplifies the layout of the project. The performance is fine for the 20-30 people who use the app.

  • Option A.

    Once the image is loaded you can verify the format and resize it before saving. There a number of .Net code samples to resize images on http://www.codeproject.com. For instance: http://www.codeproject.com/KB/cs/Photo_Resize.aspx

  • If they are small files that will not need to be edited then option B is not a bad option. I prefer this to writing logic to store files and deal with crazy directory structure issues. Having a lot of files in one directory is bad. emkay?

    If the files are large or require constant editing, especially from programs like office, then option A is your best bet.

    For most cases, it's a matter of preference, but if you go option A, just make re the directories don't have too many files in them. If you choose option B, then make the table with the BLOBed data be in it's own database and/or file group. This will help with maintenance, especially backups/restores. Your regular data is probably fairly small, while your image data will be huge over time.

  • There's sort of a hybrid approach in SQL Server 2008 called the filestream datatype that was talked about on RunAs Radio #74, which is sort of like the best of both worlds. Most people don't have the 2008 otion, but if you do, this option looks pretty cool

0 comments:

Post a Comment