Sunday, March 27, 2011

Database: To delete or not to delete records

I don't think I am the only person wondering about this. What do you usually practice about database behavior? Do you prefer to delete a record from the database physically? Or is it better to just flag the record with a "deleted" flag or a boolean column to denote the record is active or inactive?

From stackoverflow
  • If you are concerned about "dormant" records slowing down your database access, you may want to move those rows into another table acting as an "archive" table.

  • I mark them as deleted, and don't really delete. However every once in a while I sweep out all the junk and archive it, so it doesn't kill performance.

  • For user-entered/managed data I've used the flag method you describe and given the user an "empty the trash" interface to actually delete items if they choose to.

  • It definitely depends on the actual content of your database. If you're using it to store session information, then by all means wipe it immediately when the session expires (or is closed), you don't want that garbage lying around. As it cannot really be used again for any practical purposes.

    Basically, what you need to ask yourself, might I need to restore this information? Like deleted questions on SO, they should definitely just be marked 'deleted', as we're actively allowing an undelete. We also have the option to display it to select users as well, without much extra work.

    If you're not actively seeking to fully restore the data, but you'd still like to keep it around for monitoring (or similar) purposes. I would suggest that you figure out (to the extent possible of course) an aggregation scheme, and shove that off to another table. This will keep your primary table clean of 'deleted' data, as well as keep your secondary table optimized for monitoring purposes (or whatever you had in mind).

  • There are also legal issues either way if personal data is involved. I think it greatly depends on where you are (or where the database is), and what the terms of use are.

    In some cases people can ask to be removed from your system, in which case a hard delete is needed (or at least clearing out all of the personal information).

    I would check with your legal department before you adopt a strategy either way if personal information is involved.

  • For removing and archiving "dormant" records or "junk" once in a while you might also consider using the tool Jailer.

  • Pros of using a delete flag:

    1. You can get the data back later if you need it,
    2. Delete operation (updating the flag) is probably quicker than really deleting it

    Cons of using a delete flag:

    1. It is very easy to miss AND DeletedFlag = 'N' somewhere in your SQL
    2. Slower for the database to find the rows that you are interested in amongst all the crap
    3. Eventually, you'll probably want to really delete it anyway (assuming your system is successful. What about when that record is 10 years old and it was "deleted" 4 minutes after originally created)
    4. It can make it impossible to use a natural key. You may have one or more deleted rows with the natural key and a real row wanting to use that same natural key.
  • As a compliment to all posts...

    However, if you plan to mark the record, its good to consider making a view, for active records. This would save you from writing or forgetting the flag in your SQL query. You might consider a view for non-active records too, if you think that also serve a purpose.

  • I am glad to have found this thread. I too was wondering what people thought about this issue. I have implemented the 'marked as deleted' for about 15 years on many systems. Whenever a user would call to say something was accidentally deleted it was certainly a lot easier to mark it un-deleted than recreate it or restore from a backup.

    We are using postgresql and Ruby on rails it looks like we could do this in 1 of two ways, modify rails or add an ondelete trigger and does instead a pl/pgsql function to mark as deleted. I am leaning toward the latter.

    As for performance hits, it will be interesting to see the results of EXPLAIN-ANALYZE on large tables to few deleted items as well as many deleted items.

    In systems used over time I have found, new users tend to do silly things like delete things accidentally. So when people are new in a position they have all the access rights of the person previously in that position except with zero experience. Accidentally deleting something and being able to quickly recover gets everyone back to work quickly.

    But as someone said, sometimes you may need that particular key back for some reason, at that point you would need to really delete it, then re-create the records (on undelete it and modify the record).

0 comments:

Post a Comment