I'm trying to determine how I should store historical transactional data.
Should I store it in a single table where the record just gets reinserted with a new timestamp each time?
Should I break out the historical data into a separate 'history' table and only keep current data in the 'active' table.
If so, how do I best do that? With a trigger that automatically copies the data to the history table? Or with logic in my application?
Update per Welbog's comment:
There will be large amounts of historical data (hundreds of thousands of rows - eventually potentially millions)
Primarily searches and reporting operations will be run on the historical data.
Performance is a concern. The searches shouldn't have to run all night to produce results.
-
If the requirement is solely for reporting, consider building a separate data warehouse. This lets you use data structures like slowly changing dimensions that are much better for historical reporting but don't work well in a transactional system. The resulting combination also moves the historical reporting off your production database which will be a performance and maintenance win.
If you need this history to be available within the application then you should implement some sort of versioning or logical deletion feature or make everything fully contra and restate (i.e. transactions never get deleted, just reversed out and restated). Think very carefully about whether you really need this as it will add a lot of complexity. Making a transactional application that can reconstruct historical state correctly is considerably harder than it looks. Financial software (e.g. insurance underwriting sytems) fails to do this a lot more than you might think.
If you need the history solely for audit logging, make shadow tables and audit logging triggers. This is much simpler and more robust than trying to correctly and comprehensively implement audit logging within the application. The triggers will also pick up changes to the database from sources outside the application.
-
This question goes along the line of Business Logic. Know your business requirements first then start from there. A Data Warehouse is a nice solution for this kind of situation. ETL will give you lots of options in dealing with data flows. Your basic concept of 'History' vs 'Active' is quite correct. Your history data will be more efficient and flexible if kept in a data warehouse with all their dimension and fact tables.
0 comments:
Post a Comment