Friday, March 4, 2011

Should I design the application or model (database) first?

I am getting ready to start building a new web project in my spare time to bring to fruition an idea that has been bouncing around my head for a while.

I have never gotten down whether I am better off first building the model and then the consuming application or the other way around.

What are the best practices? What would you build first and why?

I imagine that in general the application should generally drive the model, however the application like many websites really doesn't do much without the model.

For some reason I find it easier at times to think in terms of the model since the application is really just actions on the model. Is this a poor way of thinking about things?

What advantages/disadvantages does each option have?

From stackoverflow
  • When you're building the whole application yourself, I would start with the user. What does the user want? What information do they need? That should drive the design of the application and model, not the other way round. When the model is designed first, there is a temptation to expose the user to it directly, which will rarely make sense.

    Jarrett Meyer : +1 this. My experience has taught me that it is best to serve the users' needs, then work backwards to determine the data requirements. I don't necessarily "design" the whole UI, but it is important to know what the user really, really wants.
    le dorfier : +1. But we seem to usually start with a database (probably because that's 100% under our control.) Once you have a schema, it's harder to adjust than the UI, too.
    roryf : This seems to have had a resurgence of views so I think a couple of quick follow-up points are necessary. When I say model I mean the database, not the domain model. And in response to an excellent post below by 'balabaster', the database isn't necessarily just a persistent representation of the View, like you say sometimes there will be important requirements like reporting, but I still think the user is the best place to start otherwise it's too easy to neglect them (especially as a programmer!)
    alchemical : interesting point but doesn't really answer the question
  • Most web applications don't do much more than just processing and presenting different kinds of data.

    I would start with exactly that: What data do I want to process?

    After that, you can start modelling how this data will fit into a database best.

    Then you should also think about how you access this data-does it give you any hints for optimization in the storage?

    I would design the application itself either at the end or even parallel to it. The application design should be independent from the database model. It's only the code itself which will in the end access the database.

    But, web applications also tend to grow. So an evolutionary model, where you add new fields or tables to the database and build new code around it, is very common.

  • Well, to an extent the requirements must come first. The database is the servant of your requirements after all, not the other way around.

    It seems to me that what you're really asking is: should you design the application completely before you start on the database (and then write code after that)?

    My answer is no. It's better to jump in and get moving quickly.

    I would probably design the app in broad strokes and then use an iterative approach. Which is an idea from Agile. There is a lot out there on this subject.

    Now if this was a two year project with 20 developers, stakeholders, and a budget, things would be somewhat different... But perhaps not as different as you think! The more complexity you're dealing with, the harder it becomes create a perfect, monolithic plan up front.

    Some people say there is a point where it actually becomes impossible.

  • You don't want your object model to be constrained by a database design. The database should be a persistence implementation of your object model, which is the one that rules. You can wrap your application around your object model, and also derive the persistence model.

  • You could start by designing the interface between the application and model and writing unit tests for how the interface should behave. I usually take the more agile approach and do only a little upfront design before I jump into the code (see: Pragmatic Programmers From Journeyman to Master Tracer Bullets concept).

  • Personally, after I know the requirements (formal or not), I design the data model to handle the requirements. Then build out from there, with the business layer, persistence, unit testing, and then finally the GUI.

    If your DB is designed properly the first time, everything should just flow.

    EDIT-Please be aware, that I'm not implying that your business layer or GUI should be a direct reflection of your DB model. Sometimes it will be similar, sometimes it won't. But your DB model should be able to accomodate all requirements.

    S.Lott : +1: data first, everything else will follow logically from there.
    ConcernedOfTunbridgeWells : Get the data right. A poor data model will cause pain for all eternity. +1
    le dorfier : It comes down to whether you want the database to reflect the user's requirements, or have the GUI reflect your database design. Exactly backwards. Your unit tests will be determined by the UI, and the database needs to facilitate the tests.
    alchemical : wtf was the last comment? Question was db or code first. Its assumed you could do neither very well if you didn't have requirements/ui, etc.
  • How about both?

    Another approach is "feature-based" development - build a vertical slice through the application, just enough at the model, persistence and interface levels to get a feature working completely. This might be something as simple as logging in, or editing a single object.

    This approach means that:

    • you get to build a small chunk of both model and application together up front;
    • you quickly discover any danger areas in the stack of technologies and designs you've chosen;
    • you rapidly have something you can show to people for comments;
    • you avoid the fallacy that anything gets designed right first time...
    Steven A. Lowe : +1 for a sane scalable approach
  • According to Martin Fowler which most skilled developers recognize as one of the "authorities" in these questions you design your OO hierarchy first and then "map" the objects into your database using e.g. the ActiveRecord design pattern...

    alchemical : question authority
  • FWIW, one of the best remembered parts of Brooks's The Mythical Man Month is this:

    Show me your flow charts and conceal your tables and I shall continue to be mystified, show me your tables and I won't usually need your flow charts; they'll be obvious.

    le dorfier : So don't design your requirements with flow charts. We all know that by now, don't we?
    le dorfier : Brooks didn't have use cases; computer systems weren't for users, they were almost always for batch jobs in those days. Any people were just there to do "Data Entry".
  • My tactic is roughly this:

    Read the requirements, and write down all nouns or "players" in the document. These are usually 80% of the things you need to store or interact with.

    With these things on a sheet of paper, read the requirements again and see if you can find that the things you have on paper can actually be used to do the job.

    The, find their attributes and make a data model. Try to fit that into the database. Build up from there.

    For web applications, this usually works for me (even for consierable size applications). As you've noticed, I didn't use terms like UML or ERD. These are just tools for communicating the model in your head with others. Powerpoint can do that, too. It's the quality of end product that counts.

  • First of all, your database is not your Model, it is just the mechanism you use to persist your model. Your model is a set of business objects the encapsulate the state and logic used by the business, and may be used by other applications.

    I have found that most clients don't understand tables, columns, but do understand process and workflow. Therefore, I work with the client to mock up the UI and the page flow for the tasks that need to be addressed in the solution. From this, I create the business objects to hold the required data for the UI.

    The controllers handle the page logic and page flow. I mock up a data repository to handle some sample data. This allow the client and I to iterate the UI and flow until we are satisfied. Usually, we discover better ways of doing things, and the some activities they thought important, add no value.

    Now is the time I work on the database and the data access logic. Waiting until this point reduces the need to rework database schema, stored procedures, and DAL code.

    This usually results in less code, a robust application, and a happy client. The triple crown.

    Also, Unit Test everything. You will be making changes, and a good unit test set makes sure that you don't break other parts of your application when you make the changes.

    le dorfier : Yes, yes. How often have your heard, in conversation with your users, something like "Oh? They can have more than one Ship-To Adress? Per Sales Office?"
    alchemical : database is the db model, domain objects are the code model--just semantics. I do see some value in what you describe, this is one way to do it. You don't mention much how to deal with collections, etc. in the DB relational integrity...
    Matthew : "just semantics". You must be a DBA. It's my opinion, and anyone who uses Domain Driven Design, that the database is irrelavent when designing an application. I most cases, I design/develop using in-memory collections for persistance. Then when everything is working, I, or a DBA, creates a robust, database implementation for persistance. Yes, the database schema, and stored procedures are critical to a successful application. However, it needs to support the Domain Model, which is attempting to model the Business. I've seen too many failures due to complex database first, app/domain next
  • For me it depends previous experience with the problem domain.

    If I have done this sort of app before, I am more likely to take the time to clarify the data model first and then start building code on top of that.

    On a first-time project I am more likely to just jump in with coding, whipping up dummy data as needed and learning about hidden dimensions of the problem domain as I go. Not uncommonly there are data categories that would have been difficult to anticipate. When I discover these, I revise the data model and continue on. Often this approach begins with coding up a script to build the database and populate it. That way, on subsequent iterations, I just modify the db-build script, run it, and I'm back in business.

  • This is the age old question. The answer, like every CS answer is that it depends. 90% of the applications that you write are just forms over data. In many of these applications, you will have legacy applications with data that you have to port over an go from there. Therefore, whether you like it or not, the Data/Database is a constricting factor and it drives whatever you do. It’s not just a place to store your domain objects, it is your domain, even though that’s not the “right” way to do things.

    In most cases, I’ve designed my data model first in a way that takes the existing data and organizes it into the relational model. I then do basic screen design. Then I build my anemic Active Record type business objects to wire them up. This is by no means the best way to design software, but in most cases, it’s the way that things will be done or have been done. In these cases, your business objects really are just containers for data with business logic around them to wire them up to the screens and insure data integrity and screen security. This sucks, but it is what it is.

    If screen interactions are the most important thing, then maybe designing the screen first and then have you other objects depend on that will be your best bet.

    If you are lucky enough to have a Greenfield project where the domain is integral in the application and the database is merely a persistence mechanism for your domain objects, then I would develop the domain objects first using Domain Driven Design in a TDD manor and develop the screens and the database around the domain objects. I would love to code like this more often, but you don’t always get the opportunity in most places.

    Note: Stack Overflow is designed in a Database as the Model way, so it can't be that evil.

  • You know, I think I have to disagree with those that blindly put the design of the GUI ahead of the underlying data model. In a real business environment, running the business is not just about workflow - a huge component of business that revolves around data analysis and reporting. After all, how can you make decisions based on data you can't get to or understand? On top of this, when you sit down with a client, 90% of the time, they don't understand what their application needs to do, how it needs to be laid out and half the time, they don't even understand what functionality it requires.

    How do you analyse your data if your whole data model is just a persistence of on-screen data? How do you report on that? If you sit down with a database guru and tell them you want a report built from a data model that basically represents your ViewState they would quit and tell you to do it yourself - at least, if someone told me that I had to build a report based on that type of model, I'd quit and tell them to get someone else to do it.

    The GUI that sits on top of the data model is incidental and allows employees to interact with the data in the simplest most efficient manner. Bear in mind that software users aren't programmers, they don't think the way programmers or database architects do, and they don't do they work the way we work; nor do they want to. They want to be able to enter data easily and in the most logical manner according to their daily workflow. They want to be able to think, how much can I get done today so that when I go home, I don't have to take work with me, they want to go on vacation without worrying if the new guy will be able to keep up with the flow or if they'll be able to understand the software.

    Business owners want to be able to get the data out in the simplest most efficient manner, they want reports written at a moment's notice, and they want that data represented logically, efficiently and representatively of whatever model they choose for the current report. They care little about workflow, they don't need to know how many departments this piece of data flowed through, where it came from, how it got to where it is now. They want to know what the piece of data is, what it represents and what does it mean to the business as a whole.

    To a business owner, the data is far more important than the piece of software. To the end user who has pressure to do ten times more in ten times less time, the software needs to provide them with a means of getting as much data into the database as possible in the shortest amount of time.

    So how do you decide which to design first, the GUI or the data model? How much money is going to be saved in the longer term? Does the company have 500 users entering data into this piece of software and are they doing it in the most efficient manner? Does the company have 500 report writers and can they get at the data quickly and efficiently? How long is a piece of string?

    Design your data model for the data analysts - make it as clean, efficient as simple as possible to get the data out in a comprehensive format.

    Design your GUI for the end users and make it as clean, simple and efficient as it can be for those users to get as much data into your database as quickly and as simply as possible without having to be a rocket scientist. Frequently end users are barely computer literate in comparison to those writing the software and extracting the data.

    From the outset, always keep in mind how you're going to wire the two together because if you don't, you'll end up with two ends and no way to provide a middle and your project will fall to pieces...

    More money is wasted putting data into a system and getting the data out than writing the software that does the wiring between the two ends. A team of developers doesn't cost nearly what it costs a company whose users are inputting inaccurate data inefficiently and poor quality reports because the data analysts can't get at that inaccurate data efficiently and spend a week writing a report that realistically shouldn't be taking more than an hour or two and when it is written is no help anyway.

  • It's one that works either way, but personally I'm leaning toward designing from the UI back more and more.

    The main reason comes down to being able to create supporting automated tests.

    One of the strengths of automated testing is the flexibility to refactor and change your code as you go. However the UI is typically the hardest to change, and the one that often requires the most work to get right.

    So for that reason, I advocate designing the UI, get it as close as possible to the finished version, then move backwards creating your middle and back end to support the operations carried out in that GUI.

    With a relatively stable (and difficult to test) UI in place, you're free to mold the other layers with a lot more flexibility once you've got good test coverage for them.

    If you design from the database up, you'll end up with a stable, easy to test database, and a LOT of messing around getting the GUI just right to match what you've done with the DB - which ends up taking a lot longer as you're making the most changes to the level of the system that is the hardest to test and has the lowest test coverage.

    Plus the fact that DB driven designed apps end up having no personality and are difficult to use. They look like the same MS Access form for each screen, except with different fields.

  • From experience, designing the database first (based on requirements), can lead to things going very smoothly.

    This is especially true if your data does not just relate to data entered in the UI, but may include pre-existing data related to or imported for the project.

    On a mid-size project I may go through 100+ iterations of the DB using an ERD diagramming tool like Erwin or Power SQL. Then, click the forward-engineer button to get the DDL.

    The domain objects will typically look a lot like your main tables, however they will often have collections where the DB has look-up tables, etc. Also your domain objects may have other objects in them for organizational purposes, etc.

    Then wire up a DAL either hand-rolled or ORM of your choice.

    Thing is, none of the tools designed to automate this process seem to do it 100%. In a code utopia, I guess you could just create the DB model and have the perfect domain model created or vice versa, and than get a perfect ORM with a few clicks. In reality, this is a lot harder than it sounds, and subtle issues can arise like performance and flexibility.

0 comments:

Post a Comment