Saturday, February 12, 2011

Are there good reasons not to use an ORM?

During my apprenticeship, I have used NHibernate for some smaller projects which I mostly coded and designed on my own. Now, before starting some bigger project, the discussion arose how to design data access and whether or not to use an ORM layer. As I am still in my apprenticeship and still consider myself a beginner in enterprise programming, I did not really try to push in my opinion, which is that using an object relational mapper to the database can ease development quite a lot. The other coders in the development team are much more experienced than me, so I think I will just do what they say. :-)

However, I do not completely understand two of the main reasons for not using NHibernate or a similar project:

  1. One can just build one’s own data access objects with SQL queries and copy those queries out of Microsoft SQL Server Management Studio.
  2. Debugging an ORM can be hard.

So, of course I could just build my data access layer with a lot of SELECTs etc, but here I miss the advantage of automatic joins, lazy-loading proxy classes and a lower maintenance effort if a table gets a new column or a column gets renamed. (Updating numerous SELECT, INSERT and UPDATE queries vs. updating the mapping config and possibly refactoring the business classes and DTOs.)

Also, using NHibernate you can run into unforeseen problems if you do not know the framework very well. That could be, for example, trusting the Table.hbm.xml where you set a string’s length to be automatically validated. However, I can also imagine similar bugs in a “simple” SqlConnection query based data access layer.

Finally, are those arguments mentioned above really a good reason not to utilise an ORM for a non-trivial database based enterprise application? Are there probably other arguments they/I might have missed?

(I should probably add that I think this is like the first “big” .NET/C# based application which will require teamwork. Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now.)

  • There's been an explosion of growth with ORMs in recent years and your more experienced coworkers may still be thinking in the "every database call should be through a stored procedure" mentality.

    Why would an ORM make things harder to debug? You'll get the same result whether it comes from a stored proc or from the ORM.

    I guess the only real detriment that I can think of with an ORM is that the security model is a little less flexible.

    EDIT: I just re-read your question and it looks they are copy and pasting the queries into inline sql. This makes the security model the same as an ORM, so there would be absolutely no advantage over this approach over an ORM. If they are using unparametrized queries then it would actually be a security risk.

    hangy : Harder to debug: If an exception is thrown, you probably need to know the framework instead of knowing the SqlConnection’s exceptions etc.
    Giovanni Galbo : Wouldn't the sql exception be part of the ORM exception?
    mattlant : Yes, most wrap the exception, so you would still be able to get the originating exception thorugh .inner
    hangy : As I said, I did not bring up that argument. :) Of course, the real SQL exception should be somewhere down the stack trace. As you might have read from my question, I kind of agree with your answer, but I will wait with accepting it. Maybe someone else comes up with really good reasons against ORM.
  • The short answer is yes, there are really good reasons. As a matter of fact there are cases where you just cannot use an ORM.

    Case in point, I work for a large enterprise financial institution and we have to follow a lot of security guidelines. To meet the rules and regulations that are put upon us, the only way to pass audits is to keep data access within stored procedures. Now some may say that's just plain stupid, but honestly it isn't. Using an ORM tool means the tool/developer can insert, select, update or delete whatever he or she wants. Stored procedures provide a lot more security, especially in environments when dealing with client data. I think this is the biggest reason to consider. Security.

    mattlant : This doesnt fit the OP's situation though. I understand and agree with what you say, but the OP makes it sound like they just copy queries as text into code and use that rather than SP's. This still leaves security and control in the hands of the developers.
    hangy : I fully understand that an ORM may not the appropriate solution in those cases. However, as mattlant pointed out, this is not the case in this situation. :)
    Kyralessa : Can this concern be mitigated by having the O/R mapper access the database through views instead of tables?
    Jason Baker : Triggers also spring to mind as well.
    Mendelt : I think this is more a limitation of current orm libraries that don't support stored procedures than a fundamental limitation in or mapping. There are orm's that can handle stored proc's and views and there's a lot to say for the orm + stored procs solution.
    David Kemp : In the case of a good ORM, you should be able to get it to use SPs anyway (of course, this mitigates a lot of the benefits...)
    Tim Scott : The topic of why SPs do not really provide more security than an ORM is well tilled ground. Here's just one article that addresses it well: http://ayende.com/Blog/archive/2007/11/18/A-false-sense-of-security.aspx
    Tom : @Tim - too bad that's not really accurate. It's quite possible for a database to contain tables or columns that are inaccessible to the web server. Stored procedures and views expose the subset that is necessary for access, and restrict access to the rest of the DB (unless you're using a toy DB in the first place). They also make it much easier to add auditing to the easy-to-hack account without forcing triggers onto the underlying database tables themselves (making back-end procedures much more costly or annoying).
    Min : Well, in Sql Server 2005 can't you just designate a schema in a database just for the ORM layer? Or is that not enough? If the applications are web apps, then one thing is connecting to a db. Security then left to the application layer.
    DoctaJonez : Of course you can restrict what the user can do with an ORM. You just set the user security settings in SQL and give them priviliges to insert/update/delete what you want them to. It would be the same as setting security privelages for stored procedures
  • I worked on one project where not using an ORM was very successfully. It was a project that

    1. Had to be horizontally scalealbe from the start
    2. Had to be developed quickly
    3. Had a relatively simple domain model

    The time that it would have taken to get NHibernate to work in a horizontally partitioned structure would have been much longer than the time that it took to develop a super simple datamapper that was aware of our partitioning scheme...

    So, in 90% of projects that I have worked on an ORM has been an invaluable help. But there are some very specific circumstances where I can see not using an ORM as being best.

    hangy : You gave some good points against using an ORM in some specific cases. However, this project belongs to those 90 % where the ORM could possibly help to reduce costs of development and maintenance.
    Mike : Totally agree - sorry for not answering your question directly! I believe that the specific reasons you mentioned are quite invalid :)
    Mauricio Scheffer : Horizontally scaling NHibernate: http://www.blechie.com/WPierce/archive/2008/06/08/Poor-Mans-Shards-in-NHibernate.aspx, http://darioquintana.com.ar/blogging/?p=25
    From Mike
  • Runtime performance is the only real downside I can think of but I think that's more than a fair trade-off for the time ORM saves you developing/testing/etc. And in most cases you should be able to locate data bottlenecks and alter your object structures to be more efficient.

    I haven't used Hibernate before but one thing I have noticed with a few "off-the-shelf" ORM solutions is a lack of flexibility. I'm sure this depends on which you go with and what you need to do with it.

    hangy : I remember some people saying that the optimised queries and not loading data not necessary in some situations (ie. lazy loading of joins) can actually speed up things. Manually implementing this in a custom DAL might be more work and worth less time.
    From Oli
  • I think that using an ORM is still a good idea. Especially considering the situation you give. It sounds by your post you are the more experienced when it comes to the db access strategies, and I would bring up using an ORM.

    There is no argument for #1 as copying and pasting queries and hardcoding in text gives no flexibility, and for #2 most orm's will wrap the original exception, will allow tracing the queries generated, etc, so debugging isnt rocket science either.

    As for validation, using an ORM will also usually allow much easier time developing validation strategies, on top of any built in validation.

    Writing your own framework can be laborious, and often things get missed.

    EDIT: I wanted to make one more point. If your company adopts an ORM strategy, that further enhances its value, as you will develop guidelines and practices for using and implementing and everyone will further enhance their knowledge of the framework chosen, mitigating one of the issues you brought up. Also, you will learn what works and what doesnt when situations arise, and in the end it will save lots of time and effort.

    From mattlant
  • There are a wide range of common problems for which ORM tools like Hibernate are a god-send, and a few where it is a hindrance. I don't know enough about your project to know which it is.

    One of Hibernate's strong points is that you get to say things only 3 times: every property is mentioned in the class, the .hbm.xml file, and the database. With SQL queries, your properties are in the class, the database, the select statements, the insert statements, the update statements, the delete statements, and all the marshalling and unmarshalling code supporting your SQL queries! This can get messy fast. On the other hand, you know how it works. You can debug it. It's all right there in your own persistence layer, not buried in the bowels of a 3rd party tool.

    Hibernate could be a poster-child for Spolsky's Law of Leaky Abstractions. Get a little bit off the beaten path, and you need to know deep internal workings of the tool. It can be very annoying when you know you could have fixed the SQL in minutes, but instead you are spending hours trying to cajole your dang tool into generating reasonable SQL. Debugging is sometimes a nightmare, but it's hard to convince people who have not been there.

    EDIT: You might want to look into iBatis.NET if they are not going to be turned around about NHibernate and they want control over their SQL queries.

    EDIT 2: Here's the big red flag, though: "Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now." So, these "experienced" developers, what are they experienced in developing? Their job security? It sounds like you might be among people who are not particularly interested in the field, so don't let them kill your interest. You need to be the balance. Put up a fight.

    jonathan-stafford : The repetition isn't true any longer. If you use JPA annotations, you only need to specify things once. Hibernate will even build your database create statements for you, although that's most useful for determining if your mapping was correct.
  • Let me first say that ORMs can make your development life easier if integrated properly, but there are a handful of problems where the ORM can actually prevent you from achieving your stated requirements and goals.

    I have found that when designing systems that have heavy performance requirements that I am often challenged to find ways to make the system more performant. Many times, I end up with a solution that has a heavy write performance profile (meaning we're writing data a lot more than we're reading data). In these cases, I want to take advantage of the facilities the database platform offers to me in order to reach our performance goals (it's OLTP, not OLAP). So if I'm using SQL Server and I know I have a lot of data to write, why wouldn't I use a bulk insert... well, as you may have already discovered, most ORMS (I don't know if even a single one does) do not have the ability to take advantage of platform specific advantages like bulk insert.

    You should know that you can blend the ORM and non-ORM techniques. I've just found that there are a handful of edge cases where ORMs can not support your requirements and you have to work around them for those cases.

    From Ajaxx
  • There are two aspects of ORMs that are worrisome. First, they are code written by someone else, sometimes closed source, sometimes open source but huge in scope. Second, they copy the data.

    The first problem causes two issues. You are relying on outsiders code. We all do this, but the choice to do so should not be taken lightly. And what if it doesn't do what you need? When will you discover this? You live inside the box that your ORM draws for you.

    The second problem is one of two phase commit. The relational database is being copied to a object model. You change the object model and it is supposed to update the database. This is a two phase commit and not the easiest thing to debug.

    Jason Bunting : Umm...if you are using, let's say .NET, you are relying on their code (which you cannot modify). I don't see how that is any more "ok" than relying on NHibernate's code, for example. Being paranoid of libraries you have not written yourself is relatively ridiculous. Sounds like you suffer from NIH
    Javier : compilers and VMs are a relatively stable part of CS, and not so hard to veryfy with tests. an ORM design has a lot of ways to be 'slightly wrong', or incomplete documentation. all this makes it harder to trust than something so basic as the compiler.
    dacracot : It is a warning... not a do not use statement. And that is only one warning out of three issues.
    Adam Byrtek : @dacracot: You are relying on lots of foreign code all the time... starting from your operating system, so I don't think your point is a valid one. Second point can be mitigated using optimistic locking, moreover it doesn't have much to do with two-phase commit.
    Tom : dacracot is spot-on when speaking of some of the less mature ORMs. I'm sure there are some that rock balls, but most will be imperfect, and it can be an issue in some (not most) environments. Regarding the two-phase commit... there are ways to model the DB transaction itself in code, but why add a layer of indirection that doesn't provide any abstraction?
    From dacracot
  • Personally, i have (until recently) opposed to use an ORM, and used to get by with writing a data access layer encapsulating all the SQL commands. The main objection to ORMs was that I didn't trust the ORM implementation to write exactly the right SQL. And, judging by the ORMs i used to see (mostly PHP libraries), i think i was totally right.

    Now, most of my web development is using Django, and i found the included ORM really convenient, and since the data model is expressed first in their terms, and only later in SQL, it does work perfectly for my needs. I'm sure it wouldn't be too hard to outgrow it and need to supplement with hand-written SQL; but for CRUD access is more than enough.

    I don't know about NHibernate; but i guess it's also "good enough" for most of what you need. But if other coders don't trust it; it will be a prime suspect on every data-related bug, making verification more tedious.

    You could try to introduce it gradually in your workplace, focus first on small 'obvious' applications, like simple data access. After a while, it might be used on prototypes, and it might not be replaced...

    From Javier
  • Hi, I think that maybe when you work on bigger systems you can use a code generator tool like CodeSmith instead of a ORM... I recently found this: Cooperator Framework which generates SQL Server Stored Procedures and also generates your business entities, mappers, gateways, lazyload and all that stuff in C#...check it out...it was written by a team here in Argentina...

    I think it's in the middle between coding the entire data access layer and use a ORM...

    Cheers from Argentina!

    PS. sory my english s@$cks

    sheepsimulator : @palboide86 - Your English in this answer is excellent.
    From pabloide86
  • First off - using an ORM will not make your code any easier to test, nor will it necessarily provide any advantages in a Continuous Integration scenerio.

    In my experience, whilst using an ORM can increase the speed of development, the biggest issues you need to address are:

    1. Testing your code
    2. Maintaining your code

    The solutions to these are:

    1. Make your code testable (using SOLID principles)
    2. Write automated tests for as much of the code as possible
    3. Run the automated tests as often as possible

    Coming to your question, the two objections you list seem more like ignorance than anything else.

    Not being able to write SELECT queries by hand (which, I presume, is why the copy-paste is needed) seems to indicate that there's a urgent need for some SQL training.

    There are two reasons why I'd not use an ORM:

    1. It is strictly forbidden by the company's policy (in which case I'd go work somewhere else)
    2. The project is extremely data intensive and using vendor specific solutions (like BulkInsert) makes more sense.

    The usual rebuffs about ORMs (NHibernate in particular) are:

    1. Speed

      There is no reason why using an ORM would be any slower than hand coded Data Access. In fact, because of the caching and optimisations built into it, it can be quicker. A good ORM will produce a repeatable set of queries for which you can optimise your schema. A good ORM will also allow efficient retrieval of associated data using various fetching strategies.

    2. Complexity

      With regards to complexity, using an ORM means less code, which generally means less complexity. Many people using hand-written (or code generated) data access find themselves writing their own framework over "low-level" data access libraries (like writing helper methods for ADO.Net). These equate to more complexity, and, worse yet, they're rarely well documented, or well tested.
      If you are looking specifically at NHibernate, then tools like Fluent NHibernate and Linq To NHibernate also soften the learning curve.

    The thing that gets me about the whole ORM debate is that the same people who claim that using an ORM will be too hard/slow/whatever are the very same people who are more than happy using Linq To Sql or Typed Datasets. Whilst the Linq To Sql is a big step in the right direction, it's still light years behind where some of the open source ORMs are. However, the frameworks for both Typed Datasets and for Linq To Sql is still hugely complex, and using them to go too far of the (Table=Class) + (basic CRUD) is stupidly difficult.

    My advice is that if, at the end of the day, you can't get an ORM, then make sure that your data access is separated from the rest of the code, and that you you follow the Gang Of Four's advice of coding to an interface. Also, get a Dependancy Injection framework to do the wiring up.

    (How's that for a rant?)

    From David Kemp
  • If it is an OLAP database (e.g. static, read-only data used for reporting/analytics, etc.) then implementing an ORM framework is not appropriate. Instead, using the database's native data access functionality such as stored procedures would be preferable. ORMs are better suited for transactional (OLTP) systems.

    From Ray Vega
  • I suggest this reading for a list of the downsides of ORMs.

    http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx

    For my self, I've found ORMs very useful for most applications I've written!

    /Asger

  • The experience I've had with Hibernate is that its semantics are subtle, and when there's problems, it's a bit hard to understand what's going wrong under the hood. I've heard from a friend that often one starts with Criteria, then needs a bit more flexibility and needs HQL, and later notices that after all, raw SQL is needed (for example, Hibernate doesn't have union AFAIK).

    Also with ORM, people easily tend to overuse existing mappings/models, which leads to that there's an object with lots of attributes that aren't initiliazed. So after the query, inside transaction Hibernate makes additional data fetching, which leads to potential slow down. Also sadly, the hibernate model object is sometimes leaked into the view architecture layer, and then we see LazyInitializationExceptions.

    To use ORM, one should really understand it. Unfortunately one gets easily impression that it's easy while it's not.

    Tom : Hibernate doesn't support UNION? That's a pretty fundamental part of set theory! I suppose it just indicates a different way to think about the problem.
    From egaga
  • The sweet spot of ORMs

    ORMs are useful for automating the 95%+ of queries where they are applicable. Their particular strength is where you have an application with a strong object model architecture and a database that plays nicely with that object model. If you're doing a new build and have strong modelling skills on your team then you will probably get good results with an ORM.

    You may well have a handful of queries that are better done by hand. In this case, don't be afraid to write a few stored procedures to handle this. Even if you intend to port your app across multiple DBMS platforms the database dependent code will be in a minority. Bearing in mind that you will need to test your application on any platform on which you intend to support it, a little bit of extra porting effort for some stored procedures isn't going to make a lot of difference to your TCO. For a first approximation, 98% portable is just as good as 100% portable, and far better than convoluted or poorly performing solutions to work around the limits of an ORM.

    I have seen the former approach work well on a very large (100's of staff-years) J2EE project.

    Where an ORM may not be the best fit

    In other cases there may be approaches that suit the application better than an ORM. Fowler's Patterns of Enterprise Application Architecture has a section on data access patterns that does a fairly good job of cataloguing various approaches to this. Some examples I've seen of situations where an ORM may not be applicable are:

    • On an application with a substantial legacy code base of stored procedures you may want to use a functionally oriented (not to be confused with functional languages) data access layer to wrap the incumbent sprocs. This re-uses the existing (and therefore tested and debugged) data access layer and database design, which often represents quite a substantial development and testing effort, and saves on having to migrate data to a new database model. It is often quite a good way wrapping Java layers around legacy PL/SQL code bases, or re-targeting rich client VB, Powerbuilder or Delphi apps with web interfaces.

    • A variation is where you inherit a data model that is not necessarily well suited to O-R mapping. If (for example) you are writing an interface that populates or extracts data from a foreign interface you may be better off working direclty with the database.

    • Financial applications or other types of systems where cross-system data integrity is important, particularly if you're using complex distributed transactions with two-phase commit. You may need to micromanage your transactions better than an ORM is really suited to supporting.

    • High-performance applications where you want to really tune your database access. In this case, it may be preferable to work at a lower level.

    • Situations where you're using an incumbent data access mechanism like ADO.Net that's 'good enough' and playing nicely with the platform is of greater benefit than the ORM brings.

    • Sometimes data is just data - it may be the case (for example) that your application is working with 'transactions' rather than 'objects' and that this is a sensible view of the domain. An example of this might be a financials package where you've got transactions with configurable analysis fields. While the application itself may be built on an O-O platform, it is not tied to a single business domain model and may not be aware of much more than GL codes, accounts, document types and half a dozen analysis fields. In this case the application isn't aware of a business domain model as such and an object model (beyond the ledger structure itself) is not relevant to the application.

    ObiWanKenobi : "or other types of systems where data integrity is important"... Social websites aside, if the integrity of your data is not important, why bother building a system at all?
    ConcernedOfTunbridgeWells : The point specifically refers to distributed transactions where multiple systems must guarantee a commit or rollback synchronously or off a message queue. Not all of these systems will be built for you and therefore will not necessarily support an ORM. You may have to explicitly manage the transactions, which may require that you use a lower level tooklit than an ORM.
  • When you need to update 50000000 records. Set a flag or whatever.

    Try doing this using an ORM without calling a stored procedure or native SQL commands..

    dotjoe : http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html#batch-direct

0 comments:

Post a Comment