Monday, February 7, 2011

How do you manage SQL Queries

At the moment my code (PHP) has too many SQL queries in it. eg...

// not a real example, but you get the idea...
$results = $db->GetResults("SELECT * FROM sometable WHERE iUser=$userid");
if ($results) {
    // Do something
}

I am looking into using stored procedures to reduce this and make things a little more robust, but I have some concerns..

I have hundreds of different queries in use around the web site, and many of them are quite similar. How should I manage all these queries when they are removed from their context (the code that uses the results) and placed in a stored procedure on the database?

  • Use a ORM package, any half decent package will allow you to

    1. Get simple result sets
    2. Keep your complex SQL close to the data model

    If you have very complex SQL, then views are also nice to making it more presentable to different layers of your application.

  • We were in a similar predicament at one time. We queried a specific table in a variety of ways, over 50+.

    What we ended up doing was creating a single Fetch stored procedure that includes a parameter value for the WhereClause. The WhereClause was constructed in a Provider object, we employed the Facade design pattern, where we could scrub it for any SQL injection attacks.

    So as far as maintenance goes, it is easy to modify. SQL Server is also quite the chum and caches the execution plans of dynamic queries so the the overall performance is pretty good.

    You'll have to determine the performance drawbacks based on your own system and needs, but all and all, this works very well for us.

    mattruma : We also created a an object, called FilterCriteria, that was a *black-box* for building the actual query string.
    From mattruma
  • This other question also has some useful links in it...

    From rikh
  • First up, you should use placeholders in your query instead of interpolating the variables directly. PDO/MySQLi allow you to write your queries like:

    SELECT * FROM sometable WHERE iUser = ?
    

    The API will safely substitute the values into the query.

    I also prefer to have my queries in the code instead of the database. It's a lot easier to work with an RCS when the queries are with your code.

    I have a rule of thumb when working with ORM's: if I'm working with one entity at a time, I'll use the interface. If I'm reporting/working with records in aggregate, I typically write SQL queries to do it. This means there's very few queries in my code.

  • There are some libraries, such as MDB2 in PEAR that make querying a bit easier and safer.

    Unfortunately, they can be a bit wordy to set up, and you sometimes have to pass them the same info twice. I've used MDB2 in a couple of projects, and I tended to write a thin veneer around it, especially for specifying the types of fields. I generally make an object that knows about a particular table and its columns, and then a helper function in that fills in field types for me when I call an MDB2 query function.

    For instance:

    function MakeTableTypes($TableName, $FieldNames)
    {
        $Types = array();
    
        foreach ($FieldNames as $FieldName => $FieldValue)
        {
            $Types[] = $this->Tables[$TableName]['schema'][$FieldName]['type'];
        }
    
        return $Types;
    }
    

    Obviously this object has a map of table names -> schemas that it knows about, and just extracts the types of the fields you specify, and returns an matching type array suitable for use with an MDB2 query.

    MDB2 (and similar libraries) then handle the parameter substitution for you, so for update/insert queries, you just build a hash/map from column name to value, and use the 'autoExecute' functions to build and execute the relevant query.

    For example:

    function UpdateArticle($Article)
    {
        $Types = $this->MakeTableTypes($table_name, $Article);
    
        $res = $this->MDB2->extended->autoExecute($table_name,
            $Article,
            MDB2_AUTOQUERY_UPDATE,
            'id = '.$this->MDB2->quote($Article['id'], 'integer'),
            $Types);
    }
    

    and MDB2 will build the query, escaping everything properly, etc.

    I'd recommend measuring performance with MDB2 though, as it pulls in a fair bit of code that might cause you problems if you're not running a PHP accelerator.

    As I say, the setup overhead seems daunting at first, but once it's done the queries can be simpler/more symbolic to write and (especially) modify. I think MDB2 should know a bit more about your schema, which would simpify some of the commonly used API calls, but you can reduce the annoyance of this by encapsulating the schema yourself, as I mentioned above, and providing simple accessor functions that generate the arrays MDB2 needs to perform these queries.

    Of course you can just do flat SQL queries as a string using the query() function if you want, so you're not forced to switch over to the full 'MDB2 way' - you can try it out piecemeal, and see if you hate it or not.

    From Slacker
  • I had to clean up a project wich many (duplicate/similar) queries riddled with injection vulnerabilities. The first steps I took were using placeholders and label every query with the object/method and source-line the query was created. (Insert the PHP-constants METHOD and LINE into a SQL comment-line)

    It looked something like this:

    -- @Line:151 UserClass::getuser():

    SELECT * FROM USERS;
    

    Logging all queries for a short time supplied me with some starting points on which queries to merge. (And where!)

    From Wimmer
  • I try to use fairly generic functions and just pass the differences in them. This way you only have one function to handle most of your database SELECT's. Obviously you can create another function to handle all your INSERTS.

    eg.

    function getFromDB($table, $wherefield=null, $whereval=null, $orderby=null){
        if($wherefield != null){ $q = "SELECT * FROM $table WHERE $wherefield = '$whereval'"; }
        else { $q = "SELECT * FROM $table"; }
        if($orderby != null){ $q .= " ORDER BY ".$orderby; }
    
        $result = mysql_query($q)) or die("ERROR: ".mysql_error());
        while($row = mysql_fetch_assoc($result)){
            $records[] = $row;
        }
        return $records;
    }
    

    This is just off the top of my head, but you get the idea. To use it just pass the function the necessary parameters:

    eg.

    $blogposts = getFromDB('myblog', 'author', 'Lewis', 'date DESC');
    

    In this case $blogposts will be an array of arrays which represent each row of the table. Then you can just use a foreach or refer to the array directly:

    echo $blogposts[0]['title'];
    
    From lewis
  • The best course of action for you will depend on how you are approaching your data access. There are three approaches you can take:

    • Use stored procedures
    • Keep the queries in the code (but put all your queries into functions and fix everything to use PDO for parameters, as mentioned earlier)
    • Use an ORM tool

    If you want to pass your own raw SQL to the database engine then stored procedures would be the way to go if all you want to do is get the raw SQL out of your PHP code but keep it relatively unchanged. The stored procedures vs raw SQL debate is a bit of a holy war, but K. Scott Allen makes an excellent point - albeit a throwaway one - in an article about versioning databases:

    Secondly, stored procedures have fallen out of favor in my eyes. I came from the WinDNA school of indoctrination that said stored procedures should be used all the time. Today, I see stored procedures as an API layer for the database. This is good if you need an API layer at the database level, but I see lots of applications incurring the overhead of creating and maintaining an extra API layer they don't need. In those applications stored procedures are more of a burden than a benefit.

    I tend to lean towards not using stored procedures. I've worked on projects where the DB has an API exposed through stored procedures, but stored procedures can impose some limitations of their own, and those projects have all, to varying degrees, used dynamically generated raw SQL in code to access the DB.

    Having an API layer on the DB gives better delineation of responsibilities between the DB team and the Dev team at the expense of some of the flexibility you'd have if the query was kept in the code, however PHP projects are less likely to have sizable enough teams to benefit from this delineation.

    Conceptually, you should probably have your database versioned. Practically speaking, however, you're far more likely to have just your code versioned than you are to have your database versioned. You are likely to be changing your queries when you are making changes to your code, but if you are changing the queries in stored procedures stored against the database then you probably won't be checking those in when you check the code in and you lose many of the benefits of versioning for a significant area of your application.

    Regardless of whether or not you elect not to use stored procedures though, you should at the very least ensure that each database operation is stored in an independent function rather than being embedded into each of your page's scripts - essentially an API layer for your DB which is maintained and versioned with your code. If you're using stored procedures, this will effectively mean you have two API layers for your DB, one with the code and one with the DB, which you may feel unnecessarily complicates things if your project does not have separate teams. I certainly do.

    If the issue is one of code neatness, there are ways to make code with SQL jammed in it more presentable, and the UserManager class shown below is a good way to start - the class only contains queries which relate to the 'user' table, each query has its own method in the class and the queries are indented into the prepare statements and formatted as you would format them in a stored procedure.

    // UserManager.php:
    
    class UserManager
    {
        function getUsers()
        {
            $pdo = new PDO(...);
            $stmt = $pdo->prepare('
                SELECT       u.userId as id,
                             u.userName,
                             g.groupId,
                             g.groupName
                FROM         user u
                INNER JOIN   group g
                ON           u.groupId = g.groupId
                ORDER BY     u.userName, g.groupName
            ');
            // iterate over result and prepare return value
        }
    
        function getUser($id) {
            // db code here
        }
    }
    
    // index.php:
    require_once("UserManager.php");
    $um = new UserManager;
    $users = $um->getUsers();
    foreach ($users as $user) echo $user['name'];
    

    However, if your queries are quite similar but you have huge numbers of permutations in your query conditions like complicated paging, sorting, filtering, etc, an Object/Relational mapper tool is probably the way to go, although the process of overhauling your existing code to make use of the tool could be quite complicated.

    If you decide to investigate ORM tools, you should look at Propel, the ActiveRecord and SqlMap components of PRADO, or my new favourite, Doctrine. Each of these gives you the ability to programmatically build queries to your database with all manner of complicated logic. Doctrine is the most fully featured, allowing you to template your database with things like the Nested Set tree pattern out of the box!

    In terms of performance, stored procedures are the fastest, but generally not by much over raw sql. ORM tools can have a significant performance impact in a number of ways - inefficient or redundant querying, huge file IO while loading the ORM libraries on each request, dynamic SQL generation on each query... all of these things can have an impact, but the use of an ORM tool can drastically increase the power available to you with a much smaller amount of code than creating your own DB layer with manual queries.

    Gary Richardson is absolutely right though, if you're going to continue to use SQL in your code you should always be using PDO's prepared statements to handle the parameters regardless of whether you're using a query or a stored procedure. The sanitisation of input is performed for you by PDO.

    // optional
    $attrs = array(PDO::ATTR_PERSISTENT => true);
    
    // create the PDO object
    $pdo = new PDO("mysql:host=localhost;dbname=test", "user", "pass", $attrs);
    
    // also optional, but it makes PDO raise exceptions instead of 
    // PHP errors which are far more useful for debugging
    $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    
    $stmt = $pdo->prepare('INSERT INTO venue(venueName, regionId) VALUES(:venueName, :regionId)');
    $stmt->bindValue(":venueName", "test");
    $stmt->bindValue(":regionId", 1);
    
    $stmt->execute();
    
    $lastInsertId = $pdo->lastInsertId();
    var_dump($lastInsertId);
    

    Caveat: assuming that the ID is 1, the above script will output 'string(1) "1"'. PDO->lastInsertId() returns the ID as a string regardless of whether the actual column is an integer or not. This will probably never be a problem for you as PHP performs casting of strings to integers automatically.

    The following will output 'bool(true)':

    // regular equality test
    var_dump($lastInsertId == 1);
    

    but if you have code that is expecting the value to be an integer, like is_int or PHP's "is really, truly, 100% equal to" operator:

    var_dump(is_int($lastInsertId));
    var_dump($lastInsertId === 1);
    

    you could run into some issues.

    Edit: Some good discussion on stored procedures here

    rikh : Very comprehensive answer. Thank you.
    From Shabbyrobe
  • Use a ORM framework like QCodo - you can easily map your existing database

    From
  • I'd move all the SQL to a separate Perl module (.pm) Many queries could reuse the same functions, with slightly different parameters.

    A common mistake for developers is to dive into ORM libraries, parametrized queries and stored procedures. We then work for months in a row to make the code "better", but it's only "better" in a development kind of way. You're not making any new features!

    Use complexity in your code only to address customer needs.

    From Andomar

0 comments:

Post a Comment