Sunday, January 23, 2011

Optimizing PHP<>MySQL performance

I am trying to optimize my PHP<>MySQL on this test script:

<?
  for($i=0;$i<100;$i++)//Iterations count
    $res.= var_dump(loadRow("select body_ru from articles where id>$i*50 limit 100"));

  print_r($res);
?>

I have APC, and article table have an index on id. Also, all these queries are hitting query cache, so sole MySQL performance if great.

But when I am using ab -c 10 -t 10 to bench this script, I am getting:

100 iterations: ~100req/sec (~10'000 MySQL queries per second)

5 iteration: ~200req/sec

1 iteration: ~380req/sec

0 itteration: ~580req/sec

I've tried to disable persistent connections in PHP - it made it slower a bit. So, how can I make it work faster, provided that MySQL is not limiting performance here?

So this looks like establishing connection or something else is taking too much time...

  • yes, i know you say that mysql is not limiting factor but anyway try this:

    <?
      for($i=0;$i<100;$i++){
        $down = $i*50;
        $up = ($i+1)*50;
        print_r(var_dump(loadRow("select body_ru from articles where id>$down and $id<=$up "));        
     }
    ?>
    

    mysql handles limit in strange way... it will generate the whole result anyway and then send only first n rows to the client. so filter on id from both upper and lower boundaries.

    slightly more stackoverflow-ish part:

    the code is not 100% equivalent of yours - in your case you've been receiving overlapping chunks.

    what is inside loadRow - does it use mysqli? mysql? some abstraction like adodb? if you tune for performance skip adodb-alike libraries.

    also - i would trust much more innodb_buffer_pool than query cache.

    please let us know how proposed change affected the performance.

    BarsMonster : loadRow is mysql_fetch_assoc(sqlexec($sql))) with some tiny stuff. As I was saying, all queries are in query cache, that means that their complexity does not matter at all, they all are cached. So doing this does not (and could not) improve performance.
    From pQd
  • mysql_pconnect is going to be faster, but, is going away (and I don't believe it is even supported with mysqli). Persistent connections remove the connection/teardown time.

    Is your mysql connecting via TCP or socket? If TCP, is compression or SSL turned on? Compression is good when you expect larger responses, but, on small responses you spend more cpu time compressing/decompressing on both sides. You want to use compression about 99% of the time. SSL connections between php and mysql will be definite performance hinderances.

    If you are connecting via TCP, skip-name-resolve in mysql will speed things up a bit. Using an IP rather than a domain name in your grants would then be required.

    However, your assumption that these queries would likely be in the cache is dependent on how large your cache is and how large your result set is. If your queries average 400k per result set (assuming a 4k article length), you would need at least a 40mb query cache to cache things. If you are sending back any integer fields, remember that the libmysqlclient library converts integers to ascii, and then php must convert it to whatever datatype when it receives the results.

    mysql_fetch_assoc is slightly slower than mysql_fetch_row, but, not enough that you would find it worthwhile to convert your code to use it.

    I believe you're probably running into an issue with the connection to the mysql server. If you're using TCP, you have a few changes you're going to want to make. If you're using a socket connection, then you're probably not going to be able to tweak too much more performance out of it.

    @pQd, in my experience, I have found that the query optimizer on a limit clause on an indexed query will be answered from the index, and using a ranged index wouldn't provide much change in the results. The ranged query also wouldn't take into account article IDs that were later deleted which would throw off pagination. Article 100-150 if 20 of those articles are deleted would return 30 rows, which may not fill your page properly. A limit clause for data presentation is usually the right way. The only time mysql will do the entire query is when you use sql_count_rows which is another useful tool for pagination. You want to make sure you are answering those queries from the index to prevent tablescans.

    pQd : can you point me to some source explaining handling of limit [and as you suggest using index for that] if there is no upper boundary for a key? i was living under the impression that mysql was always doing 'the stupid thing'. i based my answer on watching rows column in explain select ... on very large table where in one case i had open-ended range and limit and in another - range limited from lower and higher end.
    pQd : @cd34 - thanks.
    BarsMonster : mysql_pconnect is indeed a little faster than mysql_connect (~2-3%) I was using MySQL on localhost. After increasing size of query cache and switching to sockets from TCP, I've got nice speed improvement: 100 iter.: 188 rq/s (so I am getting 18800 SQL queries per second) 5: iter.: 377 rq/s 1: iter.: 514 rq/s 0 iter.: 526 rq/s
    BarsMonster : Forgot to turn on persistern connections: 100 iter.: 201 rq/s (so I am getting 20100 SQL queries per second) 5: iter.: 440 rq/s 1: iter.: 619 rq/s 0 iter.: 633 rq/s

0 comments:

Post a Comment