Thursday, March 24, 2011

Database column sizes for character based data

I've just come across a database where all the character based columns have their size specified in multiples of 8 (e.g. Username = varchar(32), Forename = nvarchar(128) etc.) Would this have any bearing on the performance of the database or is it completely pointless?

Thanks.

N.B. This is in a SQL 2000 database.

From stackoverflow
  • Looks like premature optimisation to me.

    I don't think this will make any performance change on the database.

  • Since they are VARchars the actual space used is based on the content. I would start to worry if they were CHARs.

    -Edoode

  • It's probably just a habit of some old school developer. As it's said - varchar is as long as it needs to be, so 32 or 33 doesn't matter when string length is for example 22.

  • "Would this have any bearing on the performance of the database or is it completely pointless?"

    It has little bearing on performance (not none, but little). It can have an impact on spare space allocation, since the "largest" row size can be quite large with this scheme. Few rows per page can slow down large table retrievals.

    The difference, however, is usually microscopic compared to improper indexing.

    And getting the sizes "right" is not worth the time. In the olden days, old-school DBA's sweated over every character. Disks used to be expensive, and every byte had a real $$$ impact on cost.

    Now that disk is so cheap, DBA time wasted in fussing over sizes costs more than the disk.

  • Would you have the same concern if they were all multiples of 5 or 10, which is what happens in the normal case.?

  • I've never seen it make a difference in the table storage, joins or basic operations.

    But I have seen it make a difference where string processing is involved.

    In SQL Server 2005, the only case I've seen where varchar size makes significant differences is when things are cast to varchar(MAX) or in UDFs. There appears to be some difference there - for instance, I have a system/process I'm porting with a number of key columns that have to be concatenated together into another pseudo keyfield (until I am able to refactor out this abomination) and the query performed significantly better casting the result as varchar(45) as soon as possible in the intermediate CTEs or nested queries.

    I've seen cases where a UDF taking and returning a varchar(MAX) performs significantly more poorly than one taking and returning varchar(50), say. For instance, a trimming or padding function which someone (perhaps me!) was trying to make future proof. varchar(MAX) has it's place, but in my experience it can be dangerous to performance.

    I do not think I profiled a difference between varchar(50) and varchar(1000).

0 comments:

Post a Comment