Friday, April 8, 2011

varchar Fields - Is a Power of Two More Efficient?

Is it more efficient to use a varchar field sized as a power of two vs. another number? I'm thinking no, because for SQL Server the default is 50. However, I've heard (but never confirmed) that sizing fields as a power of 2 is more efficient because they equate to even bytes, and computers process in bits & bytes. So, does a field declared as varchar(32) or varchar(64) have any real benefit over varchar(50)?

From stackoverflow
  • I always thought people choose powers of two for varchar fields because we're geeks and that's what we do. At least that's what I've always done.

  • It depends on the specific database implementation, but I wouldn't expect it to. It's not performing any calculations on the number of characters, generally, so it shouldn't affect performance - only space.

  • The only tangible benefit you will see from using certain maximum lengths is the storage space required for the VARCHAR. A maximum length over 255 would require an extra byte to store the length of the value in each row (and an extra 2 bytes for lengths of 256^2 or greater and so on).

  • For SQL in general? No. For a specific implementation, maybe.

    What is more efficient is not determined by a specification (SQL is just a specification), but how it is implemented in a certain DBMS.

    dmckee : One possible place for this to occur would be in the in-memory format for rows...2^n lengths should reduce the need for alignment offsets. Usually a small effect, I suspect.
    Juliano : Memory alignment happens in multiples, not in powers. On a 32-bit aligned architecture, the best byte offsets are 4×n, for example.
  • no.

    in some other uses, there are some advantages to use structures with a power of two size, mostly because you can fit a nice (power of two) number of these inside another power-of-two-sized structure. but this doesn't apply to a DB fieldsize.

    the only power-of-two-sizing related to VARCHARs is about the exact type of varchar (or TEXT/BLOB in some SQL dialects): if it's less than 256, it can use a single byte to indicate length. if it's less than 65536 (64KB), two bytes are enough, three bytes work up to 16777216 (16MB), four bytes go to 4294967296 (4GB).

    also, it can be argued that VARCHAR(50) is just as expensive as VARCHAR(255), since both will need n+1 bytes of storage.

    of course that's before thinking of Unicode...

0 comments:

Post a Comment