Sunday, March 20, 2011

Whats the actual file size?

(Only programmers would know)

In Windows XP, under File Properties, Whats the actual file size?.. Size or Size-on-disk?

I'm looking for the exact size of the file, in bytes.. (1,024 bytes NOT 1,000 bytes)


Great! So blame it on the disk!, Why not call it Size Windows takes to store this file?

From stackoverflow
  • Size. Size-on-disk is the number of blocks * blocksize required to store the file on the drive.

  • Size is the actual number of bytes in the file.

    Size on disk is the number of bytes the file takes up on disk. This is generally a higher number because disk space can only be allocated in blocks, so there will be extra slack space reserved on disk at the end of the file to fill up the last block. But this slack space is not actually part of the file.

    Tony Lambert : see my comment.
    YotaXP : As evidence of this, try setting the 'compress' attribute on a relatively larger file. Next time you view the Properties dialog of that file, the Size on Disk field will likely show a smaller number than the other Size field. This compression is a feature of NTFS itself.
  • Mmmm, define "real". The file will always occupy some integral number of blocks; I forget what the blocksize is in the Windows file system (I seriously haven't used Windows in about 10 years) but I'm sure you can get the number of blocks out of DIR or, worst case, by installing Cygwin and using ls. So, the "real" file size is blocksize×number of blocks in that sense.

    Or, since the file can end in the middle of the block, the "real" file size could be the actual number of bytes, in which case wc -c would give that.

  • Windows has always calculated the file size correctly when showing in larger units (1KB = 1,024B, etc).

    Size is the actual file size, while size on disk is the actual size taken on the hard drive. Since hard drives are broken up into blocks, you have to consume a block completely. I would be like having a bucket that holds 4L and you have 5L of water, you would need two buckets to hold the water.

  • size on disk is refering to the fact that the disk is divided into blocks so a disk using 4kb blocks will report a size on disk of 4k

    I made a text file with one character in it and it correctly reports that the size of the file is 1 bytes. Size on disk is 4k because that is the smallest space on disk that a file can take.

  • NTFS can store very small files in the directory entry so they effectively take no size on disk, and are very quick to access.

    Tony

  • I don't beleive this is a windows specific issue. Because disks are allocated in blocks, the file will potentially take up more space on disk than it's actual size if the file size is not an exact mulitple of the disks block size.

    Consider:

    File         |------3.4k------|   |-------------4.1k--------| 
    Disk Blocks  |--------4k----------|--------4k----------|--------4k----------|
    

    Files on disk must be aligned to the allocated blocks (they must start where a block starts) so for example, if a files actual size was 3.4k, and disk block size was 4k, the files size on disk would be 4k because even though there are only 3.4k of data in the file, it effectively is taking up 4k on the disk because you can't use the remainder of that block for anything.

  • Before flaming Windows about wasting that space on the disk, you should understand a bit about how pretty much all filesystems manage space on hard disks.

    The "smallest" addressable lump of space on a HDD is a sector (often 512 bytes), and you cannot read or write smaller than that. If you want to write one byte in a sector, you must read the sector, modify the byte, and write the sector back. That's how the physics/electronics work. A "block" is usually one or more sectors, and represents the smallest chunk of space that the filesystem will allocate for a file. Old DOS "FAT-Based" filesystems increased the block size as disk size increased.

    For many reasons, filesystems prefer NOT to have multiple files sharing the same sector. Imagine if you want to make a file a little longer - that "wasted space" at the end of the sector can be used, before additional sectors are allocated to make the file bigger.

    And, while it's theoretically possible to have a filesystem which can allocate disk files on a per byte (or even, per-bit) basis, the amount of housekeeping storage on disk to keep track of this would rocket and probably outweigh any advantages you might have gained. And the performance would likely by very poor...

0 comments:

Post a Comment