Friday, April 30, 2010

PC industry must use IEC standard for calculating storage

This joke stopped being funny a long time ago, so cleanup your act people!

I am talking about vendors of PC secondary storage who calculate drive capacity in magnitudes of a thousand each. Technically, they are correct, because the SI system (http://en.wikipedia.org/wiki/SI_prefix) works that way. But computers do not use the decimal or SI system, they are binary and calculate data in orders of magnitude that had finally been defined and been given a new abbreviation about 10 years ago by IEC (http://en.wikipedia.org/wiki/Binary_prefix).

To illustrate, let us see what a storage vendor claims and what the OS actually sees.
Vendor - 1,000 (1 kilo) bytes = 1KB.  OS - 1024 bytes = 1KB.
Vendor - 1,000,000 (1 million or mega) bytes = 1MB. OS - 1,048,576 bytes = 1MB.
Vendor - 1,000,000,000 (1 billion or giga) bytes = 1GB. OS - 1,073,741,824 bytes = 1GB
Now the storage vendor or company is scientifically correct, because the SI system is decimal and states that <10 to the power of 6> or one million, qualifies for the "Mega" prefix. Similarly, <10 to the power of 9> or one billion units, qualify for the "Giga" prefix. In contrast, the OS looks at storage in binary, and thus <2 to the power of 10> or 1024 bytes is reported as one KiloByte. Similarly, <2 to the power of 20> or 1048576 bytes are one MegaByte, 2^30 = 1GB, and 2^40 = 1TB.

But because of the vendor using the SI system and the OS using the binary system, people who buy a hard disk drive (HDD) of a certain capacity feel cheated when they see the OS reporting less space. For example, here are a few commonly seen HDD capacities:
2.5 GB HDD = Actual formatted capacity of only 2.33 GB
20 GB HDD = Actual formatted capacity of only 18.63 GB
40 GB HDD = Actual formatted capacity of only 37.25 GB
80 GB HDD = Actual formatted capacity of only 74.51 GB
160 GB HDD = Actual formatted capacity of only 149.01 GB
250 GB HDD = Actual formatted capacity of only 232.83 GB
500 GB HDD = Actual formatted capacity of only 465.66 GB
1 TB HDD = Actual formatted capacity of only 931.32 GB (instead of the 1024 GB you would expect)
2 TB HDD = Actual formatted capacity of only 1862.6 GB

As you go higher up the capacity ladder, the bigger the "perceived" loss of space or the feeling of being cheated. An immediate example I could bring up, is that of a person with HDDs that total up to a claimed 9.5 TeraBytes of data storage capacity. But as far as usable drive space is concerned, this person only has 8.64 TB of space actually. So does that mean he has effectively been cheated of almost 900 GB, since that is the difference in numbers?

Not at all, the issue at hand is simply the logical result of the SI-prefix versus IEC-prefix problem. In an ideal world, people would use the correct representations, just as you would convert weight in kg (kilograms) to lb (pounds) using a set formula. The example to remember is "KiBiByte" (presumably standing for kilo binary byte which would translate to 1024 Bytes) represented as KiB (to prevent confusion with the "KB" which stands for only 1000 Bytes). Similarly you have representations such as MiB (MeBiByte), GiB (GiBiByte), TiB (TeBiByte) and so on, which is what IEC proposes you should use. Under the IEC system, you won't have the confusing situation outlined above and you (presumably) won't feel cheated if the units were shown to be distinct from one another. For example:
2.5 GB HDD = 2.33 GiB
20 GB HDD = 18.63 GiB
40 GB HDD = 37.25 GiB
80 GB HDD = 74.51 GiB
160 GB HDD = 149.01 GiB
250 GB HDD = 232.83 GiB
500 GB HDD = 465.66 GiB
1 TB HDD = 931.32 GiB
2 TB HDD = 1862.6 GiB

I was hoping Windows 7 (the latest Operating System from Microsoft) would change the prevailing attitude, by providing an option to show file/drive sizes in compatibility format (for those who want the current way) and then options for strict SI format (in the 10^3 form for KB) and for strict IEC form (in the 2^10 format for KiB). Apple's Snow Leopard and Linux Ubuntu 10.04 are already moving in that direction. If you want to convert file/drive sizes yourself without the drudgery of multiple steps, you could use this online calculator instead - http://www.gordonengland.co.uk/conversion/binary.htm

The other option is for the storage industry (such as Seagate, Kingston, etc.) to provide larger drives (:P). When I ask for 2TB, give me the full 2048 GiB capacity instead of only 1863 GiB which feels as though almost 10% of the drive (the difference in number is 185 GiB) has been withheld from the customer.



Addendum:
After all, secondary storage is the only place this sleight-of-hand deception is taking place - a person buying 8GB of RAM is given 8 GiB (or 8192 MB) of memory, fair and square. But then, the Internet bandwidth and networking industry has been benefiting from the SI prefix as well for a while now. This is in addition to their added benefit of expressing speeds in bits (thus being able to roll off large numbers) instead of actual data-transfer speed in Bytes (made of 8 bits) and not taking data packet transmission overhead into account.  To the layman, saying "I have a 256 kbps Internet connection" sounds more appealing than saying "I get transfer speeds of 32 k Bytes/sec" -- 256 is a larger number than 32, right?

No comments:

Post a Comment

No spam please, my eyes are watching you!