Why Downloads Fail – Part 3 – 32bit Math and the 2GB Boundary Limit
In our “Why Downloads Fail” series part 1 & 2 we discussed end-user behavior and browser limitations as reasons why downloads fail. Today we’re going to try and explain one of the most enervating reasons why large downloads fail; the math. Specifically, the limits of math in download computation.
Math used in computation and file storage can limit the size of files that can downloaded successfully. Its a true as it is painful for users and service providers alike.
Modern computers use 32bit or 64bit computing architectures. The word “bit” refers to the “binary digit”, or 0s and 1s, used as the foundation for digital computing. The 32bit or 64bit labels used (carelessly) in common tech parlance today describe the memory that is accessible by CPUs (central processing units) and, just as often, describe attributes of software written to take advantage of the addressable memory made available by the CPU architecture. For example; We have 64bit CPUs from Intel and 64bit versions of Solaris 10 and RedHat Linux.
Here comes the Math part. The “bit” is the limit of the computers ability to address and store information. So, a machine based on 32bit architecture can only represent numbers for processing up to a limit of 2^32, or 4,294,967,295.
A quick glance suggests that this means we can have a 4GB limit for files used in 32bit architectures. This is partially true. However, a “number” in this context is really an integer. Integers are (we all remember from 6th grade math right?) a positive or negative whole number including zero. Thus, even though in 32bit systems we use 2^32 or 4,294,967,295 bits to play with, we need to include the representation of negative integers when doing computations. The pool of integers we get to use is −2,147,483,648 to +2,147,483,647. This is the root of the 2GB boundary limit.
The difference between the set of integers between zero and 4,294,967,294 and −2,147,483,648 to +2,147,483,647 is known as unsigned vs. signed integers. Software developers use different software languages and libraries of tools and methods, built by others, to help create the foundation for features and functions in the software we use every day. Inherent in these tools are assumptions about how computations will be made. Consequently, some of the software applications still in use today have built-in limitations on how they can read, write, and address files where the numbers (integers) used to represent the sizes are larger than the limits of 32bit math.
Think about this for a second. How long ago was it that 250MB hard drives were the biggest you could buy to put in a laptop? It wasn’t long ago that 2GB files might have been considered “way-out-there”, “over-the-horizon” type file sizes and therefore the need for developers to consider this limit were few and far-between.
Practical manifestations of this calculation limit still show up for end-users every day.
1. They can’t copy or save files over 2GB.
2. They may have a 64bit (integer pool is now 2^64) CPU but their software has been built and compiled on 32bit systems, so they bump into 2GB boundaries.
3. They work in environments where some systems can handle large files and others do not, so cross-system transfers fail without an obvious reason.
All very real, and all very frustrating. As an experiment, try talking to a customer who has paid $500 for a piece of software they need right-now, and explaining that their browser/os/file-system isn’t compatible with large files. Now try it when they just got done watching a streaming IronMan movie on the same machine. Not fun!
There are ways to work-around this limit (see Long vs Float), some obvious, and others not-so-obvious. Large files are still being downloaded, copied and transferred in great quantities. Specialized tools and architectures are making sure that you can distribute your digital assets, however large they may be, quickly and with high success rates. But we are still seeing this as one of the reasons downloads fail to complete successfully.
Sometimes downloads fail because users stop them. Sometimes downloads fail because of browser bugs. Sometimes downloads fail because the math (signed integers) was never going to let them succeed.
Useful links
http://en.wikipedia.org/wiki/Integral_data_type
http://en.wikipedia.org/wiki/Signed_number_representations
http://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits
http://en.wikipedia.org/wiki/Large_file_support
(acknowledgement: this a much discussed topic in tech circles, and we’ve read everything we can find on the subject. My thanks go out to everyone who has attempted to explain this issue, in any fashion, to a non-technical audience. Its not easy. Hopefully this treatment beneficially adds to the ongoing discourse).