By Mike Farrar
Big Data is part technological phenomenon and part economic phenomenon. You may have heard of Moore’s Law, which says that the cost of computers’ processing power halves every two years or so. That’s pretty fast, but it’s nothing compared to the dwindling cost of data storage.
Maybe not as famous as Moore’s Law is Kryder’s Law, which points out that the cost of disk space halves ever year. In other words, over ten years the cost of processing will drop by a factor of roughly 32, while the cost of storage will drop by a factor of 1,000. As late as 1980 a 26 MB hard drive cost $5,000, implying that a 1 TB hard drive would have cost upwards of $200,000. These days you can pick up a 1 TB drive for about fifty bucks.
Much of the Big Data of today is simply a fact of storage becoming wildly cheaper. Cheaper storage lets us store more data. It’s not that the data is necessarily more valuable, nor that it carries more information. In fact, cheaper storage lets us economically store data of less and less value.
In the olden days of computing, storage space was incredibly expensive. Engineers invented all sorts of ways to squeeze more data into less and less space. Packed binary, zoned decimal – you name it – anything to cut costs. The Y2K crisis was all due to engineers having long ago cut costs by omitting the century digits from dates.
None of these strategies are surprising. Firms will only store data that’s worth the cost. Expensive storage allows only the most important, vital data to get saved. If costs drop a little, a firm can afford to store a little more data that isn’t quite as important.
Fast-forward to today. Storage prices have collapsed. In response, firms are storing more data. They’re storing not only the crucial, mission-critical data they always have – they now have the latitude to store data that has the barest fraction of the value of that mission-critical stuff.
Processing costs continue to drop, too, but as we’ve seen, they haven’t dropped as quickly as storage costs. The ratio of processing costs to storage costs tilts more and more strongly in favor of storage costs. Our capacity to store data economically is outstripping our capacity to process it economically. More and more firms are finding that they just can’t keep up. Even as we push back the threshold of Big Data, the undiscovered country beyond the horizon keeps growing, and it grows faster and faster by the day.
How do you cope? How do you extract the most value from these vast and deepening oceans of data?