The amount of data being generated on a daily basis by people and machines (the “digital universe”) is expanding at a rapid rate, providing a mass of Big Data potential, according to a new IDC study sponsored by EMC. The study concludes that 2.8 zettabytes (ZB) of data will be created and replicated this year. (1 ZB = 1021 bytes.) Of all that data, the researchers suggest that slightly less than one-quarter would be useful were it to be tagged and analyzed. But, just 3% is tagged and only 0.5% is analyzed, leaving what the study dubs “the untapped big data gap.”
The types of data that the study suggests are useful for analysis include surveillance footage, embedded and medical devices, entertainment and social media, and consumer images. Of course, there’s a lot of data to keep up with: the amount of data in the digital universe has doubled in the past 2 years alone. By 2020, IDC predicts that the digital universe will hold an amazing 40ZB of data, of which 33% will be useful. To put that 40ZB figure in perspective, the study offers some comparative numbers:
- “40ZB is equal to 57 times the amount of all the grains of sand on all the beaches on earth”; and
- Saving all 40ZB onto today’s Blu-ray discs would mean that “the weight of those discs (without any sleeves or cases) would be the same as 424 Nimitz-class aircraft carriers.”
In other words, 40ZB is a lot of data – equivalent to 5,247 GB per person worldwide.
US Responsible for One-Third of the Digital Universe
The “Digital Universe” study finds that mature markets continue to account for the largest share of digital data, but that the geographic composition of the digital universe is rapidly changing. Currently, the US accounts for the largest share – 32% – of the digital universe, with Western Europe (19%), China (13%), and India (4%) accounting for another 36%, and the rest of the world the remaining 32%.
But while emerging markets accounted for only 23% of the digital universe in 2010, that share has already grown to 36% this year, and is predicted to reach 62% in 2020. In fact, that year, China alone will generate 22% of the world’s data.
- Between 2012 and 2020, the size of the digital universe will double every 2 years.
- Machine-generated data will be a major driver of that growth, accounting for 40% of the digital universe in 2020, up from 11% this year.
- The volume of data stored in the digital universe about individual users is greater than the amount of data that these users actually create themselves.”
- Slightly more than one-third of the data in today’s digital universe requires some form of protection, but less than 20% actually has the necessary protections, presenting a major security threat.