One of the classic tropes in science fiction – or, these days, even in television cop shows – is the Magical Database. Every bit of human (or alien) knowledge ever is in this database, and if you just ask it the question in the right way, it uses all that information to come up with all sorts of amazing insights.
These days, we’re starting to get there, with something called “big data” – but there’s also some concern that big data is getting too big and too knowledgeable.
What is Big Data?
The generally accepted definition of Big Data is data that’s too big to work with – which is to some degree a fallacy on two levels. First, as hardware and software improves, the limit of what’s “too big” is constantly increasing. Second, when people talk about Big Data, it typically isn’t in the context of throwing up their hands in futility, but in finding ways to use existing hardware and software technology to manipulate their data.
This is why Big Data is often just the other side of the coin from data analytics, because analytics uses different ways of slicing, dicing, and otherwise picking through vast amounts of data to find the bits that are interesting and relevant to a particular task.
Not that long ago, organizations that wanted to analyze their data were limited to the size of a floppy disk. But not only is the potential size of a database getting bigger – particularly with new technologies such as the cloud – the industry is figuring out new ways to link together multiple databases into what appears as a single whole. “Logical data warehouses bringing together information from multiple sources as needed will replace the single data warehouse model,” predicted Gartner recently when it named Big Data as one of its Top 10 Strategic Technologies for 2012.
But Big Data is becoming an important problem to solve. As more information is collected, and more things are connected to the Internet, everything is starting to become a Big Data problem – even monitoring the network itself.
While Big Data has traditionally fallen under the purview of research and science, by having large databases of imagery, seismic records, and so on, Big Business is getting interested in Big Data as well. In an era when the economy has been doing badly, the notion of making decisions based on actual information and analysis is appealing.
And the results are bearing that out. In a survey of 607 high-level executives commissioned by Capgemini earlier this year, The Deciding Factor: Big Data & Decision Making, participants estimated that, for processes where Big Data analytics has been applied, on average, organizations saw a 26% improvement in performance over the past three years, and expect it to improve by 41% over the next three.
“For our respondents, data is now the fourth factor of production, as essential as land, labour, and capital,” Capgemini noted. “It follows that tomorrow’s winners will be the organisations that succeed in exploiting Big Data, for example by applying advanced predictive analytic techniques in real time.”
“’Big data,’ as it has been dubbed by researchers, has become so valuable that the World Economic Forum, in a report published last year, deemed it a new class of economic asset, like oil,” concurred the Washington Post.
Similarly, a GigaOm Pro survey earlier this year of 304 North American IT decision makers, Deploying big data: 2012 strategies for IT departments, found that 77% said they have a budget allotted for big data projects, and 61% would consider using a service provider for big data projects in the next 12-18 months – though 51% were concerned about security and 34% were concerned about cost.
Other conclusions from the Capgemini survey results included:
- Firms that emphasized decision-making based on data and analytics performed 5-6% better—as measured by output and performance—than firms that rely on intuition and experience for decision-making.
- Two-thirds of survey respondents said that the collection and analysis of data underpins their firm’s business strategy and day-to-day decision-making, particularly in the energy and natural resource, financial services, and healthcare, pharmaceuticals, and biotechnology sectors.
- Just over half of executives surveyed said that management decisions based purely on intuition or experience are increasingly regarded as suspect.
- Two-thirds insisted that management decisions are increasingly based on “hard analytic information.”
- Nine in ten of the executives polled felt that the decisions they’ve made in the past three years would have been better if they’d had all the relevant data to hand.
- The data perceived to add the greatest value was “business activity data” (such as sales, purchases, and costs) and, in consumer goods and retail, point-of-sale data. Other valued data included email and social media records.
At the same time, Big Data is not a panacea; 55% of respondents felt big data management was not viewed strategically at senior levels of their organization. Moreover, two-thirds of executives believed there was not enough of a “big data culture” in their organization, particularly in the manufacturing sector.
Another problem? Lack of a skilled workforce. Noted GigaOm: “A McKinsey report from May 2011 stated that by 2018, the U.S. could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to analyze big data to make effective decisions.”
Federal Government and Big Data
The U.S. government was one of the first peacetime users of the computer (for calculating taxes, naturally), and it developed what became the Internet. Similarly, the U.S. government is taking the lead on big data. A number of efforts this year are intended to enable the U.S. government to improve its work in this area, after the President’s Council of Advisors on Science and Technology December, 2010, report, Designing a Digital Future, recommended that “Every Federal agency needs to have a “big data” strategy.”
In March, the White House unveiled a Big Data initiative that will result in $200 million in research and development investments in big data. Agencies committing to the project include the National Science Foundation, National Institutes of Health, Department of Defense, the Defense Advanced Research Projects Agency, the Department of Energy, and the U.S. Geological Survey. Projects include $73 million in research grants, direct U.S. research investments, and making big data such as the 200-terabyte 1,000 Genomes Project available for free to researchers.
In April, the TechAmerica Foundation, a nonprofit organization that educates industry executives, policy makers and opinion leaders on technological innovation, formed a “Big Data Commission” intended to help the federal government in this area. “The Commission will focus on demystifying the term ‘Big Data,’ understanding the velocity, variety, and complexity of data, and defining the key business outcomes and use cases Big Data will serve,” the group said. “Areas of focus will include: sensor networks; social networks; social media data analysis, astronomy, atmospheric science, genomics, biological, and other complex and/or interdisciplinary scientific research; military surveillance; medical records; photography archives; video archives; and large-scale e-Commerce. Discussions regarding privacy, security, intellectual property, mobile technology, ‘Internet of Things,’ ‘free flow of information,’ and cloud computing are likely. The Commission will explain ‘Big Data,’ the scope of data collection and metadata generation, the ‘art of the possible’ in business analytics and visualization, and help prioritize policy debates.” The group did not say when it expected to produce its report.
How Big is Too Big?
At the same time, however, there are those who are concerned that Big Data is getting too big and knows too much about us.
The field of traffic analysis, for example, looks at communications between various people and makes conclusions based on that information. While it doesn’t look at the actual content of the message, it turns out that one can learn a lot just based on who’s talking to whom. A Massachusetts Institute of Technology research project found, for example, that it could reasonably determine people’s sexual orientation based on the orientations of the people they were talking to.
In addition, people might not realize that information they provide to one source could be combined with information from another source to create whole new sets of information. “By triangulating different sets of data (you are suddenly asking lots of people on LinkedIn for endorsements on you as a worker, and on Foursquare you seem to be checking in at midday near a competitor’s location), people can now conclude things about you (you’re probably interviewing for a job there) that are radically different from either set of public information,” notes Quentin Hardy in the New York Times.
Hardy goes on to quote Danah Boyd, a Microsoft researcher whose seminal work was on the differences between the types of young people who used social media networks such as MySpace and Facebook, and who has been speaking on the privacy implications of Big Data since 2010. She says she has been finding that people are now changing their behavior on social networking sites in an attempt to hide information about themselves (how many people do you know who don’t put their real birthday on Facebook?) and that she expects some level of regulation on how Big Data can be used.
“It terrifies me when those who are passionate about Big Data espouse the right to collect, aggregate, and analyze anything that they can get their hands on,” Boyd wrote in 2010. “In short, if it's accessible, it's fair game. To get here, we've perverted ‘public’ to mean ‘accessible by anyone under any conditions at any time and for any purpose.’ We've stripped content out of context, labeled it data, and justified our actions by the fact that we had access to it in the first place. Alarm bells should be ringing because the cavalier attitudes around accessibility and Big Data raise serious ethical issues. What's at stake is not whether or not something is possible, but what the unintended consequences of doing something are.”