Few tech industry buzzwords have gotten as vigorous a workout as 'big data', but while the hype remains plentiful, it is starting to give way to real-life successes as well as formal ways companies can develop big data strategies, according to a number of IDC analysts.
"We're at this period now where there's certainly a lot of hype, a lot of promise," said IDC analyst, Dan Vesset, during a panel session at the research firm's Directions 2013 conference in Boston in March. "The question is what's reality, and what can and should companies do in the near and short term?"
IDC, which is owned by IDG News Service's parent company International Data Group, defines big data as information-driven, tactical decision-making resulting in new economic value, derived from large data volumes from a wide variety of sources, Vesset said.
The first point is crucial, according to Vesset.
"We can be installing all kinds of technology but the important thing is to improve the decision-making," Vesset said. "You can have the greatest Hadoop deployment in the world but it's not going to be enough," he added, referring to the popular open-source data-processing framework that is synonymous with the big data movement.
Progressive is one example of a company using big data projects to transform their business, said IDC analyst, Michael Versace, who presented with Vesset. Using detailed information about customers' driving habits, the insurer has created a usage-based model that defines a policy's price down to the individual, he said.
Progressive gets the data through a device a driver plugs into a car's diagnostic port, according to its website. It can track how often customers slam on their brakes, drive late at night and other possibly risky driving habits. If the data shows a customer is driving safely, they can get significant discounts on their insurance.
Meanwhile, a number of challenges stand in the way of companies wishing to launch successful big data projects of their own, according to Vesset. These include the dilemma of what business data should be stored and what should be discarded, the cost of acquiring needed technologies and a lack of IT professionals with the necessary skills.
The last problem will likely become more severe in the near future, according to Versace. "That gene pool is growing pretty shallow right now."
Some misperceptions also persist with regard to big data, Vesset said. For one thing, "it's not all about social media," he said. "That's one of the big fallacies out there, that big data is all about clickstream analysis."
Nor is Hadoop a sole solution, since it's oriented to large-scale batch processing and not real-time monitoring, such as a rail company tracking the performance of specific components in their train cars as they deliver shipments, he said.
The analyst firm has created a 'maturity model' for big data, which Vesset and Versace described in their session. It spans five subject areas: Data, people, process, technology and intent; and five stages of deployment: ad hoc, opportunistic, repeatable, managed and optimised.
A first step for companies just starting out with big data is to identify opportunities to use their existing technology and data in new ways, evaluate public cloud and open-source options, and start experimenting with proof-of-concept exercises and prototypes, according to the analysts.
Over the following one to two years, those companies should look to use early successes with big data projects to justify funding for larger efforts. In the same period, it would be wise to also look for sponsors within business departments who will champion big data projects, they said.
Some 80 per cent of what can be considered big data is unstructured or semi-structured information, said IDC analyst, David Schubmehl, during another presentation at the event. These sources could include anything from clickstream data to patent records, research archives and even video, he said.
This diversity will give rise to what IDC is calling unified information access technology, evidenced by products such as Oracle's Endeca and IBM's Vivisimo, as well as specialised vendors like Attivio.
Big data's challenges will also continue influencing the database industry, with growing prominence for technologies like graph and in-memory database platforms, said IDC analyst Carl Olofson during the presentation.
Traditional relational databases will also change, with capabilities expanding "to the point Ted Codd wouldn't recognise them," Olofson said, referring to the 'father' of the relational model.