Amassing a pile of big data is not sufficient to spur insight, even if you have clever ways of analyzing that data. The savvy data scientist must also find a way to make all this information tell a story so it can be understandable to others.
Exploiting the innate power of the narrative lies at the heart of a newly updated service offered by business intelligence startup ClearStory Data, which unveiled its offering at the Strata + Hadoop World conference in New York this week.
"As data becomes more complex, and more sources of data are coming together, you have to bind it together in a way where people can follow the sequence of events as they happen," said ClearStory Data CEO and co-founder Sharmila Shahani-Mulligan.
The cloud service -- called Collaborative StoryBoards --presents a series of data visualizations as if they were scenes in a movie storyboard. Each "scene" contains a visualization of data, which can be assembled from multiple sources, and updated in real time. Each scene also contains notes explaining the chart and offers advanced chart functionality such as the ability to play back data from earlier periods in time.
Shahani-Mulligan described Collaborative StoryBoards as a superior alternative to business intelligence dashboards, offering more information and allowing the user to dig more deeply into the data being presented.
"A storyboard is like a connective tissue for live data that may be all over the place," she said. "As you get more and more data, you need that connective tissue, otherwise it is difficult to find out what is happening."
The service can also be a potentially superior alternative to the venerable desktop PowerPoint presentation, in that it can be linked to live sources of data, and can be easily updated by multiple contributors. A storyboard can also be widely shared, with the owner retaining control of the presentation through a single, canonical copy.
Shahani-Mulligan formed ClearStory Data in 2011 to tackle the thorny issue of combining multiple, disparate data sources in a way that they can be easily used. In research with potential enterprise customers, the company found that 74 percent of the organizations want to blend data from more than four data sources. Today, when a team collaborates with multiple data sources they often must resort to sharing information over email or by spreadsheet.
The commercial service, which debuted last year, provided an easy way for users to build charts from numerous sources of data, which can be an arduous process if the data comes in different formats. The product builds on Apache Spark, a data analysis platform that can work with multiple streams of data.
To use the service, the user uploads one or more sources of data -- such as a CSV file, spreadsheet or relational database -- or provides an API link to a live data source. The service then offers a number of ways to combine and visualize the data or do some basic mathematical operations, such as finding averages.
In order to ease the ingestion of data from multiple sources, the company built what it calls a data inference and profiling engine, which can make many basic assumptions about how a new set of data should best be formatted. The service also builds a set of metadata, covering aspects such as time, or location.
The service relies on a proprietary bit of code the company calls a harmonization engine, which looks at the metadata for common points across different data sets, zip codes or time periods, for instance. From this work, it produces recommendations that, given a data set, offers suggestions about what other data sets could be integrated. The resulting graphics, or set of visualizations, can be bookmarked for others to see.
While the original service was aimed for business analysts, the new Collaborative StoryBoards is marketed towards management and senior executives, allowing them to combine multiple contributions into a single presentation.
With storyboarding, a team leader can start a project and multiple contributors can add their own graphs and visualizations. Each member of a team can also comment on the individual stories. When all the data is collected, the work can be posted, used as an extended live dashboard of sorts and shared with others.
Pricing is based on an annual subscription model, including the amount of data being processed and the number of storyboard authors. A typical enterprise deployment may start at about US$50,000 per year
The idea of using stories to interpret large amounts of data has been a theme at the Strata + Hadoop World conference. Mapping startup CartoDB offers a way to visualize geospatially oriented data by overlaying that data on maps. Storytelling has been an essential part of this service, explained CartoDB senior scientist Andrew Hill in a presentation.
"Really, we're trying to help people gain insights and tell stories about their data," Hill said. Storytelling is essential "if you want to make a point, or you want to convince other people about the importance of what you're doing."
Venture capital firms such as Andreessen Horowitz, DAG Ventures, Google Ventures, Khosla Ventures and Kleiner Perkins Caufield & Byers have all invested in ClearStory Data.