ARN

Hadoop creator expects surge in interest to continue

Doug Cutting explains why 'upstart' is making in-roads with Microsoft, Oracle, others

Doug Cutting , the creator of the open-source Hadoop framework that allows enterprises to store and analyze petabytes of unstructured data, led the team that built one of the world's largest Hadoop clusters while he was at Yahoo. The former engineer at Excite, Apple and Xerox PARC is also the developer of Lucene and Nutch, two open-source search engine technologies now being managed by the Apache Foundation. Cutting is now an architect at Cloudera, which sells and supports a commercial version of Hadoop and which this week will host the Hadoop World conference in New York. In an interview, Cutting talked about the reasons for the surging enterprise interest in Hadoop.

How would you describe Hadoop to a CIO or a CFO? Why should enterprises care? At a really simple level it lets you affordably save and process vastly more data than you could before. With more data and the ability to process it, companies can see more, they can learn more, they can do more. [With Hadoop] you can start to do all sorts of analyses that just weren't practical before. You can start to look at patterns over years, over seasons, across demographics. You have enough data to fill in patterns and make predictions and decide, 'How should we price things?' and 'What should we be selling now?' and 'How should we advertise?' It is not only about having data for longer periods but also richer data about any given period, as well.

What are Hive and Pig? Why should enterprises know about these projects? Hive gives you [a way] to query data that is stored in Hadoop. A lot of people are used to using SQL and so, for some applications, it's a very useful tool. Pig is a different language. It is not SQL. It is an imperative data flow language. It is an alternate way to do higher level programming of Hadoop clusters. There is also HBase, if you want to have real time [analysis] as opposed to batch. There is a whole ecosystem of projects that have grown up around Hadoop and that are continuing to grow. Hadoop is the kernel of a distributed operating system and all the other components around the kernel are now arriving on the stage. Pig and Hive are good examples of those kinds of things. Nobody we know of uses just Hadoop. They use several of these other tools on top as well.

To continue reading, register here to become an Insider. You'll get free access to premium content from CIO, Computerworld, CSO, InfoWorld, and Network World. See more Insider content or sign in.

Nominations for the 2012 ARN IT Industry Awards open on Tuesday, June 12.

More about: Apache, Apple, ARC, etwork, Microsoft, Oracle, Xerox, Yahoo
References show all

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
Users posting comments agree to the ARN comments policy.
Login or register to link comments to your user profile, or you may also post a comment without being logged in.
Related Coverage
Related Whitepapers
Latest Stories
Community Comments
Tags: Apple, applications, BI and Analytics, databases, freemium, software, xerox, Yahoo
ARN Directory | Distributors relevant to this article
Aquion , ASI Solutions , Australasian PC Distributors (APCD) , Avnet Technology Solutions , Bluechip Infotech , Brightpoint Australia , Compucon Computers , Dicker Data , Express Data , Express Online , ICT Distribution , Impact Systems Technology , Leader Computers , NewLease , Synnex Australia , Topstar Computer International , XiT Distribution , Xpress I.T.
ARN Directory | Vendors relevant to this article
Oracle
rhs_login_lockGet exclusive access to ARN's news, research and invitation only events.
ARN Distributor Directory
ARN Vendor Directory

iAsset is a channel management ecosystem that automates all major aspects of the entire sales,marketing and service process, including data tracking, integrated learning, knowledge management and product lifecycle management.