Exploring the data explosion

Exploring the data explosion

To put the growth of data in perspective, a University of California study predicts that after taking approximately 300,000 years for humans to generate 12 exabytes (an exabyte is over 1 million terabytes or a million trillion bytes) of information, the next 12 exabytes will be accumulated in just two and a half years.

And the sources of data are growing as well. Witness the variety of corporate, personal and industrial devices that not only house data but, more importantly, are becoming enabled to hook into back-end data sources to feed and retrieve data.

Meanwhile, only about 20 per cent of the world's data resides in relational databases; the rest is in a combination of flat files, audio, video, prerelational, and unstructured formats - not to mention the mountains of paper-based data just waiting to be digitised.

The result of incorporating all these different data types and sources is that data management is changing into a broader category of managing content that includes all data types, vendors say.

Data, data everywhere

To keep up with the data explosion, database vendors are working to manage more data types, and in some cases they are doing so from within the core database engine.

"We're attempting to get at all of the data out there," says Jeff Jones, senior program manager of the data management group at IBM. "We want to provide data management to the universe of nonrelational data."

Although several companies once competed in the database space, a number of vendors, such as Sybase, have honed in on specific niches such as financial institutions and telecommunications companies. The field has cleared, so to speak, and there are two general approaches to data management moving forward: Oracle's centralised philosophy and IBM's federated data style.

Understanding that very few customers have all their relational data in a single vendor's database, Big Blue's approach is to be capable of managing data residing just about anywhere, including competitors' databases, and to extend the functionality of its flagship database, DB2, to other data types and locations, including competitors' databases.

"Federation is about enabling middleware to reach out and touch data from a variety of sources, then manage it as if it were in one relational database," Jones says.

One of the most important advantages to the federated approach is that companies don't need to migrate data from a variety of sources, such as legacy and nonrelational systems, into a single repository. Migrating small amounts of data is not problematic, but moving a multiterabyte data warehouse is nothing short of Herculean.

Instead, IBM's approach extends the core database engine capabilities to sources outside the database, including non-IBM databases.

Big Blue's fiercest rival, Oracle, on the other hand, is pushing the notion of centralised management, where all of a company's data resides in an Oracle database from which it can be easily managed.

"I guess we have a philosophical disagreement with IBM," says Jeremy Burton, Oracle's senior vice president of products and services marketing. "We think the industry has come out of the distributed computing model."

Burton added that the biggest benefits to centralising data management are its low cost, faster performance and the fact that it provides better information because it is all in one place.

Oracle is not ignoring this need to access data outside the database engine. Companies that have content residing on the Internet, for instance, can use the database's query engine to index that content, Burton says.

Another perspective

Analysts maintain that IBM's federation provides the best of both worlds.

"IBM probably has the better approach because you can either use federation, or you can bring all the data into the database to centrally manage it because it scales," says Peter Urban, a senior analyst at US-based AMR Research.

Urban says that Oracle's scalability within the forthcoming 9i database will improve considerably, particularly with Real Application Clusters, a feature that enables customers to add or subtract servers from a cluster as need be without taking the server farm down.

Analyst firms Dataquest and International Data Corp (IDC) both list IBM and Oracle as the market's top guns; but Microsoft has been making its own push into the enterprise with each new iteration of SQL Server and has won some household-name accounts, such as The software giant is expected to gain market share quickly because it offers a database that is considerably less expensive than either Oracle or DB2, yet easier to use.

Microsoft's approach is almost a blending of both Oracle's with IBM's.

"Our philosophy is that you really need to have centralised management of metadata to effectively search across different sources and types of data," says Steve Murchie, Microsoft's group product manager for SQL Server.

The engine that could

Although the various approaches differ, all the database vendors are moving away from managing just data and the metadata that describes it toward managing a broader category of content, analysts say.

"If users stretch the definition of what data is, it stretches our definition of what the database has to do," says Pat Selinger, a fellow at IBM.

To that end, as data management morphs into a broader category of content management, the vendors say more and more functionality will be packed into the database. The latest technologies being pulled into the database are data mining and analytic capabilities.

"The database vendors are finally starting to get it and are adding functionality to help end users use the data," says industry analyst Howard Dressner, vice president and research director at research group Gartner.

Without nailing down a specific time frame, IBM says it plans to pull its Content Manager software, currently a standalone product, into the database in the future.

Dressner added that incorporating more and more functionality into the database engine generally improves performance and makes the specific technology more effective. In the case of business intelligence functionality, for instance, users can get more insight out of the data when they interact with the database engines.

A market reborn

"As users are exposed to more functionality over the Internet for e-business purposes, you will have intense demand for data that is immediately accessible online. This in turn causes demand for [database] software that manages that data," says Carl Olofson, a program director at IDC.

In fact, in a survey of IT executives by AMR Research, almost half responded that databases are their top investment area in 2001 and will remain the most important through 2002. And Dataquest reports that by 2004 the database market will reach $US12.7 billion - no small pittance.

Analysts also expect the competition to increase between vendors as they all vie to manage the most data, and analysts say there will be a considerable technology overlap.

With such blurry lines, choosing a database management system is not an easy task, and not all users find the picture black-and-white. Ed Scannell contributed to this report.Beyond the horizonAll of the database vendors spend a great deal of energy and time touting the next version that will come to market, typically well before it will be generally available. Behind the scenes, however, at corporate headquarter campuses or in research labs, they are hard at work on features and functionality that won't appear in the immediately forthcoming revision.

In IBM's Silicon Valley labs, IBM Fellow Bruce Lindsay and colleagues are at work on the next generation of database replication technology, Lindsay says. Replication in this case refers to geographically dispersed databases being capable of receiving updates from each other in real time. The first generation of this technology is currently in DB2, but it is not as fast as it should be.

"You trade things like integrity for speed," Lindsay says. "What we're trying to accomplish is a combination of support in the engine and a fairly complex application that works with the database."

IBM is also working on a query optimizer that will learn the optimal route to certain data and be capable of automatically seeking out that path in the future, according to Bernie Schiefer, manager of DB2 performance.

Additionally, IBM is building what it calls a SMART (Self Managing and Resource Tuning) database, designed to reduce the human intervention needed to run and maintain it. "The long-term vision is to at least provide the option that the user not get involved," says Sam Lightstone, a senior technology development manager at IBM.

Sybase iNC.CEO John Chen says his company is headed toward zero-administration and zero-maintenance databases.

"It won't happen overnight, but this whole concept of preventative maintenance is going to start creeping into the database space. I think you will see major headway within [the near future], and not only from Sybase," he says.

Microsoft is working on a temporal database that tracks the time things happen, a time series database that optimises functions occurring on a regular basis, and a unified search mechanism that will provide access to data residing on a variety of back-end sources, according to Steve Murchie, Microsoft's group product manager for SQL Server. He goes on to say that another big push for Microsoft is to help users make sense of the data they have.

"Not only does an organisation need to be able to store and manage data, they need to be able to understand it," he adds. "It's more important to understand it than just to manage it."

Oracle, for its part, was mum on what it has cooking in research and development.

"We like to keep that as close to the vest as we can," says Jeremy Burton, Oracle's senior vice president of products and services marketing, adding that while its pending 9i database will ship mid year, the company is already at work on the next version. In the meantime, 9i focuses on improving performance and scalability by clustering and caching.

IBM also has a new release of DB2 due in the middle of this year, although officials decline to say exactly when or whether it is a point release or a new version worthy of a nice round number. The company says that the next version will continue its push toward incorporating more data sources into the federated approach, as well as furthering scalability.

Microsoft won't say when we can expect the next version of SQL Server.

Analysts say that on the whole such technologies make databases more complete solutions.

"The [database] software business is maturing. It is growing up and coming out of the Wild West era when you had a bunch of companies out there with partial solutions that would offer what they had, and then backfill with whatever stuff they could," says Carl Olofson, a program director at IDC.

Follow Us

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Show Comments