Combine open-source software, distributed storage running on low-cost hardware and the World Wide Web, and what do you get? Storage for as little as 15 cents per gigabyte per month, and another 10 to 20 cents for each gigabyte users upload or download.
That's a pretty good deal, especially when Andrew Reichman, an analyst at Forrester Research Inc., estimates it costs US$15 to $25 per gigabyte just to buy the hardware and software needed for secondary (backup or archival) storage, and $50 and up per gigabyte for the primary storage needed for business-critical applications such as stock trading or airline reservations. Neither of these prices take into account ongoing management costs.
But don't throw away your Fibre Channel storage-area network (SAN) yet. These Web-based services lack the performance required for online transactional applications or giant database queries. Then there's the question of security, and how much of their data companies will trust to a node somewhere in the Internet "cloud."
Still, if promising new technologies deliver, they could reduce corporate reliance on the proprietary, higher-priced, storage hardware and software sold by industry giants such as EMC Corp., IBM and Hitachi Data Systems Inc., not to mention a host of smaller players.
The first technology enabling this new storage platform is open-source storage software. (See "Open source software takes the storage stage"). This can be in the form of tools for specific storage functions, such as the Amanda open-source backup and the Darik's Boot and Nuke (DBAN) disk-wiping utility. It also includes network file systems such as Lustre, OpenAFS and SAMBA, which can form the foundations of entire storage infrastructures.
The second technology is distributed grid- or cluster-based storage architectures from start-ups such as Cleversafe Inc. and established services such as MozyPro from Berkeley Data Systems Inc.
The third enabling technology is the use of industry-standard servers and disk drives in lieu high-end storage arrays in these architectures.
Berkeley Data Systems, for example, bases its MozyPro online backup services on its storage clustering and file serving software running on "white box" (unbranded) servers running in the Berkeley Data Systems data center that store data on their internal drives. The price: $4 per month charge for each desktop or server using the service and 50 cents per month for each gigabyte of data stored. Unlike other online storage providers that safeguard customers' data by storing multiple copies, Berkeley's software saves 33 percent of the original data, from which it can restore the complete original if needed. This means it must store only 33 percent more data than a customer sends it, compared to other storage providers who must store 300 percent of the original data, says Vance Checketts, vice president for products.
Cleversafe, a 29-person start-up that is alpha-testing software it will offer to other companies to build open-source, Web-based distributed storage architectures, goes further. Its software uses algorithms to split encrypted data into 11 "slices," which are stored on distributed servers and must be combined to yield any usable information. Using the same algorithms, the software can recreate the original data from any of the original slices. By eliminating the backup, archiving and restoration of entire files, Cleversafe reduces the amount of "extra" data a company must store to protect critical information from the current 300 percent or more of actual data to 130 percent, according to CEO Chris Gladwin He also claims the data slicing is inherently secure because no one storage node contains an entire copy of any file, making it harder to steal or corrupt. Availability is also assured because any five of the 11 nodes can fail, and the software can still recover the data, he says.
The Planet.com Internet Services Inc., a Houston-based hosting firm, is investigating Cleversafe as a way to use older servers to create low-cost storage grids. "Instead of going for three years or four years, with the proper upgrades in disk drives, we could get five to six years of life out of them, and at the same time, offer storage to our customers," says Chairman and CEO Doug Erwin.
Stelios Valavanis, president and founder of Onshore Networks LLC, a Chicago-based networking consultant, thinks that the security, rather than any cost savings, offered by Cleversafe could make it attractive to his clients. Both he and The Planet.com are waiting for Cleversafe to deliver new features later this year, such as further reducing the amount of "extra" code stored on a Cleversafe grid and allowing users and applications to see the grid as a network drive, before deciding how to proceed.
Perhaps the biggest online player is Amazon.com Inc. (see "Amazon.com Unveils Data Storage Service"). Adam Selipsky, vice president of product management and developer relations for Amazon Web Services, says its S3 service is provided by "multiple arrays of storage servers at multiple locations, storing multiple copies" of customers' data. It is aimed at developers who can experiment building innovative applications because of its low cost: 15 cents per month for each gigabyte of data stored, 10 cents for each gigabyte uploaded, and between 13 and 18 cents for each gigabyte downloaded. Selipsky declined to describe the technology used in S3 except to say it includes "multiple arrays of storage servers at multiple locations, storing multiple copies" of data and that Amazon "predominantly uses open-source software" throughout its infrastructure.