Disaster avoidance -- because it will happen
- 24 September, 1997 14:20
Parts of this article are excerpted from "How to keep your PC trouble free", which will appear in the November 1997 issue of Australian PC World, on sale from 24 October 1997They don't make disasters like they used to. Once upon a time plagues of locusts, earthquakes, fires and floods were the only things you really had to worry about. In the information age, a voltage spike here, a software glitch there, even a handy utility you download from the Web can all lead to catastrophes of Old Testament proportionsNo matter how paranoid - I mean, careful - you are, it's likely that something, somehow, will go wrong with your computer at some point. There's no guaranteed way to avoid glitches, but there are things you can do to minimise their impact.
Case in point 1
At the St James chemical plant near Baton Rouge, a hairy crash in September 1995 broke through all the barriers that had been erected to protect data.
The network administrator was on holiday (of course), and the usual backups had not been performed for a week. But for an IS team with open eyes, a failure is seized as an opportunity for intelligent failure analysis.
When the IS team finally cleared the brambles away from its recovery system, it figured out what types of problems were causing crashes. Thanks to a new approach to data protection, the plant's Novell network is now protected by a backup plan that IS site coordinator, Brian Durham, calls "almost too good to be true".
At St James, production engineers monitor the process of making styrene, constantly checking for ways to increase yield and decrease cost. Modifications to the styrene recipe go into the plant's computers, along with storage-gobbling graphics files and AutoCAD drawings. Also on the network are the typical files and software found in any office: the spreadsheets and financial software and the mail and post office used by the plant's approximately 130 employees and various outside contractors.
When the network went down last September, it held all this information hostage for two full working days. At the end of it, after analysing what went wrong, the St James team came up with some sobering answers. Though it had been careful to provide hardware backup - using redundant servers and Cheyenne's ArcServe - the system had crashed anyway because of a faulty software write to the SCSI card.
"It wiped out the Novell partitions on the network drive," Durham says.
In the end, it cost them more than $US100,000 to find out something that's particularly true in today's distributed environments: hardware solutions do not protect against software failures.
Something had to be done. Soon after the crash, Durham happened to look at some promotional literature about LANtegrity, a novel backup solution from Network Integrity. Durham examined the brochure and remembers being sceptical simply because it sounded too perfect. But the LANtegrity solution was so inexpensive, it was worth a gamble.
Aside from the licensing fee, the only real outlay was for an automatic tape loader and compressed tapes. The system did require a server, but because Durham had been mirroring each active server with a backup, he had servers to spare.
Unlike traditional backup systems where a complete copy of all files is usually made weekly with incremental backups during the week, LANtegrity only copies files that have changed. (A complete copy of all files is made during installation.) Once the system has an image of a file, it never has to protect it again.
Fault-resilience is built in - if a server does go down, the LANtegrity server steps in for it and automatically becomes that server.
Another key difference is that backup is performed continuously. The LANtegrity server acts as a single user on the network. When a file's time-date stamp or archive flag indicates that it's changed, that file goes into a queue. NetWare's Storage Management Services processes it just like any other user's request and sends it over to the LANtegrity server when traffic permits.
The LANtegrity server keeps these active files in cache, and it also automatically writes them to tape for archiving and off-site backup purposes.
"It's not doing all the data all at once," explains Tim Millunzi, Network Integrity technical support manager. "So you're no longer trying to artificially compress a backup in some fixed length of time."
The LANtegrity system kept a low profile even as Durham added increasingly more servers and storage.
"The entire plant had four gigs available when I came here," Durham says.
"Now we have 100, of which we're using about 60. I took spare equipment and made servers out of all of them. I was able to put three systems online for the price of one."
One reason for this increase was a desire to protect valuable data for (and perhaps from) desktop users.
"We took a lot of the things people were using on workstations that were unprotected, and we put it onto the sys- tem where it would be backed up and protected," Durham says.
One LANtegrity server (a Compaq ProLiant 2000 P 200 with 512Mb of RAM and 22Gb of drive space) protects the main file and application server (an identical ProLiant 2000), two Compaq ProSignias (66MHz 486es with 256Mb of RAM and 22Gb drive spaces), and a ProLiant 4500 (a 66MHz Pentium with 512Mb of RAM and a 17Gb drive space).
The additional servers are used for storing AutoCAD drawings, an electronic document management systems library, an Oracle database, and .AVI files.
It wasn't long before the new system was put to the test: three failures during the month after installation. For the first one, Durham was on holiday (Murphy's laws in IS are anything if not consistent), and the LANtegrity system had to be manually told to step in. Now the system is configured to step in automatically.
"Since then, we have had a total of eight failures," Durham says, "and users typically don't even know that we've had a failure. [LANtegrity] instantly takes over. And because we're running Windows 95 on all our workstations, Windows 95 automatically reconnects by itself as soon as that server becomes visible again."
Case in point 2
When the main server in the Brazil office of Young & Rubicam Advertising crashed late one morning in December 1996, it could have been a catastrophe. Instead, it set in motion some well-detailed plans.
A download of the company's Lotus Notes application, which the ad agency depends on for its creative work, media plans and strategy, was immediately initiated from its New York office via its WAN. By the end of business that day, the Brazil system was operational. All the while, the ad agency's data was well protected by four levels of redundant backup.
"From a standpoint of data, we didn't lose anything," said David Gutierrez, Young & Rubicam's vice president/regional technology officer for the Southern Hemisphere, and the man charged with protecting client data in the increasingly competitive market of Latin America.
When operations go international, so do concerns about security. And global companies don't just worry about server crashes and natural disasters. With worldwide threats such as industrial espionage, they need to consider what Kathleen Harvey, senior information security analyst at Datapro Information Services Group, calls "global risk". She said the key is to create, as Young & Rubicam did, a consistent policy across the entire organisation - not an easily achieved goal.
The key word here is consistency. "If you're an attacker, you'll look for the weakest link," said Jackie Hyde, an information security analyst at Datapro in the UK. Datapro conducted a survey on global security, which included 1342 respondents from the US, Canada, Central and South America, Europe and the Asia-Pacific region.
One weak link can lead to hefty losses, especially with the increasing trend among global companies to consolidate data centres from hundreds worldwide to the double and even single digits.
In support of this coordinated single-policy approach, large disaster recovery service providers such as IBM and Comdisco recently announced global business recovery services. That means companies can put their entire organisation under one umbrella policy rather than contracting on a regional basis.
The most progressive organisations, according to Datapro, are setting up a small central security team at headquarters and appointing a person responsible for security within each business unit around the world. The central team, headed by a corporate security manager, conducts a risk analysis for the entire organisation and then selects a method-ology to use around the world.
A good illustration of a global policy with local controls is found at Telstra, here in Australia.
"We try to work to a collective security model which is adapted to prevailing local conditions and circumstances," said David Harris, Telstra's general manager for corporate security, which has operations and joint ventures spanning Asia, Europe and North America.
"We have people with skill sets in specialised areas like security who form a centralised resource that can be drawn on, but we also need the experience of the country manager."
Key to this approach is communication between the policy makers and the policy implementers, said John Clark, director of Andersen Consulting's information security practice. "I've seen cases firsthand where companies have a central security group in one country, and they distribute these policies to other countries that have not necessarily bought off on those policies," he said.
Another complication is differing regional attitudes about the importance of security. Even at Telstra, awareness of hacker intrusion is far less than in the US, company officials said. This makes it more difficult to get employees to focus on the problem.
But it's clear that security can't remain on the back burner. "We have not yet seen reports of global disasters - you know, transnational computer breakdowns," author Roche said.
"But with the rise of distributed processing and global telecommunications networking resulting in more and more dependence on international telecommunications circuits, we're bound to see this type of thing occur more."
Every parent knows that when the little ones go off to preschool, they're bound to come home with the flu. The same is true of the Internet. As soon as you're sharing data and programs with millions of users online, your computer's chances of coming down with a virus increase a hundredfold. You need an up-to-date antivirus package, and you need to use it correctly. Here's how:
1) Buy an antivirus program. Look for one that operates in the background, checking files as you work. It should be easy to set up, use, and - very importantly - update. Dr Solomon's AntiVirus Toolkit is highly regarded by virtue of its ability to detect a large number of viruses, and remove 89 per cent of them. Symantec's Norton AntiVirus also detects as many viruses but is able to remove only 77 per cent of them. It's simple to use, versatile, and amazingly easy to update - you just click a button and it updates itself over the Net.
2) Run it in the background. This is the best way to use an antivirus program because it stays out of your way.
3) Update virus definitions at least monthly. New viruses are cropping up all the time. If you want to stay clear of them, you need to get regular updates from the folks who made your software.
Dr Solomon's Software
Tel (03) 9690 0455ÊFax (03) 9690 0455
Tel (02) 9850 1000ÊFax (02) 9850 1001
McAfee (The Paradigm Agency)
Tel (02) 9437 5866ÊFax (02) 9439 5166
Solid fix-it software
Windows 95 comes with a handful of serviceable utilities for keeping the hard drive healthy, but you only get what you pay for. To be absolutely sure, users should shell out for a good commercial utility package. There are two top-flight candidates for this job: Symantec's long-established Norton Utilities and Helix's newcomer Nuts & Bolts.
What separates them? Only price, with Nuts & Bolts being the dearer of the two. Helix's package also sports a better user interface, with clear dialogue boxes and excellent tools.
Both packages test and fix a hard drive more thoroughly than does Windows' ScanDisk. For instance, they check the partition table and the boot sector for errors that can render your drive inaccessible - things that ScanDisk ignores.
Symantec's and Helix's defraggers are also faster and considerably safer because to avoid errors they compare the file fragments to the originals as they move them. They're also true 32-bit programs, while ScanDisk and Disk Defragmenter are old-fashioned, 16-bit tools. A 16-bit program is more prone to crashing, and if there's anything you don't want to crash, it's a disk scanner or defragger.
Price: Norton Utilities 2.0 for Windows 95 $129 RRP. Norton AntiVirus 2.0 $89.99SymantecTel (02) 9850 1000ÊFax (02) 9850 1001INFO: www.symantec.comNuts & Bolts solutionsPrice: $149 RRPLight Years AheadTel (02) 9477 6666ÊFax (02) 9477 6655INFO: www.helixsoftware.comA surge suppressorPower corrupts, and electric power corrupts electrically. You can install a good surge suppressor but a sudden jolt of electricity can wipe out the computer. This is especially true for users in an area with frequent electrical storms or a building with ancient wiring. Sure, it's a small risk, but do you want to bet all of your hard work on it? A surge suppressor looks and works like a power strip, but it also protects the devices plugged into it from electrical surges that can fry your hardware. If the surge suppressor is hit by a bigger jolt than it can handle, it will self-destruct, shutting off power to the computer and sacrificing itself for the good of more expensive hardware. Best of all, there's quite a range to stock.
A backup drive
End users guide to backup security
1) Get a tape drive. The easiest backup in the world is one where you click a button and walk away. The cheapest way to do this is to buy a tape backup device with at least the capacity of your hard drive, such as the Iomega Ditto 2Gb or the Ditto Easy 3200. Removable-disk units like Iomega's Zip drive just aren't big enough. And don't even think about floppies.
Internal backup drives are cheaper than external drives but require you to pop the hood and install them; most external drives are slower but simply plug into your PC's parallel port.
2) Test it. Sometimes a tape drive seems to be working fine. Then you try to restore a file (usually the day your big project is due) and realise the drive only looks like it's been backing up your files. To protect yourself, back up and restore a few files when you set up the drive's software. You should run this test about once a month.
3) Set up a schedule. Once you've tested the tape drive, do a full backup. Then at the end of every workday, do an incremental backup, which copies only those files that were created or have changed since the last backup.
Every two weeks, change tapes and do another full backup. With two or three tapes, you'll have a month's worth of data.
4) Store your data off-site. If your computer is stolen or destroyed in a fire, you don't want your data to go with it.
AN UNINTERRUPTIBLE POWER SUPPLY
If work is so critical and time-sensitive that a sudden system crash would be an absolute disaster, install yourself an uninterruptible power supply. This is basically a battery operated 240v supply with a surge suppressor attached. If a power failure occurs, you'll get enough juice to save your work and close things down gracefully.
Not long ago, UPS was an irrelevant issue for Australian businesses, since State- and Commonwealth-operated electricity companies provided a high quality, rarely interrupted source of power. With the growth in privatisation of these resources, both the quality and reliability of electricity have allegedly declined.
According to John Pignolet, from Australian UPS manufacturer PowerTech, uninterruptible power supplies perform filtering and "power conditioning" on the mains power. This is the act of removing surges, sags, spikes, electrical noise and other impurities from the mains power supply. The effects of these phenomena are most visible to humans in the effects they have on incandescent lights (dimming, flickers, flashing).
The effects they have on computers are very serious, and can result in stress on computer components, causing premature hardware failures, or corruption of RAM, cache or hard drive data.
ARN is featuring UPS's in the November 5 issu