Disaster Recovery: The Time Is Now
- By Dian Schaffhauser
- 10/21/05
On the heels of Katrina, it's time to get a top-flight disaster recovery plan
into place. Here's how to catch up.
AS THE WATERS RECEDE, Tulane technologists bring systems back on line and look
for new ways to protect, secure, and prepare them for crises.
Within 48 hours of Hurricane Katrina battering the Southeast, only two of the
many four-year universities in New Orleans had Web sites up. The others showed
“failure to connect” messages to students, parents, family members,
friends, faculty, staff, or others trying to learn about the state of the schools
in which they had a vested interest.
Two weeks later, the silent schools showed signs of life online, albeit with
rough edges: links that went nowhere and highly visible content that was dated.
On its home page, one Catholic university mourned the passing of the Pope and
shared the schedule for April 2005 memorial masses. Another university, whose
buildings sustained flood damage up to the second floor, advertised on its home
page for data entry operators to aid in hurricane recovery efforts from its temporary
headquarters in Baton Rouge.
And what of the two schools that remained viable in some form almost throughout
the duration of the storm? One, Tulane University (LA), ran
a blog updated daily by President Scott Cowen, keeping visitors abreast of the
safety of faculty, staff, and students and of conditions at its campuses. It
was hosted on an emergency Web site, which the university’s normal URL
redirected site visitors to. After the disaster, Tulane moved temporary operations
to Houston and held online chats with Cowen so that he could respond to fears
and questions from the public (no, emergency power hasn’t been maintained
at the medical school, which means research has been disrupted; and, yes, formal
rush for sororities will take place—in the springtime). The university
has also provided a continuous and unbroken chain of updates via its Web site.
The New Orleans campus of Louisiana State University (with its main campus
in Baton Rouge) also remained live online through most of the storm onslaught
and disastrous aftermath. The institution posted an emergency notice to its
Health Sciences Center Web site, lsuhsc.edu, stating that the academic campus
for LSUHSC in downtown New Orleans had suspended operations and that “hospital
personnel should follow Code Gray procedures and instructions from hospital
leadership.” It also advised Blackberry (
www.blackberry.com)
subscribers that they could use their devices as a means of e-mail communication,
“in the event we lose our Exchange servers during Hurricane Katrina.”
When Is a Disaster Too Catastrophic?
Although many have argued that the proportions of the Katrina disaster (a Category
4/5 hurricane accompanied by no end of additional and devastating complications)
went far beyond a scale that can be planned for, others insist that the tools
of disaster recovery planning (DRP) aren’t necessarily prohibitive in
terms of dollar or time investment, nor do they need to be highly complex.
Keeping
up a blog on an emergency Web site d'esn’t require complicated technology,
for instance. And while populating a blog is a far cry from rebuilding research
data that has been destroyed, on the heels of any disaster communication becomes
vital: It is the link that holds survivors and supporters together while they
take emotional and physical stock. In such times, reliable information becomes
gold.
Days after Katrina’s passing, new LSU CIO Brian Voss wrote to Campus
Technology, “Yesterday, I ‘suspended’ normal IT support activities
(with the exception of our administrative systems and computing functions, which
‘run’ the university).” An absence of “normal users”
on campus allowed him to redeploy his staff to help state and federal recovery
agencies set up temporary phone and network operations on campus, and helped
his staff to create “quick and dirty” applications to aid in information
gathering and sharing. The LSU crew also helped colleagues at another school
rebuild their Web site and start to recover e-mail services.
As Voss now observes, “All this means that, as a new CIO with lots of
efforts underway to build the future of IT at LSU, I suddenly found that we
were pulled into a more pressing, vital role to support the relief and response
efforts on campus.” What lessons for CIOs (even for new CIOs) might be
evident here?
“I suppose there are two,” says Voss. “First, if disaster
strikes, be prepared to see your mission make a sudden shift, and hope that
you’ve a capable staff willing to be flexible and adapt to the change—I
am fortunate to have such a staff.” (CIOs might also hire with an eye
to such flexibility.) “Second,” adds Voss, “disaster-recovery
planning efforts in higher ed IT shops need to take on greater urgency in terms
of resources devoted to their planning and preparation. The aftermath of a crisis
is no time to try to recover ad hoc, especially when one considers that ramifications
of a disaster can be far broader than the traditional ‘data center hit
by a tornado.’”
In fact, besides hurricanes, in the last 20 years universities and colleges
have faced deadly ice storms, earthquakes that leveled whole campus buildings,
terrorist attacks, train derailments that required campus evacuation, animal
activists taking over a research building, the potential presence of anthrax
and SARS, and even the more seemingly mundane computer worms and viruses that
haunt any network.
The fact of the matter is: Disasters encompass people and services. According
to Ross Armstrong, senior research analyst for the Info-Tech Research Group
(www.infotech.com), in
the face of disaster, the role of those in charge of IT is to figure out how
to get regular users— including the IT department itself—back up
and running again. That challenge becomes an exercise in risk analysis, he points
out, and the first question a CIO should ask himself is: What are the potential
hazards we face? The second question is: What is to be brought up first, second,
third, and so forth, and in what order?
Who’s ready, who’s not. Yet what surprises Armstrong
is the number of organizations in general—regardless of industry—with
either no plan in place, or merely the intention to implement such a plan over
the next two to three years. Higher education is not immune to this ill: According
to Info-Tech’s DRP in the Education Sector 2005 Benchmarking Report,
a surprising 47 percent of universities and colleges currently have no disaster
recovery plan in place. These institutions, do, however, acknowledge the importance
of having such a plan: 68 percent of them say they are currently in the process
of planning. Unfortunately, it may be some time before many college and university
DRPs see implementation: 32 percent of schools with no current plan concede
it may be up to three years before they have one in place, according to the
report, and that may be because security and end user support are higher IT
priorities than disaster recovery—just a notch above the categories of
network/LAN/WAN, Web site and IT governance. But the good news is that, according
to Info-Tech, among the 53 percent of schools currently with a plan in force,
a whopping 86 percent are improving that plan. (For more DRP data, see “Lessons
in Disaster,” page 14.)
The money factor. Another challenge unique to the university
setting is the fact that faculties and departments receive different levels
of funding, says Info-Tech senior research analyst Curtis Gittens, adding that
this makes a comprehensive campuswide disaster recovery plan “difficult
to implement.”
“Whereas large corporations can focus their resources on implementing
a comprehensive and cohesive strategy,” he says, “universities just
don’t have that luxury. This means that to create an effective disaster
recovery planning strategy, universities should implement a flexible, heterogeneous
[plan] that matches the resources available to the individual units. Well-funded
faculties should implement the complete [plan]; less well-funded faculties should
implement the lower-cost version.”
6 steps to develop Your Disaster Recovery Plan
SO YOU’RE ONE OF THE LAGGARD SCHOOLS.
How do you build your disaster recovery plan pronto? What should it include? According to Ross Armstrong, senior research analyst for Info-Tech Research Group (www.infotech.com), there are some basic steps you need to take:
- Build your team. Who should it include?
- The CIO, IT director, or IT manager; wh'ever is in charge of IT operations.
- The facilities manager or office manager—wh'ever is most conversant with how the facility is physically laid out.
- A network administrator or database administrator—wh'ever has his or her finger on the pulse of critical data.
- One administrative and one technical representative from each distinct unit—for example, the dean of the faculty or department chair, plus a senior IT manager from that group.
Info-Tech advises its higher education clients to put a high-level IT administrator in charge of the disaster recovery plan, because, according to Info-Tech Senior Research Analyst Curtis Gittens, that individual “will be able to effectively implement the technology to support business continuity in the event of a disaster.”
- Conduct an operational analysis. Analyze inventory, physical security, data access, and all other critical services. Review the currency of passwords for user accounts, how complete data storage is, and what the status of backup procedures is.
- Perform risk and business-impact analysis. Here’s where you look at critical IT assets, and for each one “determine how long it can go down without the impact being felt by the organization; its bottom line suffering because of that loss,” explains Armstrong. “Whichever of those timeframes is the shortest, that creates your prioritized list.”
- Document all systems and applications. This entails pulling up your network topology, figuring out IP and MAC addresses, taking inventory of all printers and fax machines, creating a vendor contact list, making note of all phone lines, and so on. This stage, says Armstrong, is where most organizations really “start to falter, because this work is time-consuming.” It’s also critical that IT managers get senior administrators to buy into this phase of their work. If the campus is geographically dispersed, investment in an automated inventory tool might be worthwhile to search the network for nodes in hidden corners.
- Create an emergency contact list and distribute it in digital and physical form to administrators, department heads and, of course, every member of the disaster recovery planning team. It shouldn’t be publicly available, but it should be updated regularly. Participants should have access to a copy at home and in the office.
- Hold an annual dress rehearsal for key parts of the plan—everything from pulling the plug on a server (after its data has been replicated elsewhere) to designating a key building “destroyed,” and then working through the steps of recovery.
“Unless something’s likely to happen, senior management is not always willing to invest money in the things that don’t bring that immediate return on investment,” explains Armstrong. “It’s going to be up to the IT professionals to ensure that senior management is routinely aware of what can happen if a disaster recovery plan is not in place and a disaster d'es in fact occur.”
Hard Choices
Ben Whorton is a network administrator at Birmingham-Southern College
in Alabama. He and a cohort staff comprise the Network Services Group, which
is in charge of e-mail and server administration, and network security for the
school. (Another IT team handles administrative computing; a third team takes
care of client services.)
While the school as a whole has a disaster plan for the physical campus, and
administrative computing has a disaster recovery plan in place for its pieces
of IT, until recently, the network services group didn’t have its own
plan—that is, aside from a disk backup spanning a couple of weeks—and
that backup resided on a single server. “If something were to happen to
that box, then we had no backup,” says Whorton.
Bad, worse, worst. When he was appointed, he relates, an early
goal was to put a plan in place that included redundant backups: to replace
all single-drive servers with redundant servers equipped with RAID controllers.
Still, by 2004 when Hurricane Ivan was rolling in, the plan hadn’t yet
been implemented. At that time, knowing the storm would soon hit, Whorton saw
only three options. Best case, he remembers, “was leaving our systems
up in the hope that we didn’t lose power.” The next was to “leave
our systems on and if power went out, pray that there was no dirty power that
could fry our systems.” Neither approach was palatable to him. “We
stood to lose a half-million dollars’ worth of equipment— plus all
the data, which you can’t really put a price on.”
A third option—the worst-case scenario— was to go into the server
rooms and proactively power-down all systems. Then, once power was restored,
the IT people would bring them back online. Yet, administrators didn’t
want to shut down systems, since that was how they expected to be able to contact
parents and let them know what was happening in Birmingham with the students.
“We had a quandary,” Whorton admits. In the end, the group shut
down operations for about 12 hours, until the worst of the storm had passed.
Ironically, the college ended up losing power for only about 30 minutes.
Clarifying priorities; weighing resources. That experience
led the team to come up with its disaster scenario. “Since the college
lost power for only a short amount of time, we realized we needed a backup solution
to keep our systems up so parents could visit our Web site to find out what
the college was doing to protect their students, provide a means to communicate
with their students here at college via e-mail, and keep our phone systems up
and running,” says Whorton. “We wanted to make sure there were as
many opportunities as possible for parents and students to communicate.”
As a result, the team has made two primary changes: First, it has decentralized
many of the servers so that, for example, if one building suffers water damage
and floods a computer room, most of that data will be replicated elsewhere on
campus. Whorton admits that even that plan is not failsafe. “If we have
the unfortunate luck of having two buildings that house the equipment suffer
the same flood damage, we’re in trouble. But that’s the chance we
take, given our limited resources.”
Second, the team researched and proposed three backup power solutions. The
least expensive (about $12,000) required purchasing 26 uninterruptible power
supplies (UPSs) for key servers; this solution would buy 30 minutes of uptime
in a power outage, and would last two to three years. The second solution, (around
$25,000) was a rack-mounted UPS. A third solution (most expensive, at about
$35,000) involved a gas-powered generator plus UPSs. Challenges included the
need for the space to add a concrete pad to mount the external generator, and
in the long-term, increased maintenance capability. As it turned out, the administration
went for the mid-priced solution, and Whorton’s group purchased APC equipment
that would keep operations running in their current state for five to six hours.
Reality test. As it happened, the school had a chance to try
out the new solution when Hurricane Dennis swept through in July 2005.
Recalls
Whorton, “The storm was supposed to hit on a Sunday and we started preparations
on that Thursday. By end of day Friday, we were ready to go. By Saturday morning,
Dennis was hitting the coast, and I said, ‘OK, here’s our plan:
Everyone has contact numbers. Let’s just stay in contact.’ If we
needed to make an adjustment, our security team on campus volunteered to pick
us up and take us to different buildings.” In the end, Whorton says, “We
didn’t have to. I stayed up until midnight on Sunday, and once the eye
[of the hurricane] passed, I knew the worst was over. We had suffered no damage
or outages.”
Rather than trying to keep everything up, Whorton’s group focused on
keeping communication links up: the phone, Web, and e-mail. Since no one on
campus was working through the storm, the school simply shut down its ERP system
and storage facilities.
6 elements Your Disaster Recovery Plan Should Include
A QUICK GOOGLE SEARCH WILL REVEAL publicly available plans for various universities, but Info-Tech recommends these sections:
- Plan for regular backup. Although tape is the typical backup method, some schools have begun using disk-to-disk backup for time-critical data.
- Plan for archiving. This involves creating a system for making copies of your current data, and storing those copies elsewhere. Since disasters frequently have a geographic aspect to them, archiving may be your best chance of recovery from a disaster if the on-site copy isn’t available.
- Thorough risk analysis. This encompasses prioritizing business needs and figuring out what the major risks are for your school. A university with on-campus dorms away from the coast but next to a major airport will derive a different set of disaster scenarios than those of a commuter college that resides on or near an earthquake fault line.
Current Info-Tech information also recommends that you examine common points between your primary site and the archiving or recovery site: “You must make sure that the two sites are geographically dispersed so they won’t be prone to the same risk of natural calamities, utility infrastructure mishap, or civil unrest.”
- Business impact analysis. Here’s where you explicitly determine the bottomline impact of functions and services, to prioritize the order in which they must be brought back up after an outage.
- Crisis management. This brings in outside agencies—police, fire, and emergency— to help establish safety procedures.
- Recovery planning. When a disaster strikes, this lays out what role individuals play, how the plan should roll out, and what technologies and services should be called on in service of the plan.
Looking at Unique Concerns
Lori Franz, a management professor in the College of Business at the University
of Missouri-Columbia, has explored the topic of disaster recovery planning
for a presentation she delivered at Educause2003 (www.educause.edu).
According to Franz, one of the DRP challenges unique to a university setting
is the fact that “everyone likes to have control over his or her environment.”
Centralize, centralize. At MU-Columbia, she says, “We’ve
done a lot of work to try to get all of our decentralized file servers into
the facility where there is backup. We try to get everyone to realize that it
isn’t safe just to have these departmentally- owned file servers where
everything is contained in a single place, and they’re not taking care
to back up, and tapes aren’t being stored in a different location.”
Changing such a mindset, however, means making participants aware of the value
of a centralized computer effort—and “creating trust in the computing
organization.”
Public or private? Importantly, her school’s disaster
recovery plan must take into account a unique situation, says Franz: “We
have the largest university run nuclear reactor in the country on this campus.
When we’re thinking about things, we think very broadly about what could
happen.” And, understandably, the institution has struggled with how public
its disaster recovery plans—or any related campus details—should
be. Franz recounts how at one time, the university had placed online maps of
every building on campus, complete with drill-down layers. “You could
see the plans and schematics for every building. You could access them. Now,
we don’t have that,” she says.
Yet, a key element of the plan that has been made public is the existence of
an emergency Web site. “We’re sitting here with 24,000 students,”
she says. “How do you communicate to the outside if something happens
on campus? The Web site is available to everyone—it’s where people
will go for the news if something happens.”
More Mundane Concerns
At the other end of the spectrum, opposite problems of biblical proportion,
are the daily worries of finding the time to keep servers running optimally
and clean of potentially devastating viruses and worms. Michael Kearns, systems
administrator at Syracuse University (NY), knew a better plan
for disaster recovery was necessary after the campus was hit by the MyDoom virus
in 2003, sending IT personnel scurrying to patch and rebuild systems. Although,
fortunately, vital data wasn’t lost, Kearns worried about damage to the
functions performed by the school’s energy servers which automate climate
control and electrical operations campuswide: He knew that if the wrong energy
server had gone down during a winter blast, vital research work could have been
lost due to extreme temperature changes.
Real-time recovery. The solution he and his team settled on
focused on maintaining real-time, disk-based backups using Symantec LiveState
Recovery software (www.symantec.com).
The advantage of going to disk is that recovery can be quick. Kearns says this
has made the IT team more willing to apply security patches; the solution allows
a “snapshot image” of the server to be recalled if a patch job d'esn’t
go as planned. (Tape is used for archival purposes.) He reports that the school
had the chance to test out the system in real time when a domain controller
upgrade went awry and the server became inoperable. A quick recovery through
LiveState had the system back up and running in less than 30 minutes.
Mining
a Pan-Departmental Team
Of course, disaster recovery g'es beyond protecting data on servers. At the
Rhode Island School of Design, Judy Tanzi, one of the school’s
two telecommunications experts, participates in her university’s DRP committee
which was formed two years ago (in the wake of 9/11), and also includes representatives
from Facilities Management, Housing, Health Services, Residential Life, Environmental
Health and Safety, and Fire and Public Safety. All 12 members of the group have
gone through Community Emergency Response Team (CERT) training, which educates
individuals about disaster preparedness for hazards, and teaches basic disaster
response skills such as fire safety, light search and rescue, team organization,
and disaster medical operations.
Drilling down to essentials. As the committee members developed
more of a “world view” about disaster preparedness and recovery,
they had to sort through what RISD’s plan should encompass.
“That was very difficult,” Tanzi recalls, “because there
were so many thoughts that each individual had from his or her side of the house.
So, we just put all ideas out there, and kept breaking down the list as to what
were the 10 most important things we needed to do.”
Student safety came first; there was no dispute about it.
“We started from that point,” Tanzi says, pointing to the questions
the committee members faced. “What happens in a disaster? How do we get
to the students? What do we do?” After many meetings, she says, a Top
10 list was formed. Then the group had to consider what types of disasters were
likely to happen. One challenge for the school is that the campus is spread
throughout the city of Providence, which meant that committee members also had
to come to agreement on where the main point of contact would be, in the event
of a disaster. Tanzi reports that the committee has met monthly and often includes
outside representatives (such as those from the police and fire departments),
in order to sort through what role the school would play in disaster scenarios.
Informing parents. After student safety, says Tanzi, RISD’s
next step is to put together a plan that can be distributed to parents, so they
will know what steps would take place if an emergency were to occur on campus,
as well as which means to use in order to communicate with the school. From
there, the committee will start pulling in other campus participants to act
as emergency leaders.
Re-examining telecom opportunities. The RISD taskforce has
also recognized that IT and telecom could work together in multiple ways. Now,
to enable nonobstructed communications, says Tanzi, 10 campus employees have
cell phones with priority status. That means that in a disaster, when everybody
hits their cell phones, the calls made by those 10 participants will go out
on the same frequency as police and fire calls. Tanzi has also set up certain
phones on campus (running “plain old” copper wire) so that in the
event of an emergency that disrupts regular phone service, the phones can host
the redirected 800 number phone service given out to parents in the school’s
disaster plan.
Tanzi isn’t new to considering the vagaries of disasters. In 2000, she
pushed her school to implement a 911 system that would ensure that calls made
anywhere on campus could be identified by location. The problem with many campus
phone systems is that they use private branch exchange, which feeds all calls
through the same billing number. When an emergency is reported to 911, there’s
no way to identify location from that phone number. Says Tanzi: “[The
911 people] would call back and say, ‘We had a 911 call.’ But the
operator would have no clue where the call came from. So I said, ‘We have
students. It’s too important to know where that call comes from; sometimes
seconds are so important in a life.”
Tanzi chose the Telident 911 enterprise solution from Teltronics (www.teltronics.com),
a turnkey solution that places a PC in the school’s switch room, running
a database application that identifies a specific phone with its location. When
a 911 call occurs on the PBX, the database feeds location information to the
emergency dispatch centers of local police and fire departments, as well as
to emergency personnel on campus. The system can also perform e-mail notification,
and send text to cell phones. Tanzi says she feeds the initial records into
the database herself, which is then transmitted to Verizon (www.verizon.com),
which populates the data in the Providence 911 system. (Many states have already
passed regulations stipulating that all new PBX installations implement a comparable
location identification system, but Rhode Island isn’t one of them. The
Federal Communications Commission has also considered national regulation in
this area off and on for many years, but nothing has been finalized.)
Finally—data. RISD’s committee is also working
with other colleges to locate their data off-site. It would be an informal arrangement,
says Tanzi, and the plan is only in the “talking stages” at this
point. “We’re on one side of the state, and we would probably work
with a college on the other side of the river,” she explains. “If
the city experienced a disaster, but the other side of the state was still up
and running, we’d be able to work from that facility.”
Data Decisions
For some, data is concern number one in any DRP. Keenan Baker, a storage specialist
with technology products and services provider CDW-G (
www.cdwg.com),
considers the focus on data protection the starting point of any disaster recovery
plan. At the most basic level, he says, clients decide to look for a good backup
scheme (typically, tape) where they can easily back up data on a daily basis.
More sophisticated planning calls for failover operations, where an institution
can “flip” or “switch over” operations from one campus
to another, in the event that the systems in the first one are hindered.
Recovery time/recovery point. Developing a plan at an institution
of higher education requires mapping out two objectives, he says: recovery time
(RTO) and recovery point (RPO).
“The recovery time objective is the time
needed to recover from a disaster: how long you can afford to be without your
systems,” says Baker. The recovery point objective “describes the
age of the data you want the ability to restore, in the event of a disaster.
For example, if your RPO is six hours, you want to be able to restore systems
back to the state they were in as of no longer than six hours ago. To achieve
this, you need to be making backups or other data copies at least every six
hours. Any data created or modified inside your recovery point objective will
be either lost or must be recreated during a recovery. If your RPO is that no
data is lost, synchronous remote copy solutions are your only choice.”
If a school, for example, is backing up only to tape, he says, administrators
might expect to have their data up and running within a day. If the facility
is running replication, however, recovery can happen in minutes.
Plans for the ‘plans’—and practice. Still,
never underestimate the ability of a disaster to thwart the best-laid DRP plans.
Baker recounts that following a talk he gave at a conference, one audience member
confided that his school’s disaster recovery plan was kept online. When
disaster struck, wiping out all power sources, nobody could gain access to the
plan. Now, physical plans are kept in the homes of all vital personnel.
Another mistake common among higher ed institutions: not rehearsing sufficiently.
The plan has to be second nature for the participants, says Baker. In addition,
run-throughs and simulations give the IT people a chance to check out how well
recovery will work. If the tape media has flaws, for instance, they learn that
up front—instead of when it’s too late.
The Latest in Backup
KEENAN BAKER at CDW-G (www.cdwg) says that when it comes to backing up vital data, a couple of trends in backup technologies can help schools keep costs down and performance up.
Tape has come a long way since the days when it could hold only between 20 and 40GB of data. Linear Tape-Open (LTO) is a computer-storage magnetic-tape format first introduced in 1999 by companies such as HP (www.hp.com) and IBM (www.ibm.com). LTO3, which came out in 2004, can hold 800GB of compressed data and transfer it at up to 160 Mbps. The tapes cost about $120 each, which makes this mode one of the most affordable means to retain data. One aspect of tape backup hasn’t changed, however: The predicted archival life is still 30 years. A tape drive to accommodate LTO-3 tapes starts at about $5,000.
Replication. Some schools have moved to a replication model of backup. Replication itself has been around for a number of years; but technology improvements have taken place in the last three to five years, says Baker, making replication now a more practical addition to backup operations. If bandwidth is readily available between campuses, replication allows changes to data happening at one site to be copied to disks at another site either synchronously or asynchronously. In the former scenario, the copying of data takes place instantly at the remote site; in the latter, the system waits for an opening in bandwidth usage. If the “pipe” is being flooded, the replication waits—typically a few seconds—until traffic has subsided. Data of a certain age can then be shuttled to tape and stored off-site for access in the event of an emergency.
There Are Backups, and There
Are Backups
Ellen Rome, a VP for universal hardware solutions provider STORServer (www.storserver.com),
says that when the disaster recovery d'esn’t work (preferably during those
practice runs) it’s always nice to have “one throat to choke.”
The device her company sells is an integrated solution incorporating server,
disks, tape, and software (along with integration services). It’s typically
rackmounted, though the company can ship the goods as a single appliance, too.
Installation takes two to three days. Once it’s up and running, says Rome,
the appliance backs up everything on the network. From there, it performs incremental
backups. Sound like your own standard operating routine?
Put the plan on the tape. What stands out is the software.
It creates, says Rome, a “daily disaster recovery plan.” The operator
can include notes regarding availability of individuals and the shifting order
of priority in bringing downed servers back into operation. (That prioritization,
she pointed out, requires meetings between the IT people and the decision-makers.)
This information gets written to the tape. Should a disaster occur, the off-site
tapes supply vital responder instructions: who should do what, in what order,
and where specific pieces of equipment are located (in order to gain access
to them). That same documentation can also be delivered daily from the system,
in other forms such as e-mail going out to all participants and their various
e-mail addresses.
STORServer founder Bill Smoldt likes to proclaim, “What I sell is boring
restores. Usually a restore is exciting, and that’s not good. A restore
should be boring.”
Where’s the tape? The question he frequently faces in
helping clients set up their backup operations is what to do with the backup
tapes themselves—and how to retrieve them in the event of an outage. He
says sometimes the solutions get creative. One client (not in education) ships
them a couple of states away. Should the tapes be needed, a plane will be sent
to retrieve them. Another client, with a number of remote offices sans broadband
access, has chosen a plan in which servers will be rebuilt and delivered by
car to the remote sites. In most cases, he says, organizations choose the “disaster
recovery zone” that’s appropriate to the types of disasters they
expect to experience. That may mean the backup g'es a building away, across
town, or across the state.
Have a Plan and Work It
LSU’s Voss points out
that many regions face situations far more dire than the loss of a single data
center. “In Louisiana, we’ve lost a major US city. While that d'esn’t
diminish the impact of the crisis on a given university, it d'es mean it’s
competing for resources with a much broader range of needs. We all need to think
through this more carefully. I, for one, am going to move smartly, making disaster
recovery a key component of my new organizational structure and mission. Because
next time, it could be LSU.”
:
Useful Links
RESOURCES:
PRODUCTS: