Business Continuity: Eureka! -- Campus Technology

Disaster Recovery and Business Continuity

Eureka!

It's a tough economy, but you still have to formulate your disaster recovery plan-- you just have to do it more creatively now.

By Matt Villano
04/01/09

Academic technologists at Lynn University (FL) learned the hard way just how important a disaster recovery plan can be. Back in 2005, when Hurricane Wilma walloped the region with driving rain and 120 mile-per-hour winds, the storm debilitated Lynn's IT department, causing damage and flooding that thwarted the campus network for nearly two weeks.

While the campus never lost its internet connection, it did lose its T1 connection, so the school's VoIP telephones went down. For days on end, CIO Chris Boniforti and his colleagues scrambled to get T1 lines onto emergency backups, and AT&T agreed to put a portable generator in place to power the intermediate exchange that connected the school's main campus to the telecommunication provider's hub.

Ultimately, the problem was resolved, and systems at Lynn resumed ordinary operations. The experience, however, taught the CIO that some sort of formal disaster recovery plan was necessary to avoid similar dramas down the road. "To say Wilma was eye-opening would be an understatement," concedes Boniforti, whose institution sits smack in the middle of Florida's infamous Hurricane Alley. "We had other spending priorities, but we also knew that tackling business continuity was something we had to do in order to guarantee survival the next time a crisis struck."

Lynn administrators weren't the only higher ed officials to experience such a recent epiphany. In past months, despite slow economic times, a growing number of academic technologists have redoubled their focus on business continuity. In particular, schools such as Rice University (TX), Saint Michael's College (VT), and Indiana University-Purdue University Indianapolis have invested in disaster recovery strategies and new processes to make sure their own systems don't succumb. And though these solutions aren't as "sexy" as social networking tools or constituent relationship management (CRM) systems, they are vital investments for the long-term survival of a campus network, and should be carefully considered even if your school isn't ostensibly in harm's way.

Building a Plan

Boniforti notes that in crisis situations, your IT staffers are the key to engineering success, so ensuring their safety is critically important. With this in mind, the Lynn disaster recovery plan is all about people. Boniforti and his colleagues divvied up their IT staff of 30 into three teams: pre-incident, midincident, and post-incident. Each team's tasks were then outlined in the 129-page document IT Hurricane Plan 2008.

The pre-team's responsibilities make up the first third of the document, and include tasks such as using traditional tape drives to back up:

All SQL databases, web applications, and Microsoft Exchange
The PBX, phone mail, and call manager
Firewall, switches, and routers

Once these tapes are made, preincident responders are charged with checking them to make sure they work, and calling representatives from information and storage protection provider IronMountain to come and pick up the tapes to store them in a secure remote location. Boniforti says pre-incident responders also are responsible for breaking out premade "hurricane kits" that contain tools such as wet/dry vacuums and radios, and must hit the local Publix supermarket to stock up on food and supplies for colleagues who will work during and after the storm.

"So many continuity planners worry only about the technology," says the CIO. "In our case, we've got some of our people riding out the incident here with our equipment, and we need to make sure that we're giving them all of the supplies and support we can." Once these steps have been fulfilled, the second team takes over. While this team is on hand to cover physical servers with plastic tarps and perform manual shutdowns if necessary, the group's main role during the incident is communication-- both with other school officials and with constituents (such as students, faculty members, and nontechnical staff).

Specifically, these employees are charged with updating the school's website with information about campus closures, and utilizing e-mail and an opt-in text-messaging service to keep constituents informed. If a power outage or related problem disrupts the primary connection to the school's data center, this team also has the authority to work with AT&T to redirect data through a secondary route (a technology the telecom provider calls SmartRing).

Finally, after an incident, Lynn's postincident squad comes in to get everything up and running again. The official document plan lays out a list of restoration priorities: Mission-critical servers and the database infrastructure come first, library systems are second, and network connections in residence halls are last. Team members are then tasked with perhaps the toughest challenge: assessing damage across campus and lining up repairs.

Determining ROI

ADMITTEDLY, UNTIL YOUR school has to recover from a disaster, it's challenging to determine the return on investment (ROI) in a disaster recovery plan. Remember the 1988 Cinderella song, "Don't Know What You Got (Till It's Gone)"? For academic CIOs, this song could be used to describe the value of data and a reliable network-- a reality that necessitates some sort of insurance policy to keep things working smoothly. Still, how much is too much to spend? Consider this: Chris Boniforti, CIO at Lynn University (FL), estimates he invested about $100,000 in hardware necessary to actualize his disaster recovery plan, and he spends another $50,000 annually for a service from AT&T that provides a backup connection to the internet should the school's primary connection fail. The CIO also notes that he spends an additional $30 per month for a service from AT&T that automatically reroutes calls to the campus's main number when the school's VoIP goes down. "How much is it worth to you, to put students and their parents at ease?" Boniforti asks. "It's hard to answer that question, but that's really the question you're asking when you start thinking about ROI with this stuff." Other technologists have found easier ways to quantify their expenditures on disaster recovery-oriented tools. Bill Anderson, CIO at Saint Michael's College (VT), declined to share how much his school has budgeted for its new off-site data backup service, but notes that the cost will represent the extent of disaster recovery expenditures across the board. "You can come up with a basic continuity strategy pretty cheaply," he says of strategizing smart business continuity. But, he points out, "It's how well you leverage that strategy that matters most."

"The only way to deal with disaster is to be tactical," says Boniforti. "Without a specific [tactical] plan, inevitably something is left undone."

Emphasis on Communication

While Lynn University has broken down its IT department into three teams, technologists at Rice University outside of Houston have pooled staffers (and representatives from just about every other department at the university) into one team, and dubbed the group the Crisis Management Team, or CMT.

Kamran Khan, the school's vice provost for information technology, says the team meets monthly to hone disaster recovery protocols and, occasionally, to practice. Khan says that every time the group gets together, it reviews the three most important tenets of Rice's plan: risk analysis, data recovery and, of course, communication.

The risk analysis part is important but relatively straightforward. Working off research and prepublished templates from the National Institute of Standards and Technology, the Federal Emergency Management Agency, and Gartner, the CMT regularly evaluates all of its mission-critical systems, and periodically re-prioritizes which systems team members should focus on fixing first, in the event of a catastrophe.

"We sit down, evaluate everything on our network, and ask ourselves questions such as, 'How long can the system be down?' 'How do we recover it?' and 'How long does it take us to recover it?'" Khan explains. "Through this process we figure out what's most relevant."

While Lynn University has broken down its IT department into three teams, technologists at Rice have pooled staffers and representatives across university departments into one Crisis Management Team.

Next, the CMT focuses on data recovery. Back in 2006, Rice and another school outside of Houston agreed to mirror data on each other's campuses, so each university automatically backs up critical information off-site. These backups form secondary data centers in the event that the schools' primary data warehouses fail. At Rice, Khan and his colleagues have lined up a third backup as well-- an off-site location even farther away. Every time CMT members meet, they test these systems to make sure they're all working properly.

The third and final piece of Rice's disaster recovery plan is communication, and Khan says this is by far the most important component. The heart of Rice's communication strategy is an emergency notification system the school implemented after Hurricane Rita in 2005. The system, which revolves around technology from MIR3, automatically sends e-mails to all Rice users in the event of an emergency. It also delivers emergency messages via cell phone; users can opt in to receive recorded audio messages from the emergency notification number. Although the school does not require users to sign up for the cell phone alerts, users appear to be interested in doing so: Khan reports that in the two years since the system went live, more than 94 percent of all potential users have signed up.

"This level of participation makes clear exactly how important communicating during a time of crisis really is," he says, noting that while CMT members try not to bombard users with test messages all that often, they will do so from time to time. "This indicates a commitment on the part of our users to being safe, which, quite frankly, makes our jobs easier."

Backing Up

For other institutions, disaster recovery focuses squarely on basic data backup. Because tape backups can be slow and unreliable, two small four-year institutions have turned to different backup solutions: drive backup and off-site backup.

At Indiana University-Purdue University Indianapolis, where each department is in charge of its own disaster recovery strategy, technologists in certain departments have turned to various types of drive backup from Paragon Software Group. This technology creates an exact disk image of every Windows server tabbed for backup, including the operating system, databases, and applications.

COMMUNICATION is by far the most important component of a disaster recovery plan, says Rice University Vice Provost for IT Kamran Khan.

According to David Hoffman, IT manager for the university's School of Physical Education and Tourism Management, the drive backup outperforms tape backup because tapes "have a tendency to fail and wear out." He adds that, currently, his department uses the tool to back up files off its file server to a separate storage area network (SAN). "The fact that it's automated makes it really easy-- every time we change a file, a new version of the file is automatically backed up," he says. "Knowing the data are being saved in duplicate puts my mind at ease that we're covered if something goes wrong."

Still, Hoffman admits that if there's a disaster in Indianapolis, both his primary data and the drive backup could be destroyed. For this same reason, technologists at Saint Michael's College have geared up to employ a different type of backup technique: saving data to an offsite location. As of late February, the St. Michael's strategy was still in development. But, as CIO Bill Anderson explains it, school officials plan to work with a (yet to be determined) vendor and third-party host to set up a business continuity/data recovery (BCDR) site 10 miles from the school's main campus in Colchester, just outside of Burlington. Once this site is established, Anderson says Saint Michael's will equip it with mirrored storage and applications, as well as servers that will test themselves automatically and back up all of the virtualized servers the college currently owns.

"We're looking to back up literally all of our most critical data and applications," he explains, noting that the college also will invest in a dedicated fiber connection to the BCDR site. "This way, if something happens on campus-- plane crash, fire, or anything-- we'll be ready." Anderson says that at this point, the biggest questions he and his colleagues have relate to recovery objectives. He notes that while prioritizing the most important data packets for virtually instantaneous recovery would be ideal, both the process of prioritizing data packets and the option of quick recovery cost big bucks-- a budget line item that might be hard to justify (see Determining ROI). Instead, the college likely will opt for a backup setup that treats all data the same way, and 30-minute recovery times-- two measures that provide reassurance but won't break the bank.

"The shorter the point and time objectives are, the more expensive the solution is going to be," Anderson explains, citing the 30-minute timeframe as a good compromise. "While we don't need things like ERP and VoIP back online immediately, longer than 30 minutes and we'll start to hear about it."

::WEBEXTRAS ::
Head to our disaster recovery solution center for the latest news, case studies, research, features, and more.

E-Mail this page

Printable Format

Featured

How Colleges Are Connecting the Student Lifecycle to Improve Student Success

Colleges are aligning recruitment, advising, and student services into a connected student lifecycle. This coordination helps institutions support students more effectively and work more collaboratively.
Microsoft Intros New Agentic AI Security Multi-Model Defense System

A new multi-model agentic AI security system built by Microsoft's Autonomous Code Security team helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack, the company anounced in a recent security blog post.
Microsoft Accelerates Focus on Quantum-Safe Security

Microsoft is speeding up its quantum-safe security timeline, saying advances in quantum computing and new federal requirements have pushed post-quantum cryptography from a future planning issue into an immediate engineering priority.
Cybersecurity Researchers Identify First Fully Autonomous AI-Driven Ransomware Attack

Threat researchers at cloud security firm Sysdig have disclosed what they describe as the first documented ransomware operation carried out end-to-end by an autonomous AI agent, with no human typing commands or directing individual steps once the attack was underway.