Open Menu Close Menu

Disaster Recovery

In Case of Disaster, Plan, Test and Plan Again

Brandeis University's deputy CIO and CISO explains how drafting and testing a disaster recovery plan can help clarify crisis decision-making and foster organizational cohesion.

When it comes to preparing for an IT crisis, it is not enough to merely have a plan. Disaster recovery plans (DRPs) can age quickly and need to be routinely tested – a process that involves not only an institution's IT organization, but senior executives throughout the institution. CIOs may find it challenging to engage leadership about the cost of redundant systems or the benefits of cloud storage, but involving the administration in planning can help later on, according to Michael Corn, deputy CIO and CISO at Brandeis University (MA). Campus Technology asked Corn about the state of disaster recovery planning in higher ed and some best practices for creating and testing a DRP.  

Campus Technology: How sophisticated are most institutions about disaster recovery planning?

Corn: What you'll find is that preparedness covers the entire spectrum from, "Oh my god, what're we going to do about disaster recovery?" to, "We've got this down to a fine art." Educause [recently] sponsored a webinar where we talked about disaster recovery at length, and we polled the couple of hundred [participating] institutions about where they were [on that spectrum]. We found about 60 percent of them felt at best they sort of muddled through a true disaster. There were only about 10 or 15 schools that were actually confident that they were prepared. But I think it's safe to say most places are becoming aware of the criticality and the need to mature their DRP exercises.

As more and more schools are moving some of their core services to the cloud, they gain disaster recovery resilience in that process. For example, if tomorrow we had some sort of disaster at Brandeis that totally eliminated the campus network, most people would still be able to log in to their Gmail accounts because you can do so without using any other Brandeis services. Having said that, not a lot of schools have moved their core administrative functions off site. Whereas you do see services like Workday becoming more common, relatively few schools have done this. So we may be able to e-mail one another if our network [stops working], but it's not clear we have a good way to cut paychecks or any other kind of administrative functions.

CT: What are some key considerations for drafting a plan? Who should be involved in the planning process?

Corn: They key thing is to recognize that a DRP needs to be driven out of the business part of the organization. There are several reasons for that — one of which is if you had some sort of disaster that eliminated or disrupted many of your services on campus, you would have to make decisions as to which ones to bring back first. Some of those decisions are driven by infrastructure: If the network is down, you cannot bring back any service, so obviously network has to be brought back. When it comes to making that next-level service decision about what to bring back – payroll, timekeeping, learning management system, network file shares – those have to be driven by the business. In other words, it is not an IT decision to bring back your ERP versus your LMS; it is a business decision that involves the most senior executives in the organization.

It is critical that when you are reviewing a DRP, your president, chancellor, provost agree and sign off on the decisions. What you don't want to have happen is that in the middle of a disaster when you are trying to restore things according to your plan, your provost or president is second-guessing your decisions. In addition, disaster recovery planning can be expensive. If you are talking about making services redundant or making services more resilient, that takes money and resources. When you take those kinds of problems to senior leadership, they will often go, "What, it takes a week to restore that? We think that should happen faster," and you can tell them the cost.
CT: What are the best ways to rehearse/test the plan? How often should that be done?

Corn: You should conduct a significant tabletop exercise testing your DRP at least once a year. Most institutions, if they do one once a year they'll declare victory because that seems to be as much as they can do. The reason is that to really test a plan effectively involves setting up a scenario that challenges the whole institution – one that requires putting all your executives in one place and having them respond to a set of problems, and having your operational staff in another place and having them respond to the operational details of the scenario – which takes a lot of planning and coordination. When we tested our DRP last August, the president couldn't be there because she was out of town and the provost had to leave in the middle for another meeting. That brought up a conversation about who's making decisions and who is in charge, and those are the conversations that you need to have before a disaster occurs.

Having said that, there is no reason that you cannot have smaller, operational-only tabletop exercises within your IT organization. I am a real believer that these micro-tabletop exercises are healthy for an IT organization.

CT: What are the top priorities for IT when a disaster occurs?

Corn: I think the really critical piece is communications. It is one thing to call the communications office and tell them what is going on, enough where they can then reach out to the press, the president, other executives. But if you have a disruption to your services, people will start calling the help desk. Does the help desk know what to say? If you call an academic unit or business unit, which each have their own points of contact for their internal and external communities, do they know what to say? One of the things you are trying to minimize is rumors that can escalate very quickly. You need be very careful about how you communicate and you need to communicate broadly. The challenge is that many places, especially smaller shops, are not going to have a communications staff, so you need to actually decide who calls who, who puts together the message – and it can't be the same people who are trying to fix the problem. In some cases, it may mean that you need to designate a senior IT person who has good communication skills to put a message together.

CT: What are some common mistakes institutions make in their disaster recovery planning?

Corn: The number one mistake is not testing enough, and number two is not involving the organization outside of IT. As a consequence of that, there are frequently a lot of assumptions people make about how prepared you are. At my last institution, we had an eight-hour power outage. When I was listening to some of the debriefing after the incident, the facilities people said the number one question they got was, "Why hasn't my emergency generator kicked in yet?" Well, most of those people did not have emergency generators; they assumed they had them. To me that revealed that you really have to practice these things to document assumptions and you have to be really clear about who is involved and who is notified. When you drill through tests, those kinds of questions surface.

CT: What are some resources available to help institutions draft a disaster recovery plan?

Corn: There are a few great resources out there. First, Educause has a huge amount of material online. You can get DRPs, tabletop exercises and guidance there. The webinar we gave was very good – my piece was just at the end of that, but there is two or three hours before where people are talking about their experiences – that can viewed and downloaded online. Another big one is that you can Google other universities' DRP tabletop exercises. Many institutions have a website where they talk about their DRPs. They are not going to give you details as to who they would call at two in the morning, but they will talk about their general planning process.

One of the great things about working in higher ed is if you stumble across someone who is doing a good job, the odds are if you call them, they will be more than happy to spend a couple of hours talking to you. When we revised our DRP over the last year, we reached out to the folks at the University of Connecticut because they're close to us. I know the person who did their DRP and not only did he share their experiences with us, but he drove to our campus and spent a morning going through our plan, acting as an outside reader. That was fantastic for us because they were a year or two ahead of us in their planning maturity and it was a great experience to learn from what they did, and to hear their insights on our plan.

comments powered by Disqus