Refining Disaster Recovery in a Challenging Environment
A clever use of software-defined network virtualization took Mohave Community College from a crippling 72-hour outage to a 45-minute (and still dropping) recovery time.
Category: IT Infrastructure & Systems
Institution: Mohave Community College
Project: Disaster Recovery
Project lead: Mark Van Pelt, chief information officer
Tech lineup: CDW-G, Veeam, VMware
Project team members Brian Massey (left) and Joshua Walters (right) (Photo courtesy of Mohave Community College)
Nestled in the northwest corner of Arizona, Mohave County is the fifth-largest county in the contiguous United States. That makes for some serious challenges for the 16-person IT team at Mohave Community College, which has campuses far-flung in Kingman, Lake Havasu City, Bullhead City and Colorado City. "My closest campus is 45 minutes away and my farthest is four-and-a-half hours away, and I have to drive through two other states to get there," said Mark Van Pelt, who has been MCC's chief information officer for three years.
Mark Van Pelt
Van Pelt said he took the position well knowing that the county's vast size and the college's limited resources would bring unique challenges, especially in terms of network infrastructure and disaster recovery. For instance, fiber is unavailable in Mojave County, and much of the connectivity comes from low-speed microwave connections.
"I came in as the seventh CIO in a five-year period," he added. "When I arrived, we were averaging about two outages per week in terms of major systems. I wouldn't say the infrastructure was collapsing, but we were certainly challenged. The youngest equipment on campus was about the same age as most of our students. In terms of networking equipment and most server infrastructure, I knew I had a big job ahead of me when I got here."
Mohave Community College's Lake Havasu City campus (Photo courtesy of Mohave Community College)
Then three months after his arrival in 2016, disaster struck. A systems engineer made a mistake that ended up corrupting the domain databases, and there were no good backups. "The same person who corrupted them was supposed to be backing them up," Van Pelt said. "One night he was supposed to be doing maintenance. The next morning we came in and nobody could log in or see anything." That was a Wednesday. For the next three days, the college was completely down. "I was still getting a handle on what we were or were not doing correctly, but that led to three sleepless nights for our group. Or when we were sleeping, we were sleeping at our desks. We would run a script, knowing it would take an hour, put our heads down and set an alarm to wake up when the script finishes."
The recovery from that outage led Van Pelt and his team on a journey to find a more reliable, yet affordable, disaster recovery solution. Two MCC IT members, Joshua Walters, VMware administrator/team lead, and Brian Massey, network administrator, worked with CDW-G and VMware to implement a solution based on NSX, VMware's software-defined network virtualization and security platform.
The college engineers realized they could use VMware and NSX to establish a "ghost network" to transfer data to a hidden host. If the main host is lost, recovery consists of materializing the ghost network, propagating the DNS (Domain Name System) change locally, and opening a few ports on the host. The system is operational locally in about 12 minutes, and cloud-wide in 49 minutes.
Photo courtesy of Mohave Community College
"VMware helped us get started implementing NSX," Walters recalled. "It was a challenge for them to put NSX in place because certain aspects of our environment weren't quite ready for it," Walters said, "but they were brilliant engineers and they helped us get it implemented to work in our environment. Brian and I then figured out a way to use NSX in a way they hadn't thought of yet as a workaround to the way our environment was set up, so we didn't have to take the time to change our whole network to get it right at that moment."
"There are several benefits to NSX, one being that if we needed to spin something up or make a change or push something out, especially within the data center, it makes it more manageable for me," Massey added.
During spring break in 2018, they simulated an outage and got the system back up in 45 minutes. "We use Veaam to do our backups and then copy those backups to the disaster recovery site," Walters said. "We had Brian cut the network connection to our data center. The production servers were still online but nobody could communicate with them. We fired up the copies of backups from Veaam. NSX stretched the network from our production data center to our disaster recovery data center."
"When we brought everything back up, it went so quickly that at first we weren't sure we had done it right," Van Pelt recalled. "Remember, we were coming off 72 hours [recovery time] and with this, we were fully up and running in 45 minutes the first time we tested it. So we ran it again to make sure."
The MCC team said the project could be relevant to other schools in rural areas that have limited infrastructure for server replication and data transfer. In MCC's case, they previously were unable to replicate their student information system over their network for backups every day, because it was too much data for the limited bandwidth available in Mohave County. With the NSX disaster recovery project, they are replicating data every 15 minutes, resulting in a server that can be brought online immediately, with no loss of data.
Since completion of the project, the college has had no unplanned outages. MCC's next step is to refine the whole process, because they believe they can get the recovery time down to 15 minutes or less.
VMware thought MCC's implementation was unique enough that it asked representatives from the college to present at last year's VMWorld conference. An executive from a New Zealand university took their approach, refined it, and implemented it in his network. "He gave us all of his notes, so we are using that as we build out or next phase," Van Pelt said. "We are talking with him and trading ideas. That was cool for us that our tiny college in Arizona has affected the network infrastructure for a large university in New Zealand."
Van Pelt stressed that even though MCC is a small college, he has been willing to invest in training for his staff. "I had two really smart employees who got together with two really smart guys from NSX, and between the four of them they came up with a way to address a serious issue for our college," he added. "I mean 72 hours is really not suitable for an academic environment, especially given that so many of our students are online students. We have to be up and running. A three-day recovery is not an option for us. From that perspective, if you are investing in the technology, damn sure invest in the training. I hit the lottery when Brian and Josh were learning about NSX because it really caused them to think about what we are doing and how we can do it better."
MCC also is planning to build an open VMware lab environment so that students can work on servers without worrying about breaking MCC's network. "About half of our staff also teach at the college, including me," Van Pelt said. "For the students it's nice because we are teaching them about stuff we did today at work. As we build it out, they can work with it, and then when they graduate and go to job interviews, they can say they used NSX and the latest version of VMware to build environments."
Return to Campus Technology Impact Awards Home