Automating Data Backups on Campus

The challenges of data integrity and storage on a college campus can be immense. With 30 TB (and growing) of crucial files to keep both backed up and accessible, South Carolina's Furman University is taking on those challenges using a differential process, one that cuts down on the duration of backups and saves on administrative load.

With 30 TB of data and growing to keep backed up and accessible, South Carolina's Furman University needed a reliable and easy-to-use backup system. Furman, a top liberal arts college in the South, serves 3,100 students as the oldest and largest private institution in South Carolina.

The backup challenges on a college campus are considerable, according to Systems Administrator Russell Ensley. "Data grows almost faster than we can keep up with; it's huge."

Not only are new pieces of data created daily, he said, but more so than in a corporate environment, data needs to be stored long-term: "Some things here just never go away. That means tons of storage space is needed.... It's constant, really--an everyday game of providing storage and keeping the integrity of the data."

Rather than backing up all 30 TB nightly and keeping track of hundreds of tapes, as the school used to do, Ensley now uses an automated system whose software selectively backs up only what's been changed and keeps track of what data is one which tape. The result is software that begins automatically every night and selectively backs up only about half a gigabyte, producing about 100 tapes a night and managing the tape library completely.

"Put simply, your very first backup is a full backup, and every backup after that is a differential backup," Ensley explained. "You get away from managing sets of tapes." Furthermore, the software system keeps track of what is on each tape, how full it is, and when portions of a tape can be reused by writing over old data, a process called data reclamation. Different types of data can be assigned different values that dictate when their space on a tape or disk can be reclaimed. That not only saves Ensley's time and reduces the number of tapes needed, but reduces the size of the tape library itself--which produces additional cost savings.

All in all, the system--an appliance from STORServer that includes hardware, software, and tape library, or jukebox--has steadily, if gradually, proved its worth over time. Ensley estimated the system saves him several hours a week in administrative time.

The STORServer backup appliance at Furman uses an IBM server and is built around IBM's Tivoli software. It makes use of a T120 Spectra Logic tape library, another well established brand. In fact, STORServer's use of IBM's industry-standard Tivoli backup software was key in Furman's selection of the product, Ensley said, as was the automated backup process it uses.

The backup process, which used to be much more hands-on, is now almost entirely automated. At 10 p.m. nightly, the software on 35 to 40 node servers kick off their backup routines, marching methodically across their files to back up selective data to about 100 tapes. That portion of the backup is completed within an hour; the remainder, which entails a linear progression to check on and selectively back up some 5 million to 6 million files nightly on three servers, takes another eight hours. Two sets of backup tapes are made, one of which goes to a tape library on campus, the other weekly to offsite storage. Ensley said that in four years of using the system, "we have never had a problem with a restore ... knock on wood."

The system's ease of use means that little administrative intervention is required. "[The backup system] has very low overhead as far as labor," Ensley said. "Once it's set up and tuned, for the most part it runs itself, outside of running new nodes for backup. There are [plenty of] buttons and dials that you can manipulate yourself, but it doesn't have to be complicated." Setting up the system initially took about three days, Ensley said, and included help from a STORServer technician who spent those three days on campus. From there, the system has been largely maintenance-free.

Perhaps the best part of the STORServer system, Ensley said, is that the database tracks what needs to be backed up, eliminating the old system: a weekly backup, then incremental backups all week long. Instead, pointers in the database keep track of what to back up when. "You don't spend your time figuring out what tapes have what on them," Ensley said.

In future, he said, Ensley envisions eliminating the tape portion of the backup completely and moving to a disk-to-disk backup, which would further reduce the time it takes to restore data when necessary.

About the Author

Linda Briggs is a freelance writer based in San Diego, Calif. She can be reached at [email protected].

Featured

  • two businessmen shaking hands

    What I Learned Working with an OPM

    At a time when higher education is being asked to do more with less, online program management partnerships can be the difference between simply surviving and truly thriving.

  • glowing digital brain above a chessboard with data charts and flowcharts

    Why AI Strategy Matters (and Why Not Having One Is Risky)

    If your institution hasn't started developing an AI strategy, you are likely putting yourself and your stakeholders at risk, particularly when it comes to ethical use, responsible pedagogical and data practices, and innovative exploration.

  • closeup of hands on laptop with various technology icons

    Microsoft Intros New AI-Powered Teaching and Learning Tools

    Microsoft has unveiled a number of updates bringing AI-powered experiences to teaching and learning. New features include a "Teach" AI tool for Copilot, a "Study and Learn" AI agent, and more.

  • magnifying glass highlighting a human profile silhouette, set over a collage of framed icons including landscapes, charts, and education symbols

    AWS, DeepBrain AI Launch AI-Generated Multimedia Content Detector

    Amazon Web Services (AWS) and DeepBrain AI have introduced AI Detector, an enterprise-grade solution designed to identify and manage AI-generated content across multiple media types. The collaboration targets organizations in government, finance, media, law, and education sectors that need to validate content authenticity at scale.