Automating Data Backups on Campus
The challenges of data integrity and storage on a college campus can be immense. With 30 TB (and growing) of crucial files to keep both backed up and accessible, South Carolina's Furman University is taking on those challenges using a differential process, one that cuts down on the duration of backups and saves on administrative load.
- By Linda L. Briggs
With 30 TB of data and growing to keep backed up and accessible, South Carolina's Furman University needed a reliable and easy-to-use backup system. Furman, a top liberal arts college in the South, serves 3,100 students as the oldest and largest private institution in South Carolina.
The backup challenges on a college campus are considerable, according to Systems Administrator Russell Ensley. "Data grows almost faster than we can keep up with; it's huge."
Not only are new pieces of data created daily, he said, but more so than in a corporate environment, data needs to be stored long-term: "Some things here just never go away. That means tons of storage space is needed.... It's constant, really--an everyday game of providing storage and keeping the integrity of the data."
Rather than backing up all 30 TB nightly and keeping track of hundreds of tapes, as the school used to do, Ensley now uses an automated system whose software selectively backs up only what's been changed and keeps track of what data is one which tape. The result is software that begins automatically every night and selectively backs up only about half a gigabyte, producing about 100 tapes a night and managing the tape library completely.
"Put simply, your very first backup is a full backup, and every backup after that is a differential backup," Ensley explained. "You get away from managing sets of tapes." Furthermore, the software system keeps track of what is on each tape, how full it is, and when portions of a tape can be reused by writing over old data, a process called data reclamation. Different types of data can be assigned different values that dictate when their space on a tape or disk can be reclaimed. That not only saves Ensley's time and reduces the number of tapes needed, but reduces the size of the tape library itself--which produces additional cost savings.
All in all, the system--an appliance from STORServer that includes hardware, software, and tape library, or jukebox--has steadily, if gradually, proved its worth over time. Ensley estimated the system saves him several hours a week in administrative time.
The STORServer backup appliance at Furman uses an IBM server and is built around IBM's Tivoli software. It makes use of a T120 Spectra Logic tape library, another well established brand. In fact, STORServer's use of IBM's industry-standard Tivoli backup software was key in Furman's selection of the product, Ensley said, as was the automated backup process it uses.
The backup process, which used to be much more hands-on, is now almost entirely automated. At 10 p.m. nightly, the software on 35 to 40 node servers kick off their backup routines, marching methodically across their files to back up selective data to about 100 tapes. That portion of the backup is completed within an hour; the remainder, which entails a linear progression to check on and selectively back up some 5 million to 6 million files nightly on three servers, takes another eight hours. Two sets of backup tapes are made, one of which goes to a tape library on campus, the other weekly to offsite storage. Ensley said that in four years of using the system, "we have never had a problem with a restore ... knock on wood."
The system's ease of use means that little administrative intervention is required. "[The backup system] has very low overhead as far as labor," Ensley said. "Once it's set up and tuned, for the most part it runs itself, outside of running new nodes for backup. There are [plenty of] buttons and dials that you can manipulate yourself, but it doesn't have to be complicated." Setting up the system initially took about three days, Ensley said, and included help from a STORServer technician who spent those three days on campus. From there, the system has been largely maintenance-free.
Perhaps the best part of the STORServer system, Ensley said, is that the database tracks what needs to be backed up, eliminating the old system: a weekly backup, then incremental backups all week long. Instead, pointers in the database keep track of what to back up when. "You don't spend your time figuring out what tapes have what on them," Ensley said.
In future, he said, Ensley envisions eliminating the tape portion of the backup completely and moving to a disk-to-disk backup, which would further reduce the time it takes to restore data when necessary.