Automating Data Backups on Campus

The challenges of data integrity and storage on a college campus can be immense. With 30 TB (and growing) of crucial files to keep both backed up and accessible, South Carolina's Furman University is taking on those challenges using a differential process, one that cuts down on the duration of backups and saves on administrative load.

With 30 TB of data and growing to keep backed up and accessible, South Carolina's Furman University needed a reliable and easy-to-use backup system. Furman, a top liberal arts college in the South, serves 3,100 students as the oldest and largest private institution in South Carolina.

The backup challenges on a college campus are considerable, according to Systems Administrator Russell Ensley. "Data grows almost faster than we can keep up with; it's huge."

Not only are new pieces of data created daily, he said, but more so than in a corporate environment, data needs to be stored long-term: "Some things here just never go away. That means tons of storage space is needed.... It's constant, really--an everyday game of providing storage and keeping the integrity of the data."

Rather than backing up all 30 TB nightly and keeping track of hundreds of tapes, as the school used to do, Ensley now uses an automated system whose software selectively backs up only what's been changed and keeps track of what data is one which tape. The result is software that begins automatically every night and selectively backs up only about half a gigabyte, producing about 100 tapes a night and managing the tape library completely.

"Put simply, your very first backup is a full backup, and every backup after that is a differential backup," Ensley explained. "You get away from managing sets of tapes." Furthermore, the software system keeps track of what is on each tape, how full it is, and when portions of a tape can be reused by writing over old data, a process called data reclamation. Different types of data can be assigned different values that dictate when their space on a tape or disk can be reclaimed. That not only saves Ensley's time and reduces the number of tapes needed, but reduces the size of the tape library itself--which produces additional cost savings.

All in all, the system--an appliance from STORServer that includes hardware, software, and tape library, or jukebox--has steadily, if gradually, proved its worth over time. Ensley estimated the system saves him several hours a week in administrative time.

The STORServer backup appliance at Furman uses an IBM server and is built around IBM's Tivoli software. It makes use of a T120 Spectra Logic tape library, another well established brand. In fact, STORServer's use of IBM's industry-standard Tivoli backup software was key in Furman's selection of the product, Ensley said, as was the automated backup process it uses.

The backup process, which used to be much more hands-on, is now almost entirely automated. At 10 p.m. nightly, the software on 35 to 40 node servers kick off their backup routines, marching methodically across their files to back up selective data to about 100 tapes. That portion of the backup is completed within an hour; the remainder, which entails a linear progression to check on and selectively back up some 5 million to 6 million files nightly on three servers, takes another eight hours. Two sets of backup tapes are made, one of which goes to a tape library on campus, the other weekly to offsite storage. Ensley said that in four years of using the system, "we have never had a problem with a restore ... knock on wood."

The system's ease of use means that little administrative intervention is required. "[The backup system] has very low overhead as far as labor," Ensley said. "Once it's set up and tuned, for the most part it runs itself, outside of running new nodes for backup. There are [plenty of] buttons and dials that you can manipulate yourself, but it doesn't have to be complicated." Setting up the system initially took about three days, Ensley said, and included help from a STORServer technician who spent those three days on campus. From there, the system has been largely maintenance-free.

Perhaps the best part of the STORServer system, Ensley said, is that the database tracks what needs to be backed up, eliminating the old system: a weekly backup, then incremental backups all week long. Instead, pointers in the database keep track of what to back up when. "You don't spend your time figuring out what tapes have what on them," Ensley said.

In future, he said, Ensley envisions eliminating the tape portion of the backup completely and moving to a disk-to-disk backup, which would further reduce the time it takes to restore data when necessary.

About the Author

Linda Briggs is a freelance writer based in San Diego, Calif. She can be reached at [email protected].

Featured

  • student reading a book with a brain, a protective hand, a computer monitor showing education icons, gears, and leaves

    4 Steps to Responsible AI Implementation

    Researchers at the University of Kansas Center for Innovation, Design & Digital Learning (CIDDL) have published a new framework for the responsible implementation of artificial intelligence at all levels of education.

  • glowing digital brain interacts with an open book, with stacks of books beside it

    Federal Court Rules AI Training with Copyrighted Books Fair Use

    A federal judge ruled this week that artificial intelligence company Anthropic did not violate copyright law when it used copyrighted books to train its Claude chatbot without author consent, but ordered the company to face trial on allegations it used pirated versions of the books.

  • server racks, a human head with a microchip, data pipes, cloud storage, and analytical symbols

    OpenAI, Oracle Expand AI Infrastructure Partnership

    OpenAI and Oracle have announced they will develop an additional 4.5 gigawatts of data center capacity, expanding their artificial intelligence infrastructure partnership as part of the Stargate Project, a joint venture among OpenAI, Oracle, and Japan's SoftBank Group that aims to deploy 10 gigawatts of computing capacity over four years.

  • laptop displaying a phishing email icon inside a browser window on the screen

    Phishing Campaign Targets ED Grant Portal

    Threat researchers at cybersecurity company BforeAI have identified a phishing campaign spoofing the U.S. Department of Education's G5 grant management portal.