Researchers Throw Out Digital Waste Scheme

A couple of computer scientists--one from Johns Hopkins University and the other at the University of Alabama at Birmingham--have looked to the science of waste management for guidance on what to do with unwanted or unused data from the digital world. In a paper published on arXiv.org, an open source repository operated by the Cornell University Library, Ragib Hasan and Randal Burns have suggested familiar "green" solutions to the digital waste data problems: reduce, reuse, recycle, recover, and dispose.

Burns, an associate professor in the Department of Computer Science in the Whiting School of Engineering at Johns Hopkins, directs the Hopkins Storage Systems Lab. Hasan, an assistant professor at the Department of Computer and Information Sciences at U Alabama at Birmingham, formerly worked with Burns at the Storage Systems Lab.

There are practical reasons to be concerned about unused data. "Resources in a digital ecosystem are not infinite," the authors wrote. "Storing, transferring, and disposing of data consume these resources." As Hasan explained, "If you have a lot of debris in the street, traffic slows down. And if you have too much waste data in your computer, your applications may slow down because they don't have the space they require."

Typically, the approach for getting rid of unwanted data includes compression and deletion. However, the authors wrote, "These processes come at a price--disposal of waste data consumes resources in the form of energy used to delete data, tying up compute cycles, blocking I/O, etc. Disposal via deletion also causes degradation of performance and reduces the lifetime of storage components (such as Flash storage)."

Even though data storage devices have become less expensive, Hasan added, hard drives can still run out of room. In addition, Flash-based systems, such as memory cards, possess a limited number of write-erase cycles, and frequent deleting of waste data can shorten their lifespan.

How then to curb the clutter of a computer? The computer scientists have created a five-tier pyramid to lay out the viable options, with the most valuable approach at the top.

Illustration of the pyramid of data waste: reduce, reuse, recycle, recover, dispose.

Reduce: The most preferred option is to reduce the amount of data that flows into a computer to begin with, thereby imposing the least overhead on the system. This can be done, the researchers wrote, by encouraging software makers to design their programs to leave fewer unneeded files behind after a program is installed. To coax the software makers to comply, operating systems and file systems could provide incentives to applications that produce less waste and punish those that produce more waste, such as throttling their performance.

Reuse: Coders could break their code into smaller modules that could serve double-duty. If two programs are found to utilize identical modules, one could be eliminated through data deduplication. Likewise, information in one machine translation could be used to "enrich global translation capabilities," such as is done by the Google Translation Toolkit "translation memories." Re-use is the second-best option in the pyramid.

Recycle: This usage, which presented a challenge to the researchers, calls for the reuse for a new purpose of files from software that's going to be removed from the computer.

Recover: Some digital leftovers could be anonymized and shared or analyzed for obtaining high-level views, the researchers have suggested. They could also be used to "gather patterns about historical trends."

Dispose: Sitting at the bottom of the pyramid, deletion of files is the least desirable option. "This is costly in terms of time and energy spent deleting data objects," the authors wrote. "So, we opine that deletion should be the absolute last recourse in managing waste data."

Outside of the hierarchy, the researchers pointed out, is the kind of data that has to be physically destroyed--through incineration, degaussing, or destruction of the storage media--for reasons of data security or regulation. However, they wrote, "This has the worst impact on the natural environment as any such disposal would impact the physical ecosystem."

To continue with their metaphor, Hasan and Burns also recommended consideration of a "digital landfill," where unwanted data could be moved "without additional cost associated with deletion." This could be accomplished by the use of a "semi-volatile storage device" that would store the data until it gradually "fade automatically," allowing the space to be reclaimed for other purposes.

Hasan noted that with the abundance of cheap storage, most users haven't given much thought to the clutter piling up in their digital devices. But with the shift to cloud computing, those systems where users send files or store them could eventually be overrun with waste data. "Someday, this could become a problem as we begin using up these storage resources," Hasan said. "Maybe we should start talking about it now."

The research paper, "The Life and Death of Unwanted Bits: Toward Proactive Waste Data Management in Digital Ecosystems," can be read online.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured