Data Management | Feature
Setting Storage Standards for Digital Media
Stanford University's Julie Sweetkind-Singer discusses the importance and impact of long-term geospatial data storage.
Photo by Linda A. Cicero / Stanford News Service
Stanford University's (CA) Julie Sweetkind-Singer is a recognized authority on digital preservation, and has been honored by the Library of Congress for her work in the field. She currently serves as both the assistant director of Stanford's Geospatial, Cartographic and Scientific Data and Services and as head of the Branner Earth Sciences Library and Map Collections. In a recent interview with CT, Sweetkind-Singer discussed the challenges facing the field of digital preservation.
Campus Technology: What are the primary considerations for archiving and preserving digital information over the long term?
Julie Sweetkind-Singer: From a librarian's point of view, digital data is very different and much more difficult to preserve for extended periods than paper-based data. For example, a book on acid-free paper can be kept on a shelf in a cool, dark place for 100 years. If it is well cared for, you would expect it to remain in pretty good shape.
With digital information, you have to implement a process from the very beginning that will allow you to preserve it well into the future. This includes ensuring that the data is well managed technically; that metadata exists so that someone in the future will understand what the data represents and how it has been stored; and that legal documents are in place indicating how the data may be used.
It's important for digital archivists to develop long-term preservation plans that include both technical and legal stipulations. Unless digital files are correctly preserved and documented, we run the risk of losing the information, which is then unavailable to future generations.
CT: From an educator's perspective, what are the key reasons to preserve geospatial data?
Sweetkind-Singer: For both educational and research purposes, it is critical that we preserve data for the long term. For example, the opportunity to trace the development of a region using historical maps is useful to researchers who are studying population growth or the change from an agriculture- to industry-based economy. A historian may want to know when the railroad first reached the study area, and what effect it had; what crops formerly grew there; in which direction the area began its expansion; when major roadways were built and which cities they connected.
You can analyze all this over time by studying geospatial data, but only if you have the content to do so. Preserving historic data and continually adding to that collection on a regular basis are a critical part of change-detection research.
CT: How did the National Geospatial Digital Archive (NGDA) come about, and what role does it play in preserving geospatial data?
Sweetkind-Singer: The NGDA is a collaborative research effort between Stanford and the University of California, Santa Barbara, with funding from the Library of Congress, to examine the issues surrounding the long-term preservation of geospatial data. The program is called the National Digital Information Infrastructure and Preservation Program (NDIIPP).
One of the goals of the NGDA is to set up the structure for a preservation network and eventually add more partners, including both libraries and state archives, covering a variety of regions around the US. Maintaining geospatial data in various locations is one important aspect for its long-term preservation in case of man-made or natural disaster. It's also important to remember that many organizations produce geospatial data but aren't involved in its collection or preservation. However, the mandate for libraries and government archives is to preserve valuable documents for the future.
CT: What procedures does the NGDA recommend for the long-term storage of geospatial data?
Sweetkind-Singer: You have to assume that both the software and hardware components that originally created the data will change in the future. Given that, it's important to have metadata for all archived geospatial data, including details about the software used to create it and related white papers. We developed a registry to track information about formats because they will certainly change over time. This information was the basis of the Library of Congress' geospatial content section on its Sustainability of Digital Formats website. Regarding the preservation of remotely sensed imagery, you need to know which sensors were used, when they were updated, and what software was used to interpret the data format.
Legal documents are another important part of the long-term data-storage process. We drafted agreements with participating NGDA members about collection-development policies, specifying what each institution is going to collect and curate. Another contract brokers the relationship between copyrighted or licensed data and the university that wants to archive it.
Data providers want their data preserved, but, as a university, we need assurances that our faculty and students can use that data for research and educational purposes. So we have contracts that specify the acceptable use of the archived data. I think long-term data preservation is a matter of developing a plan that includes technical solutions from the IT department, as well as recommendations from librarians, archivists, and lawyers to make sure geospatial data is properly and legally preserved for the future.
CT: Are standard procedures for the preservation of geospatial data widely implemented in libraries and government archives today?
Sweetkind-Singer: The long-term preservation of data is something that is just emerging as an issue for libraries. While many libraries and state archives are aware of the problem, they don't really know how to tackle it yet. At first, it may seem like an overwhelming task, but breaking the procedure down into its component parts makes the process achievable. One important effort that has emerged over the past few years, also funded by NDIIPP, is the Geospatial Data Preservation Resource Center. This site has been designed specifically to bring together "freely available web-based resources about the preservation of geospatial information." It also gives practitioners a place to start, discover best practices, and get their questions answered.
As we go forward, we will figure out sustainable methods to manage, archive, preserve, and create access to digital information. Relatively speaking, though, we're in the early days. It's a process that we'll develop and refine as we continue to work with this type of content. Long-term data archiving is a very interesting and challenging area for libraries, because we are building the digital collections of the future. Libraries have an important role to play in making sure that we provide proper stewardship and preservation of geospatial data.
An Intro to Geospatial Data
According to the EPA, geospatial data is defined as "information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the earth." To analyze, interpret, and display geospatial data, researchers and planners use a geographic information system (GIS). Typically, a GIS is used for handling maps of one kind or another. These might be represented as several different layers, where each layer holds data about a particular kind of feature. Each feature is linked to a position on the graphical image on a map and a record in an attribute table. GIS can relate otherwise disparate data on the basis of common geography, revealing hidden patterns, relationships, and trends that are not readily apparent in spreadsheets or statistical packages.
With GIS, researchers and planners can explore the spatial element of data to display soil types, track crime patterns, analyze animal-migration patterns, find the best location for an expanding business, model the path of atmospheric pollution, and make decisions for many types of complicated problems.