A Collaboratory for Classics

Take one part crowdsourcing and one part techno-mastery, mix in classical studies and a world-class research library, and you'll come close to understanding the ingredients that make up the Duke Collaboratory for Classics Computing or DC3, an initiative recently begun at Duke University. The DC3 is a digital classics R&D group with a mission to create new standards, services, and technology for managing and sharing scholarly information and ancient texts, starting with those written on papyrus, "antiquity's paper," and inscribed on stone.

On a grander scale, the initiative will foster global collaboration among scholars, for certain, but also ordinary folks around the world who have an interest in the ancient objects contained in their local neighborhoods, museums, and schools.

As Director Joshua Sosin explains, people from all over the world visit archaeological sites and museums taking high-quality photographs of Greek and Latin inscriptions that "get pushed to Flickr or other sites. And while they may feed the fond memories of a vacation or semester abroad, the contents can easily become lost to scholarly use." DC3 hopes to create, among other things, resources to help capture, process, and archive that material and make it available for study.

"It's not about getting ordinary people to do scholars' work. Rather, we want scholars to be able to harness this incredible stream of information and the people who are gathering it to get the chance not just to enjoy it, but also to contribute in a productive way to the progress of knowledge."

The DC3 is an initiative of Duke University Libraries funded jointly by the Andrew W. Mellon Foundation and Duke University. What's also unique is that Sosin and his small team have been "embedded" into the Duke Libraries. That's a first at the university, says Vice Provost for Library Affairs Deborah Jakubs. Typically, the embedding works the other way around -- librarians spend time within academic departments. In this case Sosin, a tenured faculty member in Classical Studies, has also been appointed to the Libraries.

It's in the Libraries where the digitizing of classical material has been going on for decades. In 1982, the Department of Classical Studies and the Duke University Libraries launched the "Duke Databank of Documentary Papyri," which now offers digital transcriptions of some 60,000 Greek and Latin texts written on wood tablets, papyrus, and pottery shards, most discovered in Egypt. Some of those came from Duke's own collection, which consists of 1,500 ancient papyri, but the bulk of these documents, dated mainly from the 3rd century before Christ to the 8th century after, come from many collections around the world. Duke's papyri are housed in the David M. Rubenstein Rare Book & Manuscript Library. Other North American collections -- such as those at the University of Michigan and the University of California Berkeley -- are larger; but Duke was the first institution to publish its entire collection online for anyone to view without restrictions.

In 1996 Duke was among the founding partners in an initiative that brought together several North American universities interested in combining their digital collections virtually under one site. The Advanced Papyrological Information System now contains nearly 35,000 records and some 20,000 images. It didn't stop there. APIS, as it's known, has now been incorporated into an international virtual collection, papyri.info, which collaborates with partners from around the world.

Papyri.info features a search engine that can retrieve contents from multiple resources and an editing tool that allows people to contribute new or edit existing content. The result is that the curation of core papyrological data now rests much more heavily in the hands of the field, of the practitioners, and is no longer the exclusive responsibility of small numbers of project leads.

The job still isn't done, Sosin explains. "For one thing there are hundreds of thousands of papyri in the world that have barely been cataloged, much less digitized or edited."

He hopes that work won't be done exclusively by established academicians. "Some of our most active users include graduate students who now have a way to contribute to scholarship in a quick and useful and peer-reviewed fashion." But it doesn't stop there. He mentions a retired Dutch high school teacher who became interested in the project "and is doing a ton of work," and a German high school teacher who comes home at the end of the day and "unwinds" by entering Coptic Egyptian documents into the database.

While papyri have been the focus for the digitization efforts, attention is turning now to epigraphy, the study of documents carved on stone. And Sosin predicts that work will be harder.

For one thing, there's scale. "These cover the entire footprint of antiquity from Spain to Afghanistan and are more numerous than the papyri by an order of magnitude," he says. "We have in excess of a million Greek and Latin inscriptions to our 60,000 published Greek and Latin papyri. What we learn by building these new tools is going to help us enhance what we built for the papyri and vice versa. We're dealing with very large amounts of data, covering a very wide geographic range [and] a deep well of linguistic variation."

For another, the technical infrastructure of existing projects is quite varied. For the papyri, the major databases fit together closely and the whole offers "quite full coverage" of the documentary material, explains Sosin. But in the domain of epigraphy, where one research group may be handling inscriptions of Spain, another Italy, and another tackling Asia Minor, each of those projects will have "different leadership and different technological underpinnings."

This brings us up to date with DC3. Its next big goal, says Sosin, is to create a set of services, a kind of "clearinghouse that will be sufficiently 'intelligent' to bring together all of the various pieces that concern any given inscription wherever it resides on the web, and to allow scholars to curate relationships across them in a peer reviewed and permanently archived way." The DC3 calls this project "IDEs, Integrating Digital Epigraphies."

Say a student is working on an inscription of which there is a digital image in one place, a digital edition of the text in another, and hundreds of citations in articles referenced in a shared digital library like JSTOR.

While all of these objects may "live on the web," at the moment there is no trusted and automatic way to aggregate and annotate them via a single interface. "So what we're trying to do now is create a set of services, workflows, tools, and collaborations to support aggregating and archiving and curating epigraphic data scattered across the web," explains Sosin.

In part to achieve that goal, the Mellon grant has enabled the DC3 project to hire Ryan Baumann and Hugh Cayless, "two absolute geniuses," who have both worked on developing systems used in digital humanities. Right now they're developing algorithms, says Sosin, "that will help match images of inscriptions to edited texts and align hundreds of thousands of epigraphic citations found in print and digital scholarship." The DC3 hopes to have a "small prototype" early next year.

A user will be able to come to the site, search for a particular inscription or related information, and get back both images and text along with related metadata no matter where it's maintained online. But perhaps more importantly, IDEs will provide an "invisible set of services" running in the background that automatically identifies and maintains meaningful connections across the data of collaborating projects.

While the technology in itself will be of value to the specific people using it, Sosin notes, it's also a great use case for linking up other classics endeavors. "We hope that a host of partner projects will see that if they share in the creation of these services, they will both get back extra goodness and be confident that DC3 IDEs is making sure that the connections and interdependencies are being maintained safely in the background for the long term."

The difference between this idea and just plugging a search term into Google, however, is that what's passed through the DC3 services will have the benefit of having gone through a "process of scholarly review." The scholars who contribute the data or build upon them, however, instead of writing an article for peer review publication will be producing different resources.

That twist on academic publishing, says Jakubs, "underscores the importance of collaboration and the changes in scholarship that we're seeing. It's not just the solitary scholar who writes the book. It's the pieces that come together through collaboration. The library is the ideal home for a project like this, and especially as we move toward a vision of 'collection development' that is not just about buying books to put on shelves, but increasingly about gaining meaningful intellectual control over the world's vast output of digital information."

Ultimately, the DC3 initiative could change the way classics are studied in the future -- though it won't make everything easier, Sosin insists.

Take the example of a student interested in "changes to naming practices, linguistic phenomena, or politeness strategies over time," he suggests. "In the past, you could either read millions of words of Greek and remember everything, and what you didn't remember, you put on a 3x5 card." That approach could take a lifetime to do with thoroughness. "Now things like this can be done much more quickly, more accurately, with a greater number of variables. But the flip side of this is that there are now more research skills and modalities to master. So the cumulative workload does not decrease!"

Akin to the idea that computing has increased worker productivity, thereby opening up time for more work, so will DC3 add room on the plate for more research. Sosin knows how he'd like to fill up the plate: with more practitioners and students studying the classics, drawn by not having to do the tedious match-ups across collections that have occupied so much time in the past.

The ultimate result will be more insight into the civilizations that came before ours. Sosin cites a papyrus he was recently reading, a document from Oxyrhynchus, a city in upper Egypt. It was written, he says, by a woman to an unknown addressee and describes "at great length and in horrifying detail" the abuses she and members of the household suffered at the hands of her husband.

"We don't know whether this is a literary exercise, a legal document, a letter. It describes in the richest detail we have what life inside an abusive household looked like in ancient Egypt, complete with torturing slaves and locking people in the basement. It's gruesome. By no means was all of antiquity like that! But there are thousands and thousands of documents from the ancient world, and each one is a gem; they all open a window on real life."

Featured

  • interconnected cloud icons with glowing lines on a gradient blue backdrop

    Report: Cloud Certifications Bring Biggest Salary Payoff

    It pays to be conversant in cloud, according to a new study from Skillsoft The company's annual IT skills and salary survey report found that the top three certifications resulting in the highest payoffs salarywise are for skills in the cloud, specifically related to Amazon Web Services (AWS), Google Cloud, and Nutanix.

  • AI-inspired background pattern with geometric shapes and fine lines in muted blue and gray on a dark background

    IBM Releases Granite 3.0 Family of Advanced AI Models

    IBM has introduced its most advanced family of AI models to date, Granite 3.0, at its annual TechXchange event. The new models were developed to provide a combination of performance, flexibility, and autonomy that outperforms or matches similarly sized models from leading providers on a range of benchmarks.

  • landscape photo with an AI rubber stamp on top

    California AI Watermarking Bill Garners OpenAI Support

    ChatGPT creator OpenAI is backing a California bill that would require tech companies to label AI-generated content in the form of a digital "watermark." The proposed legislation, known as the "California Digital Content Provenance Standards" (AB 3211), aims to ensure transparency in digital media by identifying content created through artificial intelligence. This requirement would apply to a broad range of AI-generated material, from harmless memes to deepfakes that could be used to spread misinformation about political candidates.

  • happy woman sitting in front of computer

    Delightful Progress: Kuali's Legacy of Community and Leadership

    CEO Joel Dehlin updates us on Kuali today, and how it has thrived as a software company that succeeds in the tech marketplace while maintaining the community values envisioned in higher education years ago.