Carnegie Mellon Releases Haitian Creole Data Set for Developers

In response to the humanitarian crisis in Haiti, scientists at Carnegie Mellon University's Language Technologies Institute in the School of Computer Science have released spoken and textual data they've compiled on Haitian Creole to help developers create translation tools needed by relief workers.

A team at Microsoft Research has used the data to help populate an experimental, Web-based system for translating between English and Haitian Creole on the company's Bing Translator. Researchers at Carnegie Mellon have begun working on their own translation system for Haitian Creole.

Following the earthquakes, which struck the island nation in mid-January, the researchers decided to begin work on an updated translation system for Haitian Creole that would incorporate the latest translation technologies. To aid other groups pursuing parallel efforts worldwide, they also opted to release the data publicly, making it available with minimal restrictions.

Although French is the official language of Haiti and is spoken by "elites," according to Robert Frederking, senior systems scientist at the institute, Haitian Creole is the most widely spoken language in Haiti. The language is based on French but has evolved substantially since Haitians overthrew the French colonists more than 200 years ago. Word meanings have drifted, and the language incorporates some African syntax.

"French speakers can sort of puzzle through it, but Creole isn't penetrable if you don't know French," Frederking said. Few translation resources are available for the language, he added.

The Carnegie Mellon database for Haitian Creole was created in the late 1990s for Diplomat, a project sponsored by the Defense Advanced Research Projects Agency. The project focused on developing portable, speech-to-speech translation devices that could be deployed rapidly for Haitian Creole and other languages of special interest to the Department of Defense. A prototype Haitian Creole translation system was delivered to the United States Army, but "as far as we know, nobody ever field-tested it," Frederking said. The project ended in the late 1990s, but the institute retained the data compiled and produced for the project.

Given the extreme poverty of Haiti, "nobody is going to make money on a Haitian Creole translator," Frederking said. "But translation systems could be an important tool, both for the relief workers now involved in emergency response and in the long-term as rebuilding takes place."

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured