Carnegie Mellon Releases Haitian Creole Data Set for Developers

In response to the humanitarian crisis in Haiti, scientists at Carnegie Mellon University's Language Technologies Institute in the School of Computer Science have released spoken and textual data they've compiled on Haitian Creole to help developers create translation tools needed by relief workers.

A team at Microsoft Research has used the data to help populate an experimental, Web-based system for translating between English and Haitian Creole on the company's Bing Translator. Researchers at Carnegie Mellon have begun working on their own translation system for Haitian Creole.

Following the earthquakes, which struck the island nation in mid-January, the researchers decided to begin work on an updated translation system for Haitian Creole that would incorporate the latest translation technologies. To aid other groups pursuing parallel efforts worldwide, they also opted to release the data publicly, making it available with minimal restrictions.

Although French is the official language of Haiti and is spoken by "elites," according to Robert Frederking, senior systems scientist at the institute, Haitian Creole is the most widely spoken language in Haiti. The language is based on French but has evolved substantially since Haitians overthrew the French colonists more than 200 years ago. Word meanings have drifted, and the language incorporates some African syntax.

"French speakers can sort of puzzle through it, but Creole isn't penetrable if you don't know French," Frederking said. Few translation resources are available for the language, he added.

The Carnegie Mellon database for Haitian Creole was created in the late 1990s for Diplomat, a project sponsored by the Defense Advanced Research Projects Agency. The project focused on developing portable, speech-to-speech translation devices that could be deployed rapidly for Haitian Creole and other languages of special interest to the Department of Defense. A prototype Haitian Creole translation system was delivered to the United States Army, but "as far as we know, nobody ever field-tested it," Frederking said. The project ended in the late 1990s, but the institute retained the data compiled and produced for the project.

Given the extreme poverty of Haiti, "nobody is going to make money on a Haitian Creole translator," Frederking said. "But translation systems could be an important tool, both for the relief workers now involved in emergency response and in the long-term as rebuilding takes place."

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • interconnected cloud icons with glowing lines on a gradient blue backdrop

    Report: Cloud Certifications Bring Biggest Salary Payoff

    It pays to be conversant in cloud, according to a new study from Skillsoft The company's annual IT skills and salary survey report found that the top three certifications resulting in the highest payoffs salarywise are for skills in the cloud, specifically related to Amazon Web Services (AWS), Google Cloud, and Nutanix.

  • Abstract widescreen image with geometric shapes, flowing lines, and digital elements like graphs and data points in soft blue and white gradients.

    5 Trends to Watch in Higher Education for 2025

    In 2025, the trends shaping higher education reflect a continuous transformation of the higher education landscape to meet the changing needs of students and staff, while maintaining sustainable and cost-effective institutional practices.

  • abstract human figures stand on a glowing grid floor in a vibrant digital landscape with floating holographic buildings, luminous data orbs, and a neon blue and purple gradient sky

    Metaverse Org Declares the Technology Is Accelerating in Spite of Rise of AI

    A new report from the Metaverse Standards Forum (MSF) declares the technology initiative is alive and well, despite skyrocketing attention paid to artificial intelligence.

  • interconnected glowing nodes and circuits in blue and green, forming a neural network on a dark background with a futuristic design

    Tech Giants Launch $100 Billion AI Infrastructure Network Project

    OpenAI, SoftBank, and Oracle have unveiled a new venture, Stargate, through which they aim to build a massive AI infrastructure network across the United States. The initiative, which was announced at the White House with President Donald Trump, has been described as the "largest AI infrastructure project in history."