Cross-post from http://idibon.com/aaas-understanding-speakers-of-7000-languages/
February 21 was International Mother Language Day. With the top researchers in language technology last week, we marked this day while running a symposium at the Annual Meeting of the American Association for the Advancement of Science, in DC. AAAS is one of the biggest cross-discipline conferences, hosted by the publishers of Science.
Language technology has the potential to be diverse: you can reach about 6,500 of the world’s 7,000 languages on the other end of your phone right now. But it is unbalanced. English only makes up about 5% of daily conversations, but for 95% of resources, like dictionaries, grammars, recordings, speech recognition technologies, search engines, even spam filtering, the resources are built solely in English. Coupled with the fact that linguists predict that half or more languages will disappear within a century, we have a thin sliver of time to understand the full diversity of human communication.
The speakers each spoke about approaches to building language resources with different technologies:
- Emily Bender, University of Washington
- Ellie Pavlick on behalf of Chris Callison-Burch from the University of Pennsylvania
- Steven Bird, University of Melbourne and UC Berkeley
I have followed their work for many years, and it was a pleasure to bring together researchers who are solving important pieces of how we can use technology to understand and support the full breadth of linguistic diversity.
Emily Bender spoke about how technology can be used to understand the linguistic structures of languages. With smart and efficient data collection, it’s possible to identify the full set of syntactic configurations that are possible in a language (eg: “Subject Verb Object” vs “Subject Object Verb”), the full set of affixes (eg: “talk”, “talk-ing”, “talks”, “talked”), and related grammatical features.
Ellie Pavlick spoke about how crowdsourcing technologies are allowing access to languages world-wide that previously lacked available and/or affordable data. Such access now means that we can affordably build Machine Translation for technologies for many less-resourced languages.
Steven Bird spoke about putting technology in the hands of the speakers of the least resourced languages, with smartphones enabling the speakers to make voice-translations into more widely spoken languages, leaving behind a Rosetta Stone for hundreds of languages that we expect to disappear.
It was an auspicious time to be at AAAS for science in general, just days after it was announced that gravitational waves were first detected, predicted by Einstein exactly one century ago.
It is interesting to imagine the language resource problem from the perspective of physics: what if 95% of research into elements only studied hydrogen, with half of the remaining elements disappearing in a few generations? Or from any other science: if half the plants would disappear in the next century, half the mountains, or half the art.
We are in a thin window of time when globalization is taking away so much of the world’s diversity while also providing recording technology and connectivity to make it possible to record that diversity before it is gone. It was wonderful to share this with the broader science community.
P.S.: A special thank you to Tom Wasow of Stanford, who suggested this session: it wouldn’t have happened without him!