A new vision of artificial intelligence for the people – MIT Technology Review

But few people had enough mastery of the language to manually transcribe the audio. Inspired by voice assistants like Siri, Mahelona began looking into natural-language processing. Teaching the computer to speak Mori became absolutely necessary, Jones says.

But Te Hiku faced a chicken-and-egg problem. To build a te reo speech recognition model, it needed an abundance of transcribed audio. To transcribe the audio, it needed the advanced speakers whose small numbers it was trying to compensate for in the first place. There were, however, plenty of beginning and intermediate speakers who could read te reo words aloud better than they could recognize them in a recording.

So Jones and Mahelona, along with Te Hiku COO Suzanne Duncan, devised a clever solution: rather than transcribe existing audio, they would ask people to record themselves reading a series of sentences designed to capture the full range of sounds in the language. To an algorithm, the resulting data set would serve the same function. From those thousands of pairs of spoken and written sentences, it would learn to recognize te reo syllables in audio.

The team announced a competition. Jones, Mahelona, and Duncan contacted every Mori community group they could find, including traditional kapa haka dance troupes and waka ama canoe-racing teams, and revealed that whichever one submitted the most recordings would win a $5,000 grand prize.

The entire community mobilized. Competition got heated. One Mori community member, Te Mihinga Komene, an educator and advocate of using digital technologies to revitalize te reo, recorded 4,000 phrases alone.

Money wasnt the only motivator. People bought into Te Hikus vision and trusted it to safeguard their data. Te Hiku Media said, What you give us, were here as kaitiaki [guardians]. We look after it, but you still own your audio, says Te Mihinga. Thats important. Those values define who we are as Mori.

Within 10 days, Te Hiku amassed 310 hours of speech-text pairs from some 200,000 recordings made by roughly 2,500 people, an unheard-of level of engagement among researchers in the AI community. No one couldve done it except for a Mori organization, says Caleb Moses, a Mori data scientist who joined the project after learning about it on social media.

The amount of data was still small compared with the thousands of hours typically used to train English language models, but it was enough to get started. Using the data to bootstrap an existing open-source model from the Mozilla Foundation, Te Hiku created its very first te reo speech recognition model with 86% accuracy.

Read this article:
A new vision of artificial intelligence for the people - MIT Technology Review

Related Posts
This entry was posted in $1$s. Bookmark the permalink.