One man’s beard is another man’s Velcro. There are obviously two confounding problems with that statement. The first is in trying to understand Lola Kutty’s English, especially with her thick Malayali accent. The second is in trying to understand the profound meaning of this “Mallu-Zen” koan. Thankfully, there may be a solution to the first problem with voxforge.org, especially if you’re always vexed by the Indian accents of heavily-scented vixens.
Dragon Zimbly Speaking
Speech recognition is the holy-sandalwood grail of computing. How our ears and mind pick up accents, and understand their meaning even in sentences with mixed languages, such as in “Hinglish” (Hindi-and-English), is a marvel that defeats all the computing power and algorithms of the world so far. Yet, speech recognition is an important coconut to crack open, if computing has to reach the next billion people on Mother Earth. One of the most popular speech-recognition software used on desktops is called Dragon Naturally Speaking, but it suffers from being proprietary and commercial. As Lola would concur: Knowledge is not general, but we have to make it muft and mukt. The problem alas, is not so “zimble”.
On the one hand, we do have a few Free and Open Source Software engines for speech-recognition, such as Sphinx, ISIP and Julius. Using these engines, people can spin various software for ordinary users, such as Interactive Voice Response Systems or IVRS, a multi-billion dollar industry for mobile phones and telephony.Or computer interfaces for the visually-challenged and the non-literate, or even for language-translation tools.
Imagine a Web search engine that works with voice-commands, and understands Indian accents. In fact, this may be a good time to check out VEDICS, a muft and mukt software based on Sphinx, and which allows you to use your computer solely by issuing voice commands. VEDICS is an acronym for Voice Enabled Desktop Interaction and Control System, and is authored by Indian developers. Yet, despite these successes, on the other hand we have no free speech to work with, which is quite an irony.
The Good, The Bad, and the Idlis
The accuracy of the engines mentioned above depends on how much you feed them. They need a large “speech corpus”, which is a database of read as well as spontaneously-spoken words and phrases. The larger the corpus, the more accurate the “acoustic model” that can be derived from them.
In fact, they need several large corpora to better handle human speech with all its nuances and variations. Such databases are available, but almost all of them are under restrictive licenses, and so even muft and mukt software have to pay heavily for commercial licences to use them.
Even worse, our mukt software still doesn’t get full access to specific speech components of such corpora. This is a call for you to feed these engines. We are a vast sub-continent; a billion of us speak in hundreds of English accents. So call all your “Usual Sisters” and your friends and family, especially all the non-techie ones, and head out to voxforge.org to raise your voice, be heard, and hopefully, be understood.
Run, Lola, Run!
The Voxforge page allows you to use your browser and your computer’s microphone to submit recordings of your voice. But use your mobile phones and MP3 players to record snippets from ordinary people on the street, and submit these files as well, at the above URL. You can also tuck up your mundu, author a chapter in an audio-book at librivox.org, and then submit the file at Voxforge.
Needless to say, all audio files and associated software will be covered under the GNU Public License (GPL). In fact, unlocking Indian speech-recognition is a huge and formidable task. So here’s an important contribution you can make in bringing freedom back to speech — and along the way, have fun, and spread the word.
Just remember Lola’s word of advice: When you are tying your buffalo in your backyard, make sure he is facing a scenic direction. Now that’s a “Mallu-Zen” koan even Osho in all his wisdom couldn’t have fathomed.
Verbatim copying, publishing and distribution of this article is encouraged in any language and medium, so long as this copyright notice is preserved. In Hindi, muft means “free-of-cost”, and mukt means “with freedom”.
Feature image courtesy: Ian Burt. Reused under the terms of CC-BY 2.0 License.