Skip to main content

Speech Recognition Challenges in Lo Kar Lo Baat

When HUL came to us with an idea to implement a phone app to connect migrant workers for free, we took it as a challenge to make it happen as it was for a good cause.

And we are proud of the final system which came up. When we ran a sample set of voices through different recognizers, we found that our system outperformed even the Google speech recognizer. Our success rate at recognizing phone numbers in Hindi and English(all 10 digits) was almost 2 times that of Google.
(note: We ran the samples through Google's speech cloud to get their recognition rate)
Granted, ours was a specialized system and Google's was generic, but hey, we still feel proud :)

Challenges in Lo Kar Lo Baat :

Generally, accuracy of speech recognition depends on many factors like;

1) channel property, speech sampling rate, speech coding

2) deviation in pattern due to effect of dialects, speaking rate and speaking style

3) effect of background

In Lo Kar Lo Baat we had faced many challenges like;

1) To improve user experience and reduce human errors (i.e. typo)/machinery errors (i.e.network problem, error recording system, inaccurate transmission of DTMF input .. etc.),we have proposed an automatic language recognizer which detects language automatically and routes the application to language specific speech recognizer.
Again it was too challenging because of accented speaking, speaking style .. etc. The language recognizer (Hindi and English digit discriminator) is susceptible to above mentioned problems and working with an accuracy of 96 to 98%.

2) When someone evaluates a speech recognition system with word accuracy it to easy to tell above 90% or so. The main challenge lies in sentence recognition (for a positive recognition, a sentence has to be 100% correct, means all the words should be recognized correctly). Which was not a easy case. Our speech recognition system was tuned to work for sentence recognition. This was necessary as a phone number has 10 digits(or 1 sentence with 10 words). Now all systems give 90% accuracy, but at word level. So we get 9 digits right. Which is good for a speech recognizer, but useless for phone number recognizer.

3) Next challenges in speech recognition are accented pronunciation and effect of dialect. A country like India is having nearly 27 major dialects or languages. People from each dialects speaks English or Hindi, but their accent used to be different. Apart from that in many dark areas, where people not much exposed to technology have used our system and their accents are  quite different from above mentioned major dialects and accents. Our system is tuned to work on these kind of environment to some extent.

4) Our speech recognition model is trained and fine tuned for telephonic environment, so it is susceptible to measure problems like channel noise, clipping, effect of noise on speech

5) Another major issue, we have faced is, speaking rate. When someones speaking rate is very high its very hard to distinguish between speech patterns. Our system is fine tuned to handle these kind of patterns to some extent.

6) We had used a speech filter which handles unexpected acoustic patterns and helps speech recognition to improve its accuracy. This filter is 99% accurate. It suggests to the speech recognizer what to recognize or what not to.

Popular posts from this blog

First Post

In this blog, I will be talking about my experiences in trying to build a cloud telephony platform , KooKoo . Along the way I will also be talking about different design choices I made, good programming practices and the IVR domain in general. For technoratti: NNFJW8EW86C3

Integrating Arborjs with Angular to create a live calls dashboard

Arborjs  is a cool graph visualization library. Angular  is one of the best JavaScript frameworks and we have been using Angular in a lot of our front end development. When you handle millions of calls, proper visualization becomes very important. Without proper visualization, you can get lost in the mountains of data. So we spend a lot of time to come up with good visualizations to represent the data. Since we loved the cool way in which Arbor represented graph data, we could not wait to hook it up with Angular. Because of Angular's two way data binding, when you hook up Angularjs with Arbor.js you can get a dynamically updated visualization of graph data with cool animations. To give back to the community, we have put up the code online at Github . Basically we have created an Angularjs directive for Arborjs. Please feel free to fork the code and add extensions and use it for your own visualizations. The code is self explanatory with comments inline. Best way to get s

Mashing up Freshdesk with KooKoo for customer support ticket management

Sorry for the delay in writing a blog post. I have been caught up in too many things since last month(we are growing, so its all good :)) Anyway, this week I thought I will look at how we can use the latest rockstar startup from India, Freshdesk . Freshdesk, as most of you know is one of the best "Social Helpdesk" systems out there. I am sure most of you are already using Freshdesk to support your customers. In this blogpost I will explain how you can add telephony to the mixture. In particular I will show some code which you can use to : Welcome the caller by name Ask him to enter his ticket id Play out the ticket status Connect the caller to the correct agent handling the ticket. Luckily for us, Freshdesk has a very well defined API and it was a pleasure to work with it. The engineers were also very supportive and answered all my queries quickly. So lets get into the code directly. 1. First we need to get the caller information. For this we need to a