Skip to main content

Kookoo speech engine versus Google speech engine-Round 1

A couple of months back, we had launched the Kookoo <recognize> tag. And as mentioned in the blog post, we support the Google speech engine and our very own Zena engine.
The Kookoo speech engine as it currently stands is just a drop in the ocean when compared to the features provided by the Google speech engine. But, as they say, a journey of a thousand miles starts with a single step. This post is about our first step.

As of today, we are launching our first model, the Yes/No model, built in house, based on a proprietary AI algorithm . You can now design your Kookoo IVRs to say "Please say yes or no", instead of "Please press 1 for yes and 2 for no".

In the internal tests our model's accuracy has been so good, we believe that this can become the standard interaction mechanism instead of DTMF.

We also did a side by side comparison of our model with Google's speech API. Before presenting the results below, some disclaimers:

1. Our model is a specific model for Yes/No. Google's model is more of a generic ASR. And in most cases specific models work better than generic models.
2. Google's ASR is in the cloud. Though our model is also in the cloud, since its co hosted with Kookoo, Kookoo responses will generally be faster.
3. Our model has not yet been fine tuned for negative cases. So, it detects only yes or no.

The data sets used for the testing in the experiments are the yes and no examples from the Google speech commands dataset. Our yes/no model was not trained with any of the samples in that dataset.

Experiment 1:
Converted dataset to 8Khz to fit the telephony format.

Google API accuracy: 78.437%
Kookoo Zena accuracy: 96.5%

We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.

Experiment 2:
Original speech command dataset
Google API accuracy: 92%

We also ran the test on multiple other private data sources. In all the cases Kookoo Zena out performed Google speech API.

We have setup a demo number for you to test it out. Please dial

080-4920 2086(India)

This is a simple demo, which will ask you to say yes or no and then play the result as per Kookoo and then ask you to say yes or no again and play the result by Google. The demo was built using the Kookoo <recognize> tag.

And this is just the start, we are going to release models for digits, commands and finally full transcription very soon. Keep watching :)
The configuration used for Google speech api:
{
   "config": {
       "profanity_filter": false,
       "encoding": "LINEAR16",
       "speech_contexts": {"phrases":["yes","no"]},
       "max_alternatives": 1,
       "sample_rate_hertz": 8000/16000,
       "language_code": "en-IN",
       "enable_word_time_offsets": true
   }
}

Popular posts from this blog

Integrating Arborjs with Angular to create a live calls dashboard

Arborjs  is a cool graph visualization library. Angular  is one of the best JavaScript frameworks and we have been using Angular in a lot of our front end development. When you handle millions of calls, proper visualization becomes very important. Without proper visualization, you can get lost in the mountains of data. So we spend a lot of time to come up with good visualizations to represent the data. Since we loved the cool way in which Arbor represented graph data, we could not wait to hook it up with Angular. Because of Angular's two way data binding, when you hook up Angularjs with Arbor.js you can get a dynamically updated visualization of graph data with cool animations. To give back to the community, we have put up the code online at Github . Basically we have created an Angularjs directive for Arborjs. Please feel free to fork the code and add extensions and use it for your own visualizations. The code is self explanatory with comments inline. Best way to ...

First Post

In this blog, I will be talking about my experiences in trying to build a cloud telephony platform , KooKoo . Along the way I will also be talking about different design choices I made, good programming practices and the IVR domain in general. For technoratti: NNFJW8EW86C3

Google business messages and chat agents-A match made in heaven

Google has launched Google business messages without much fanfare. It's just a small button that pops up when someone searches for your business on Google. But from the conversation industry perspective this is HUGE .   Do you know that the small call button drives millions of calls i n a year for pizza joints and other retailers in the US. Businesses spend more than a trillion dollars supporting billions of customer service calls each year. Now imagine how many chat conversations the "Message" button can drive.  Think of how customers interact with business. 1. Search on Google. 2. Click on web site link. 3. Web site shows chat pop up and tries to force the user to chat.(Annoying. I know :)) 4. User clicks on chat and starts conversing with a bot or an agent. This flow can now be completely changed. The new flow can be: 1. Search on Google. 2. User clicks on Message and starts conversing with a bot or an agent. What if you could design a customer experience that helps...