Kookoo speech engine versus Google speech engine-Round 1

A couple of months back, we had launched the Kookoo <recognize> tag. And as mentioned in the blog post, we support the Google speech engine and our very own Zena engine.
The Kookoo speech engine as it currently stands is just a drop in the ocean when compared to the features provided by the Google speech engine. But, as they say, a journey of a thousand miles starts with a single step. This post is about our first step.

As of today, we are launching our first model, the Yes/No model, built in house, based on a proprietary AI algorithm . You can now design your Kookoo IVRs to say "Please say yes or no", instead of "Please press 1 for yes and 2 for no".

In the internal tests our model's accuracy has been so good, we believe that this can become the standard interaction mechanism instead of DTMF.

We also did a side by side comparison of our model with Google's speech API. Before presenting the results below, some disclaimers:

1. Our model is a specific model for Yes/No. Google's model is more of a generic ASR. And in most cases specific models work better than generic models.
2. Google's ASR is in the cloud. Though our model is also in the cloud, since its co hosted with Kookoo, Kookoo responses will generally be faster.
3. Our model has not yet been fine tuned for negative cases. So, it detects only yes or no.

The data sets used for the testing in the experiments are the yes and no examples from the Google speech commands dataset. Our yes/no model was not trained with any of the samples in that dataset.

Experiment 1:
Converted dataset to 8Khz to fit the telephony format.

Google API accuracy: 78.437%
Kookoo Zena accuracy: 96.5%

We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.

Experiment 2:
Original speech command dataset
Google API accuracy: 92%

We also ran the test on multiple other private data sources. In all the cases Kookoo Zena out performed Google speech API.

We have setup a demo number for you to test it out. Please dial

080-4920 2086(India)

This is a simple demo, which will ask you to say yes or no and then play the result as per Kookoo and then ask you to say yes or no again and play the result by Google. The demo was built using the Kookoo <recognize> tag.

And this is just the start, we are going to release models for digits, commands and finally full transcription very soon. Keep watching :)

The configuration used for Google speech api:
{
"config": {
"profanity_filter": false,
"encoding": "LINEAR16",
"speech_contexts": {"phrases":["yes","no"]},
"max_alternatives": 1,
"sample_rate_hertz": 8000/16000,
"language_code": "en-IN",
"enable_word_time_offsets": true
}
}

Cloud Telephony-History and state of the art

Well, its been 11 years since Twilio launched their voice API in November 2008. I would say that was a major turning point in the cloud telephony industry. Before that, for people to build telephony applications, you either had to depend on proprietary platforms like Avaya dialog designer or build on arcane technologies like VXML which again was supported at varying degrees by the incumbents. Enter Twilio with their voice API and the industry changed for the better. Since it's been almost 11 years now I thought now might be a good time to do a comprehensive review of the cloud telephony industry as a whole in general and in India in particular. The Beginning Twilio was undoubtedly the startup which ushered in the era of cloud telephony. They started in November 2008. At that time in India, we at Ozonetel had launched a hosted VXML platform. There were no takers. After all who coded in VXML :) So when Twilio launched and we saw them take off, we immediately realized tha...

Telugu ASR speech data collection

Image Source: IIIT-H Developing an indigenous ASR for Indian languages has been a goal for us since a long time. In that regard we have been experimenting a lot, trying out various neural network architectures. While doing these experiments we found that there was no good dataset for Indian languages. While discussing with IIIT professors we got to know that the government of India was also exploring options to generate a good dataset. We immediately offered our help and our platform for this endeavor. So, as a starting step we have come up with a few campaigns to encourage users to donate speech data. We wanted to make it fun, so our first few campaigns are along the lines of JAMs(Just a Minute speech topics) etc. A topic will be provided and you need to speak for a minute on that topic. We have started this campaign for college students to start with. Of course anyone can participate and contribute their data. The more the merrier :) We will adding a lot more innovative ways ut...

Google's approach to business communication

Google has been making silent moves in the business communication space. Google has mostly lost the instant messaging wars. But it does not want to lose the business communication war. WhatsApp, Instagram, Twitter and Facebook have been making their own moves to enable businesses to reach their customers through their channels. Its all about who has control over the communication channels. Especially communication which leads to business. That's where the money is. Currently, Google is the king of search and most online transactions start with a Google search. FB, Amazon, Apple and others want to change that. They want the search to start on their properties. And they have started making the moves. WhatsApp business allows small businesses to conduct their transactions on WhatsApp. FB and Instagram have long supported small businesses to manage their business on their channels. Apple has also made some nice moves with Apple business chat. They have integrated a whole shopping expe...

Cloud Telephony Experiments

Search This Blog