Skip to main content

KooKoo speech engine versus Google speech engine - Round 2

TLDR:

The data sets used for the testing in the experiments are the digit examples from the Google speech commands dataset.

Experiment 1:
Converted dataset to 8Khz to fit the telephony format.

Google API accuracy: 74.3%
Kookoo Zena accuracy: 92.4%


We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.

Experiment 2:
Original speech command dataset
Google API accuracy: 85.6%

The Long version:
At Ozonetel, we are constantly innovating and we are back again with a new speech model after the Yes/No model.

We all saw Google demo Duplex last week and it was a pretty cool demo. Just to put things in perspective, though the demo was cool, we think we are still some a ways away from actually having a system operate as smoothly as shown in the demo.
We come to the conclusion mainly based on our experiments with the Google speech API. Unless Google is using some model other than the one exposed publicly in the Google speech API, there is still a long way to go. Especially because the Google speech API is pretty bad with telephony data(8Khz).


Today, we are launching our second model, the single digit model, built in house, based on a proprietary AI algorithm. You can now design your KooKoo IVRs to say "Please say your order number", instead of "Please enter your order number".

In the internal tests our model's accuracy has been really good and we believe , soon, that this can become the standard interaction mechanism instead of DTMF.

We also did a side by side comparison of our model with Google's speech API. Before presenting the results below, some disclaimers:

1. Our model is a specific model for Digit. Google's model is more of a generic ASR. And in most cases specific models work better than generic models.
2. Google's ASR is in the cloud. Though our model is also in the cloud, since its co hosted with KooKoo, KooKoo responses will generally be faster.


The data sets used for the testing in the experiments are the digit examples from the Google speech commands dataset.

Experiment 1:
Converted dataset to 8Khz to fit the telephony format.

Google API accuracy: 74.3%
Kookoo Zena accuracy: 92.4%

We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.

Experiment 2:
Original speech command dataset
Google API accuracy: 85.6%

We also ran the test on multiple other private data sources. In all the cases KooKoo Zena out performed Google speech API.


Watch this space for more upcoming speech models.

Popular posts from this blog

Cloud Telephony-History and state of the art

Well, its been 11 years since Twilio launched their voice API in November 2008. I would say that was a major turning point in the cloud telephony industry. Before that, for people to build telephony applications, you either had to depend on proprietary platforms like Avaya dialog designer or build on arcane technologies like VXML which again was supported at varying degrees by the incumbents. Enter Twilio with their voice API and the industry changed for the better. Since it's been almost 11 years now I thought now might be a good time to do a comprehensive review of the cloud telephony industry as a whole in general and in India in particular. The Beginning Twilio was undoubtedly the startup which ushered in the era of cloud telephony. They started in November 2008. At that time in India, we at Ozonetel had launched a hosted VXML platform. There were no takers. After all who coded in VXML :) So when Twilio launched and we saw them take off, we immediately realized tha...

Google business messages and chat agents-A match made in heaven

Google has launched Google business messages without much fanfare. It's just a small button that pops up when someone searches for your business on Google. But from the conversation industry perspective this is HUGE .   Do you know that the small call button drives millions of calls i n a year for pizza joints and other retailers in the US. Businesses spend more than a trillion dollars supporting billions of customer service calls each year. Now imagine how many chat conversations the "Message" button can drive.  Think of how customers interact with business. 1. Search on Google. 2. Click on web site link. 3. Web site shows chat pop up and tries to force the user to chat.(Annoying. I know :)) 4. User clicks on chat and starts conversing with a bot or an agent. This flow can now be completely changed. The new flow can be: 1. Search on Google. 2. User clicks on Message and starts conversing with a bot or an agent. What if you could design a customer experience that helps...

Telugu ASR speech data collection

Image Source: IIIT-H Developing an indigenous ASR for Indian languages has been a goal for us since a long time. In that regard we have been experimenting a lot, trying out various neural network architectures.  While doing these experiments we found that there was no good dataset for Indian languages. While discussing with IIIT professors we got to know that the government of India was also exploring options to generate a good dataset. We immediately offered our help and our platform for this endeavor. So, as a starting step we have come up with a few campaigns to encourage users to donate speech data. We wanted to make it fun, so our first few campaigns are along the lines of JAMs(Just a Minute speech topics) etc. A topic will be provided and you need to speak for a minute on that topic. We have started this campaign for college students to start with. Of course anyone can participate and contribute their data. The more the merrier :) We will adding a lot more innovative ways ut...