TLDR:
The data sets used for the testing in the experiments are the digit examples from the Google speech commands dataset.
Experiment 1:
Converted dataset to 8Khz to fit the telephony format.
Google API accuracy: 74.3%
Kookoo Zena accuracy: 92.4%
We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.
Experiment 2:
Original speech command dataset
Google API accuracy: 85.6%
The Long version:
At Ozonetel, we are constantly innovating and we are back again with a new speech model after the Yes/No model.
We all saw Google demo Duplex last week and it was a pretty cool demo. Just to put things in perspective, though the demo was cool, we think we are still some a ways away from actually having a system operate as smoothly as shown in the demo.
We come to the conclusion mainly based on our experiments with the Google speech API. Unless Google is using some model other than the one exposed publicly in the Google speech API, there is still a long way to go. Especially because the Google speech API is pretty bad with telephony data(8Khz).
Today, we are launching our second model, the single digit model, built in house, based on a proprietary AI algorithm. You can now design your KooKoo IVRs to say "Please say your order number", instead of "Please enter your order number".
In the internal tests our model's accuracy has been really good and we believe , soon, that this can become the standard interaction mechanism instead of DTMF.
We also did a side by side comparison of our model with Google's speech API. Before presenting the results below, some disclaimers:
1. Our model is a specific model for Digit. Google's model is more of a generic ASR. And in most cases specific models work better than generic models.
2. Google's ASR is in the cloud. Though our model is also in the cloud, since its co hosted with KooKoo, KooKoo responses will generally be faster.
The data sets used for the testing in the experiments are the digit examples from the Google speech commands dataset.
Experiment 1:
Converted dataset to 8Khz to fit the telephony format.
Google API accuracy: 74.3%
Kookoo Zena accuracy: 92.4%
We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.
Experiment 2:
Original speech command dataset
Google API accuracy: 85.6%
We also ran the test on multiple other private data sources. In all the cases KooKoo Zena out performed Google speech API.
Watch this space for more upcoming speech models.
The data sets used for the testing in the experiments are the digit examples from the Google speech commands dataset.
Experiment 1:
Converted dataset to 8Khz to fit the telephony format.
Google API accuracy: 74.3%
Kookoo Zena accuracy: 92.4%
We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.
Experiment 2:
Original speech command dataset
Google API accuracy: 85.6%
The Long version:
At Ozonetel, we are constantly innovating and we are back again with a new speech model after the Yes/No model.
We all saw Google demo Duplex last week and it was a pretty cool demo. Just to put things in perspective, though the demo was cool, we think we are still some a ways away from actually having a system operate as smoothly as shown in the demo.
We come to the conclusion mainly based on our experiments with the Google speech API. Unless Google is using some model other than the one exposed publicly in the Google speech API, there is still a long way to go. Especially because the Google speech API is pretty bad with telephony data(8Khz).
Today, we are launching our second model, the single digit model, built in house, based on a proprietary AI algorithm. You can now design your KooKoo IVRs to say "Please say your order number", instead of "Please enter your order number".
In the internal tests our model's accuracy has been really good and we believe , soon, that this can become the standard interaction mechanism instead of DTMF.
We also did a side by side comparison of our model with Google's speech API. Before presenting the results below, some disclaimers:
1. Our model is a specific model for Digit. Google's model is more of a generic ASR. And in most cases specific models work better than generic models.
2. Google's ASR is in the cloud. Though our model is also in the cloud, since its co hosted with KooKoo, KooKoo responses will generally be faster.
The data sets used for the testing in the experiments are the digit examples from the Google speech commands dataset.
Experiment 1:
Converted dataset to 8Khz to fit the telephony format.
Google API accuracy: 74.3%
Kookoo Zena accuracy: 92.4%
We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API.
Experiment 2:
Original speech command dataset
Google API accuracy: 85.6%
We also ran the test on multiple other private data sources. In all the cases KooKoo Zena out performed Google speech API.
Watch this space for more upcoming speech models.