TLDR: The data sets used for the testing in the experiments are the digit examples from the Google speech commands dataset . Experiment 1: Converted dataset to 8Khz to fit the telephony format. Google API accuracy: 74.3% Kookoo Zena accuracy: 92.4% We felt the accuracy for Google dropped down because of the 8Khz format. So we did another experiment for Google speech API. Experiment 2: Original speech command dataset Google API accuracy: 85.6% The Long version: At Ozonetel, we are constantly innovating and we are back again with a new speech model after the Yes/No model. We all saw Google demo Duplex last week and it was a pretty cool demo. Just to put things in perspective, though the demo was cool, we think we are still some a ways away from actually having a system operate as smoothly as shown in the demo. We come to the conclusion mainly based on our experiments with the Google speech API. Unless Google is using some model other than the one exposed publicly ...
Our hits and misses while building a scalable cloud telephony platform KooKoo, Ozonetel.