Skip to main content

Telugu ASR speech data collection

Image Source: IIIT-H

Developing an indigenous ASR for Indian languages has been a goal for us since a long time. In that regard we have been experimenting a lot, trying out various neural network architectures. 

While doing these experiments we found that there was no good dataset for Indian languages. While discussing with IIIT professors we got to know that the government of India was also exploring options to generate a good dataset. We immediately offered our help and our platform for this endeavor.

So, as a starting step we have come up with a few campaigns to encourage users to donate speech data. We wanted to make it fun, so our first few campaigns are along the lines of JAMs(Just a Minute speech topics) etc. A topic will be provided and you need to speak for a minute on that topic.

We have started this campaign for college students to start with. Of course anyone can participate and contribute their data. The more the merrier :)

We will adding a lot more innovative ways utilizing a lot more channels to encourage users to contribute their voice data. Keep checking.

This data will be made publicly available for free by the government to everyone once it is transcribed.

The following is a blurb from the IIIT-H website which explains the goal much better:


We are a land of many spoken languages. Our People have huge aspirations. We are a growing nation.
"Should language come in the way of our people's aspirations?"

As one of the national missions of Government of India. We are developing technologies specifically for Indian Languages to enable Speech
to Speech translation.

For that data quality and quantity are critical. We need high-quality language data for developing system that can understand and respond to human
speech in variety of environments and contexts.

We at IIIT-Hyderabad, have embarked on a nation wide project to collect conversational speech from open population across multiple languages
(100,000 hrs/language).

We are seeking participation from as many people across the country.


Popular posts from this blog

First Post

In this blog, I will be talking about my experiences in trying to build a cloud telephony platform , KooKoo . Along the way I will also be talking about different design choices I made, good programming practices and the IVR domain in general. For technoratti: NNFJW8EW86C3

Integrating Arborjs with Angular to create a live calls dashboard

Arborjs  is a cool graph visualization library. Angular  is one of the best JavaScript frameworks and we have been using Angular in a lot of our front end development. When you handle millions of calls, proper visualization becomes very important. Without proper visualization, you can get lost in the mountains of data. So we spend a lot of time to come up with good visualizations to represent the data. Since we loved the cool way in which Arbor represented graph data, we could not wait to hook it up with Angular. Because of Angular's two way data binding, when you hook up Angularjs with Arbor.js you can get a dynamically updated visualization of graph data with cool animations. To give back to the community, we have put up the code online at Github . Basically we have created an Angularjs directive for Arborjs. Please feel free to fork the code and add extensions and use it for your own visualizations. The code is self explanatory with comments inline. Best way to get s

Telecommunication Revolution & Cloud Telephony

Telecommunication Revolution & Cloud Telephony Every one talking that world power has shifted from west to east, next century belongs to India and China, as China is an aging country, everyone is betting very high on India. To find the reality, I have done a comparative study between India and western countries. The result was really depressing, in most of sectors/areas we are way behind the developed countries, except one sector i.e. “Telecommunication”. You can see in the above picture the call rate, which was at Rs 15.5/min has dropped to less than a Rupee/min. The telephone subscriber base has also increased a rate of 40% YoY (approx.). According to Mar’15 release by TRAI, India has 999.71 million telephone subscriber, now we are in Sep’15. We would have crossed 1 billion mark. This is the overall picture of telephone subscriber, but if we look internet and broadband subscriber base it is also very optimistic. Apart from that recently Airtel has launched 4G servic