SPRING-INX Data

We are happy to release 2000 hours of manually transcribed speech data in 10 Indian languages for ASR applications.

Download Data
ASR Recipes
Manually Transcribed Multilingual Indian Speech Corpus
Releasing speech data in 10 different Indian Languages to encourage the members from academia and industry to build speech applications for Indian languages. The total data amounts to 2000 hours. You can find more details in the arXiv paper.
Languages
Data_R1
Data_R2
Assamese
Download
Download
Bengali
Download
Download
Gujarati
Download
Download
Hindi
Download
Download
Kannada
Download
-
Malayalam
Download
Download
Marathi
Download
Download
Odia
Download
-
Punjabi
Download
Download
Tamil
Download
Download
Statistics of the Data
Data_R1(hrs)Data_R2(hrs)
LanguagestraindevevalTotal_R1traindevevalTotal_R2Total
Assamese50.565.124.9760.6535.355.094.9945.43106.08
Bengali374.7340.025.02419.77272.9830.065.04308.08727.85
Gujarati175.4919.614.95200.0558.277.195.0970.55270.6
Hindi316.4129.685.09351.18137.1116.235.08158.42509.6
Kannada82.499.664.7996.94096.94
Malayalam214.7324.725.07244.5268.88.165.182.06326.58
Marathi130.3614.375.16149.89208.9624.695.15238.8388.69
Odia82.499.264.7396.48096.48
Punjabi138.9615.095.08159.13216.3624.885.03246.27405.4
Tamil200.6619.975.1225.73128.1115.125.17148.4374.13
Total1766.88187.549.962004.341125.94131.4240.651298.013302.35
Source of DATA
This data was collected on payment basis using the following vendors -- Mediscribe India, Desicrew, and Crescendo. Preliminary checking of quality of transcriptions was done by our partners at KL University as well as by SPRING Lab members. The data consists mostly of mock conversations as well as monolgues on different topics.
Indian Language ASR Challenge Data 490 Hours Links
Indian Language Data Balance (Excluding the Challenge Data) 410 Hours Links
English 110 hrs Hindi 262 hrs Tamil 38 hrs
AudioAudioAudio
TranscriptsTranscriptsTranscripts
NPTEL Data 1200 Hours Links
Computer Science 250 hrs Electrical 250 hrs Humanities 250 hrs Mechanical 250 hrsBioChem 200 hrs
AudioAudioAudioAudioAudio
TranscriptsTranscriptsTranscriptsTranscriptsTranscripts
Funding
This data collection effort was funded by Ministry of Electronics & Information Technology.
License
This is released under CCBY 4.0 license
SPRING Lab
blueLeaf
Revolutionizing the way we communicate through innovation and research in speech technologies.
Products
Speech to Speech
ASR 2.0
Video to Video
TTS
TTT
Prosody
Downloads
Contacts
Room No. CSD-313, ESB Building,
Dept of Electrical Engineering,
Indian Institute of Technology - Madras,
Chennai 600 036