Aller au contenu principal

Automatic Speech Recognition Models

By LINAGORA - French, English, Arabic

We propose models for a few language, but we do it right, achieving beyond state of the art performance and accuracy for French, Arabic and English

astuce

Those models are the most generic ones, achieving best all-over performance, we however maintain specific accoustic models for business use-cases like heavily noisy environment, aeroplanes, phones, call-centers and decoding graphs for specific vocabulary, like medical or banking... contact us to learn more.

Acoustic model

  • A deep Time Delay Neural Network (TDNN) model, trained on a large spontanious speech corpora. Data augmentation was applied to increase the quantity of training data and to simulate artificially some environment conditions (noise, speaker). The full corpus after data augmentation is approximately 7100 hours.

2.0.0 AM download

  • A deep neural network architecture (~30M parameters). This model is trained on the same data (7100 hours).

2.2.0 AM download

Decoding graph

  • This model is trained on multiple text corpus from different resources. It requires important memory resource on the one hand and provides very accurate transcription.

2.1.0 LM download

  • This model is trained on various large corpus. Should provide best accuracy but is a bit more resource intensive than the other models.

2.2.0 LM download

Community built models & Other languages

ModelSizeWord error rate/SpeedNotesLicense
English    
vosk-model-small-en-us-0.1540M9.85 (librispeech test-clean) 10.38 (tedlium)Lightweight wideband model for Android and RPiApache 2.0
vosk-model-en-us-0.221.8G5.69 (librispeech test-clean) 6.05 (tedlium) 29.78(callcenter)Accurate generic US English modelApache 2.0
vosk-model-en-us-0.22-lgraph128M7.82 (librispeech) 8.20 (tedlium)Big US English model with dynamic graphApache 2.0
English Other Older Models  
vosk-model-en-us-daanzu-202009051.0G7.08 (librispeech test-clean) 8.25 (tedlium)Wideband model for dictation from Kaldi-active-grammar projectAGPL
vosk-model-en-us-daanzu-20200905-lgraph129M8.20 (librispeech test-clean) 9.28 (tedlium)Wideband model for dictation from Kaldi-active-grammar project with configurable graphAGPL
vosk-model-en-us-librispeech-0.2845MTBDRepackaged Librispeech model from Kaldi, not very accurateApache 2.0
vosk-model-small-en-us-zamia-0.549M11.55 (librispeech test-clean) 12.64 (tedlium)Repackaged Zamia model f_250, mainly for researchLGPL-3.0
vosk-model-en-us-aspire-0.21.4G13.64 (librispeech test-clean) 12.89 (tedlium) 33.82(callcenter)Kaldi original ASPIRE model, not very accurateApache 2.0
vosk-model-en-us-0.211.6G5.43 (librispeech test-clean) 6.42 (tedlium) 40.63(callcenter)Wideband model previous generationApache 2.0
Indian English    
vosk-model-en-in-0.51G36.12 (NPTEL Pure)Generic Indian English model for telecom and broadcastApache 2.0
vosk-model-small-en-in-0.436M49.05 (NPTEL Pure)Lightweight Indian English model for mobile applicationsApache 2.0
Chinese    
vosk-model-small-cn-0.2242M23.54 (SpeechIO-02) 38.29 (SpeechIO-06) 17.15 (THCHS)Lightweight model for Android and RPiApache 2.0
vosk-model-cn-0.221.3G13.98 (SpeechIO-02) 27.30 (SpeechIO-06) 7.43 (THCHS)Big generic Chinese model for server processingApache 2.0
Chinese Other    
vosk-model-cn-kaldi-multicn-0.151.5G17.44 (SpeechIO-02) 9.56 (THCHS)Original Wideband Kaldi multi-cn model from Kaldi with Vosk LMApache 2.0
Russian    
vosk-model-ru-0.221.5G5.74 (our audiobooks) 13.35 (open_stt audiobooks) 20.73 (open_stt youtube) 37.38 (openstt calls) 8.65 (golos crowd) 19.71 (sova devices)Big mixed band Russian model for server processingApache 2.0
vosk-model-small-ru-0.2245M22.71 (openstt audiobooks) 31.97 (openstt youtube) 29.89 (sova devices) 11.79 (golos crowd)Lightweight wideband model for Android/iOS and RPiApache 2.0
Russian Other    
vosk-model-ru-0.102.5G5.71 (our audiobooks) 16.26 (open_stt audiobooks) 26.20 (public_youtube_700_val open_stt) 40.15 (asr_calls_2_val open_stt)Big narrowband Russian model for server processingApache 2.0
French    
vosk-model-small-fr-0.2241M23.95 (cv test) 19.30 (mtedx) 27.25 (podcast)Lightweight wideband model for Android/iOS and RPiApache 2.0
vosk-model-fr-0.221.4G14.72 (cv test) 11.64 (mls) 13.10 (mtedx) 21.61 (podcast) 13.22 (voxpopuli)Big accurate model for serversApache 2.0
French Other    
vosk-model-small-fr-pguyot-0.339M37.04 (cv test) 28.72 (mtedx) 37.46 (podcast)Lightweight wideband model for Android and RPi trained by Paul GuyotCC-BY-NC-SA 4.0
vosk-model-fr-0.6-linto-2.2.01.5G16.19 (cv test) 16.44 (mtedx) 23.77 (podcast) 0.4xRTModel from LINTO projectAGPL
German    
vosk-model-de-0.211.9G9.83 (Tuda-de test), 24.00 (podcast) 12.82 (cv-test) 12.42 (mls) 33.26 (mtedx)Big German model for telephony and serverApache 2.0
vosk-model-de-tuda-0.6-900k4.4G9.48 (Tuda-de test), 25.82 (podcast) 4.97 (cv-test) 11.01 (mls) 35.20 (mtedx)Latest big wideband model from Tuda-DE projectApache 2.0
vosk-model-small-de-zamia-0.349M14.81 (Tuda-de test, 37.46 (podcast)Zamia f_250 small model repackaged (not recommended)LGPL-3.0
vosk-model-small-de-0.1545M13.75 (Tuda-de test), 30.67 (podcast)Lightweight wideband model for Android and RPiApache 2.0
Spanish    
vosk-model-small-es-0.4239M16.02 (cv test) 16.72 (mtedx test) 11.21 (mls)Lightweight wideband model for Android and RPiApache 2.0
vosk-model-es-0.421.4G7.50 (cv test) 10.05 (mtedx test) 5.84 (mls)Big model for SpanishApache 2.0
Portuguese/Brazilian Portuguese    
vosk-model-small-pt-0.331M68.92 (coraa dev) 32.60 (cv test)Lightweight wideband model for Android and RPiApache 2.0
vosk-model-pt-fb-v0.1.1-20220516_21131.6G54.34 (coraa dev) 27.70 (cv test)Big model from FalaBrazilGPLv3.0
Greek    
vosk-model-el-gr-0.71.1GTBDBig narrowband Greek model for server processing, not extremely accurate thoughApache 2.0
Turkish    
vosk-model-small-tr-0.335MTBDLightweight wideband model for Android and RPiApache 2.0
Vietnamese    
vosk-model-small-vn-0.332MTBDLightweight wideband model for Android and RPiApache 2.0
Italian    
vosk-model-small-it-0.2248M16.88 (cv test) 25.87 (mls) 17.01 (mtedx)Lightweight model for Android and RPiApache 2.0
vosk-model-it-0.221.2G8.10 (cv test) 15.68 (mls) 11.23 (mtedx)Big generic Italian model for serversApache 2.0
Dutch    
vosk-model-small-nl-0.2239M22.45 (cv test) 26.80 (tv) 25.84 (mls) 24.09 (voxpopuli)Lightweight model for DutchApache 2.0
Dutch Other    
vosk-model-nl-spraakherkenning-0.6860M20.40 (cv test) 32.64 (tv) 17.73 (mls) 19.96 (voxpopuli)Medium Dutch model from Kaldi_NLCC-BY-NC-SA
vosk-model-nl-spraakherkenning-0.6-lgraph100M22.82 (cv test) 34.01 (tv) 18.81 (mls) 21.01 (voxpopuli)Smaller model with dynamic graphCC-BY-NC-SA
Catalan    
vosk-model-small-ca-0.442MTBDLightweight wideband model for Android and RPi for CatalanApache 2.0
Arabic    
vosk-model-ar-mgb2-0.4318M16.40 (MGB-2 dev set)Repackaged Arabic model trained on MGB2 dataset from KaldiApache 2.0
Farsi    
vosk-model-small-fa-0.447MTBDLightweight wideband model for Android and RPi for Farsi (Persian)Apache 2.0
vosk-model-fa-0.51GTBDModel with large vocabulary, not yet accurate but better than before (Persian)Apache 2.0
vosk-model-small-fa-0.560MTBDBigger small model for desktop application (Persian)Apache 2.0
Filipino    
vosk-model-tl-ph-generic-0.6320MTBDMedium wideband model for Filipino (Tagalog) by feddybearCC-BY-NC-SA 4.0
Ukrainian    
vosk-model-small-uk-v3-nano73MTBDNano model from Speech Recognition for UkrainianApache 2.0
vosk-model-small-uk-v3-small133MTBDSmall model from Speech Recognition for UkrainianApache 2.0
vosk-model-uk-v3343MTBDBigger model from Speech Recognition for UkrainianApache 2.0
vosk-model-uk-v3-lgraph325MTBDBig dynamic model from Speech Recognition for UkrainianApache 2.0
Kazakh    
vosk-model-small-kz-0.1542M9.60(dev) 8.32(test)Small mobile model from SAIDA_KazakhApache 2.0
vosk-model-kz-0.15378M8.06(dev) 6.81(test)Bigger wideband model SAIDA_KazakhApache 2.0
Swedish    
vosk-model-small-sv-rhasspy-0.15289MTBDRepackaged model from Rhasspy projectMIT
Japanese    
vosk-model-small-ja-0.2248M9.52(csj CER) 17.07(ted10k CER)Lightweight wideband model for JapaneseApache 2.0
vosk-model-ja-0.221Gb8.40(csj CER) 13.91(ted10k CER)Big model for JapaneseApache 2.0
Esperanto    
vosk-model-small-eo-0.4242M7.24 (CV Test)Lightweight model for EsperantoApache 2.0
Hindi    
vosk-model-small-hi-0.2242M20.89 (IITM Challenge) 24.72 (MUCS Challenge)Lightweight model for HindiApache 2.0
vosk-model-hi-0.221.5Gb14.85 (CV Test) 14.83 (IITM Challenge) 13.11 (MUCS Challenge)Big accurate model for serversApache 2.0
Czech    
vosk-model-small-cs-0.4-rhasspy44M21.29 (CV Test)Lightweight model for Czech from Rhasspy projectMIT
Polish    
vosk-model-small-pl-0.2250.5M18.36 (CV Test) 16.88 (MLS Test) 11.55 (Voxpopuli Test)Lightweight model for Polish for AndroidApache 2.0
Speaker identification model    
vosk-model-spk-0.413MTBDModel for speaker identification, should work for all languagesApache 2.0