Automatic Speech Recognition Models

By LINAGORA - French, English, Arabic

We propose models for a few language, but we do it right, achieving beyond state of the art performance and accuracy for French, Arabic and English

tip

Those models are the most generic ones, achieving best all-over performance, we however maintain specific accoustic models for business use-cases like heavily noisy environment, aeroplanes, phones, call-centers and decoding graphs for specific vocabulary, like medical or banking... contact us to learn more.

French v2
French v1
English US
Arabic

Acoustic model

A deep Time Delay Neural Network (TDNN) model, trained on a large spontanious speech corpora. Data augmentation was applied to increase the quantity of training data and to simulate artificially some environment conditions (noise, speaker). The full corpus after data augmentation is approximately 7100 hours.

2.0.0 AM download

A deep neural network architecture (~30M parameters). This model is trained on the same data (7100 hours).

2.2.0 AM download

Decoding graph

This model is trained on multiple text corpus from different resources. It requires important memory resource on the one hand and provides very accurate transcription.

2.1.0 LM download

This model is trained on various large corpus. Should provide best accuracy but is a bit more resource intensive than the other models.

2.2.0 LM download

Community built models & Other languages

Model	Size	Word error rate/Speed	Notes	License
English
vosk-model-small-en-us-0.15	40M	9.85 (librispeech test-clean) 10.38 (tedlium)	Lightweight wideband model for Android and RPi	Apache 2.0
vosk-model-en-us-0.22	1.8G	5.69 (librispeech test-clean) 6.05 (tedlium) 29.78(callcenter)	Accurate generic US English model	Apache 2.0
vosk-model-en-us-0.22-lgraph	128M	7.82 (librispeech) 8.20 (tedlium)	Big US English model with dynamic graph	Apache 2.0
English Other		Older Models
vosk-model-en-us-daanzu-20200905	1.0G	7.08 (librispeech test-clean) 8.25 (tedlium)	Wideband model for dictation from Kaldi-active-grammar project	AGPL
vosk-model-en-us-daanzu-20200905-lgraph	129M	8.20 (librispeech test-clean) 9.28 (tedlium)	Wideband model for dictation from Kaldi-active-grammar project with configurable graph	AGPL
vosk-model-en-us-librispeech-0.2	845M	TBD	Repackaged Librispeech model from Kaldi, not very accurate	Apache 2.0
vosk-model-small-en-us-zamia-0.5	49M	11.55 (librispeech test-clean) 12.64 (tedlium)	Repackaged Zamia model f_250, mainly for research	LGPL-3.0
vosk-model-en-us-aspire-0.2	1.4G	13.64 (librispeech test-clean) 12.89 (tedlium) 33.82(callcenter)	Kaldi original ASPIRE model, not very accurate	Apache 2.0
vosk-model-en-us-0.21	1.6G	5.43 (librispeech test-clean) 6.42 (tedlium) 40.63(callcenter)	Wideband model previous generation	Apache 2.0
Indian English
vosk-model-en-in-0.5	1G	36.12 (NPTEL Pure)	Generic Indian English model for telecom and broadcast	Apache 2.0
vosk-model-small-en-in-0.4	36M	49.05 (NPTEL Pure)	Lightweight Indian English model for mobile applications	Apache 2.0
Chinese
vosk-model-small-cn-0.22	42M	23.54 (SpeechIO-02) 38.29 (SpeechIO-06) 17.15 (THCHS)	Lightweight model for Android and RPi	Apache 2.0
vosk-model-cn-0.22	1.3G	13.98 (SpeechIO-02) 27.30 (SpeechIO-06) 7.43 (THCHS)	Big generic Chinese model for server processing	Apache 2.0
Chinese Other
vosk-model-cn-kaldi-multicn-0.15	1.5G	17.44 (SpeechIO-02) 9.56 (THCHS)	Original Wideband Kaldi multi-cn model from Kaldi with Vosk LM	Apache 2.0
Russian
vosk-model-ru-0.22	1.5G	5.74 (our audiobooks) 13.35 (open_stt audiobooks) 20.73 (open_stt youtube) 37.38 (openstt calls) 8.65 (golos crowd) 19.71 (sova devices)	Big mixed band Russian model for server processing	Apache 2.0
vosk-model-small-ru-0.22	45M	22.71 (openstt audiobooks) 31.97 (openstt youtube) 29.89 (sova devices) 11.79 (golos crowd)	Lightweight wideband model for Android/iOS and RPi	Apache 2.0
Russian Other
vosk-model-ru-0.10	2.5G	5.71 (our audiobooks) 16.26 (open_stt audiobooks) 26.20 (public_youtube_700_val open_stt) 40.15 (asr_calls_2_val open_stt)	Big narrowband Russian model for server processing	Apache 2.0
French
vosk-model-small-fr-0.22	41M	23.95 (cv test) 19.30 (mtedx) 27.25 (podcast)	Lightweight wideband model for Android/iOS and RPi	Apache 2.0
vosk-model-fr-0.22	1.4G	14.72 (cv test) 11.64 (mls) 13.10 (mtedx) 21.61 (podcast) 13.22 (voxpopuli)	Big accurate model for servers	Apache 2.0
French Other
vosk-model-small-fr-pguyot-0.3	39M	37.04 (cv test) 28.72 (mtedx) 37.46 (podcast)	Lightweight wideband model for Android and RPi trained by Paul Guyot	CC-BY-NC-SA 4.0
vosk-model-fr-0.6-linto-2.2.0	1.5G	16.19 (cv test) 16.44 (mtedx) 23.77 (podcast) 0.4xRT	Model from LINTO project	AGPL
German
vosk-model-de-0.21	1.9G	9.83 (Tuda-de test), 24.00 (podcast) 12.82 (cv-test) 12.42 (mls) 33.26 (mtedx)	Big German model for telephony and server	Apache 2.0
vosk-model-de-tuda-0.6-900k	4.4G	9.48 (Tuda-de test), 25.82 (podcast) 4.97 (cv-test) 11.01 (mls) 35.20 (mtedx)	Latest big wideband model from Tuda-DE project	Apache 2.0
vosk-model-small-de-zamia-0.3	49M	14.81 (Tuda-de test, 37.46 (podcast)	Zamia f_250 small model repackaged (not recommended)	LGPL-3.0
vosk-model-small-de-0.15	45M	13.75 (Tuda-de test), 30.67 (podcast)	Lightweight wideband model for Android and RPi	Apache 2.0
Spanish
vosk-model-small-es-0.42	39M	16.02 (cv test) 16.72 (mtedx test) 11.21 (mls)	Lightweight wideband model for Android and RPi	Apache 2.0
vosk-model-es-0.42	1.4G	7.50 (cv test) 10.05 (mtedx test) 5.84 (mls)	Big model for Spanish	Apache 2.0
Portuguese/Brazilian Portuguese
vosk-model-small-pt-0.3	31M	68.92 (coraa dev) 32.60 (cv test)	Lightweight wideband model for Android and RPi	Apache 2.0
vosk-model-pt-fb-v0.1.1-20220516_2113	1.6G	54.34 (coraa dev) 27.70 (cv test)	Big model from FalaBrazil	GPLv3.0
Greek
vosk-model-el-gr-0.7	1.1G	TBD	Big narrowband Greek model for server processing, not extremely accurate though	Apache 2.0
Turkish
vosk-model-small-tr-0.3	35M	TBD	Lightweight wideband model for Android and RPi	Apache 2.0
Vietnamese
vosk-model-small-vn-0.3	32M	TBD	Lightweight wideband model for Android and RPi	Apache 2.0
Italian
vosk-model-small-it-0.22	48M	16.88 (cv test) 25.87 (mls) 17.01 (mtedx)	Lightweight model for Android and RPi	Apache 2.0
vosk-model-it-0.22	1.2G	8.10 (cv test) 15.68 (mls) 11.23 (mtedx)	Big generic Italian model for servers	Apache 2.0
Dutch
vosk-model-small-nl-0.22	39M	22.45 (cv test) 26.80 (tv) 25.84 (mls) 24.09 (voxpopuli)	Lightweight model for Dutch	Apache 2.0
Dutch Other
vosk-model-nl-spraakherkenning-0.6	860M	20.40 (cv test) 32.64 (tv) 17.73 (mls) 19.96 (voxpopuli)	Medium Dutch model from Kaldi_NL	CC-BY-NC-SA
vosk-model-nl-spraakherkenning-0.6-lgraph	100M	22.82 (cv test) 34.01 (tv) 18.81 (mls) 21.01 (voxpopuli)	Smaller model with dynamic graph	CC-BY-NC-SA
Catalan
vosk-model-small-ca-0.4	42M	TBD	Lightweight wideband model for Android and RPi for Catalan	Apache 2.0
Arabic
vosk-model-ar-mgb2-0.4	318M	16.40 (MGB-2 dev set)	Repackaged Arabic model trained on MGB2 dataset from Kaldi	Apache 2.0
Farsi
vosk-model-small-fa-0.4	47M	TBD	Lightweight wideband model for Android and RPi for Farsi (Persian)	Apache 2.0
vosk-model-fa-0.5	1G	TBD	Model with large vocabulary, not yet accurate but better than before (Persian)	Apache 2.0
vosk-model-small-fa-0.5	60M	TBD	Bigger small model for desktop application (Persian)	Apache 2.0
Filipino
vosk-model-tl-ph-generic-0.6	320M	TBD	Medium wideband model for Filipino (Tagalog) by feddybear	CC-BY-NC-SA 4.0
Ukrainian
vosk-model-small-uk-v3-nano	73M	TBD	Nano model from Speech Recognition for Ukrainian	Apache 2.0
vosk-model-small-uk-v3-small	133M	TBD	Small model from Speech Recognition for Ukrainian	Apache 2.0
vosk-model-uk-v3	343M	TBD	Bigger model from Speech Recognition for Ukrainian	Apache 2.0
vosk-model-uk-v3-lgraph	325M	TBD	Big dynamic model from Speech Recognition for Ukrainian	Apache 2.0
Kazakh
vosk-model-small-kz-0.15	42M	9.60(dev) 8.32(test)	Small mobile model from SAIDA_Kazakh	Apache 2.0
vosk-model-kz-0.15	378M	8.06(dev) 6.81(test)	Bigger wideband model SAIDA_Kazakh	Apache 2.0
Swedish
vosk-model-small-sv-rhasspy-0.15	289M	TBD	Repackaged model from Rhasspy project	MIT
Japanese
vosk-model-small-ja-0.22	48M	9.52(csj CER) 17.07(ted10k CER)	Lightweight wideband model for Japanese	Apache 2.0
vosk-model-ja-0.22	1Gb	8.40(csj CER) 13.91(ted10k CER)	Big model for Japanese	Apache 2.0
Esperanto
vosk-model-small-eo-0.42	42M	7.24 (CV Test)	Lightweight model for Esperanto	Apache 2.0
Hindi
vosk-model-small-hi-0.22	42M	20.89 (IITM Challenge) 24.72 (MUCS Challenge)	Lightweight model for Hindi	Apache 2.0
vosk-model-hi-0.22	1.5Gb	14.85 (CV Test) 14.83 (IITM Challenge) 13.11 (MUCS Challenge)	Big accurate model for servers	Apache 2.0
Czech
vosk-model-small-cs-0.4-rhasspy	44M	21.29 (CV Test)	Lightweight model for Czech from Rhasspy project	MIT
Polish
vosk-model-small-pl-0.22	50.5M	18.36 (CV Test) 16.88 (MLS Test) 11.55 (Voxpopuli Test)	Lightweight model for Polish for Android	Apache 2.0
Speaker identification model
vosk-model-spk-0.4	13M	TBD	Model for speaker identification, should work for all languages	Apache 2.0

Automatic Speech Recognition Models

By LINAGORA - French, English, Arabic​

Acoustic model​

Decoding graph​

Acoustic model​

Decoding graph​

Acoustic model​

Decoding graph​

Acoustic model​

Decoding graph​

Community built models & Other languages​

By LINAGORA - French, English, Arabic

Acoustic model

Decoding graph

Acoustic model

Decoding graph

Acoustic model

Decoding graph

Acoustic model

Decoding graph

Community built models & Other languages