Dagmawi Babi

The Massively Multilingual Speech (MMS) project scales speech technology to 1,100-4,000 languages using self-supervised learning with wav2vec 2.0.

Compared to OpenAI Whisper, the multilingual ASR model supports 11x more languages but has less than half the average error rate on 54 languages of FLEURS. The model is also trained on a fraction of the labeled data (45K vs. 680K hours).

Paper
• https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/

Blog
• https://ai.facebook.com/blog/multilingual-model-speech-recognition/

Code/models
• https://github.com/facebookresearch/fairseq/tree/main/examples%2Fmms

#MMS #AI #OSS
@Dagmawi_Babi

260 views21:23