
Automatic speech recognition (ASR) technologies have been greatly advanced, yet notable inequalities remain in their ability to accurately identify diverse languages. Major ASR systems, such as Openai’s Whisper, performed the performance intervals when processing eastern languages compared to Western counterparts. This discrepancy presents tangible challenges in multilingual regions, especially the characteristic of those who are characterized by many dialects and linguistic variations, especially underlining the requirement for sophisticated multilingual ASR systems to suit eastern languages.
Researchers at Dataocean AI and Tsinghua University have introduced dolphins, a comprehensive multilingual speech recognition model built on an extended whisper architecture, adapted to adjust a broad spectrum of eastern languages and dialects. Dolphin effectively addresses the major boundaries identified in the current multilingual ASR model by integrating both proprietary dataset and publicly accessible dataset. The model supports 40 Eastern languages from East Asia, South Asia, South East Asia and Middle East, as well as 22 separate dialects of sugar.
The dolphin employs a hybrid ASR approach to a combination of the connectionist temporal classification (CTC) with attention-based mechanisms. Its architecture includes an e-brenchformer encoder and a transformer decoder, which greatly enhances the ability of the model to interpret complex linguistic patterns in diverse languages. The dolphin also uses a double-tier language tokening system, which separates the general language code from the field-specific dialect token. This mechanism recognizes accreditation accuracy and resolution, especially for dialect-intensive languages like Chinese. Additionally, the dolphin has included a 4 × subscription layer to reduce the length of the input sequence efficiently, which increases computational speed and training effectiveness without compromising accreditation accreditation.
Experimental assessment displays marked improvements of dolphins in multilingual speech recognition accuracy relative to whisper models. For example, the small model of dolphins reduced the word error rate (WER) to about 24.5% compared to the base model, leading to more incremental improvements in moderate and larger variants. In particular, the dolphin base model acquired an average of 31.8%, especially the big-V3 model of the whisper improved, which recorded an average of 52.3% in the same assessment benchmark. The assessment on the dialect-centric dataset, including the coust, confirmed the ability of the dolphin to continuously handle the complex linguistic variations, positively correlated with an increase in model size with performance growth.
The research team publicly issued a dolphin base and small model under the Apache 2.0 license, as well as with the affiliated conclusion code. Dolphin’s training used a broad dataset, including an audio recording of 21.2 million hours, including open datasets such as Common Voice, Rizonspich and Gigspic 2 from 2 to 7.4 million hours obtained from 2, ensuring strengthening and replication.
In summary, the dolphin forms a significant progress in multilingual ASR technology, which systematically addresses the boundaries prevalent in the eastern language and is recognized through the functioning data integration, sophisticated architectural structures and commitment to open source. This work sets an impressive benchmark for future development in multilingual ASR research, furthering linguistic inclusion and system generalization.
Check out Paper, dolphin-small-model and dolphin-base-model. All credit for this research goes to the researchers of this project. Also, feel free to follow us Twitter And don’t forget to join us 85K+ ML Subredit,
[Register Now] Minicon Virtual Conference on Open Source AI: Free Registration + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 am- 12 am) + Hands on Workshop [Sponsored]
Asif razzaq is CEO of Marktechpost Media Inc .. As a visionary entrepreneur and engineer, ASIF is committed to using the ability of artificial intelligence for social good. His most recent effort is the launch of an Artificial Intelligence Media Platform, Marktekpost, which stands for his intensive coverage of machine learning and deep learning news, technically sound and easily understand by a comprehensive audience. The stage claims more than 2 million monthly ideas, reflecting its popularity among the audience.