Alibaba Cloud has open-sourced its Qwen3-ASR and Qwen3-ForcedAligner AI models, delivering state-of-the-art speech recognition and forced alignment performance.Alibaba Cloud has open-sourced its Qwen3-ASR and Qwen3-ForcedAligner AI models, delivering state-of-the-art speech recognition and forced alignment performance.

Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities

2026/01/29 22:30
2 min di lettura
Per feedback o dubbi su questo contenuto, contattateci all'indirizzo crypto.news@mexc.com.
Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities

Alibaba Cloud announced that it has made its Qwen3-ASR and Qwen3-ForcedAligner AI models open-source, offering advanced tools for speech recognition and forced alignment. 

The Qwen3-ASR family includes two all-in-one models, Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and transcription across 52 languages and accents, leveraging large-scale speech data and the Qwen3-Omni foundation model. 

Internal testing indicates that the 1.7B model delivers state-of-the-art accuracy among open-source ASR systems, while the 0.6B version balances performance and efficiency, capable of transcribing 2,000 seconds of speech in one second with high concurrency. 

The Qwen3-ForcedAligner-0.6B model uses a non-autoregressive LLM approach to align text and speech in 11 languages, outperforming leading force-alignment solutions in both speed and accuracy. 

Alibaba Cloud has also released a comprehensive inference framework under the Apache 2.0 license, supporting streaming, batch processing, timestamp prediction, and fine-tuning, aimed at accelerating research and practical applications in audio understanding.

Qwen3-ASR And Qwen3-ForcedAligner Models Demonstrate Leading Accuracy And Efficiency

Alibaba Cloud has released performance results for its Qwen3-ASR and Qwen3-ForcedAligner models, demonstrating leading accuracy and efficiency across diverse speech recognition tasks. 

The Qwen3-ASR-1.7B model achieves state-of-the-art results among open-source systems, outperforming commercial APIs and other open-source models in English, multilingual, and Chinese dialect recognition, including Cantonese and 22 regional variants. 

It maintains reliable accuracy in challenging acoustic conditions, such as low signal-to-noise environments, child or elderly speech, and even singing voice transcription, achieving average word error rates of 13.91% in Chinese and 14.60% in English with background music.

The smaller Qwen3-ASR-0.6B balances accuracy and efficiency, delivering high throughput and low latency under high concurrency, capable of transcribing up to five hours of speech in online asynchronous mode at a concurrency of 128. 

Meanwhile, the Qwen3-ForcedAligner-0.6B outperforms leading end-to-end forced alignment models including Nemo-Forced-Aligner, WhisperX, and Monotonic-Aligner, offering superior language coverage, timestamp accuracy, and support for varied speech and audio lengths.

The post Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities appeared first on Metaverse Post.

Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta crypto.news@mexc.com per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!