Inception Labs has launched Mercury 2, a diffusion-based reasoning model capable of generating over 1,000 tokens per second, three times faster than comparable Inception Labs has launched Mercury 2, a diffusion-based reasoning model capable of generating over 1,000 tokens per second, three times faster than comparable

Inception Labs Launches Mercury 2, Diffusion-Based Reasoning Model Achieving Over 1,000 Tokens Per Second

2026/02/26 17:38
2 דקת קריאה
למשוב או לפניות בנוגע לתוכן זה, אנא צור קשר איתנו ב crypto.news@mexc.com
Inception Labs Unveils Mercury 2: A Diffusion-Based LLM Delivering Over 1,000 Tokens Per Second For Low-Latency AI Applications

Inception Labs, an AI startup, has launched Mercury 2, a diffusion-based Large Language Model (LLM) designed to significantly accelerate reasoning tasks in production AI applications. 

Unlike traditional autoregressive models that generate text sequentially, Mercury 2 uses a parallel refinement process, producing multiple tokens simultaneously and converging over a small number of steps, enabling speeds of over 1,000 tokens per second on NVIDIA Blackwell GPUs—approximately three times faster than competing models in the same price range.

The model is optimized for real-time responsiveness in complex AI workflows, where latency compounds across multiple inference calls, retrieval pipelines, and agentic loops. Mercury 2 maintains high reasoning quality while reducing latency, allowing developers, voice AI systems, search engines, and other interactive applications to operate at reasoning-grade performance without the delays associated with sequential generation. It supports features such as tunable reasoning, 128K token context windows, schema-aligned JSON output, and native tool integration, providing flexibility for a range of production deployments.

Mercury 2 Enables Low-Latency AI Across Coding, Voice, And Search Workflows 

The report highlights several use cases where low-latency reasoning is critical. In coding and editing workflows, Mercury 2 delivers rapid autocomplete and next-edit suggestions that integrate seamlessly with developers’ thought processes. In agentic workflows, the model allows for more inference steps without exceeding latency budgets, improving the quality and depth of automated decision-making. Voice-based AI and interactive applications benefit from its ability to generate reasoning-quality responses within natural speech cadences, enhancing user experiences in real-time conversation scenarios. Additionally, Mercury 2 supports multi-hop search and retrieval pipelines, enabling rapid summarization, reranking, and reasoning without compromising response times.

Early adopters have noted significant improvements in throughput and user experience. Mercury 2 has been described as at least twice as fast as GPT-5.2 while maintaining competitive quality, with applications spanning real-time transcript cleanup, interactive human-computer interfaces, autonomous advertising optimization, and voice-enabled AI avatars.

The model is compatible with the OpenAI API, allowing integration into existing stacks without extensive modification, and Inception Labs offers support for enterprise evaluations, performance validation, and workload-specific deployment guidance. Mercury 2 represents a step forward in diffusion-based LLMs, redefining the balance between reasoning quality and latency in production AI environments.

The post Inception Labs Launches Mercury 2, Diffusion-Based Reasoning Model Achieving Over 1,000 Tokens Per Second appeared first on Metaverse Post.

הצהרת סיכום: המאמרים המתפרסמים מחדש באתר זה מקורם בפלטפורמות ציבוריות ונמסרים לצרכי מידע בלבד. הם אינם משקפים בהכרח את עמדותיה של MEXC. כל הזכויות שמורות למחברים המקוריים. אם אתה סבור שתוכן כלשהו מפר זכויות צד שלישי, אנא צרו קשר עם crypto.news@mexc.com לבקשת הסרה. MEXC אינה מתחייבת לדיוק, לשלמות או לעדכניות התוכן, ואינה אחראית לכל פעולה שתינקט על סמך המידע המסופק. התוכן אינו מהווה ייעוץ פיננסי, משפטי או מקצועי אחר, ואין לראות בו המלצה או אישור מטעם MEXC.

Starter Gold Rush: Win $2,500!

Starter Gold Rush: Win $2,500!Starter Gold Rush: Win $2,500!

Start your first trade & capture every Alpha move