Together.ai releases Mamba-3, an open-source state space model built for inference that outperforms Mamba-2 and matches Transformer decode speeds at 16K sequencesTogether.ai releases Mamba-3, an open-source state space model built for inference that outperforms Mamba-2 and matches Transformer decode speeds at 16K sequences

Mamba-3 SSM Drops With Inference-First Design Beating Transformers at Decode

2026/03/18 01:48
Okuma süresi: 3 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen crypto.news@mexc.com üzerinden bizimle iletişime geçin.

Mamba-3 SSM Drops With Inference-First Design Beating Transformers at Decode

James Ding Mar 17, 2026 17:48

Together.ai releases Mamba-3, an open-source state space model built for inference that outperforms Mamba-2 and matches Transformer decode speeds at 16K sequences.

Mamba-3 SSM Drops With Inference-First Design Beating Transformers at Decode

Together.ai has released Mamba-3, a state space model architecture designed from the ground up for inference workloads rather than training efficiency. The open-source release marks a philosophical shift in how linear architectures are built, arriving as agentic AI workflows have pushed inference demand to unprecedented levels.

At 16,384 sequence length, Mamba-3's SISO variant clocks prefill+decode at 140.61 seconds versus 149.02 seconds for Mamba-2 and a staggering 976.50 seconds for Llama-3.2-1B running on vLLM. That's nearly 7x faster than the Transformer baseline on the same H100 GPU hardware.

Why Inference Matters Now

The timing isn't accidental. While Mamba-2 bet big on training speed back in mid-2024—delivering 2-8x faster training than its predecessor—the landscape has shifted dramatically. Reinforcement learning with verifiable rewards for coding and math requires massive rollout generation. Tools like Codex, Claude Code, and OpenClaw have made inference the bottleneck, not pretraining.

Previous linear architectures simplified their underlying mechanisms to accelerate training, leaving the inference step "too simple" and memory-bound. GPUs weren't computing—they were mostly shuffling data around.

Three Core Improvements

Mamba-3 addresses this through changes rooted in classical control theory rather than trendy deep learning interpretations:

Exponential-trapezoidal discretization creates a more expressive recurrence. This eliminates the short causal convolution that plagued Mamba-1 and Mamba-2—a component that had become standard across linear models since H3 and RWKV-4 popularized it.

Complex-valued SSM systems expand state-tracking capabilities. The model can now handle synthetic tasks like parity and arithmetic reasoning that Mamba-2 couldn't reliably solve.

Multi-input, multi-output (MIMO) architecture runs multiple SSMs in parallel. The MIMO variant boosts downstream accuracy by over 1 percentage point at 1B scale compared to standard Mamba-3, with a crucial catch: training takes longer, but decode latency stays flat.

That last point deserves emphasis. Training is compute-bound; inference is memory-bound. Adding FLOPs per timestep barely touches inference latency because idle GPU cores simply pick up the work.

Benchmark Results

On downstream language modeling evaluations, Mamba-3 outperforms both Mamba-2 and Gated DeltaNet across pretrained model scales. The SISO variant matches Mamba-2's architecture shapes exactly while delivering better accuracy. MIMO pushes further ahead.

Retrieval tasks tell a more nuanced story. Pure linear models naturally underperform Transformers here—that fixed-size state can't match an ever-growing KV cache for exact recall. But Mamba-3 holds its own among sub-quadratic alternatives, and MIMO improves retrieval without increasing state size.

The team predicts hybrid models combining linear layers with global self-attention will dominate language modeling going forward. Their experiments show this combination beats vanilla Transformers on retrieval while maintaining efficiency gains.

Open Source From Day One

Kernels are available at the mamba-ssm repository, built across Triton, TileLang, and CuTe DSL depending on the operation. The stack reflects pragmatic engineering: Triton for standard architecture development, TileLang for fine-grained memory control on MIMO prefill, and CuTe DSL for maximizing Hopper GPU performance during decode.

NVIDIA's recent Nemotron 3 Super release, which uses Mamba-2 layers in a hybrid configuration, suggests enterprise interest in SSM architectures is accelerating. Mamba-3's inference-first approach could accelerate adoption in production environments where token generation speed directly impacts costs and user experience.

The full paper is available on arXiv, with a second blog post covering the mathematical foundations of the three core improvements expected to follow.

Image source: Shutterstock
  • mamba-3
  • state space models
  • ai inference
  • together.ai
  • open source
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen crypto.news@mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

BFX Presale Raises $7.5M as Solana Holds $243 and Avalanche Eyes $1B Treasury — Best Cryptos to Buy in 2025

BFX Presale Raises $7.5M as Solana Holds $243 and Avalanche Eyes $1B Treasury — Best Cryptos to Buy in 2025

BFX presale hits $7.5M with tokens at $0.024 and 30% bonus code BLOCK30, while Solana holds $243 and Avalanche builds a $1B treasury to attract institutions.
Paylaş
Blockchainreporter2025/09/18 01:07
MoneyGram launches stablecoin-powered app in Colombia

MoneyGram launches stablecoin-powered app in Colombia

The post MoneyGram launches stablecoin-powered app in Colombia appeared on BitcoinEthereumNews.com. MoneyGram has launched a new mobile application in Colombia that uses USD-pegged stablecoins to modernize cross-border remittances. According to an announcement on Wednesday, the app allows customers to receive money instantly into a US dollar balance backed by Circle’s USDC stablecoin, which can be stored, spent, or cashed out through MoneyGram’s global retail network. The rollout is designed to address the volatility of local currencies, particularly the Colombian peso. Built on the Stellar blockchain and supported by wallet infrastructure provider Crossmint, the app marks MoneyGram’s most significant move yet to integrate stablecoins into consumer-facing services. Colombia was selected as the first market due to its heavy reliance on inbound remittances—families in the country receive more than 22 times the amount they send abroad, according to Statista. The announcement said future expansions will target other remittance-heavy markets. MoneyGram, which has nearly 500,000 retail locations globally, has experimented with blockchain rails since partnering with the Stellar Development Foundation in 2021. It has since built cash on and off ramps for stablecoins, developed APIs for crypto integration, and incorporated stablecoins into its internal settlement processes. “This launch is the first step toward a world where every person, everywhere, has access to dollar stablecoins,” CEO Anthony Soohoo stated. The company emphasized compliance, citing decades of regulatory experience, though stablecoin oversight remains fluid. The US Congress passed the GENIUS Act earlier this year, establishing a framework for stablecoin regulation, which MoneyGram has pointed to as providing clearer guardrails. This is a developing story. This article was generated with the assistance of AI and reviewed by editor Jeffrey Albus before publication. Get the news in your inbox. Explore Blockworks newsletters: Source: https://blockworks.co/news/moneygram-stablecoin-app-colombia
Paylaş
BitcoinEthereumNews2025/09/18 07:04
CME Group to Launch Solana and XRP Futures Options

CME Group to Launch Solana and XRP Futures Options

The post CME Group to Launch Solana and XRP Futures Options appeared on BitcoinEthereumNews.com. An announcement was made by CME Group, the largest derivatives exchanger worldwide, revealed that it would introduce options for Solana and XRP futures. It is the latest addition to CME crypto derivatives as institutions and retail investors increase their demand for Solana and XRP. CME Expands Crypto Offerings With Solana and XRP Options Launch According to a press release, the launch is scheduled for October 13, 2025, pending regulatory approval. The new products will allow traders to access options on Solana, Micro Solana, XRP, and Micro XRP futures. Expiries will be offered on business days on a monthly, and quarterly basis to provide more flexibility to market players. CME Group said the contracts are designed to meet demand from institutions, hedge funds, and active retail traders. According to Giovanni Vicioso, the launch reflects high liquidity in Solana and XRP futures. Vicioso is the Global Head of Cryptocurrency Products for the CME Group. He noted that the new contracts will provide additional tools for risk management and exposure strategies. Recently, CME XRP futures registered record open interest amid ETF approval optimism, reinforcing confidence in contract demand. Cumberland, one of the leading liquidity providers, welcomed the development and said it highlights the shift beyond Bitcoin and Ethereum. FalconX, another trading firm, added that rising digital asset treasuries are increasing the need for hedging tools on alternative tokens like Solana and XRP. High Record Trading Volumes Demand Solana and XRP Futures Solana futures and XRP continue to gain popularity since their launch earlier this year. According to CME official records, many have bought and sold more than 540,000 Solana futures contracts since March. A value that amounts to over $22 billion dollars. Solana contracts hit a record 9,000 contracts in August, worth $437 million. Open interest also set a record at 12,500 contracts.…
Paylaş
BitcoinEthereumNews2025/09/18 01:39