The post DualPath lifts throughput as RDMA eases KV-cache I/O appeared on BitcoinEthereumNews.com. DualPath inference: dual load paths relieve KV-cache I/O bottlenecksThe post DualPath lifts throughput as RDMA eases KV-cache I/O appeared on BitcoinEthereumNews.com. DualPath inference: dual load paths relieve KV-cache I/O bottlenecks

DualPath lifts throughput as RDMA eases KV-cache I/O

For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

DualPath inference: dual load paths relieve KV-cache I/O bottlenecks

A new paper introduces DualPath inference, describing a system that nearly doubles an agent’s throughput by tackling the KV cache bottleneck in multi‑round agentic LLMs.

According to DeepSeek’s arXiv paper (https://arxiv.org/abs/2602.21548?utm_source=openai), DualPath adds a second load path: storage loads into the decode engine, which then uses RDMA (Remote Direct Memory Access) to transfer KV data to the prefill engine. The report indicates this rebalances bandwidth and relieves the KV-cache I/O bottleneck, delivering up to ~1.87× throughput in offline tests and ~1.96× on average in online service, without breaching latency SLOs. Peak gains assume very high cache reuse, with KV-cache hit rates around or above 95%.

Why it matters: higher throughput without breaking latency SLOs

For online inference, throughput gains are only meaningful if Time to First Token and token-to-token latency remain stable. The evaluations emphasize preserving SLOs while increasing aggregate tokens served.

“Computation is cheap; data movement is expensive,” said Jeff Dean.

As reported by 36Kr (https://eu.36kr.com/en/p/3700922638053255?utm_source=openai), the approach exploits idle decode‑engine network capacity to move KV data via RDMA. The outlet also notes stable TTFT and token‑to‑token behavior under high load.

BingX: a trusted exchange delivering real advantages for traders at every level.

Agentic, multi‑turn workloads that repeatedly draw from past context benefit most, because DualPath reduces stalls when fetching KV cache from external storage. This shifts the limiting factor away from storage I/O toward better‑balanced compute and network use.

In production, the headline result is higher tokens‑per‑second per cluster without measurable TTFT regression in the reported tests. That combination supports steadier user experience while raising capacity.

Organizations should still validate under their own mixes, as realized gains depend on cache reuse patterns, sequence lengths, and interconnect quality.

Deployment checklist, hardware needs, and when DualPath helps less

RDMA-capable interconnect, robust storage bandwidth, and ≥95% KV-cache hit rates

A practical rollout expects an RDMA‑capable interconnect, solid storage throughput to feed caches, and very high reuse so KV‑cache hit rates approach the ~95% mark cited in evaluations. Decode‑engine NIC capacity should be provisioned to absorb the added transfer path.

Lower cache reuse or weaker networking may reduce realized gains

Workloads with sparse history reuse, fragmented sessions, or weaker networking will see smaller uplift. Absent robust RDMA, added transfers can shift bottlenecks rather than remove them.

At the time of this writing, NVIDIA (NVDA) traded near 186.18 in overnight action after a 5.49% decline to 184.89 at the close, based on data from Nasdaq.

FAQ about DualPath inference

How much throughput improvement does DualPath deliver in online vs offline inference workloads?

Reported gains reached about 1.87× in offline tests and roughly 1.96× on average in online service, while adhering to stated service-level objectives in the paper’s evaluations.

Does DualPath affect Time to First Token (TTFT) and token-to-token latency under real production load?

The reported evaluations indicate TTFT and token-to-token latency remained stable under load, with throughput increasing via DualPath’s second transfer path and RDMA-assisted balancing of bandwidth.

Source: https://coincu.com/news/dualpath-lifts-throughput-as-rdma-eases-kv-cache-i-o/

Market Opportunity
GAINS Logo
GAINS Price(GAINS)
$0.00742
$0.00742$0.00742
+0.81%
USD
GAINS (GAINS) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

The post Polygon Tops RWA Rankings With $1.1B in Tokenized Assets appeared on BitcoinEthereumNews.com. Key Notes A new report from Dune and RWA.xyz highlights Polygon’s role in the growing RWA sector. Polygon PoS currently holds $1.13 billion in RWA Total Value Locked (TVL) across 269 assets. The network holds a 62% market share of tokenized global bonds, driven by European money market funds. The Polygon POL $0.25 24h volatility: 1.4% Market cap: $2.64 B Vol. 24h: $106.17 M network is securing a significant position in the rapidly growing tokenization space, now holding over $1.13 billion in total value locked (TVL) from Real World Assets (RWAs). This development comes as the network continues to evolve, recently deploying its major “Rio” upgrade on the Amoy testnet to enhance future scaling capabilities. This information comes from a new joint report on the state of the RWA market published on Sept. 17 by blockchain analytics firm Dune and data platform RWA.xyz. The focus on RWAs is intensifying across the industry, coinciding with events like the ongoing Real-World Asset Summit in New York. Sandeep Nailwal, CEO of the Polygon Foundation, highlighted the findings via a post on X, noting that the TVL is spread across 269 assets and 2,900 holders on the Polygon PoS chain. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 Key Trends From the 2025 RWA Report The joint publication, titled “RWA REPORT 2025,” offers a comprehensive look into the tokenized asset landscape, which it states has grown 224% since the start of 2024. The report identifies several key trends driving this expansion. According to…
Share
BitcoinEthereumNews2025/09/18 00:40
Shiba Inu (SHIB) Price Reset Point: Three Oversold Indicators, 20% Potential

Shiba Inu (SHIB) Price Reset Point: Three Oversold Indicators, 20% Potential

The post Shiba Inu (SHIB) Price Reset Point: Three Oversold Indicators, 20% Potential appeared on BitcoinEthereumNews.com. Shiba Inu remains lower Most likely outcome
Share
BitcoinEthereumNews2026/03/02 22:49
MAXI DOGE Holders Diversify into $GGs for Fast-Growth 2025 Crypto Presale Opportunities

MAXI DOGE Holders Diversify into $GGs for Fast-Growth 2025 Crypto Presale Opportunities

Presale crypto tokens have become some of the most active areas in Web3, offering early access to projects that blend culture, finance, and technology. Investors are constantly searching for the best crypto presale to buy right now, comparing new token presales across different niches. MAXI DOGE has gained attention for its meme-driven energy, but early [...] The post MAXI DOGE Holders Diversify into $GGs for Fast-Growth 2025 Crypto Presale Opportunities appeared first on Blockonomi.
Share
Blockonomi2025/09/18 00:00