TLDR: TurboQuant compresses AI KV cache memory by up to five times with minimal impact on model quality. The upgrade enables laptops and phones to run longer AITLDR: TurboQuant compresses AI KV cache memory by up to five times with minimal impact on model quality. The upgrade enables laptops and phones to run longer AI

Tether Brings Google’s TurboQuant to Production, Unlocking Long-Context AI on Everyday Devices

2026/06/02 07:46
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

TLDR:

  • TurboQuant compresses AI KV cache memory by up to five times with minimal impact on model quality.
  • The upgrade enables laptops and phones to run longer AI sessions without cloud dependence.
  • QVAC SDK 0.12.0 integrates TurboQuant into Fabric, expanding local AI development options.
  • Tether aims to advance privacy-focused AI by bringing efficient inference closer to end users.

Tether’s AI Research Group has released an open-source production version of TurboQuant, a memory compression algorithm originally developed by Google Research.

The release is part of QVAC SDK 0.12.0 and targets laptops, phones, edge devices, and decentralized networks. It allows local AI models to handle longer sessions without relying on cloud infrastructure.

This marks a practical shift in how on-device AI manages memory-intensive tasks.

TurboQuant Compresses AI Memory Up to Five Times

Memory has long been a barrier for running capable AI models on consumer hardware. When an AI assistant processes a long document or conversation, it stores that context in what is called the KV cache.

At roughly 262,000 tokens, the KV cache for a 4B model can consume around 8 GB of memory alone. Four concurrent sessions can push that figure to 32 GB before accounting for the model itself.

TurboQuant addresses this by compressing the KV cache by up to five times while maintaining output quality close to an uncompressed model.

A user can now ask a laptop-based assistant to analyze a hundred-page legal document without uploading it to a remote server.

Students, developers, journalists, and researchers can all benefit from longer, more context-aware AI sessions on devices they already own.

Speaking on the broader reasoning behind the release, Tether CEO Paolo Ardoino pointed to the gap between research and practical software.

Google’s research showed that AI memory could be compressed far more efficiently than most people assumed,” he said. “Our work brings that breakthrough into production software that developers, startups, and users can actually build with.”

The production release includes a full quantization pipeline, framework adapters, developer documentation, and workload-tuned profiles.

These components are designed for real environments outside hyperscale data centers, covering constrained memory, mixed hardware, and latency-sensitive deployments.

QVAC SDK 0.12.0 Expands Local AI Development Options

TurboQuant ships as part of QVAC SDK 0.12.0, integrated directly into Fabric, a core component of the QVAC stack.

Fabric began as a llama.cpp fork and has since grown to incorporate multiple research advances. The SDK gives developers a unified set of tools, libraries, and runtime components for building local AI applications.

For startups and independent developers, this removes the assumption that large AI products require expensive GPU clusters.

Teams can now design for longer context windows, larger file workloads, and flexible deployment across consumer and edge hardware. That opens practical paths for building AI products without cloud-only architecture.

Addressing concerns around data privacy and cloud dependency, Ardoino made the case for keeping AI tasks on local devices.

People should be able to ask an AI assistant to read a long document or work through private information without every task being forced through a remote data center,” he said. TurboQuant, in that sense, gives local AI more operational room.

Tether’s strategy centers on AI that runs closer to users, across personal devices and decentralized networks. The company sees software efficiency and portability as defining factors in the next phase of AI development, alongside large-scale compute infrastructure.

The post Tether Brings Google’s TurboQuant to Production, Unlocking Long-Context AI on Everyday Devices appeared first on Blockonomi.

Market Opportunity
Gensyn Logo
Gensyn Price(AI)
$0.02784
$0.02784$0.02784
-0.85%
USD
Gensyn (AI) Live Price Chart

SPACEX(PRE) Launchpad

SPACEX(PRE) LaunchpadSPACEX(PRE) Launchpad

Register for a chance to win a free lucky draw

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

NEAR Protocol Price Surges 10% as Bullish Technical Setup Puts $3.50 in Sight

NEAR Protocol Price Surges 10% as Bullish Technical Setup Puts $3.50 in Sight

The post NEAR Protocol Price Surges 10% as Bullish Technical Setup Puts $3.50 in Sight appeared first on Coinpedia Fintech News The AI crypto narrative is gaining
Share
CoinPedia2026/06/03 17:23
CME Group to launch Solana and XRP futures options in October

CME Group to launch Solana and XRP futures options in October

The post CME Group to launch Solana and XRP futures options in October appeared on BitcoinEthereumNews.com. CME Group is preparing to launch options on SOL and XRP futures next month, giving traders new ways to manage exposure to the two assets.  The contracts are set to go live on October 13, pending regulatory approval, and will come in both standard and micro sizes with expiries offered daily, monthly and quarterly. The new listings mark a major step for CME, which first brought bitcoin futures to market in 2017 and added ether contracts in 2021. Solana and XRP futures have quickly gained traction since their debut earlier this year. CME says more than 540,000 Solana contracts (worth about $22.3 billion), and 370,000 XRP contracts (worth $16.2 billion), have already been traded. Both products hit record trading activity and open interest in August. Market makers including Cumberland and FalconX plan to support the new contracts, arguing that institutional investors want hedging tools beyond bitcoin and ether. CME’s move also highlights the growing demand for regulated ways to access a broader set of digital assets. The launch, which still needs the green light from regulators, follows the end of XRP’s years-long legal fight with the US Securities and Exchange Commission. A federal court ruling in 2023 found that institutional sales of XRP violated securities laws, but programmatic exchange sales did not. The case officially closed in August 2025 after Ripple agreed to pay a $125 million fine, removing one of the biggest uncertainties hanging over the token. This is a developing story. This article was generated with the assistance of AI and reviewed by editor Jeffrey Albus before publication. Get the news in your inbox. Explore Blockworks newsletters: Source: https://blockworks.co/news/cme-group-solana-xrp-futures
Share
BitcoinEthereumNews2025/09/17 23:55
Best Crypto Presale 2026: $GRUNTLE Crosses $105k as ETH and DOGE Drop 9%

Best Crypto Presale 2026: $GRUNTLE Crosses $105k as ETH and DOGE Drop 9%

The post Best Crypto Presale 2026: $GRUNTLE Crosses $105k as ETH and DOGE Drop 9% appeared first on Coinpedia Fintech News Bullish crypto positions lost $1.6 billion
Share
CoinPedia2026/06/03 17:22

RealStocks Now Live

RealStocks Now LiveRealStocks Now Live

Trade real U.S. stock via regulated brokerage