NVIDIA and Nebius benchmarks show GPU fractioning achieves 86% user capacity on 0.5 GPU allocation, enabling 3x more concurrent users for mixed AI workloads. (ReadNVIDIA and Nebius benchmarks show GPU fractioning achieves 86% user capacity on 0.5 GPU allocation, enabling 3x more concurrent users for mixed AI workloads. (Read

NVIDIA Run:ai GPU Fractioning Delivers 77% Throughput at Half Allocation

2026/02/19 02:31
3 min read

NVIDIA Run:ai GPU Fractioning Delivers 77% Throughput at Half Allocation

Darius Baruo Feb 18, 2026 18:31

NVIDIA and Nebius benchmarks show GPU fractioning achieves 86% user capacity on 0.5 GPU allocation, enabling 3x more concurrent users for mixed AI workloads.

NVIDIA Run:ai GPU Fractioning Delivers 77% Throughput at Half Allocation

NVIDIA's Run:ai platform can deliver 77% of full GPU throughput using just half the hardware allocation, according to joint benchmarking with cloud provider Nebius released February 18. The results demonstrate that enterprises running large language model inference can dramatically expand capacity without proportional GPU investment.

The tests, conducted on clusters with 64 NVIDIA H100 NVL GPUs and 32 NVIDIA HGX B200 GPUs, showed fractional GPU scheduling achieving near-linear performance scaling across 0.5, 0.25, and 0.125 allocations.

Hard Numbers from Production Testing

At 0.5 GPU allocation, the system supported 8,768 concurrent users while maintaining time-to-first-token under one second—86% of the 10,200 users supported at full allocation. Token generation hit 152,694 tokens per second, compared to 198,680 at full capacity.

Smaller models pushed these gains further. Phi-4-Mini running on 0.25 GPU fractions handled 72% more concurrent users than full-GPU deployment, achieving approximately 450,000 tokens per second with P95 latency under 300 milliseconds on 32 GPUs.

The mixed workload scenario proved most striking. Running Llama 3.1 8B, Phi-4 Mini, and Qwen-Embeddings simultaneously on fractional allocations tripled total concurrent system users compared to single-model deployment. Combined throughput exceeded 350,000 tokens per second at full scale with no cross-model interference.

Why This Matters for GPU Economics

Traditional Kubernetes schedulers allocate whole GPUs to individual models, leaving substantial capacity stranded. The benchmarks noted that even Qwen3-14B, the largest model tested at 14 billion parameters, occupies only 35% of an H100 NVL's 80GB capacity.

Run:ai's scheduler eliminates this waste through dynamic memory allocation. Users specify requirements directly; the system handles resource distribution without preconfiguration. Memory isolation happens at runtime while compute cycles distribute fairly among active processes.

This timing coincides with broader industry moves toward GPU partitioning. SoftBank and AMD announced validation testing on February 16 for similar fractioning capabilities on AMD Instinct GPUs, where single GPUs can split into up to eight logical devices.

Autoscaling Without Latency Spikes

Nebius tested automatic scaling with Llama 3.1 8B configured to add GPUs when concurrent users exceeded 50. Replicas scaled from 1 to 16 with clean ramp-up, stable utilization during pod warm-up, and negligible HTTP errors.

The practical implication: enterprises can run multiple inference models on existing GPU inventory, scale dynamically during peak demand, and reclaim idle capacity during off-hours for other workloads. For organizations facing fixed GPU budgets, fractioning transforms capacity planning from hardware procurement into software configuration.

Run:ai v2.24 is available now. NVIDIA plans to discuss the Nebius implementation at GTC 2026.

Image source: Shutterstock
  • nvidia
  • gpu
  • ai infrastructure
  • llm inference
  • run:ai
Market Opportunity
NodeAI Logo
NodeAI Price(GPU)
$0.03072
$0.03072$0.03072
+2.63%
USD
NodeAI (GPU) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Forward Industries Bets Big on Solana With $4B Capital Plan

Forward Industries Bets Big on Solana With $4B Capital Plan

The firm has filed with the U.S. Securities and Exchange Commission to launch a $4 billion at-the-market (ATM) equity program, […] The post Forward Industries Bets Big on Solana With $4B Capital Plan appeared first on Coindoo.
Share
Coindoo2025/09/18 04:15
Nvidia’s Strategic Masterstroke: Deepening Early-Stage Ties with India’s Booming AI Startup Ecosystem

Nvidia’s Strategic Masterstroke: Deepening Early-Stage Ties with India’s Booming AI Startup Ecosystem

BitcoinWorld Nvidia’s Strategic Masterstroke: Deepening Early-Stage Ties with India’s Booming AI Startup Ecosystem NEW DELHI, INDIA – October 2025: Nvidia Corporation
Share
bitcoinworld2026/02/20 09:30
Bad Bunny Tops 2025 Latin Grammy With 12 Nominations, Ca7riel & Paco Amoroso Get 10

Bad Bunny Tops 2025 Latin Grammy With 12 Nominations, Ca7riel & Paco Amoroso Get 10

The post Bad Bunny Tops 2025 Latin Grammy With 12 Nominations, Ca7riel & Paco Amoroso Get 10 appeared on BitcoinEthereumNews.com. Bad Bunny and Ca7riel & Paco Amoroso among the most nominated artists for the 2025 Latin Grammys. Mike Coppola/MG25/Getty Images for The Met Museum/Vogue, Dana Jacobs/WireImage Puerto Rican megastar Bad Bunny leads the 26th Annual Latin Grammy Awards with the most nominations, followed closely by breakout Argentinian experimental trap, hip-hop and pop duo Ca7riel & Paco Amoroso, and prolific music producer Edgar Barrera, who once again ranks among the year’s top nominees. Bad Bunny earned 12 nominations, including Album of the Year for Debí Tirar Más Fotos, Record and Song of the Year for “Baile Inolvidable” and “DtMF​.”​ Songs from his hit album even compete against each other in three categories. Ca7riel & Paco Amoroso received 10 nominations, including Album of the Year for Papota and Record and Song of the Year for “El Día del Amigo” and “#Tetas.” The duo gained widespread popularity following their 2024 NPR Tiny Desk Concert​, which has garnered more than 42 million views to date.​ Five of their nine album tracks are from that performance​. Sought-after music producer Edgar Barrera also secured 10 nominations​ —​ one more than in 2024​ —​ including Songwriter and Producer of the Year. He received additional recognition for his contributions to songs across urban, tropical and regional categories, including Maluma’s “Cosas Pendientes,” Karol G’s “Si Antes Te Hubiera Conocido” and Grupo Frontera’s “Hecha Pa’ Mí.” Other top nominees include Natalia Lafourcade with eight nominations, Liniker with six, and Alejandro Sanz with four. Also in the mix are Rauw Alejandro, Gloria Estefan, Shakira and Rubén Blades. In announcing the nominees, Manuel Abud, CEO of The Latin Recording Academy, highlighted Latin music’s expanding influence. “The impact of Latin music continues to grow on a global level, and all of the nominated artists encompass its diversity and richness while continuing to preserve…
Share
BitcoinEthereumNews2025/09/18 06:41