The post NVIDIA Enhances Training Throughput with NeMo-RL’s Megatron-Core appeared on BitcoinEthereumNews.com. Ted Hisokawa Aug 20, 2025 16:26 NVIDIA introduces Megatron-Core support in NeMo-RL v0.3, optimizing training throughput for large models with GPU-optimized techniques and enhanced parallelism. NVIDIA has unveiled the latest iteration of its NeMo-RL framework, version 0.3, which incorporates support for Megatron-Core. This enhancement aims to optimize training throughput for large language models by leveraging GPU-optimized techniques and advanced parallelism strategies, according to NVIDIA’s official blog. Challenges with Previous Backends The initial release of NVIDIA NeMo-RL utilized PyTorch DTensor (FSDP2), offering native integration with the HuggingFace ecosystem and enabling quick experimentation through PyTorch’s native parallelisms. However, as model sizes increased to hundreds of billions of parameters, the DTensor path proved inadequate due to significant recompute overhead and lack of optimized NVIDIA CUDA kernels, leading to inefficient step times. Introducing Megatron-Core The Megatron-Core library addresses these limitations by offering a more efficient solution for training extensive models. It employs a 6D parallelism strategy to enhance communication and computation patterns, supporting various model architectures. This backend enables seamless training of massive language models, enhancing throughput and performance significantly. Getting Started with Megatron-Core Implementing Megatron-based training involves adding specific configurations to the YAML setup. The process is streamlined by NeMo-RL, which handles complex tuning automatically, presenting users with straightforward configuration options. This makes the adoption of Megatron-Core more accessible for developers, allowing them to focus on optimizing their model training processes. Performance Improvements Megatron-based training supports both dense and Mixture of Experts (MoE) models. Performance tests have demonstrated superior training performance with Megatron-Core compared to PyTorch DTensor, as shown in various model configurations like Llama 3.1-8B and 70B. The enhancements are evident in faster step times and improved convergence properties. Additional Features and Future Prospects NeMo-RL v0.3 introduces features such as async rollouts and non-colocated… The post NVIDIA Enhances Training Throughput with NeMo-RL’s Megatron-Core appeared on BitcoinEthereumNews.com. Ted Hisokawa Aug 20, 2025 16:26 NVIDIA introduces Megatron-Core support in NeMo-RL v0.3, optimizing training throughput for large models with GPU-optimized techniques and enhanced parallelism. NVIDIA has unveiled the latest iteration of its NeMo-RL framework, version 0.3, which incorporates support for Megatron-Core. This enhancement aims to optimize training throughput for large language models by leveraging GPU-optimized techniques and advanced parallelism strategies, according to NVIDIA’s official blog. Challenges with Previous Backends The initial release of NVIDIA NeMo-RL utilized PyTorch DTensor (FSDP2), offering native integration with the HuggingFace ecosystem and enabling quick experimentation through PyTorch’s native parallelisms. However, as model sizes increased to hundreds of billions of parameters, the DTensor path proved inadequate due to significant recompute overhead and lack of optimized NVIDIA CUDA kernels, leading to inefficient step times. Introducing Megatron-Core The Megatron-Core library addresses these limitations by offering a more efficient solution for training extensive models. It employs a 6D parallelism strategy to enhance communication and computation patterns, supporting various model architectures. This backend enables seamless training of massive language models, enhancing throughput and performance significantly. Getting Started with Megatron-Core Implementing Megatron-based training involves adding specific configurations to the YAML setup. The process is streamlined by NeMo-RL, which handles complex tuning automatically, presenting users with straightforward configuration options. This makes the adoption of Megatron-Core more accessible for developers, allowing them to focus on optimizing their model training processes. Performance Improvements Megatron-based training supports both dense and Mixture of Experts (MoE) models. Performance tests have demonstrated superior training performance with Megatron-Core compared to PyTorch DTensor, as shown in various model configurations like Llama 3.1-8B and 70B. The enhancements are evident in faster step times and improved convergence properties. Additional Features and Future Prospects NeMo-RL v0.3 introduces features such as async rollouts and non-colocated…

NVIDIA Enhances Training Throughput with NeMo-RL’s Megatron-Core

For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com


Ted Hisokawa
Aug 20, 2025 16:26

NVIDIA introduces Megatron-Core support in NeMo-RL v0.3, optimizing training throughput for large models with GPU-optimized techniques and enhanced parallelism.



NVIDIA Enhances Training Throughput with NeMo-RL's Megatron-Core

NVIDIA has unveiled the latest iteration of its NeMo-RL framework, version 0.3, which incorporates support for Megatron-Core. This enhancement aims to optimize training throughput for large language models by leveraging GPU-optimized techniques and advanced parallelism strategies, according to NVIDIA’s official blog.

Challenges with Previous Backends

The initial release of NVIDIA NeMo-RL utilized PyTorch DTensor (FSDP2), offering native integration with the HuggingFace ecosystem and enabling quick experimentation through PyTorch’s native parallelisms. However, as model sizes increased to hundreds of billions of parameters, the DTensor path proved inadequate due to significant recompute overhead and lack of optimized NVIDIA CUDA kernels, leading to inefficient step times.

Introducing Megatron-Core

The Megatron-Core library addresses these limitations by offering a more efficient solution for training extensive models. It employs a 6D parallelism strategy to enhance communication and computation patterns, supporting various model architectures. This backend enables seamless training of massive language models, enhancing throughput and performance significantly.

Getting Started with Megatron-Core

Implementing Megatron-based training involves adding specific configurations to the YAML setup. The process is streamlined by NeMo-RL, which handles complex tuning automatically, presenting users with straightforward configuration options. This makes the adoption of Megatron-Core more accessible for developers, allowing them to focus on optimizing their model training processes.

Performance Improvements

Megatron-based training supports both dense and Mixture of Experts (MoE) models. Performance tests have demonstrated superior training performance with Megatron-Core compared to PyTorch DTensor, as shown in various model configurations like Llama 3.1-8B and 70B. The enhancements are evident in faster step times and improved convergence properties.

Additional Features and Future Prospects

NeMo-RL v0.3 introduces features such as async rollouts and non-colocated generation, expanding its capabilities. Looking ahead, NVIDIA plans to support larger MOE models and introduce further optimizations, including FP8 generation support and non-colocated generation with Megatron-Core.

The advancements in NeMo-RL with Megatron-Core backend mark a significant step forward in optimizing reinforcement learning for large-scale language models, ensuring both efficiency and scalability in model training.

Image source: Shutterstock


Source: https://blockchain.news/news/nvidia-enhances-training-throughput-nemo-rl-megatron-core

Market Opportunity
Moonveil Logo
Moonveil Price(MORE)
$0.00005352
$0.00005352$0.00005352
-4.68%
USD
Moonveil (MORE) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

This week, NFT transaction volume rebounded by 1.27% to US$108.6 million, and the number of buyers and sellers increased by more than 50%.

This week, NFT transaction volume rebounded by 1.27% to US$108.6 million, and the number of buyers and sellers increased by more than 50%.

PANews reported on September 21st that Crypto.news reported that CryptoSlam data showed that NFT market transaction volume increased by 1.27% over the past week, reaching $108.6 million. Market participation has rebounded, with the number of NFT buyers increasing by 53.24% to 276,735 and the number of NFT sellers increasing by 67.19% to 206,669. However, the number of NFT transactions decreased by 6.65% to 1,630,579. Ethereum network transaction volume reached $46.7 million, a 42.85% surge from the previous week. Mythos Chain network transaction volume reached $12.15 million, down 21.91%. Bitcoin network transaction volume reached $9.82 million, down 2.17%. This week's high-value transactions include: BOOGLE sold for 1,380 SOL ($324,846 USD) CryptoPunks #8521 sold for 55.48 ETH ($255,288 USD) CryptoPunks #4420 sold for 56.388 ETH ($254,250) CryptoPunks #2642 sold for 52.1 ETH ($239,735) CryptoPunks #1180 sold for 49.89 ETH ($232,394)
Share
PANews2025/09/21 09:01
XRP’s ‘True Value’ Could Be $32, Says BlackRock Executive

XRP’s ‘True Value’ Could Be $32, Says BlackRock Executive

Robert Mitchnick and Susan Athey’s 2018 study valued XRP up to $32 under adoption scenarios. Bitcoin is trading above the modeled fair value of $93,000 at $112,800, while XRP has remained stagnant around $3. A resurfaced research paper co-authored in 2018 by Robert Mitchnick, now Head of Digital Assets at BlackRock, has drawn fresh attention [...]]]>
Share
Crypto News Flash2025/09/22 16:40
Grayscale’s ‘first multi-crypto asset ETP’ in the works: Will BTC, ETH win?

Grayscale’s ‘first multi-crypto asset ETP’ in the works: Will BTC, ETH win?

The post Grayscale’s ‘first multi-crypto asset ETP’ in the works: Will BTC, ETH win? appeared on BitcoinEthereumNews.com. Key Takeaways What does this approval mean for investors? It allows traditional investors to access diversified exposure to major cryptocurrencies without buying tokens directly. Which cryptocurrencies are included in GDLC? Bitcoin, Ether, XRP, Solana, and Cardano. The U.S. Securities and Exchange Commission (SEC) has greenlit the Grayscale Digital Large Cap Fund (GDLC) for stock exchange trading.  The approval, coinciding with relaxed ETF listing standards, opens the door for traditional investors to access the crypto market more easily and signals growing institutional support. Grayscale CEO Peter Mintzberg weighs in Grayscale CEO Peter Mintzberg confirmed the development on X (formerly Twitter), praising the SEC’s Crypto Task Force for providing much-needed clarity to the sector. He said,  “The Grayscale team is working expeditiously to bring the FIRST multi #crypto asset ETP to market with Bitcoin, Ethereum, XRP, Solana, and Cardano.” He further added,  “Thank you to the SEC #Crypto Task Force for their continued, unmatched efforts in bringing the regulatory clarity our industry deserves.” The newly approved Grayscale Digital Large Cap Fund (GDLC) offers investors exposure to five of the world’s largest cryptocurrencies: Bitcoin [BTC], Ethereum [ETH], Ripple [XRP], Solana [SOL], and Cardano [ADA]. Impact on included tokens Following the announcement, markets reacted positively. BTC traded at $117,153.61 after a 0.69% rise in the past 24 hours, Ether climbed 2.02% to $4,579.73, XRP at $3.10 up by 3.07%, Solana at $245.94 up by 4.78%, and Cardano reached $0.9130 up by 4.85%, per CoinMarketCap. By packaging multiple cryptocurrencies into a single ETP, GDLC allows traditional investors to gain diversified crypto exposure without the need to open exchange accounts or purchase individual tokens. This green light comes just months after the SEC had delayed Grayscale’s plan to convert GDLC from an over-the-counter fund to an ETP listed on NYSE Arca. With approval now granted, the fund is…
Share
BitcoinEthereumNews2025/09/19 12:53

Trade GOLD, Share 1,000,000 USDT

Trade GOLD, Share 1,000,000 USDTTrade GOLD, Share 1,000,000 USDT

0 fees, up to 1,000x leverage, deep liquidity