TLDR DeepSeek introduced Manifold-Constrained Hyper-Connections (mHC) to improve large-model training scalability and efficiency. The mHC method was tested on 3BTLDR DeepSeek introduced Manifold-Constrained Hyper-Connections (mHC) to improve large-model training scalability and efficiency. The mHC method was tested on 3B

DeepSeek Introduces mHC Architecture to Improve Large Model Training

TLDR

  • DeepSeek introduced Manifold-Constrained Hyper-Connections (mHC) to improve large-model training scalability and efficiency.
  • The mHC method was tested on 3B, 9B, and 27B parameter models, showing stable performance without added computational cost.
  • mHC builds on ByteDance’s 2024 hyper-connection architecture by adding a manifold constraint to reduce memory overhead.
  • CEO Liang Wenfeng co-authored and uploaded the paper, reaffirming his direct involvement in DeepSeek’s technical development.
  • Industry observers expect a new DeepSeek model release ahead of Spring Festival 2026, based on the company’s publication patterns.

DeepSeek has released a new AI training method, Manifold-Constrained Hyper-Connections (mHC), in a paper uploaded to arXiv by CEO Liang Wenfeng. The architecture aims to improve training scalability for large models while keeping computational costs low. Researchers tested the method on models with 3, 9, and 27 billion parameters, showing consistent training efficiency. This comes as the company is expected to launch a new model before the Spring Festival in February 2026.

DeepSeek Builds on ResNet and Hyper-Connection Foundations

According to a report by SCMP, the mHC method enhances earlier hyper-connection (HC) designs first proposed by ByteDance in 2024 as an improvement to ResNet. ResNet allows deeper neural networks by preserving signal strength across layers, but faces challenges in maintaining efficient learning at large scale. ByteDance’s HC improved signal flow but didn’t fully address memory usage in larger models.

DeepSeek introduced a manifold constraint to limit expansion and better control memory and compute costs during training. This adjustment preserved the HC benefits while making the network suitable for larger training tasks. Researchers wrote that mHC maintained performance without increasing computational overhead per unit during model training at scale.

Lead authors Zhenda Xie, Yixuan Wei, and Huanqi Cao explained that the system enables stable deep learning without collapse. They confirmed mHC works with minimal infrastructure adjustments, making it efficient for broader deployment. The architecture was tested across multiple model sizes, confirming the technique’s adaptability and reliability. DeepSeek reported that the method handled signal preservation and scalability better than previous HC-based frameworks.

Liang Wenfeng Directly Leads Technical Advancement

CEO Liang Wenfeng was listed as the final author and uploaded the paper himself, continuing his role in major DeepSeek research. He has consistently shared technical papers linked to the company’s top models, such as R1 and V3 on arXiv. Other researchers typically upload supporting studies not directly tied to product development.

His involvement in this paper signals continued leadership in the company’s core AI work. The release underscores DeepSeek’s approach of linking internal research closely with future product direction. Florian Brand, a PhD researcher at Trier University, said DeepSeek papers often indicate what models are coming next.

He noted that the R1 model followed a similar pattern of publication and then launch. Liang’s involvement has again drawn attention from analysts watching DeepSeek’s release schedule. The company has not announced a date, but its publication strategy has become predictable. DeepSeek has remained quiet on details, but research uploads suggest new systems are under development.

The post DeepSeek Introduces mHC Architecture to Improve Large Model Training appeared first on Blockonomi.

Market Opportunity
Hyperlane Logo
Hyperlane Price(HYPER)
$0.122
$0.122$0.122
+0.37%
USD
Hyperlane (HYPER) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Academic Publishing and Fairness: A Game-Theoretic Model of Peer-Review Bias

Academic Publishing and Fairness: A Game-Theoretic Model of Peer-Review Bias

Exploring how biases in the peer-review system impact researchers' choices, showing how principles of fairness relate to the production of scientific knowledge based on topic importance and hardness.
Share
Hackernoon2025/09/17 23:15
Foreigner’s Lou Gramm Revisits The Band’s Classic ‘4’ Album, Now Reissued

Foreigner’s Lou Gramm Revisits The Band’s Classic ‘4’ Album, Now Reissued

The post Foreigner’s Lou Gramm Revisits The Band’s Classic ‘4’ Album, Now Reissued appeared on BitcoinEthereumNews.com. American-based rock band Foreigner performs onstage at the Rosemont Horizon, Rosemont, Illinois, November 8, 1981. Pictured are, from left, Mick Jones, on guitar, and vocalist Lou Gramm. (Photo by Paul Natkin/Getty Images) Getty Images Singer Lou Gramm has a vivid memory of recording the ballad “Waiting for a Girl Like You” at New York City’s Electric Lady Studio for his band Foreigner more than 40 years ago. Gramm was adding his vocals for the track in the control room on the other side of the glass when he noticed a beautiful woman walking through the door. “She sits on the sofa in front of the board,” he says. “She looked at me while I was singing. And every now and then, she had a little smile on her face. I’m not sure what that was, but it was driving me crazy. “And at the end of the song, when I’m singing the ad-libs and stuff like that, she gets up,” he continues. “She gives me a little smile and walks out of the room. And when the song ended, I would look up every now and then to see where Mick [Jones] and Mutt [Lange] were, and they were pushing buttons and turning knobs. They were not aware that she was even in the room. So when the song ended, I said, ‘Guys, who was that woman who walked in? She was beautiful.’ And they looked at each other, and they went, ‘What are you talking about? We didn’t see anything.’ But you know what? I think they put her up to it. Doesn’t that sound more like them?” “Waiting for a Girl Like You” became a massive hit in 1981 for Foreigner off their album 4, which peaked at number one on the Billboard chart for 10 weeks and…
Share
BitcoinEthereumNews2025/09/18 01:26
Adoption Leads Traders to Snorter Token

Adoption Leads Traders to Snorter Token

The post Adoption Leads Traders to Snorter Token appeared on BitcoinEthereumNews.com. Largest Bank in Spain Launches Crypto Service: Adoption Leads Traders to Snorter Token Sign Up for Our Newsletter! For updates and exclusive offers enter your email. Leah is a British journalist with a BA in Journalism, Media, and Communications and nearly a decade of content writing experience. Over the last four years, her focus has primarily been on Web3 technologies, driven by her genuine enthusiasm for decentralization and the latest technological advancements. She has contributed to leading crypto and NFT publications – Cointelegraph, Coinbound, Crypto News, NFT Plazas, Bitcolumnist, Techreport, and NFT Lately – which has elevated her to a senior role in crypto journalism. Whether crafting breaking news or in-depth reviews, she strives to engage her readers with the latest insights and information. Her articles often span the hottest cryptos, exchanges, and evolving regulations. As part of her ploy to attract crypto newbies into Web3, she explains even the most complex topics in an easily understandable and engaging way. Further underscoring her dynamic journalism background, she has written for various sectors, including software testing (TEST Magazine), travel (Travel Off Path), and music (Mixmag). When she’s not deep into a crypto rabbit hole, she’s probably island-hopping (with the Galapagos and Hainan being her go-to’s). Or perhaps sketching chalk pencil drawings while listening to the Pixies, her all-time favorite band. This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Center or Cookie Policy. I Agree Source: https://bitcoinist.com/banco-santander-and-snorter-token-crypto-services/
Share
BitcoinEthereumNews2025/09/17 23:45