Mixture-of-Adaptations (MoA) introduces stochastic routing, consistency regularization, and module merging to make large language model fine-tuning more parameter-efficient. By randomly routing inputs across adaptation modules, then merging or averaging their weights, MoA reduces FLOPs and computational cost without sacrificing performance. This approach connects with Bayesian inference and model ensembling, offering a robust yet efficient path to adapting LLMs.Mixture-of-Adaptations (MoA) introduces stochastic routing, consistency regularization, and module merging to make large language model fine-tuning more parameter-efficient. By randomly routing inputs across adaptation modules, then merging or averaging their weights, MoA reduces FLOPs and computational cost without sacrificing performance. This approach connects with Bayesian inference and model ensembling, offering a robust yet efficient path to adapting LLMs.

How Mixture-of-Adaptations Makes Language Model Fine-Tuning Cheaper and Smarter

Abstract and 1. Introduction

  1. Background

    2.1 Mixture-of-Experts

    2.2 Adapters

  2. Mixture-of-Adaptations

    3.1 Routing Policy

    3.2 Consistency regularization

    3.3 Adaptation module merging and 3.4 Adaptation module sharing

    3.5 Connection to Bayesian Neural Networks and Model Ensembling

  3. Experiments

    4.1 Experimental Setup

    4.2 Key Results

    4.3 Ablation Study

  4. Related Work

  5. Conclusions

  6. Limitations

  7. Acknowledgment and References

Appendix

A. Few-shot NLU Datasets B. Ablation Study C. Detailed Results on NLU Tasks D. Hyper-parameter

3 Mixture-of-Adaptations

\

3.1 Routing Policy

Recent work like THOR (Zuo et al., 2021) has demonstrated stochastic routing policy like random routing to work as well as classical routing mechanism like Switch routing (Fedus et al., 2021) with the following benefits. Since input examples are randomly routed to different experts, there is no requirement for additional load balancing as each expert has an equal opportunity of being activated simplifying the framework. Further, there are no added parameters, and therefore no additional computation, at the Switch layer for expert selection. The latter is particularly important in our setting for parameter-efficient fine-tuning to keep the parameters and FLOPs the same as that of a single adaptation module. To analyze the working of AdaMix, we demonstrate connections to stochastic routing and model weight averaging to Bayesian Neural Networks and model ensembling in Section 3.5.

\ \

\ \ Such stochastic routing enables adaptation modules to learn different transformations during training and obtain multiple views of the task. However, this also creates a challenge on which modules to use during inference due to random routing protocol during training. We address this challenge with the following two techniques that further allow us to collapse adaptation modules and obtain the same computational cost (FLOPs, #tunable adaptation parameters) as that of a single module.

3.2 Consistency regularization

\

\ \ \

3.3 Adaptation module merging

While the above regularization mitigates inconsistency in random module selection during inference, it still results in increased serving cost to host several adaptation modules. Prior works in fine-tuning language models for downstream tasks have shown improved performance on averaging the weights of different models fine-tuned with different random seeds outperforming a single fine-tuned model. Recent work (Wortsman et al., 2022) has also shown that differently fine-tuned models from the same initialization lie in the same error basin motivating the use of weight aggregation for robust task summarization. We adopt and extend prior techniques for language model fine-tuning to our parameterefficient training of multi-view adaptation modules

\ \

\

3.4 Adaptation module sharing

\

3.5 Connection to Bayesian Neural Networks and Model Ensemblin

\

\ \ This requires averaging over all possible model weights, which is intractable in practice. Therefore, several approximation methods have been developed based on variational inference methods and stochastic regularization techniques using dropouts. In this work, we leverage another stochastic regularization in the form of random routing. Here, the objective is to find a surrogate distribution qθ(w) in a tractable family of distributions that can replace the true model posterior that is hard to compute. The ideal surrogate is identified by minimizing the Kullback-Leibler (KL) divergence between the candidate and the true posterior.

\ \

\ \ \

\ \ \

\ \ \ \

:::info Authors:

(1) Yaqing Wang, Purdue University (wang5075@purdue.edu);

(2) Sahaj Agarwal, Microsoft (sahagar@microsoft.com);

(3) Subhabrata Mukherjee, Microsoft Research (submukhe@microsoft.com);

(4) Xiaodong Liu, Microsoft Research (xiaodl@microsoft.com);

(5) Jing Gao, Purdue University (jinggao@purdue.edu);

(6) Ahmed Hassan Awadallah, Microsoft Research (hassanam@microsoft.com);

(7) Jianfeng Gao, Microsoft Research (jfgao@microsoft.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Market Opportunity
FINE Logo
FINE Price(FINE)
$0.0000000007713
$0.0000000007713$0.0000000007713
-0.16%
USD
FINE (FINE) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

U.S. Court Finds Pastor Found Guilty in $3M Crypto Scam

U.S. Court Finds Pastor Found Guilty in $3M Crypto Scam

The post U.S. Court Finds Pastor Found Guilty in $3M Crypto Scam appeared on BitcoinEthereumNews.com. Crime 18 September 2025 | 04:05 A Colorado judge has brought closure to one of the state’s most unusual cryptocurrency scandals, declaring INDXcoin to be a fraudulent operation and ordering its founders, Denver pastor Eli Regalado and his wife Kaitlyn, to repay $3.34 million. The ruling, issued by District Court Judge Heidi L. Kutcher, came nearly two years after the couple persuaded hundreds of people to invest in their token, promising safety and abundance through a Christian-branded platform called the Kingdom Wealth Exchange. The scheme ran between June 2022 and April 2023 and drew in more than 300 participants, many of them members of local church networks. Marketing materials portrayed INDXcoin as a low-risk gateway to prosperity, yet the project unraveled almost immediately. The exchange itself collapsed within 24 hours of launch, wiping out investors’ money. Despite this failure—and despite an auditor’s damning review that gave the system a “0 out of 10” for security—the Regalados kept presenting it as a solid opportunity. Colorado regulators argued that the couple’s faith-based appeal was central to the fraud. Securities Commissioner Tung Chan said the Regalados “dressed an old scam in new technology” and used their standing within the Christian community to convince people who had little knowledge of crypto. For him, the case illustrates how modern digital assets can be exploited to replicate classic Ponzi-style tactics under a different name. Court filings revealed where much of the money ended up: luxury goods, vacations, jewelry, a Range Rover, high-end clothing, and even dental procedures. In a video that drew worldwide attention earlier this year, Eli Regalado admitted the funds had been spent, explaining that a portion went to taxes while the remainder was used for a home renovation he claimed was divinely inspired. The judgment not only confirms that INDXcoin qualifies as a…
Share
BitcoinEthereumNews2025/09/18 09:14
MSCI’s Proposal May Trigger $15B Crypto Outflows

MSCI’s Proposal May Trigger $15B Crypto Outflows

MSCI's plan to exclude crypto-treasury companies could cause $15B outflows, impacting major firms.
Share
CoinLive2025/12/19 13:17
This U.S. politician’s suspicious stock trade just returned over 200% in weeks

This U.S. politician’s suspicious stock trade just returned over 200% in weeks

The post This U.S. politician’s suspicious stock trade just returned over 200% in weeks appeared on BitcoinEthereumNews.com. United States Representative Cloe Fields has seen his stake in Opendoor Technologies (NASDAQ: OPEN) stock return over 200% in just a matter of weeks. According to congressional trade filings, the lawmaker purchased a stake in the online real estate company on July 21, 2025, investing between $1,001 and $15,000. At the time, the stock was trading around $2 and had been largely stagnant for months. Receive Signals on US Congress Members’ Stock Trades Stocks Stay up-to-date on the trading activity of US Congress members. The signal triggers based on updates from the House disclosure reports, notifying you of their latest stock transactions. Enable signal The trade has since paid off, with Opendoor surging to $10, a gain of nearly 220% in under two months. By comparison, the broader S&P 500 index rose less than 5% during the same period. OPEN one-week stock price chart. Source: Finbold Assuming he invested a minimum of $1,001, the purchase would now be worth about $3,200, while a $15,000 stake would have grown to nearly $48,000, generating profits of roughly $2,200 and $33,000, respectively. OPEN’s stock rally Notably, Opendoor’s rally has been fueled by major corporate shifts and market speculation. For instance, in August, the company named former Shopify COO Kaz Nejatian as CEO, while co-founders Keith Rabois and Eric Wu rejoined the board, moves seen as a return to the company’s early innovative spirit.  Outgoing CEO Carrie Wheeler’s resignation and sale of millions in stock reinforced the sense of a new chapter. Beyond leadership changes, Opendoor’s surge has taken on meme-stock characteristics. In this case, retail investors piled in as shares climbed, while short sellers scrambled to cover, pushing prices higher.  However, the stock is still not without challenges, where its iBuying model is untested at scale, margins are thin, and debt tied to…
Share
BitcoinEthereumNews2025/09/18 04:02