This article introduces a novel Knowledge Consolidation strategy for IIL that utilizes Exponential Moving Average to transfer learned knowledge from the student to the teacher model.This article introduces a novel Knowledge Consolidation strategy for IIL that utilizes Exponential Moving Average to transfer learned knowledge from the student to the teacher model.

Model Promotion: Using EMA to Balance Learning and Forgetting in IIL

2025/11/06 00:30
3 min di lettura
Per feedback o dubbi su questo contenuto, contattateci all'indirizzo crypto.news@mexc.com.

Abstract and 1 Introduction

  1. Related works

  2. Problem setting

  3. Methodology

    4.1. Decision boundary-aware distillation

    4.2. Knowledge consolidation

  4. Experimental results and 5.1. Experiment Setup

    5.2. Comparison with SOTA methods

    5.3. Ablation study

  5. Conclusion and future work and References

    \

Supplementary Material

  1. Details of the theoretical analysis on KCEMA mechanism in IIL
  2. Algorithm overview
  3. Dataset details
  4. Implementation details
  5. Visualization of dusted input images
  6. More experimental results

4.2. Knowledge consolidation

Different from existing IIL methods that only focus on the student model, we propose to consolidate knowledge from student to teacher for better balance between learning and forgetting. The consolidation is not implemented through learning but through model exponential moving average (EMA). Model EMA was initially introduced by Tarvainen et al. [28] to enhance the generalizability of models. In the vanilla model EMA, the model is trained from scratch, and EMA is applied after every iteration. The underlying mechanism of model EMA is not thoroughly explained before. In this work, we leverage model EMA for knowledge consolidation (KC) in the context of IIL task and explain the mechanism theoretically. According to our theoretical analysis, we propose a new KC-EMA for knowledge consolidation. Mathematically, the model EMA can be formulated as

\

\ Hence, the teacher model can achieve a minima training loss on both the old task and the new task, which indicates improved generalization on both the old data and new observations. This has been verified by our experiments in Sec. 5. However, since α < 1, it is noteworthy that the gradient of the teacher model, whether on the old task or the new task, is larger than the initial gradient on the old task or the final gradient of the student model on the new task. That is, the obtained teacher model sacrifices some unilateral performance on either the old data or the new data in order to achieve better generalization on both. From this perspective, the mechanism of vanilla EMA could also be partially explained. In vanilla EMA, where the model starts from scratch and only the new task is considered, we only need to focus on the second term in Equation 13. Since the teacher model has larger gradient on the training data than the student model, it is less possible to overfit to the training data. As a result, the teacher model has better generalization as Tarvainen et al. [28] observed.

\

\

\

:::info Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

\

Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta crypto.news@mexc.com per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.

Potrebbe anche piacerti

Swiss Franc Intervention: Critical Analysis of SNB’s 2025 Policy and Safe-Haven Resilience

Swiss Franc Intervention: Critical Analysis of SNB’s 2025 Policy and Safe-Haven Resilience

BitcoinWorld Swiss Franc Intervention: Critical Analysis of SNB’s 2025 Policy and Safe-Haven Resilience ZURICH, March 2025 – The Swiss National Bank faces mounting
Condividi
bitcoinworld2026/03/16 23:10
Tapzi is Investors’ 1000x Pick in Volatile Market

Tapzi is Investors’ 1000x Pick in Volatile Market

The post Tapzi is Investors’ 1000x Pick in Volatile Market appeared on BitcoinEthereumNews.com. Crypto News 18 September 2025 | 00:05 Bitcoin swings after CPI data release as Tapzi’s presale gains momentum, emerging as a top crypto project in 2025. The crypto market moved sharply last week after the release of US Consumer Price Index (CPI) data. Bitcoin, the largest digital asset, reacted within minutes of the announcement, recording rapid swings before settling back near earlier levels.  At the same time, presale projects continued to attract investors, with Tapzi emerging as one of the most-watched tokens this month. It is being picked by investors as the next crypto to explode due to its high-growth potential in Tier 1 and Tier 2 countries, with Web3 gaming’s increasing adoption. Tapzi Presale Draws Attention While Bitcoin reacted to economic data, Tapzi’s presale has become a focal point among both retail and larger investors. Tapzi is a Web3 gaming platform designed to merge competitive gameplay with blockchain-based settlements. Players stake TAPZI tokens in head-to-head matches of chess, checkers, rock-paper-scissors, and tic-tac-toe. Winners receive tokens directly from prize pools funded by players, not by inflationary rewards. Don’t Watch the Wave – Ride It With $TAPZI! The presale opened with tokens priced at $0.0035. More than 27 million tokens have already been sold, with prices set to increase in each new stage. Analysts following the sale point to potential gains of around 300% once TAPZI lists on exchanges later this year. Liquidity locks and vesting schedules are in place to reduce the risks of sharp sell-offs after launch. This has placed Tapzi on the radar of investors searching for the best crypto to buy now. Bitcoin Price Reacts to CPI Last week, Bitcoin climbed toward $114,000 before jumping to $114,500, its highest level in weeks. The gains were short-lived as the price quickly dropped by $1,000. At press time, Bitcoin…
Condividi
BitcoinEthereumNews2025/09/18 06:26
Why Startups Should Choose BitPay Clone Script in 2026

Why Startups Should Choose BitPay Clone Script in 2026

Every few years, a shift happens in payments that separates the businesses who move fast from those who don’t. In 2010, it was mobile payments. In 2016, it
Condividi
Medium2026/03/16 22:44