Details BSGAL's implementation on the LVIS dataset using CenterNet2 with ResNet-50/Swin-L backbones.Details BSGAL's implementation on the LVIS dataset using CenterNet2 with ResNet-50/Swin-L backbones.

Technical Details: BSGAL Training, Swin-L Backbone, and Dynamic Threshold Strategy

2025/12/07 02:00

Abstract and 1 Introduction

  1. Related work

    2.1. Generative Data Augmentation

    2.2. Active Learning and Data Analysis

  2. Preliminary

  3. Our method

    4.1. Estimation of Contribution in the Ideal Scenario

    4.2. Batched Streaming Generative Active Learning

  4. Experiments and 5.1. Offline Setting

    5.2. Online Setting

  5. Conclusion, Broader Impact, and References

    \

A. Implementation Details

B. More ablations

C. Discussion

D. Visualization

A. Implementation Details

A.1. Dataset

We choose LVIS (Gupta et al., 2019) as the dataset for our experiments. LVIS is a large-scale instance segmentation dataset, comprising approximately 160,000 images with over 2 million high-quality instance segmentation annotations across 1203 real-world categories. The dataset is further divided into three categories: rare, common, and frequent, based on their occurrence across images. Instances marked as ‘rare’ appear in 1-10 images, ‘common’ instances appear in 11-100 images, whereas ‘frequent’ instances appear in more than 100 images. The overall dataset exhibits a long-tail distribution, closely resembling the data distribution in the real world, and is widely applied under multiple settings, including few-shot segmentation (Liu et al., 2023) and open-world segmentation (Wang et al., 2022; Zhu et al., 2023). Therefore, we believe that selecting LVIS allows for a better reflection of the model’s performance in real-world scenarios. We use the official LVIS dataset splits, with about 100,000 images in the training set and 20,000 images in the validation set.

A.2. Data Generation

Our data generation and annotation process is consistent with Zhao et al. (2023), and we briefly introduce it here. We first use StableDiffusion V1.5 (Rombach et al., 2022a) (SD) as the generative model. For the 1203 categories in LVIS (Gupta et al., 2019), we generate 1000 images per category, with image resolution 512 × 512. The prompt template for generation is “a photo of a single {CATEGORY NAME}”. We use U2Net (Qin et al., 2020), SelfReformer (Yun and Lin, 2022), UFO (Su et al., 2023), and CLIPseg (Luddecke and Ecker ¨ , 2022) respectively to annotate the raw generative images, and select the mask with the highest CLIP score as the final annotation. To ensure data quality, images with CLIP scores below 0.21 are filtered out as low-quality images. During training, we also employ the instance paste strategy provided by Zhao et al. (2023) for data augmentation. For each instance, we randomly resize it to match the distribution of its category in the training set. The maximum number of pasted instances per image is set to 20.

\ In addition, to further expand the diversity of generated data and make our research more universal, we also use other generative models, including DeepFloyd-IF (Shonenkov et al., 2023) (IF) and Perfusion (Tewel et al., 2023) (PER), with 500 images per category per model. For IF, we use the pre-trained model provided by the author, and the generated images are the output of Stage II, with a resolution of 256×256. For PER, the base model we use is StableDiffusion V1.5. For each category, we fine-tune the model using the images croped from the training set, with 400 fine-tuning steps. We use the fine-tuned model to generate images.

\ Table 7. Comparison of different generated data.

\ We also explore the effect of using different generated data on the model performance (see Table 7). We can see that based on the original StableDiffusion V1.5, using other generative models can bring some performance improvement, but this improvement is not obvious. Specifically, for specific frequency categories, we found that IF has a more significant improvement for rare categories, while PER has a more significant improvement for common categories. This is likely because IF data is more diverse, while PER data is more consistent with the distribution of the training set. Considering that the overall performance has been improved to a certain extent, we finally adopt the generated data of SD + IF + PER for subsequent experiments.

A.3. Model Training

Follow Zhao et al. (2023), We use CenterNet2 (Zhou et al., 2021) as our segmentation model, with ResNet-50 (He et al., 2016) or Swin-L (Liu et al., 2022) as the backbone. For ResNet-50, the maximum training iteration is set to 90,000 and the model is initialized with weights first pretrained on ImageNet-22k then finetuned on LVIS (Gupta et al., 2019), as Zhao

\ Figure 5. Model performances when using different amount of generated data.

\ et al. (2023) did. And we use 4 Nvidia 4090 GPUs with a batch size of 16 during training. As for Swin-L, the maximum training iteration is set to 180,000 and the model is initialized with weights pretrained on ImageNet-22k, since our early experiments show that this initialization can bring a slight improvement compared to the weights trained with LVIS. And we use 4 Nvidia A100 GPUs with a batch size of 16 for training. Besides, due to the large number of parameters of Swin-L, the additional memory occupied by saving the gradient is large, so we actually use the algorithm in Algorithm 2.

\ The other unspecified parameters also follow the same settings as X-Paste (Zhao et al., 2023), such as the AdamW (Loshchilov and Hutter, 2017) optimizer with an initial learning rate of 1e−4.

A.4. Data Amount

In this work, we have generated over 2 million images. Figure 5 shows the model performances when using different amount of generated data(1%,10%,40%,70%,100%). Overall, as the amount of generated data increases, the performance of the model also improves, but there is also some fluctuation. Our method is always better than the baseline, which proves the effectiveness and robustness of our method.

A.5. Contribution Estimation

\ Thus, we essentially calculate the cosine similarity. Then we conducted an experimental comparison, as shown in Table 8,

\ Table 8. Comparison of using grad normalization or not.

\ Figure 6. Illustration of noisy images exhibiting various noise scales and categories. Each row, from top to bottom, signifies different noise levels, specifically 0, 40, 100, 200, and 400, respectively. All images are sourced from the CIFAR-10 dataset.

\ we can see that if we normalize the gradient, our method will have a certain improvement. In addition, since we need to keep two different thresholds, it is difficult to ensure the consistency of the acceptance rate. So we adopt a dynamic threshold strategy, pre-set an acceptance rate, maintain a queue to save the contribution of the previous iter, and then dynamically adjust the threshold according to the queue, so that the acceptance rate stays at the pre-set acceptance rate.

A.6. Toy Experiment

The following are the specific experimental settings implemented on CIFAR-10: We employed a simple ResNet18 as the baseline model and conducted training over 200 epochs, and the accuracy after training on the original training set is 93.02%. The learning rate is set at 0.1, utilizing the SGD optimizer. A momentum of 0.9 is in effect, with a weight decay of 5e-4. We use a cosine annealing learning rate scheduler. The constructed noisy images are depicted in Figure 6. A decline in image quality is observed as the noise level escalates. Notably, when the noise level reaches 200, the images become significantly challenging to identify. For Table 1, we use Split1 as R, while G consists of ‘Split2 + Noise40’, ‘Split3 + Noise100’, ‘Split4 + Noise200’,

A.7. A Simplification Only Forward Once

\

:::info Authors:

(1) Muzhi Zhu, with equal contribution from Zhejiang University, China;

(2) Chengxiang Fan, with equal contribution from Zhejiang University, China;

(3) Hao Chen, Zhejiang University, China (haochen.cad@zju.edu.cn);

(4) Yang Liu, Zhejiang University, China;

(5) Weian Mao, Zhejiang University, China and The University of Adelaide, Australia;

(6) Xiaogang Xu, Zhejiang University, China;

(7) Chunhua Shen, Zhejiang University, China (chunhuashen@zju.edu.cn).

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Short-Term Bitcoin Profits Dominate For The First Time Since 2023

Short-Term Bitcoin Profits Dominate For The First Time Since 2023

The post Short-Term Bitcoin Profits Dominate For The First Time Since 2023 appeared on BitcoinEthereumNews.com. Bitcoin is making another attempt to break the downtrend that has kept the crypto king capped since late October. Price is hovering near $91,000 as investors watch a rare shift in market structure unfold.  For the first time in more than two and a half years, short-term holders have surpassed long-term holders in realized profits, creating both opportunities and risks for BTC. Sponsored Sponsored Bitcoin Sees Some Shift The MVRV Long/Short Difference highlights a notable change in Bitcoin’s profit distribution. A positive reading usually signals long-term holders hold more unrealized gains, while a negative value indicates short-term holders are ahead. In Bitcoin’s case, the difference has dipped into negative territory for the first time since March 2023. This marks 30 months since short-term holders last led in profits. Such dominance raises concerns because short-term holders tend to sell aggressively when volatility increases. Their profit-taking behavior could add pressure on BTC’s price if the broader market weakens, especially during attempts to break the downtrend. Want more token insights like this? Sign up for Editor Harsh Notariya’s Daily Crypto Newsletter here. Bitcoin MVRV Long/Short Difference. Source: Santiment Sponsored Sponsored Despite this shift, Bitcoin’s broader momentum shows encouraging signs. Exchange net position change data confirms rising outflows across major platforms, signaling a shift in investor accumulation. BTC leaving exchanges is often treated as a bullish indicator, reflecting confidence in long-term appreciation. This trend suggests that many traders view the $90,000 range as a reasonable bottom zone and are preparing for a potential recovery. Sustained outflows support price stability and strengthen the probability of BTC breaking above immediate resistance levels. Bitcoin Exchange Net Position Change. Source: Glassnode BTC Price Is Trying Its Best Bitcoin is trading at $91,330 at the time of writing, positioned just below the $91,521 resistance. Reclaiming this level and flipping it into support…
Share
BitcoinEthereumNews2025/12/08 05:57
OKX founder responds to Moore Threads co-founder 1,500 BTC debt

OKX founder responds to Moore Threads co-founder 1,500 BTC debt

The post OKX founder responds to Moore Threads co-founder 1,500 BTC debt appeared on BitcoinEthereumNews.com. The successful stock market debut of Moore Threads, a company that’s being touted as China’s answer to Nvidia, has been overshadowed by resurfaced allegations that link one of its co-founders to an unpaid cryptocurrency debt that has been lingering for roughly a decade. Shares in the GPU maker skyrocketed to as much as 470% on Thursday following its initial public offering (IPO) on the Shanghai Stock Exchange, valuing the company at around RMB 282 billion ($39.9 billion). However, as the success was being celebrated online, a social media post revived claims that Moore Threads’ co-founder Li Feng borrowed 1,500 Bitcoins from Mingxing “Star” Xu, founder and CEO of cryptocurrency exchange OKX, and never repaid the loan. Crypto past with OKX founder resurfaces In an X post, AB Kuai.Dong referenced Feng’s involvement in a 2017 initial coin offering that raised 5,000 ETH alongside controversial angel investor Xue Manzi. Feng allegedly dismissed the Bitcoin loan, stating, “It was just that Xu Mingxing’s investment in me had failed.” Xu responded to the post with a conciliatory message, writing, “People cannot always remain in the shadow of negative history. Face the future and contribute more positive energy.” He added, “Let the legal system handle the debt issue,” and offered blessings to every entrepreneur. Feng reportedly partnered with Xue Manzi and Li Xiaolai in 2017 to launch Malego Coin, which was later renamed Alpaca Coin MGD. The project reportedly raised approximately 5,000 ETH, but it was around this period that China banned ICOs, allowing regulators to crack down on what they viewed as speculative excess and potential fraud in the cryptocurrency sector. The Bitcoin loan dispute appears separate from the ICO controversy. According to sources familiar with the matter, the original loan agreement was dated December 17, 2014, with an expiry of December 16, 2016.…
Share
BitcoinEthereumNews2025/12/08 06:13