This deep-dive analysis proves why SymTax works, showing its 'symbiotic' enricher and taxonomy fusion are essential for its state-of-the-art performance.This deep-dive analysis proves why SymTax works, showing its 'symbiotic' enricher and taxonomy fusion are essential for its state-of-the-art performance.

A Quantitative and Qualitative Analysis of the SymTax Citation Recommendation Model

Abstract and 1. Introduction

  1. Related Work

  2. Proposed Dataset

  3. SymTax Model

    4.1 Prefetcher

    4.2 Enricher

    4.3 Reranker

  4. Experiments and Results

  5. Analysis

    6.1 Ablation Study

    6.2 Quantitative Analysis and 6.3 Qualitative Analysis

  6. Conclusion

  7. Limitations

  8. Ethics Statement and References

Appendix

6 Analysis

We conduct extensive analysis to assess further the modularity of SymTax, the importance of different modules, combinatorial choice of LM and taxonomy fusion, and the usage of hyperbolic space over Euclidean space. Furthermore, we analysed the effect of using section heading as an additional signal (shown in Appendix A).

\

6.1 Ablation Study

We perform an ablation study to highlight the importance of Symbiosis, taxonomy fusion and hyperbolic space. We consider two variants of SymTax, namely SciBERTvector and SPECTERgraph. For each of these two variants, we further conduct three experiments by (i) removing the Enricher module that works on the principle of Symbiosis, (ii) not considering the taxonomy attribute associated with the citation context and (iii) using Euclidean space to calculate the separation score.

\ As evident from Table 3, Symbiosis exclusion results in a drop of 21.40% and 24.45% in Recall@5 and NDCG respectively for SciBERTvector whereas for SPECTERgraph, it leads to a drop of 17.84% and 20.32% in Recall@5 and NDCG respectively. Similarly, taxonomy exclusion results in a drop of 34.94% and 27.88% in Recall@5 and NDCG respectively for SciBERTvector whereas for SPECTERgraph, it leads to a drop of 14.81% and 12.51% in Recall@5 and NDCG respectively. It is clear from Table 3 that the use of Euclidean space instead of hyperbolic space leads to performance drop across all metrics in both variants. Exclusion of Symbiosis impacts higher recall metrics more in comparison to excluding taxonomy fusion and hyperbolic space.

\ Table 4: Analysis on choice of LM and taxonomy fusion on 10k random samples from ArSyTa. Best results are highlighted in bold and second best are italicised.

\

6.2 Quantitative Analysis

We consider two available LMs, i.e. SciBERT and SPECTER, and the two types of taxonomy fusion, i.e. graph-based and vector-based. This results in four variants, as shown in Table 4. As evident from the results, SciBERTvector and SPECTERgraph are the best-performing variants. So, the combinatorial choice of LM and taxonomy fusion plays a vital role in model performance. The above observations can be attributed to SciBERT being a LM trained on plain scientific text. In contrast, SPECTER is a LM trained with Triplet loss using 1-hop neighbours of the positive sample from the citation graph as hard negative samples. So, SPECTER embodies graph information inside itself, whereas SciBERT does not.

\

6.3 Qualitative Analysis

We assess the quality of recommendations given by different algorithms by randomly choosing an example. Though random, we choose the example that has multiple citations in a given context so that we can present the qualitative analysis well by investigating the top-10 ranked predictions. As shown in Table 5, we consider an excerpt from Liu et al. (2020) that contains five citations. As we can see that Symtax correctly recommend three citations in the top-10, whereas HAtten only recommend one citation correctly at rank 1 and BM25 only suggest one correct citation at rank 10. The use of title is crucial to performance, as we can see that many recommendations consist of the words “BERT" and “Pretraining", which are the keywords present in the title. One more observation is that the taxonomy plays a vital role in recommendations. The taxonomy category of the query is ‘Computation and Language‘, and most of the recommended articles are from the same category. SymTax gives only one recommendation (Deep Residual Learning for Image Recognition) from a different category, i.e.“Computer Vision", whereas HAtten recommends three citations from different categories, i.e. (Deep Residual Learning for Image Recognition) from “Computer Vision" and (Batch Normalization, and Adam) from “Machine Learning".

Table 5: The table shows the top-10 citation recommendations given by various algorithms for a randomly chosen example from ArSyTa. Valid predictions are highlighted in bold. It clearly shows that SymTax (SciBERT_vector) is able to recommend three valid articles in the top-10. In contrast, each of the HAtten and BM25 could recommend only one valid article for the given citation context. # denotes the rank of the recommended citations.

\

:::info Authors:

(1) Karan Goyal, IIIT Delhi, India (karang@iiitd.ac.in);

(2) Mayank Goel, NSUT Delhi, India (mayank.co19@nsut.ac.in);

(3) Vikram Goyal, IIIT Delhi, India (vikram@iiitd.ac.in);

(4) Mukesh Mohania, IIIT Delhi, India (mukesh@iiitd.ac.in).

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

Market Opportunity
DeepBook Logo
DeepBook Price(DEEP)
$0.050804
$0.050804$0.050804
-1.94%
USD
DeepBook (DEEP) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Taiko Makes Chainlink Data Streams Its Official Oracle

Taiko Makes Chainlink Data Streams Its Official Oracle

The post Taiko Makes Chainlink Data Streams Its Official Oracle appeared on BitcoinEthereumNews.com. Key Notes Taiko has officially integrated Chainlink Data Streams for its Layer 2 network. The integration provides developers with high-speed market data to build advanced DeFi applications. The move aims to improve security and attract institutional adoption by using Chainlink’s established infrastructure. Taiko, an Ethereum-based ETH $4 514 24h volatility: 0.4% Market cap: $545.57 B Vol. 24h: $28.23 B Layer 2 rollup, has announced the integration of Chainlink LINK $23.26 24h volatility: 1.7% Market cap: $15.75 B Vol. 24h: $787.15 M Data Streams. The development comes as the underlying Ethereum network continues to see significant on-chain activity, including large sales from ETH whales. The partnership establishes Chainlink as the official oracle infrastructure for the network. It is designed to provide developers on the Taiko platform with reliable and high-speed market data, essential for building a wide range of decentralized finance (DeFi) applications, from complex derivatives platforms to more niche projects involving unique token governance models. According to the project’s official announcement on Sept. 17, the integration enables the creation of more advanced on-chain products that require high-quality, tamper-proof data to function securely. Taiko operates as a “based rollup,” which means it leverages Ethereum validators for transaction sequencing for strong decentralization. Boosting DeFi and Institutional Interest Oracles are fundamental services in the blockchain industry. They act as secure bridges that feed external, off-chain information to on-chain smart contracts. DeFi protocols, in particular, rely on oracles for accurate, real-time price feeds. Taiko leadership stated that using Chainlink’s infrastructure aligns with its goals. The team hopes the partnership will help attract institutional crypto investment and support the development of real-world applications, a goal that aligns with Chainlink’s broader mission to bring global data on-chain. Integrating real-world economic information is part of a broader industry trend. Just last week, Chainlink partnered with the Sei…
Share
BitcoinEthereumNews2025/09/18 03:34
Kalshi Prediction Markets Are Pulling In $1 Billion Monthly as State Regulators Loom

Kalshi Prediction Markets Are Pulling In $1 Billion Monthly as State Regulators Loom

The post Kalshi Prediction Markets Are Pulling In $1 Billion Monthly as State Regulators Loom appeared on BitcoinEthereumNews.com. In brief Kalshi reached $1 billion in monthly volume and now dominates 62% of the global prediction market industry, surpassing Polymarket’s 37% share. Four states including Massachusetts have filed lawsuits claiming Kalshi operates as an unlicensed sportsbook, with Massachusetts seeking to permanently bar the platform. Kalshi operates under federal CFTC regulation as a designated contract market, arguing this preempts state gambling laws that require separate licensing. Prediction market Kalshi just topped $1 billion in monthly volume as state regulators nip at its heels with lawsuits alleging that it’s an unregistered sports betting platform. “Despite being limited to only American customers, Kalshi has now risen to dominate the global prediction market industry,” the company said in a press release. “New data scraped from publicly available activity metrics details this rise.” The publicly available data appears on a Dune Analytics dashboard that’s been tracking prediction market notional volume. The data show that Kalshi now accounts for roughly 62% of global prediction market volume, Polymarket for 37%, and the rest split between Limitless and Myriad, the prediction market owned by Decrypt parent company Dastan. Trading volume on Kalshi skyrocketed in August, not coincidentally at the start of the NFL season and as the prediction market pushes further into sports.  But regulators in Maryland, Nevada, and New Jersey have all issued cease-and-desist orders, arguing Kalshi’s event contracts amount to unlicensed sports betting. Each case has spilled into federal court, with judges issuing preliminary rulings but no final decisions yet. Last week, Massachusetts went further, filing a lawsuit that calls Kalshi’s sports contracts “illegal and unsafe sports wagering.” The 43-page Massachusetts lawsuit seeks to stop the company from allowing state residents on its platform—much the way Coinbase has had to do with its staking offerings in parts of the United States. Massachusetts Attorney General…
Share
BitcoinEthereumNews2025/09/19 09:21
[Pastilan] End the confidential fund madness

[Pastilan] End the confidential fund madness

UPDATE RULES. Former Commission on Audit commissioner Heidi Mendoza speaks during a public forum.
Share
Rappler2026/01/16 14:02