This paper explores the critical safety risk of typographic attacks against Vision-Large-Language-Models (Vision-LLMs) integrated into autonomous driving (AD) systems.This paper explores the critical safety risk of typographic attacks against Vision-Large-Language-Models (Vision-LLMs) integrated into autonomous driving (AD) systems.

Typographic Attacks on Vision-LLMs: Evaluating Adversarial Threats in Autonomous Driving Systems

Abstract and 1. Introduction

  1. Related Work

    2.1 Vision-LLMs

    2.2 Transferable Adversarial Attacks

  2. Preliminaries

    3.1 Revisiting Auto-Regressive Vision-LLMs

    3.2 Typographic Attacks in Vision-LLMs-based AD Systems

  3. Methodology

    4.1 Auto-Generation of Typographic Attack

    4.2 Augmentations of Typographic Attack

    4.3 Realizations of Typographic Attacks

  4. Experiments

  5. Conclusion and References

Abstract

Vision-Large-Language-Models (Vision-LLMs) are increasingly being integrated into autonomous driving (AD) systems due to their advanced visual-language reasoning capabilities, targeting the perception, prediction, planning, and control mechanisms. However, Vision-LLMs have demonstrated susceptibilities against various types of adversarial attacks, which would compromise their reliability and safety. To further explore the risk in AD systems and the transferability of practical threats, we propose to leverage typographic attacks against AD systems relying on the decision-making capabilities of Vision-LLMs. Different from the few existing works developing general datasets of typographic attacks, this paper focuses on realistic traffic scenarios where these attacks can be deployed, on their potential effects on the decision-making autonomy, and on the practical ways in which these attacks can be physically presented. To achieve the above goals, we first propose a dataset-agnostic framework for automatically generating false answers that can mislead Vision-LLMs’ reasoning. Then, we present a linguistic augmentation scheme that facilitates attacks at image-level and region-level reasoning, and we extend it with attack patterns against multiple reasoning tasks simultaneously. Based on these, we conduct a study on how these attacks can be realized in physical traffic scenarios. Through our empirical study, we evaluate the effectiveness, transferability, and realizability of typographic attacks in traffic scenes. Our findings demonstrate particular harmfulness of the typographic attacks against existing Vision-LLMs (e.g., LLaVA, Qwen-VL, VILA, and Imp), thereby raising community awareness of vulnerabilities when incorporating such models into AD systems. We will release our source code upon acceptance.

1 Introduction

Vision-Language Large Models (Vision-LLMs) have seen rapid development over the recent years [1, 2, 3], and their incorporation into autonomous driving (AD) systems have been seriously considered by both industry and academia [4, 5, 6, 7, 8, 9]. The integration of Vision-LLMs into AD systems showcases their ability to convey explicit reasoning steps to road users on the fly and satisfy the need for textual justifications of traffic scenarios regarding perception, prediction, planning, and control, particularly in safety-critical circumstances in the physical world. The core strength of VisionLLMs lies in their auto-regressive capabilities through large-scale pretraining with visual-language alignment [1], making them even able to perform zero-shot optical character recognition, grounded reasoning, visual-question answering, visual-language reasoning, etc. Nevertheless, despite their impressive capabilities, Vision-LLMs are unfortunately not impervious against adversarial attacks that can misdirect the reasoning processes [10]. Any successful attack strategies have the potential to pose critical problems when deploying Vision-LLMs in AD systems, especially those that may even bypass the models’ black-box characteristics. As a step towards their reliable adoption in AD, studying the transferability of adversarial attacks is crucial to raising awareness of practical threats against deployed Vision-LLMs, and to efforts in building appropriate defense strategies for them.

\ In this work, we revisit the shared auto-regressive characteristic of different Vision-LLMs and intuitively turn that strength into a weakness by leveraging typographic forms of adversarial attacks, also known as typographic attacks. Typographic attacks were first studied in the context of the well-known Contrastive Language-Image Pre-training (CLIP) model [11, 12]. Early works in this area focused on developing a general typographic attack dataset targeting multiple-choice answering (such as object recognition, visual attribute detection, and commonsense answering) and enumeration [13]. Researchers also explored multiple-choice self-generating attacks against zero-shot classification [14], and proposed several defense mechanisms, including keyword-training [15] and prompting the model for detailed reasoning [16]. Despite these initial efforts, the methodologies have neither seen a comprehensive attack framework nor been explicitly designed to investigate the impact of typographic attacks on safety-critical systems, particularly those in AD scenarios.

\ Our work aims to fill this research gap by studying typographic attacks from the perspective of AD systems that incorporate Vision-LLMs. In summary, our scientific contributions are threefold:

\ • Dataset-Independent Framework: we introduce a dataset-independent framework designed to automatically generate misleading answers that can disrupt the reasoning processes of Vision-Large Language Models (Vision-LLMs).

\ • Linguistic Augmentation Schemes: we develop a linguistic augmentation scheme aimed at facilitating stronger typographic attacks on Vision-LLMs. This scheme targets reasoning at both the image and region levels and is expandable to multiple reasoning tasks simultaneously.

\ • Empirical Study in Semi-Realistic Scenarios: we conduct a study to explore the possible implementations of these attacks in real-world traffic scenarios.

\ Through our empirical study of typographic attacks in traffic scenes, we hope to raise community awareness of critical typographic vulnerabilities when incorporating such models into AD systems.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

:::info Authors:

(1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam;

(2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China;

(3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR;

(4) Jie Zhang, Nanyang Technological University, Singapore;

(5) Aishan Liu, Beihang University, China;

(6) Yun Lin, Shanghai Jiao Tong University, China;

(7) Jin Song Dong, National University of Singapore, Singapore;

(8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

The post Polygon Tops RWA Rankings With $1.1B in Tokenized Assets appeared on BitcoinEthereumNews.com. Key Notes A new report from Dune and RWA.xyz highlights Polygon’s role in the growing RWA sector. Polygon PoS currently holds $1.13 billion in RWA Total Value Locked (TVL) across 269 assets. The network holds a 62% market share of tokenized global bonds, driven by European money market funds. The Polygon POL $0.25 24h volatility: 1.4% Market cap: $2.64 B Vol. 24h: $106.17 M network is securing a significant position in the rapidly growing tokenization space, now holding over $1.13 billion in total value locked (TVL) from Real World Assets (RWAs). This development comes as the network continues to evolve, recently deploying its major “Rio” upgrade on the Amoy testnet to enhance future scaling capabilities. This information comes from a new joint report on the state of the RWA market published on Sept. 17 by blockchain analytics firm Dune and data platform RWA.xyz. The focus on RWAs is intensifying across the industry, coinciding with events like the ongoing Real-World Asset Summit in New York. Sandeep Nailwal, CEO of the Polygon Foundation, highlighted the findings via a post on X, noting that the TVL is spread across 269 assets and 2,900 holders on the Polygon PoS chain. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 Key Trends From the 2025 RWA Report The joint publication, titled “RWA REPORT 2025,” offers a comprehensive look into the tokenized asset landscape, which it states has grown 224% since the start of 2024. The report identifies several key trends driving this expansion. According to…
Share
BitcoinEthereumNews2025/09/18 00:40
transcosmos helping Chinese lingerie brand LING LINGERIE’s full-fledged entry into Japan

transcosmos helping Chinese lingerie brand LING LINGERIE’s full-fledged entry into Japan

Executing strategies to help LING LINGERIE, a Chinese brand meeting Gen Z needs, boost awareness TOKYO, Jan. 23, 2026 /PRNewswire/ — transcosmos today announced
Share
AI Journal2026/01/23 19:30
UBS Crypto Trading: The Bold Move That Could Reshape Private Banking in 2025

UBS Crypto Trading: The Bold Move That Could Reshape Private Banking in 2025

BitcoinWorld UBS Crypto Trading: The Bold Move That Could Reshape Private Banking in 2025 In a landmark development for the financial world, Swiss banking titan
Share
bitcoinworld2026/01/23 19:25