The experimental section details the evaluation of the O3D-SIM representation and its integration with ChatGPT for Vision-Language Navigation (VLN).The experimental section details the evaluation of the O3D-SIM representation and its integration with ChatGPT for Vision-Language Navigation (VLN).

Evaluating Novel 3D Semantic Instance Map for Vision-Language Navigation

2025/12/16 10:38
2 min di lettura
Per feedback o dubbi su questo contenuto, contattateci all'indirizzo crypto.news@mexc.com.

Abstract and 1 Introduction

  1. Related Works

    2.1. Vision-and-Language Navigation

    2.2. Semantic Scene Understanding and Instance Segmentation

    2.3. 3D Scene Reconstruction

  2. Methodology

    3.1. Data Collection

    3.2. Open-set Semantic Information from Images

    3.3. Creating the Open-set 3D Representation

    3.4. Language-Guided Navigation

  3. Experiments

    4.1. Quantitative Evaluation

    4.2. Qualitative Results

  4. Conclusion and Future Work, Disclosure statement, and References

4. Experiments

Having introduced the O3D-SIM creation pipeline and its integration with ChatGPT for natural language understanding and Vision-Language Navigation (VLN) enhancement, we now turn to the evaluation of this novel representation both quantitatively and qualitatively. This will also shed light on the impact of the O3D-SIM representation on an agent’s ability to execute queries that mimic human interaction. The evaluation is structured into two subsections: Section 4.1 focuses on the quantitative assessment of O3D-SIM, and Section 4.2 addresses the qualitative analysis of the representation.

\ Figure 4. This figure shows the difference in output from ChatGPT due to the difference in nature of the two mapping approaches, where SI-Maps is closed-set, and O3D-SIM is open-set. For queries specifying exact object classes, both approaches output the same code. But, for queries specified in an open-set manner, the newer approach describes the goal to the code, whereas the older approach maps the description to the pre-known classes and passes this class to the code. The older approach benefits from LLM’s understanding, whereas the newer approach benefits from the open-set embeddings (CLIP)

\

:::info Authors:

(1) Laksh Nanwani, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(2) Kumaraditya Gupta, International Institute of Information Technology, Hyderabad, India;

(3) Aditya Mathur, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(4) Swayam Agrawal, International Institute of Information Technology, Hyderabad, India;

(5) A.H. Abdul Hafez, Hasan Kalyoncu University, Sahinbey, Gaziantep, Turkey;

(6) K. Madhava Krishna, International Institute of Information Technology, Hyderabad, India.

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

Opportunità di mercato
Logo MapNode
Valore MapNode (MAP)
$0.00227
$0.00227$0.00227
+5.09%
USD
Grafico dei prezzi in tempo reale di MapNode (MAP)
Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta crypto.news@mexc.com per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.

KAIO Global Debut

KAIO Global DebutKAIO Global Debut

Enjoy 0-fee KAIO trading and tap into the RWA boom