- Natural-language LLM outputs are great for humans but painful for code; you need strict JSON to automate anything reliably. - You can “force” JSON by combining- Natural-language LLM outputs are great for humans but painful for code; you need strict JSON to automate anything reliably. - You can “force” JSON by combining

Stop Parsing Nightmares: Prompting LLMs to Return Clean, Parseable JSON

2025/12/17 13:19
11 min di lettura
Per feedback o dubbi su questo contenuto, contattateci all'indirizzo crypto.news@mexc.com.

\ If you’re using large language models in real products, “the model gave a sensible answer” is not enough.

What you actually need is:

This article walks through a practical framework for turning messy, natural-language LLM outputs into machine-friendly structured JSON, using only prompt design. We’ll cover:

  • Why JSON is the natural “bridge format” between LLMs and your backend
  • A 4-step prompt pattern for stable JSON output
  • Common failure modes (extra text, broken syntax, wrong types…) and how to fix them
  • Three real-world prompt templates (e-commerce, customer support, project management)

1. Why JSON? Moving from “human-readable” to “machine-readable”

By default, LLMs talk like people: paragraphs, bullet points, and the occasional emoji.

Example request:

A typical answer might be:

Nice for humans. Awful for code.

If you want to:

  • Plot prices in a chart
  • Filter out all non-touchscreen models
  • Load specs into a database

…you’re forced to regex your way through free text. Any tiny format change breaks your parsing.

JSON fixes this in three ways

1. Syntax is strict, parsing is deterministic

  • Keys are quoted.
  • Arrays use [], objects use {}.
  • Every mainstream language has a stable JSON library (json in Python, JSON.parse in JS, etc.).

If the output is valid JSON, parsing is a solved problem.

2. Types are explicit

  • Strings, numbers, booleans, arrays, objects.
  • You can enforce logic like “price_gbp must be a number, not \"£1,299\"”.

3. Nested structure matches real data

Think: user → order list → line items. JSON handles this naturally:

{  "user": {    "name": "Alice",    "orders": [     { "product": "Laptop", "price_gbp": 1299 },     { "product": "Monitor", "price_gbp": 199 }   ] } }

Example: natural language vs JSON

Free-text output:

JSON output:

{  "laptop_analysis": {    "analysis_date": "2025-01-01",    "total_count": 3,    "laptops": [     {        "brand": "Lenovo",        "model": "Slim 7",        "screen": {          "size_inch": 16,          "resolution": "2.5K",          "touch_support": false       },        "processor": "Intel i7",        "price_gbp": 1299     },     {        "brand": "HP",        "model": "Envy 14",        "screen": {          "size_inch": 14,          "resolution": "2.2K",          "touch_support": true       },        "processor": "AMD Ryzen 7",        "price_gbp": 1049     },     {        "brand": "Apple",        "model": "MacBook Air M2",        "screen": {          "size_inch": 13.6,          "resolution": "Retina-class",          "touch_support": false       },        "processor": "Apple M2",        "price_gbp": 1249     }   ] } }

Now your pipeline can do:

data = json.loads(output) for laptop in data["laptop_analysis"]["laptops"]:    ...

No brittle parsing. No surprises.


2. A 4-step pattern for “forced JSON” prompts

Getting an LLM to output proper JSON isn’t magic. A robust prompt usually has four ingredients:

  1. Format instructions – “Only output JSON, nothing else.”
  2. A concrete JSON template – the exact keys and structure you expect.
  3. Validation rules – type constraints, required fields, allowed values.
  4. Few-shot examples – one or two “here’s the input, here’s the JSON” samples.

Let’s go through them.


Step 1 – Hard-lock the output format

You must explicitly fight the model’s “chatty” instinct.

Bad instruction:

You will absolutely get:

Here is your analysis: { ... } Hope this helps!

Your parser will absolutely die.

Use strict wording instead:

You MUST return ONLY valid JSON. ​ - Do NOT include any explanations, comments, or extra text. - The output must be a single JSON object. - If you include any non-JSON content, the result is invalid.

You can go even stricter by wrapping it:

【HARD REQUIREMENT】 Return output wrapped between the markers ---BEGIN JSON--- and ---END JSON---. Outside these markers there must be NOTHING (no text, no spaces, no newlines). ​ Example: ---BEGIN JSON--- {"key": "value"} ---END JSON---

Then your code can safely extract the block between those markers before parsing.


Step 2 – Provide a JSON “fill-in-the-blanks” template

Don’t leave structure to the model’s imagination. Tell it exactly what object you want.

Example: extracting news metadata.

{ "news_extraction": {   "article_title": "",     // string, full headline   "publish_time": "",       // string, "YYYY-MM-DD HH:MM", or null   "source": "",             // string, e.g. "BBC News"   "author": "",             // string or null   "key_points": [],         // array of 3–5 strings, each ≤ 50 chars   "category": "",           // one of: "Politics", "Business", "Tech", "Entertainment", "Sport"   "word_count": 0           // integer, total word count } }

Template design tips:

  • Prefer English snake_case keys: product_name, price_gbp, word_count.
  • Use inline comments to mark types and constraints.
  • Explicitly say how to handle optional fields: null instead of empty string.
  • For arrays, describe the item type: tags: [] // array of strings, e.g. ["budget", "lightweight"].

This turns the model’s job into “fill in a form”, not “invent whatever feels right”.


Step 3 – Add lightweight validation rules

The template defines shape. Validation rules define what’s legal inside that shape.

Examples you can include in the prompt:

  • Type rules

You don’t need a full JSON Schema in the prompt, but a few clear bullets like this reduce errors dramatically.


Step 4 – Use one or two few-shot examples

Models learn fast by imitation. Give them a mini “input → JSON” pair that matches your task.

Example: news extraction.

Prompt snippet:

Example input article: ​ "[Tech] UK startup launches home battery to cut energy bills Source: The Guardian Author: Jane Smith Published: 2024-12-30 10:00 A London-based climate tech startup has launched a compact home battery designed to help households store cheap off-peak electricity and reduce their energy bills..." ​ Example JSON output:

{  "news_extraction": {    "article_title": "UK startup launches home battery to cut energy bills",    "publish_time": "2024-12-30 10:00",    "source": "The Guardian",    "author": "Jane Smith",    "key_points": [      "London climate tech startup releases compact home battery",      "Product lets households store off-peak electricity and lower bills",      "Targets UK homeowners looking to reduce reliance on the grid"   ],    "category": "Tech",    "word_count": 850 } }

Then you append your real article and say:

This single example often bumps JSON correctness from “coin flip” to “production-ready”.


3. Debugging JSON output: 5 common failure modes

Even with good prompts, you’ll still see issues. Here’s what usually goes wrong and how to fix it.


Problem 1 – Extra natural language before/after JSON

Why it happens: chatty default behaviour; format instruction too soft.

How to fix:

  • Repeat a hard requirement at the end of the prompt.
  • Use explicit markers (---BEGIN JSON--- / ---END JSON---) as shown earlier.
  • Make sure your few-shot examples contain only JSON, no explanation.

Problem 2 – Broken JSON syntax

Examples:

  • Keys without quotes
  • Single quotes instead of double quotes
  • Trailing commas
  • Missing closing braces

Fixes:

  1. Add a “JSON hygiene” reminder:

JSON syntax rules: - All keys MUST be in double quotes. - Use double quotes for strings, never single quotes. - No trailing commas after the last element in an object or array. - All { [ must have matching } ].

  1. For very long/complex structures, generate in steps:
  • Step 1: output only the top-level structure.
  • Step 2: fill a particular nested array.
  • Step 3: add the rest.
  1. Add a retry loop in your code:
  • Try json.loads().

  • If it fails, send the error message back to the model:


Problem 3 – Wrong data types

Examples:

  • "price_gbp": "1299.0" instead of 1299.0
  • "in_stock": "yes" instead of true
  • "word_count": "850 words"

Fixes:

  • Be blunt in the template comments:

"price_gbp": 0.0   // number ONLY, like 1299.0, no currency symbol "word_count": 0     // integer ONLY, like 850, no text "in_stock": false   // boolean, must be true or false

  • Include bad vs good examples in the prompt:

Wrong: "word_count": "850 words" Correct: "word_count": 850 ​ Wrong: "touch_support": "yes" Correct: "touch_support": true

  • In your backend, add lightweight type coercion where safe (e.g. "1299"1299.0), but still log violations.

Problem 4 – Missing or extra fields

Examples:

  • author omitted even though it existed
  • An unexpected summary field appears

Fixes:

  • Spell out required vs forbidden fields:

The JSON MUST include exactly these fields: article_title, publish_time, source, author, key_points, category, word_count. ​ Do NOT add any new fields such as summary, description, tags, etc.

  • Add a checklist at the end of the instructions:

Problem 5 – Messy nested structures

This is where things like arrays of objects containing arrays go sideways.

Fixes:

  • Break down nested templates:

"laptops" is an array. Each element is an object with: ​ { "brand": "", "model": "", "screen": {   "size_inch": 0,   "resolution": "",   "touch_support": false }, "processor": "", "price_gbp": 0 }

  • Use a dedicated example focused just on one nested element.
  • Or ask the model to generate one laptop object first, validate it, then scale to an array.

4. Three ready-to-use JSON prompt templates

Here are three complete patterns you can lift straight into your own system.


Scenario 1 – E-commerce product extraction (for database import)

Goal: From a UK shop’s product description, extract key fields like product ID, category, specs, price, stock, etc.

Prompt core:

Task: Extract key product data from the following product description and return JSON only. ​ ### Output requirements 1. Output MUST be valid JSON, no extra text. 2. Use this template exactly (do not rename keys): ​ { "product_info": {   "product_id": "",       // string, e.g. "P20250201001"   "product_name": "",     // full name, not abbreviated   "category": "",         // one of: "Laptop", "Phone", "Appliance", "Clothing", "Food"   "specifications": [],   // 2–3 core specs as strings   "price_gbp": 0.0,       // number, price in GBP, e.g. 999.0   "stock": 0,             // integer, units in stock   "free_shipping": false, // boolean, true if free delivery in mainland UK   "sales_count": 0         // integer, total units sold (0 if not mentioned) } } ​ 3. Rules:   - No "£" symbol in price_gbp, number only.   - If no product_id mentioned, use "unknown".   - If no sales info, use 0 for sales_count. ​ ### Product text: "..."

Example model output:

{  "product_info": {    "product_id": "P20250201005",    "product_name": "Dell XPS 13 Plus 13.4" Laptop",    "category": "Laptop",    "specifications": [      "Colour: Platinum",      "Memory: 16GB RAM, 512GB SSD",      "Display: 13.4" OLED, 120Hz"   ],    "price_gbp": 1499.0,    "stock": 42,    "free_shipping": true,    "sales_count": 850 } }

In Python, it’s just:

import json ​ data = json.loads(model_output) price = data["product_info"]["price_gbp"] stock = data["product_info"]["stock"]

And you’re ready to insert into a DB.


Scenario 2 – Customer feedback sentiment (for ticket routing)

Goal: Take free-text customer feedback and turn it into structured analysis for your support system.

Template:

{ "feedback_analysis": {   "feedback_id": "",     // string, you can generate like "F20250201093001"   "sentiment": "",       // "Positive" | "Negative" | "Neutral"   "core_demand": "",     // 10–30 chars summary of what the customer wants   "issue_type": "",       // "Delivery" | "Quality" | "After-sales" | "Enquiry"   "urgency_level": 0,     // 1 = low, 2 = medium, 3 = high   "keywords": []         // 3–4 noun keywords, e.g. ["laptop", "screen crack"] } }

Rule of thumb for urgency:

  • Product unusable (“won’t turn on”, “payment blocked”) → 3
  • Delays and inconvenience (“parcel 1 day late”) → 2
  • Simple questions (“how do I…?”) → 1

Example output:

{  "feedback_analysis": {    "feedback_id": "F20250201093001",    "sentiment": "Negative",    "core_demand": "Request replacement or refund for dead-on-arrival laptop",    "issue_type": "Quality",    "urgency_level": 3,    "keywords": ["laptop", "won't turn on", "replacement", "refund"] } }

Your ticketing system can now:

  • Route all "Quality" issues with urgency_level = 3 to a priority queue.
  • Show agents a one-line core_demand instead of a wall of text.

Scenario 3 – Project task breakdown (for Jira/Trello import)

Goal: Turn a “website redesign” paragraph into a structured task list.

Template:

{ "project": "Website Redesign", "tasks": [   {     "task_id": "T001",         // T + 3 digits     "task_name": "",           // 10–20 chars, clear action     "owner": "",               // "Product Manager" | "Designer" | "Frontend" | "Backend" | "QA"     "due_date": "",             // "YYYY-MM-DD", assume project start 2025-02-01     "priority": "",             // "High" | "Medium" | "Low"     "dependencies": []         // e.g. ["T001"], [] if none   } ], "total_tasks": 0               // number of items in tasks[] }

Rules:

  • Cover the full flow: requirements → design → build → test → release.
  • Make dependency chains realistic (frontend depends on design, etc.).
  • Dates must logically lead up to the stated launch date.

Example output (shortened):

{  "project": "Website Redesign",  "tasks": [   {      "task_id": "T001",      "task_name": "Gather detailed redesign requirements",      "owner": "Product Manager",      "due_date": "2025-02-03",      "priority": "High",      "dependencies": []   },   {      "task_id": "T002",      "task_name": "Design new homepage and listing UI",      "owner": "Designer",      "due_date": "2025-02-08",      "priority": "High",      "dependencies": ["T001"]   },   {      "task_id": "T003",      "task_name": "Implement login and registration backend",      "owner": "Backend",      "due_date": "2025-02-13",      "priority": "High",      "dependencies": ["T001"]   } ],  "total_tasks": 3 }

You can then POST tasks into Jira/Trello with their APIs and auto-create all tickets.


5. From “stable JSON” to “production-ready pipelines”

To recap:

  • Why JSON? It’s the natural contract between LLMs and code: deterministic parsing, clear types, nested structures.
  • How to get it reliably? Use the 4-step pattern:
  1. Hard format instructions
  2. A strict JSON template
  3. Light validation rules
  4. One or two good few-shot examples
  • How to ship it? Combine prompt-side constraints with backend safeguards:
  • Retry on JSONDecodeError with error feedback to the model.
  • Optional type coercion (e.g. "1299"1299.0) with logging.
  • JSON Schema validation for high-stakes use cases (finance, healthcare).

Once you can reliably get structured JSON out of an LLM, you move from:

to:

That’s the real unlock.

\

Opportunità di mercato
Logo LETSTOP
Valore LETSTOP (STOP)
$0.01033
$0.01033$0.01033
-2.82%
USD
Grafico dei prezzi in tempo reale di LETSTOP (STOP)
Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta crypto.news@mexc.com per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.