Back to Blog
10 min read

Avoiding the Hallucination: How to Force AI to Stay Within Your JSON Constraints

Learn proven techniques to prevent AI hallucinations when generating JSON. Includes validation strategies, constraint enforcement, and error handling templates.

TL;DR: Prevent AI hallucinations in JSON generation by using strict schemas, validation checkpoints, few-shot examples, and explicit constraints. This guide provides templates that enforce accuracy.

The Problem: AI Makes Things Up

You ask an AI to generate user data. It invents fields that don't exist in your schema, creates impossible values, or fabricates relationships.

Example hallucination:

{
  "userId": "USR-001",
  "name": "Alice",
  "department": "Engineering",
  "manager": "Bob Smith",
  "officeLocation": "Building A, Floor 3",
  "parkingSpot": "A-42"
}

Your schema only had: userId, name, department.

The AI hallucinated manager, officeLocation, and parkingSpot.

Why AI Hallucinates

Reason 1: Pattern Completion

LLMs are trained to complete patterns. If they see "user object," they add fields they've seen in training data.

Reason 2: Lack of Constraints

Without explicit boundaries, AI fills gaps with plausible-sounding data.

Reason 3: Ambiguous Instructions

Vague prompts leave room for interpretation.

The Solution: Constraint Enforcement

Core Principles

  1. Explicit Schema: Define exactly what fields are allowed
  2. Validation Rules: Specify what values are valid
  3. Negative Examples: Show what NOT to do
  4. Checkpoints: Add self-validation steps

Copy-Paste Anti-Hallucination Templates

Template 1: Strict Schema Enforcement

You are a JSON generator. You MUST follow this schema EXACTLY. Do not add any fields not listed.

ALLOWED SCHEMA (ONLY these fields):
{
  "userId": "string",
  "name": "string",
  "department": "string"
}

FORBIDDEN: Do not add any other fields like manager, email, phone, location, etc.

Task: Generate a user object for Alice in Engineering department, ID USR-001

Output ONLY JSON matching the schema:

AI Output:

{"userId":"USR-001","name":"Alice","department":"Engineering"}

Template 2: Enum Constraints

You are a JSON generator. Output ONLY valid JSON.

Schema:
{
  "status": "string (MUST be one of: active, inactive, pending)",
  "priority": "number (MUST be 1, 2, or 3)",
  "category": "string (MUST be one of: bug, feature, task)"
}

CRITICAL: status can ONLY be "active", "inactive", or "pending". Any other value is INVALID.
CRITICAL: priority can ONLY be 1, 2, or 3. No other numbers allowed.
CRITICAL: category can ONLY be "bug", "feature", or "task". No other categories exist.

Task: Generate an issue object with status active, priority 2, category bug

Output:

Template 3: Value Range Constraints

You are a JSON generator.

Schema with STRICT constraints:
{
  "age": "integer (MUST be between 0 and 120, no exceptions)",
  "score": "number (MUST be between 0.0 and 100.0)",
  "quantity": "integer (MUST be >= 0)"
}

VALIDATION RULES:
- If age is outside 0-120, the JSON is INVALID
- If score is outside 0.0-100.0, the JSON is INVALID
- If quantity is negative, the JSON is INVALID

Task: Generate data for age 30, score 85.5, quantity 10

Before outputting, verify all values are within constraints.

Output:

Template 4: No Extra Fields

You are a JSON generator.

EXACT schema (do not add or remove fields):
{
  "id": "string",
  "name": "string"
}

FORBIDDEN FIELDS (do not include these even if they seem relevant):
- email
- phone
- address
- age
- created_at
- updated_at
- Any other field not in the exact schema

Task: Generate for ID 123, name Alice

Output ONLY the 2 fields from the schema:

Template 5: Self-Validation Checkpoint

You are a JSON generator.

Schema:
{
  "productId": "string (format: PROD-XXXXX where X is digit)",
  "price": "number (min 0, max 10000)",
  "inStock": "boolean"
}

Task: Generate for product PROD-12345, price 99.99, in stock

BEFORE OUTPUTTING:
1. Check: Does productId match pattern PROD-XXXXX?
2. Check: Is price between 0 and 10000?
3. Check: Is inStock a boolean (not string)?
4. Check: Are there ONLY these 3 fields?

If any check fails, regenerate until all pass.

Output:

Validate AI-Generated JSON

Paste your AI output here to verify it matches your schema and contains no hallucinated fields.

Validate Schema →

Advanced Anti-Hallucination Techniques

Technique 1: Negative Examples

Show what NOT to do:

✅ CORRECT:
{"id":"001","name":"Alice"}

❌ INCORRECT (hallucinated fields):
{"id":"001","name":"Alice","email":"alice@example.com","age":30}

❌ INCORRECT (wrong format):
{"user_id":"001","user_name":"Alice"}

Now generate for ID 002, name Bob. Output ONLY the correct format:

Technique 2: Explicit Field Count

Generate JSON with EXACTLY 3 fields. No more, no less.

Fields:
1. id (string)
2. name (string)
3. active (boolean)

If your output has 2 fields or 4 fields, it is WRONG. Must be exactly 3.

Technique 3: Whitelist Approach

ALLOWED fields (whitelist):
- userId
- username
- email

EVERYTHING ELSE IS FORBIDDEN. Do not add:
- name, firstName, lastName
- phone, mobile
- address, location
- created, updated
- Any other field

Generate with userId USR-001, username alice, email alice@example.com

Technique 4: JSON Schema Validation

You are a JSON generator. Your output MUST validate against this JSON Schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["id", "name"],
  "properties": {
    "id": {"type": "string"},
    "name": {"type": "string"}
  },
  "additionalProperties": false
}

Note: "additionalProperties": false means NO extra fields allowed.

Task: Generate for id 123, name Alice

Output:

Real-World Use Cases

Use Case 1: API Response Generation

Prompt:

Generate a mock API response. ONLY include these fields, nothing else:

{
  "status": 200,
  "data": {
    "userId": "string",
    "username": "string"
  }
}

Do NOT add:
- timestamps
- metadata
- pagination
- Any other fields

Generate for user ID abc123, username alice:

Use Case 2: Configuration File

Prompt:

Generate a config JSON. EXACT fields required:

{
  "port": "integer (3000-9000)",
  "host": "string (only 'localhost' or '0.0.0.0')",
  "ssl": "boolean"
}

Do NOT add database, logging, or any other config sections.

Generate for port 3000, host localhost, SSL false:

Use Case 3: Test Data

Prompt:

Generate test user data. Schema (strict):

{
  "id": "integer (1-1000)",
  "name": "string (realistic name)",
  "role": "string (ONLY: admin, user, or guest)"
}

FORBIDDEN:
- email, phone, address
- created_at, updated_at
- Any nested objects

Generate 3 users with IDs 1, 2, 3:

Pro Tips

Tip 1: Use "ONLY" and "EXACTLY"

Weak: "Generate a user object"
Strong: "Generate ONLY a user object with EXACTLY these fields"

Tip 2: List Forbidden Fields

Explicitly state what NOT to include:

Do NOT include: email, phone, address, age, created_at

Tip 3: Add Validation Steps

After generating, verify:
1. Field count matches schema
2. All values are within constraints
3. No extra fields present

Tip 4: Use Few-Shot with Strict Examples

Example 1: {"id":"001","name":"Alice"}
Example 2: {"id":"002","name":"Bob"}
Example 3: {"id":"003","name":"Charlie"}

Pattern: ONLY id and name. No other fields.

Generate for ID 004, name Dana:

Common Hallucination Patterns

Pattern 1: Adding Timestamps

Hallucination:

{"id":"001","name":"Alice","created_at":"2024-01-15T10:30:00Z"}

Prevention:

Do NOT add timestamps like created_at, updated_at, timestamp, etc.

Pattern 2: Adding Metadata

Hallucination:

{"data":{"id":"001","name":"Alice"},"meta":{"version":"1.0"}}

Prevention:

Output ONLY the data object. No meta, metadata, or wrapper objects.

Pattern 3: Expanding Abbreviations

Hallucination:

{"userId":"001","userName":"Alice","userEmail":"alice@example.com"}

Prevention:

Field names: userId, name (NOT userName, NOT userEmail)

Pattern 4: Adding Relationships

Hallucination:

{"id":"001","name":"Alice","manager":"Bob","team":"Engineering"}

Prevention:

Do NOT add relationship fields like manager, team, department, etc.

Validation Checklist

After AI generation, verify:

Field Count: Matches schema exactly
Field Names: Exact match (case-sensitive)
Data Types: Correct types (string, number, boolean)
Value Constraints: Within specified ranges
No Extra Fields: Only schema fields present
No Missing Fields: All required fields present

Use our JSON Validator for automated checks.

Automation Script

import json
 
def validate_no_hallucination(generated_json, allowed_fields):
    """Verify AI didn't hallucinate extra fields"""
    data = json.loads(generated_json)
    actual_fields = set(data.keys())
    allowed_set = set(allowed_fields)
    
    extra_fields = actual_fields - allowed_set
    missing_fields = allowed_set - actual_fields
    
    if extra_fields:
        print(f"❌ Hallucinated fields: {extra_fields}")
        return False
    
    if missing_fields:
        print(f"❌ Missing fields: {missing_fields}")
        return False
    
    print("✅ No hallucination detected")
    return True
 
# Usage
generated = '{"id":"001","name":"Alice","email":"alice@example.com"}'
allowed = ["id", "name"]
validate_no_hallucination(generated, allowed)
# Output: ❌ Hallucinated fields: {'email'}

Conclusion

Preventing AI hallucinations requires explicit constraints, validation checkpoints, and clear boundaries. By using strict schemas and negative examples, you can ensure AI stays within your defined structure.

Key Takeaways:

  • Define exact schema with allowed fields
  • List forbidden fields explicitly
  • Use enum constraints for limited values
  • Add self-validation checkpoints
  • Validate output against schema

Master Anti-Hallucination Template:

EXACT schema (ONLY these fields):
{SCHEMA}

FORBIDDEN fields:
{LIST_OF_FORBIDDEN_FIELDS}

Task: {TASK}

Validate before output:
1. Field count matches
2. No extra fields
3. Values within constraints

Output:

Verify No Hallucinations

Use our JSON Validator to check if AI-generated JSON matches your exact schema with no extra fields.

Validate Now →