Avoiding the Hallucination: How to Force AI to Stay Within Your JSON Constraints
Learn proven techniques to prevent AI hallucinations when generating JSON. Includes validation strategies, constraint enforcement, and error handling templates.
TL;DR: Prevent AI hallucinations in JSON generation by using strict schemas, validation checkpoints, few-shot examples, and explicit constraints. This guide provides templates that enforce accuracy.
The Problem: AI Makes Things Up
You ask an AI to generate user data. It invents fields that don't exist in your schema, creates impossible values, or fabricates relationships.
Example hallucination:
{
"userId": "USR-001",
"name": "Alice",
"department": "Engineering",
"manager": "Bob Smith",
"officeLocation": "Building A, Floor 3",
"parkingSpot": "A-42"
}Your schema only had: userId, name, department.
The AI hallucinated manager, officeLocation, and parkingSpot.
Why AI Hallucinates
Reason 1: Pattern Completion
LLMs are trained to complete patterns. If they see "user object," they add fields they've seen in training data.
Reason 2: Lack of Constraints
Without explicit boundaries, AI fills gaps with plausible-sounding data.
Reason 3: Ambiguous Instructions
Vague prompts leave room for interpretation.
The Solution: Constraint Enforcement
Core Principles
- Explicit Schema: Define exactly what fields are allowed
- Validation Rules: Specify what values are valid
- Negative Examples: Show what NOT to do
- Checkpoints: Add self-validation steps
Copy-Paste Anti-Hallucination Templates
Template 1: Strict Schema Enforcement
You are a JSON generator. You MUST follow this schema EXACTLY. Do not add any fields not listed.
ALLOWED SCHEMA (ONLY these fields):
{
"userId": "string",
"name": "string",
"department": "string"
}
FORBIDDEN: Do not add any other fields like manager, email, phone, location, etc.
Task: Generate a user object for Alice in Engineering department, ID USR-001
Output ONLY JSON matching the schema:
AI Output:
{"userId":"USR-001","name":"Alice","department":"Engineering"}Template 2: Enum Constraints
You are a JSON generator. Output ONLY valid JSON.
Schema:
{
"status": "string (MUST be one of: active, inactive, pending)",
"priority": "number (MUST be 1, 2, or 3)",
"category": "string (MUST be one of: bug, feature, task)"
}
CRITICAL: status can ONLY be "active", "inactive", or "pending". Any other value is INVALID.
CRITICAL: priority can ONLY be 1, 2, or 3. No other numbers allowed.
CRITICAL: category can ONLY be "bug", "feature", or "task". No other categories exist.
Task: Generate an issue object with status active, priority 2, category bug
Output:
Template 3: Value Range Constraints
You are a JSON generator.
Schema with STRICT constraints:
{
"age": "integer (MUST be between 0 and 120, no exceptions)",
"score": "number (MUST be between 0.0 and 100.0)",
"quantity": "integer (MUST be >= 0)"
}
VALIDATION RULES:
- If age is outside 0-120, the JSON is INVALID
- If score is outside 0.0-100.0, the JSON is INVALID
- If quantity is negative, the JSON is INVALID
Task: Generate data for age 30, score 85.5, quantity 10
Before outputting, verify all values are within constraints.
Output:
Template 4: No Extra Fields
You are a JSON generator.
EXACT schema (do not add or remove fields):
{
"id": "string",
"name": "string"
}
FORBIDDEN FIELDS (do not include these even if they seem relevant):
- email
- phone
- address
- age
- created_at
- updated_at
- Any other field not in the exact schema
Task: Generate for ID 123, name Alice
Output ONLY the 2 fields from the schema:
Template 5: Self-Validation Checkpoint
You are a JSON generator.
Schema:
{
"productId": "string (format: PROD-XXXXX where X is digit)",
"price": "number (min 0, max 10000)",
"inStock": "boolean"
}
Task: Generate for product PROD-12345, price 99.99, in stock
BEFORE OUTPUTTING:
1. Check: Does productId match pattern PROD-XXXXX?
2. Check: Is price between 0 and 10000?
3. Check: Is inStock a boolean (not string)?
4. Check: Are there ONLY these 3 fields?
If any check fails, regenerate until all pass.
Output:
Validate AI-Generated JSON
Paste your AI output here to verify it matches your schema and contains no hallucinated fields.
Validate Schema →Advanced Anti-Hallucination Techniques
Technique 1: Negative Examples
Show what NOT to do:
✅ CORRECT:
{"id":"001","name":"Alice"}
❌ INCORRECT (hallucinated fields):
{"id":"001","name":"Alice","email":"alice@example.com","age":30}
❌ INCORRECT (wrong format):
{"user_id":"001","user_name":"Alice"}
Now generate for ID 002, name Bob. Output ONLY the correct format:
Technique 2: Explicit Field Count
Generate JSON with EXACTLY 3 fields. No more, no less.
Fields:
1. id (string)
2. name (string)
3. active (boolean)
If your output has 2 fields or 4 fields, it is WRONG. Must be exactly 3.
Technique 3: Whitelist Approach
ALLOWED fields (whitelist):
- userId
- username
- email
EVERYTHING ELSE IS FORBIDDEN. Do not add:
- name, firstName, lastName
- phone, mobile
- address, location
- created, updated
- Any other field
Generate with userId USR-001, username alice, email alice@example.com
Technique 4: JSON Schema Validation
You are a JSON generator. Your output MUST validate against this JSON Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["id", "name"],
"properties": {
"id": {"type": "string"},
"name": {"type": "string"}
},
"additionalProperties": false
}
Note: "additionalProperties": false means NO extra fields allowed.
Task: Generate for id 123, name Alice
Output:
Real-World Use Cases
Use Case 1: API Response Generation
Prompt:
Generate a mock API response. ONLY include these fields, nothing else:
{
"status": 200,
"data": {
"userId": "string",
"username": "string"
}
}
Do NOT add:
- timestamps
- metadata
- pagination
- Any other fields
Generate for user ID abc123, username alice:
Use Case 2: Configuration File
Prompt:
Generate a config JSON. EXACT fields required:
{
"port": "integer (3000-9000)",
"host": "string (only 'localhost' or '0.0.0.0')",
"ssl": "boolean"
}
Do NOT add database, logging, or any other config sections.
Generate for port 3000, host localhost, SSL false:
Use Case 3: Test Data
Prompt:
Generate test user data. Schema (strict):
{
"id": "integer (1-1000)",
"name": "string (realistic name)",
"role": "string (ONLY: admin, user, or guest)"
}
FORBIDDEN:
- email, phone, address
- created_at, updated_at
- Any nested objects
Generate 3 users with IDs 1, 2, 3:
Pro Tips
Tip 1: Use "ONLY" and "EXACTLY"
❌ Weak: "Generate a user object"
✅ Strong: "Generate ONLY a user object with EXACTLY these fields"
Tip 2: List Forbidden Fields
Explicitly state what NOT to include:
Do NOT include: email, phone, address, age, created_at
Tip 3: Add Validation Steps
After generating, verify:
1. Field count matches schema
2. All values are within constraints
3. No extra fields present
Tip 4: Use Few-Shot with Strict Examples
Example 1: {"id":"001","name":"Alice"}
Example 2: {"id":"002","name":"Bob"}
Example 3: {"id":"003","name":"Charlie"}
Pattern: ONLY id and name. No other fields.
Generate for ID 004, name Dana:
Common Hallucination Patterns
Pattern 1: Adding Timestamps
Hallucination:
{"id":"001","name":"Alice","created_at":"2024-01-15T10:30:00Z"}Prevention:
Do NOT add timestamps like created_at, updated_at, timestamp, etc.
Pattern 2: Adding Metadata
Hallucination:
{"data":{"id":"001","name":"Alice"},"meta":{"version":"1.0"}}Prevention:
Output ONLY the data object. No meta, metadata, or wrapper objects.
Pattern 3: Expanding Abbreviations
Hallucination:
{"userId":"001","userName":"Alice","userEmail":"alice@example.com"}Prevention:
Field names: userId, name (NOT userName, NOT userEmail)
Pattern 4: Adding Relationships
Hallucination:
{"id":"001","name":"Alice","manager":"Bob","team":"Engineering"}Prevention:
Do NOT add relationship fields like manager, team, department, etc.
Validation Checklist
After AI generation, verify:
✅ Field Count: Matches schema exactly
✅ Field Names: Exact match (case-sensitive)
✅ Data Types: Correct types (string, number, boolean)
✅ Value Constraints: Within specified ranges
✅ No Extra Fields: Only schema fields present
✅ No Missing Fields: All required fields present
Use our JSON Validator for automated checks.
Automation Script
import json
def validate_no_hallucination(generated_json, allowed_fields):
"""Verify AI didn't hallucinate extra fields"""
data = json.loads(generated_json)
actual_fields = set(data.keys())
allowed_set = set(allowed_fields)
extra_fields = actual_fields - allowed_set
missing_fields = allowed_set - actual_fields
if extra_fields:
print(f"❌ Hallucinated fields: {extra_fields}")
return False
if missing_fields:
print(f"❌ Missing fields: {missing_fields}")
return False
print("✅ No hallucination detected")
return True
# Usage
generated = '{"id":"001","name":"Alice","email":"alice@example.com"}'
allowed = ["id", "name"]
validate_no_hallucination(generated, allowed)
# Output: ❌ Hallucinated fields: {'email'}Conclusion
Preventing AI hallucinations requires explicit constraints, validation checkpoints, and clear boundaries. By using strict schemas and negative examples, you can ensure AI stays within your defined structure.
Key Takeaways:
- Define exact schema with allowed fields
- List forbidden fields explicitly
- Use enum constraints for limited values
- Add self-validation checkpoints
- Validate output against schema
Master Anti-Hallucination Template:
EXACT schema (ONLY these fields):
{SCHEMA}
FORBIDDEN fields:
{LIST_OF_FORBIDDEN_FIELDS}
Task: {TASK}
Validate before output:
1. Field count matches
2. No extra fields
3. Values within constraints
Output:
Verify No Hallucinations
Use our JSON Validator to check if AI-generated JSON matches your exact schema with no extra fields.
Validate Now →