AI-Powered Data Cleaning: How to Prompt ChatGPT to Fix Corrupted JSON Exports
Learn how to use AI prompts to clean corrupted JSON data from exports, APIs, and logs. Includes step-by-step guides and copy-paste templates for instant data recovery.
TL;DR: Use AI to automatically fix corrupted JSON by providing the broken data, describing the expected structure, and requesting specific repairs. This guide provides templates for common corruption scenarios.
The Problem: Corrupted JSON Exports
You export data from a legacy system, scrape an API, or extract logs—and the JSON is broken. Missing quotes, truncated arrays, encoding issues, mixed formats.
Manual cleanup would take hours. AI can fix it in seconds.
Common JSON Corruption Scenarios
Scenario 1: Missing Quotes
{name: Alice, age: 30}Scenario 2: Truncated Data
{"users":[{"name":"Alice"},{"name":"Bob"},{"naScenario 3: Mixed Encoding
{"text":"Caf\u00e9 \u0026 R\u00e9sum\u00e9"}Scenario 4: Malformed Arrays
[{"id":1}{"id":2}{"id":3}]Scenario 5: Extra Commas
{"name":"Alice",,,"age":30,}The AI Repair Workflow
Step 1: Identify the Corruption Type
Quick diagnosis:
- Parse error at position X → Syntax issue
- Unexpected end of input → Truncation
- Invalid character → Encoding problem
- Missing comma/bracket → Structure issue
Step 2: Prepare the Repair Prompt
Template:
You are a JSON repair specialist. Fix the following corrupted JSON.
Corrupted JSON:
{PASTE_BROKEN_JSON}
Expected structure:
{DESCRIBE_STRUCTURE}
Corruption type: {ISSUE_DESCRIPTION}
Output ONLY the repaired, valid JSON. No explanations.
Step 3: Validate the Repair
Paste the AI output into our JSON Validator to ensure it's syntactically correct.
Copy-Paste Repair Templates
Template 1: Missing Quotes Repair
You are a JSON repair specialist.
Corrupted JSON:
{name: Alice, email: alice@example.com, age: 30}
Issue: Missing quotes around keys and string values
Expected structure: Object with string keys "name", "email" (string values) and "age" (number value)
Fix by:
1. Adding double quotes around all keys
2. Adding double quotes around string values
3. Keeping numbers unquoted
Output ONLY valid JSON:
AI Output:
{"name":"Alice","email":"alice@example.com","age":30}Template 2: Truncated Data Recovery
You are a JSON repair specialist.
Corrupted JSON (truncated):
{"users":[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"},{"id":3,"na
Expected structure: Object with "users" array containing objects with "id" (number) and "name" (string)
The data was truncated mid-record. Complete the JSON by:
1. Closing the truncated "name" field with a reasonable value
2. Closing all open brackets/braces
3. Ensuring valid syntax
Output ONLY valid JSON:
AI Output:
{"users":[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"},{"id":3,"name":"Charlie"}]}Template 3: Encoding Fix
You are a JSON repair specialist.
Corrupted JSON:
{"description":"Caf\u00e9 \u0026 R\u00e9sum\u00e9","tags":["caf\u00e9","r\u00e9sum\u00e9"]}
Issue: Unicode escapes that should be decoded
Convert all Unicode escapes (\uXXXX) to their actual characters.
Output ONLY valid JSON with decoded characters:
AI Output:
{"description":"Café & Résumé","tags":["café","résumé"]}Template 4: Missing Commas/Brackets
You are a JSON repair specialist.
Corrupted JSON:
[{"id":1}{"id":2}{"id":3}]
Issue: Missing commas between array items
Fix by adding commas between objects in the array.
Output ONLY valid JSON:
AI Output:
[{"id":1},{"id":2},{"id":3}]Template 5: Extra Commas Removal
You are a JSON repair specialist.
Corrupted JSON:
{"name":"Alice",,,"age":30,,"email":"alice@example.com",}
Issue: Extra commas and trailing comma
Fix by:
1. Removing duplicate commas
2. Removing trailing comma
Output ONLY valid JSON:
AI Output:
{"name":"Alice","age":30,"email":"alice@example.com"}Advanced Repair Scenarios
Scenario 1: Mixed Data Types
Problem:
{"age":"30","active":"true","count":"0"}Prompt:
Fix this JSON by converting string numbers to actual numbers and string booleans to actual booleans.
Corrupted: {"age":"30","active":"true","count":"0"}
Expected types:
- age: number
- active: boolean
- count: number
Output:
Result:
{"age":30,"active":true,"count":0}Scenario 2: Nested Structure Repair
Problem:
{"user":{"name":"Alice""email":"alice@example.com"},"status":"active"}Prompt:
Fix this JSON with missing comma in nested object.
Corrupted: {"user":{"name":"Alice""email":"alice@example.com"},"status":"active"}
Add missing comma between "name" and "email" fields.
Output:
Result:
{"user":{"name":"Alice","email":"alice@example.com"},"status":"active"}Scenario 3: Array Reconstruction
Problem:
{"items":"[1,2,3]"}Prompt:
Fix this JSON where array is incorrectly stored as string.
Corrupted: {"items":"[1,2,3]"}
Convert "items" value from string to actual array.
Output:
Result:
{"items":[1,2,3]}Validate Repaired JSON
After AI repairs your JSON, paste it here to verify it's syntactically correct and ready to use.
Validate Repair →Pro Tips for AI-Powered Repairs
Tip 1: Provide Context
Bad:
Fix this JSON: {broken data}
Good:
This JSON came from a MySQL export. Fix by adding missing quotes and converting string numbers to actual numbers.
Broken: {data}
Tip 2: Specify Expected Structure
Expected structure:
{
"users": [
{"id": number, "name": string, "active": boolean}
]
}
Tip 3: Handle Large Files in Chunks
For files >1000 lines:
- Split into chunks of 100-200 lines
- Repair each chunk separately
- Combine repaired chunks
- Validate final result
Tip 4: Preserve Original Data
Always keep a backup before AI repair:
cp broken.json broken.json.backupReal-World Use Cases
Use Case 1: Legacy System Export
Scenario: Export from old CRM system produces malformed JSON
Prompt:
Fix this JSON exported from a legacy CRM system.
Issues:
- Single quotes instead of double quotes
- Dates in non-standard format
- Phone numbers with inconsistent formatting
Corrupted data:
{'customer_name': 'Alice Smith', 'signup_date': '01/15/2024', 'phone': '555.123.4567'}
Convert to:
- Double quotes
- ISO 8601 dates
- Standard phone format (XXX) XXX-XXXX
Output:
Use Case 2: API Scraping Cleanup
Scenario: Scraped API responses have HTML entities
Prompt:
Clean this JSON from web scraping.
Corrupted:
{"title":"Alice\u0026#39;s Project","description":"A \u0026quot;great\u0026quot; tool"}
Decode all HTML entities to actual characters.
Output:
Use Case 3: Log File Parsing
Scenario: Application logs with incomplete JSON entries
Prompt:
Repair these incomplete JSON log entries.
Logs:
{"timestamp":"2024-01-15T10:30:00","level":"ERROR","message":"Failed to
{"timestamp":"2024-01-15T10:31:00","level":"INFO","message":"Success"}
Complete truncated entries with reasonable values and ensure all entries are valid JSON.
Output as JSON array:
Validation Checklist
After AI repair, verify:
✅ Syntax: No parse errors
✅ Structure: Matches expected schema
✅ Data Types: Numbers are numbers, booleans are booleans
✅ Completeness: No truncated values
✅ Encoding: Special characters display correctly
✅ Consistency: Field names match across records
Use our JSON Validator for automated checks.
When AI Repair Fails
Limitation 1: Severe Truncation
If >50% of data is missing, AI can't reliably reconstruct it.
Solution: Request original data or use partial recovery.
Limitation 2: Ambiguous Structure
If the corruption makes structure unclear, AI may guess wrong.
Solution: Provide explicit schema in prompt.
Limitation 3: Binary Data
AI can't repair binary corruption in JSON.
Solution: Re-export from source if possible.
Automation Script
For batch repairs, use this workflow:
import openai
import json
def repair_json_with_ai(broken_json, expected_structure):
prompt = f"""
You are a JSON repair specialist. Output ONLY valid JSON.
Corrupted JSON:
{broken_json}
Expected structure:
{expected_structure}
Fix all syntax errors and output valid JSON:
"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
repaired = response.choices[0].message.content
# Validate
try:
json.loads(repaired)
return repaired
except:
return None
# Usage
broken = '{"name": Alice, "age": 30}'
expected = '{"name": "string", "age": number}'
fixed = repair_json_with_ai(broken, expected)
print(fixed)Conclusion
AI-powered JSON repair saves hours of manual work. By providing clear context, expected structure, and specific repair instructions, you can fix most corrupted JSON automatically.
Key Takeaways:
- Identify corruption type first
- Provide expected structure
- Use specific repair templates
- Always validate output
- Keep backups of original data
Master Repair Template:
Corrupted JSON: {YOUR_DATA}
Expected structure: {SCHEMA}
Issue: {PROBLEM_DESCRIPTION}
Fix by: {SPECIFIC_INSTRUCTIONS}
Output ONLY valid JSON:
Repair Your JSON Now
Use our Smart Repair tool to automatically fix common JSON syntax errors.
Try Smart Repair →