Scaling Agentic Recruitment: Using JSON Schema to Solve Data Hallucinations in AI-Driven Candidate Sourcing

Photo by Daniil Komov on Unsplash
Scaling Agentic Recruitment: Using JSON Schema to Solve Data Hallucinations in AI-Driven Candidate Sourcing
With nearly a decade in IT Human Resources and a technical background in AI solutions, I have seen the "hallucination" problem firsthand. When you deploy an agentic workflow to source candidates from resumes or LinkedIn profiles, the LLM often "invents" skills or misinterprets years of experience to fit a prompt's desired outcome.
In recruitment, a hallucinated "Java" skill or an incorrect "years of experience" count isn't just a technical bug—it's a failed hire or a wasted interview.
To scale agentic recruitment, you cannot rely on natural language prompts alone. You must enforce Structured Output. By using JSON Schema, we move from "asking the AI to be accurate" to "forcing the AI to adhere to a strict data contract."
The Problem: The "Creative" Recruiter AI
Standard prompts like "Extract the candidate's experience into a list" often result in inconsistent formats:
- Candidate A:
{"exp": "5 years"} - Candidate B:
{"experience": "Since 2018"} - Candidate C:
{"years_of_experience": "Approx 4-6"}
This inconsistency breaks downstream automation (like ATS integration or scoring algorithms).
The Solution: Enforcing JSON Schema
By defining a strict schema, we ensure the agent returns a predictable object. This allows us to validate the data before it ever reaches the database.
Below is a production-ready implementation using a JSON Schema to standardize candidate extraction.
1. The Schema Definition (candidate-schema.json)
This schema ensures that the AI cannot return a string where an integer is required and enforces a specific set of categories.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "CandidateProfile",
"type": "object",
"properties": {
"full_name": { "type": "string" },
"years_of_experience": {
"type": "integer",
"minimum": 0,
"description": "Total years of professional experience"
},
"primary_stack": {
"type": "array",
"items": { "type": "string" },
"minItems": 1
},
"top_skills": {
"type": "array",
"items": {
"type": "object",
"properties": {
"skill": { "type": "string" },
"level": { "enum": ["Junior", "Mid", "Senior", "Expert"] }
},
"required": ["skill", "level"]
}
},
"contact_info": {
"type": "object",
"properties": {
"email": { "type": "string", "format": "email" },
"linkedin_url": { "type": "string", "format": "uri" }
},
"required": ["email"]
}
},
"required": ["full_name", "years_of_experience", "primary_stack"]
}
2. Implementation: The System Prompt
When calling your LLM (GPT-4o, Claude 3.5, etc.), you must pass the schema and explicitly instruct the agent to adhere to it.
**System Prompt:**
You are a technical recruitment agent. Your task is to extract candidate data from the provided text.
You MUST return the data strictly as a JSON object following the provided JSON Schema.
If a field is missing from the text, return `null` for that specific field.
Do not invent data. Do not provide conversational filler.
**Schema:**
[Insert JSON Schema Above]
3. Validation Logic (Python)
To ensure the agent didn't hallucinate a format, use a validator. This prevents "garbage in, garbage out" scenarios.
import jsonschema
from jsonschema import validate
# The schema defined above
candidate_schema = {
"type": "object",
"properties": {
"full_name": {"type": "string"},
"years_of_experience": {"type": "integer"},
"primary_stack": {"type": "array", "items": {"type": "string"}},
},
"required": ["full_name", "years_of_experience", "primary_stack"]
}
# Response received from the AI Agent
ai_response = {
"full_name": "Jane Doe",
"years_of_experience": 7,
"primary_stack": ["Python", "AWS", "Kubernetes"]
}
try:
validate(instance=ai_response, schema=candidate_schema)
print("✅ Data Validated: Ready for ATS Import")
except jsonschema.exceptions.ValidationError as e:
print(f"❌ Hallucination/Format Error: {e.message}")
# Trigger a retry loop or flag for human review
Technical Insight: Why this scales
When you scale to sourcing 1,000+ candidates per day, manual verification is impossible. Using JSON Schema allows you to:
- Automate Quality Control: Any response that fails validation is automatically routed back to the LLM for a "self-correction" pass.
- Deterministic Filtering: You can run a simple
if candidate['years_of_experience'] >= 5without worrying about parsing strings like "around five years." - Integration Ready: Your data is immediately ready for API POST requests to your ATS or CRM.
Next Steps for Your Infrastructure
If you are building these agents, the biggest bottleneck is often the quality of the data entering the pipeline. Before you scale your AI agents, ensure your landing pages and site tracking are optimized to capture the right signals.
To audit your current site's technical performance and ensure your conversion funnels are working properly, I recommend using inspect-my-site.com. A clean, fast site ensures that the candidates your agents find actually convert into applicants.
About the Author: Maria Jose Gonzalez Antelo is a professional content writer with nearly 10 years of experience in IT Human Resources. She specializes in the intersection of recruitment and AI solutions, leveraging a strong technical background to build scalable talent acquisition workflows.