JSONAIschemastandardizingcandidateprofilessystems

Scaling Agentic Recruitment: Using JSON Schema to Solve Data Hallucinations in AI-Driven Candidate Sourcing

By Maria José González Antelo· June 15, 2026

Photo by Daniil Komov on Unsplash

Scaling Agentic Recruitment: Using JSON Schema to Solve Data Hallucinations in AI-Driven Candidate Sourcing

With nearly a decade in IT Human Resources and a technical background in AI solutions, I have seen the "hallucination" problem firsthand. When you deploy an agentic workflow to source candidates from resumes or LinkedIn profiles, the LLM often "invents" skills or misinterprets years of experience to fit a prompt's desired outcome.

In recruitment, a hallucinated "Java" skill or an incorrect "years of experience" count isn't just a technical bug—it's a failed hire or a wasted interview.

To scale agentic recruitment, you cannot rely on natural language prompts alone. You must enforce Structured Output. By using JSON Schema, we move from "asking the AI to be accurate" to "forcing the AI to adhere to a strict data contract."

The Problem: The "Creative" Recruiter AI

Standard prompts like "Extract the candidate's experience into a list" often result in inconsistent formats:

Candidate A: {"exp": "5 years"}
Candidate B: {"experience": "Since 2018"}
Candidate C: {"years_of_experience": "Approx 4-6"}

This inconsistency breaks downstream automation (like ATS integration or scoring algorithms).

The Solution: Enforcing JSON Schema

By defining a strict schema, we ensure the agent returns a predictable object. This allows us to validate the data before it ever reaches the database.

Below is a production-ready implementation using a JSON Schema to standardize candidate extraction.

1. The Schema Definition (`candidate-schema.json`)

This schema ensures that the AI cannot return a string where an integer is required and enforces a specific set of categories.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "CandidateProfile",
  "type": "object",
  "properties": {
    "full_name": { "type": "string" },
    "years_of_experience": {
      "type": "integer",
      "minimum": 0,
      "description": "Total years of professional experience"
    },
    "primary_stack": {
      "type": "array",
      "items": { "type": "string" },
      "minItems": 1
    },
    "top_skills": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "skill": { "type": "string" },
          "level": { "enum": ["Junior", "Mid", "Senior", "Expert"] }
        },
        "required": ["skill", "level"]
      }
    },
    "contact_info": {
      "type": "object",
      "properties": {
        "email": { "type": "string", "format": "email" },
        "linkedin_url": { "type": "string", "format": "uri" }
      },
      "required": ["email"]
    }
  },
  "required": ["full_name", "years_of_experience", "primary_stack"]
}

2. Implementation: The System Prompt

When calling your LLM (GPT-4o, Claude 3.5, etc.), you must pass the schema and explicitly instruct the agent to adhere to it.

**System Prompt:**
You are a technical recruitment agent. Your task is to extract candidate data from the provided text.
You MUST return the data strictly as a JSON object following the provided JSON Schema.
If a field is missing from the text, return `null` for that specific field.
Do not invent data. Do not provide conversational filler.

**Schema:**
[Insert JSON Schema Above]

3. Validation Logic (Python)

To ensure the agent didn't hallucinate a format, use a validator. This prevents "garbage in, garbage out" scenarios.

import jsonschema
from jsonschema import validate

# The schema defined above
candidate_schema = {
    "type": "object",
    "properties": {
        "full_name": {"type": "string"},
        "years_of_experience": {"type": "integer"},
        "primary_stack": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["full_name", "years_of_experience", "primary_stack"]
}

# Response received from the AI Agent
ai_response = {
    "full_name": "Jane Doe",
    "years_of_experience": 7,
    "primary_stack": ["Python", "AWS", "Kubernetes"]
}

try:
    validate(instance=ai_response, schema=candidate_schema)
    print("✅ Data Validated: Ready for ATS Import")
except jsonschema.exceptions.ValidationError as e:
    print(f"❌ Hallucination/Format Error: {e.message}")
    # Trigger a retry loop or flag for human review

Technical Insight: Why this scales

When you scale to sourcing 1,000+ candidates per day, manual verification is impossible. Using JSON Schema allows you to:

Automate Quality Control: Any response that fails validation is automatically routed back to the LLM for a "self-correction" pass.
Deterministic Filtering: You can run a simple if candidate['years_of_experience'] >= 5 without worrying about parsing strings like "around five years."
Integration Ready: Your data is immediately ready for API POST requests to your ATS or CRM.

Next Steps for Your Infrastructure

If you are building these agents, the biggest bottleneck is often the quality of the data entering the pipeline. Before you scale your AI agents, ensure your landing pages and site tracking are optimized to capture the right signals.

To audit your current site's technical performance and ensure your conversion funnels are working properly, I recommend using inspect-my-site.com. A clean, fast site ensures that the candidates your agents find actually convert into applicants.

About the Author: Maria Jose Gonzalez Antelo is a professional content writer with nearly 10 years of experience in IT Human Resources. She specializes in the intersection of recruitment and AI solutions, leveraging a strong technical background to build scalable talent acquisition workflows.

Scaling Agentic Recruitment: Using JSON Schema to Solve Data Hallucinations in AI-Driven Candidate Sourcing

Scaling Agentic Recruitment: Using JSON Schema to Solve Data Hallucinations in AI-Driven Candidate Sourcing

The Problem: The "Creative" Recruiter AI

The Solution: Enforcing JSON Schema

1. The Schema Definition (candidate-schema.json)

2. Implementation: The System Prompt

3. Validation Logic (Python)

Technical Insight: Why this scales

Next Steps for Your Infrastructure

1. The Schema Definition (`candidate-schema.json`)