Four Checkpoints for AI Hiring Agents (with Human Approval)

A well-designed autonomous hiring agent should pause for human approval at exactly four moments: rejecting a candidate, sending the first outreach, booking an interview, and publishing a borderline score. Everywhere else, let it ship.

In May 2025, a federal judge let a discrimination lawsuit against Workday move forward as a nationwide class action. The plaintiff is a Black man over 40 who applied to more than 100 jobs through Workday-powered systems and was rejected, sometimes within minutes of hitting submit. Buried in the court filings is a number that has stuck with me ever since I read it: Workday represented that 1.1 billion applications were rejected by its software during the relevant period.

One point one billion.

I keep coming back to it. Not because I think Workday's screener is uniquely bad. It isn't. The same problems live inside every modern ATS. I keep coming back because of what the number implies. Somewhere along the way, the industry quietly accepted that "AI rejected you in 47 seconds" was an acceptable outcome. We built systems that act on people, at scale, with nobody watching.

When we started building Curriculo's hiring agent, we had to decide where to draw that line. Where does the agent get to act on its own, and where does it stop and wait for a person? We argued about it for the better part of a year. We landed on four places. Just four. We call them the checkpoints.

Why pure autonomy is the wrong default

The dream of fire-and-forget hiring AI was always a sales pitch dressed up as a roadmap. There is a reason the 2025 SHRM Benchmarking Survey found that average cost-per-hire and time-to-hire both went up during the same three-year period that generative AI flooded the recruiting stack. Speed without control isn't speed. It is just churn at a higher frame rate.

The pattern Anthropic describes for production agents is the correct one. Agents plan and operate independently, then return to a human "for further information or judgement." LangGraph engineers call it interrupt-and-resume. HR leads call it "the part where I sign off." Same idea, different vocabulary.

The hard part is not admitting you need checkpoints. It is deciding which actions are checkpoint-worthy and which are noise. Force the agent to ask permission too often and you have built something worse than a spreadsheet. Ask too little and you ship a 1.1-billion-rejection machine. Most of our internal arguments were about that calibration, not about the principle.

The four checkpoints

Here is the framework we landed on. These are the only four moments our hiring agent will not act on its own.

1. Reject decision

The agent can score a candidate as a strong no. It cannot send the rejection. It drafts the message, surfaces the specific reasons it scored low, links back to the resume passage that drove the call, and waits for a human to press send.

This is the most important checkpoint, and we knew it would be from day one. Roughly 75% of job applications received no reply at all in 2025. Fifty-three percent of job seekers have been ghosted by an employer they applied to. Every one of those silences is a small reputation tax. Candidates who get ghosted by a company are 37% less likely to buy from that company later. Seventy-two percent of them post about the experience publicly.

Rejection is not the cruel part of hiring. Auto-rejection without a human looking is. So our agent proposes. A person sends.

2. First outreach

Sourcing emails are the second checkpoint. The agent can find the candidate, draft a message, and pull in the right hooks: a recent project, a shared connection, the specific line in a portfolio that matches the role. It cannot send the email by itself.

Why not? Because outbound is your brand talking out loud. If the agent gets the tone wrong once, you have annoyed a senior engineer who is now describing the experience in a Slack group with two thousand other engineers. Drafting at scale is fine. Sending at scale, on your behalf, with no human in the loop, is how a company becomes a cautionary tale.

The recruiter edits or sends the draft as-is. Either way, a human's name lands in the inbox.

3. Interview booking

The agent watches the back-and-forth, proposes time slots based on the panel's actual calendar and the candidate's stated availability, and writes the confirmation note. It does not place a meeting on anyone's calendar without a human "yes."

Part of the reason is cultural. Calendars are personal. Most recruiters I know would rather miss a meeting than be told one was added without their knowledge. But there is a quieter reason too. The booking step is where mistakes get expensive. Wrong time zone. Wrong panel member. Wrong job number on the invite. The cost of a clean human approval at this checkpoint is a few seconds of attention. The cost of a bad auto-booking is a candidate who quietly drops out and tells five friends that the process felt sloppy.

4. Score publish

This is the only checkpoint with a confidence band, and it is where we let the agent operate with the most freedom.

Every candidate gets a 0-100 score. If the score is above the high-confidence threshold or below the low-confidence one, the agent publishes it to the pipeline and moves on. If the score lands in the murky middle, the "this candidate is interesting but I am not sure" zone, the agent queues it for human review and writes its reasoning so a recruiter can read the call like a short memo.

This is the only place we trust the agent to act without asking, and we only trust it because the action is reversible. A score is a label, not a decision. A wrong label can be re-scored in three seconds. A bad rejection email cannot be un-sent.

What changes when you build this way

The day we shipped this design, two things shifted that we did not fully predict.

First, recruiters started spending their time on the part of the job that is actually theirs. Not screening three hundred resumes for the same role at eleven seconds apiece. Reading the agent's notes, signing off on the calls that need a real person, replying to candidates who deserve a human voice. The twenty-three hours of "screening" per hire that the industry quotes is mostly clicking. The agent absorbs the clicking. The recruiter does the work that requires being human.

Second, and this caught us off guard, candidates started replying more. We had assumed automated outreach would feel automated. It does, when it is automated all the way through. When a person edits the draft, even by a single sentence, and presses send, the email lands differently. Reply rates went up. Ghosting went down on both sides. We did not write that in the spec. It just happened.

There is a lesson in there I am still chewing on. The part of hiring AI that actually wins is not the autonomy. It is the discipline about where the autonomy stops. The agent that pauses four times beats the agent that never pauses, every time we have measured it.

The smallest possible checkpoint count

If you are building an agentic system inside your own product, hiring or otherwise, the question is not "how many guardrails should I add." It is "what is the smallest set of moments where a human signature meaningfully changes the outcome." For us, it was four.

Your number might be different. The principle will not be. Reversible actions can be automated. Irreversible ones, especially the ones that touch a real person on the other side of the screen, should pause and ask.

If you want to see this running in production, our hiring agent lives inside a Gmail-style inbox at curriculo.me. It pauses at the four checkpoints above. Everywhere else, it just runs, and the recruiter spends their day on the parts of hiring that need a human anyway. The free plan is enough to feel the difference on a real role, which is the only honest way to evaluate any of this.

FAQ

What is a checkpoint in an AI hiring agent?

A checkpoint is a moment where an autonomous agent stops, presents its current state and proposed action to a human, and waits for approval before continuing. It is the standard human-in-the-loop pattern used by production agentic systems built on frameworks like LangGraph.

Why not let the AI hiring agent fully automate rejections?

Auto-rejection at scale damages employer brand and exposes the company to discrimination claims. The Workday class action certified in May 2025 covers a population that may include "hundreds of millions" of applicants. A human review on every rejection is the cheapest insurance policy a hiring team can buy.

How is this different from a normal ATS workflow?

A normal ATS waits for a human to do every step. A checkpointed agent does every step automatically and only stops at four moments. The recruiter goes from doing one hundred percent of the work to approving four decisions per candidate.

Where does Curriculo fit in?

Curriculo is an applicant tracking system built around a personal AI hiring agent and a Gmail-style inbox. The agent runs the four-checkpoint pattern by default. There is no per-seat fee on the free plan, which is the cheapest way to test whether checkpointed agents change anything for your team.

Command Palette