Automated two-stage screening pipeline: Claude evaluates candidates against configurable binary signals, then deterministic tier logic shortlists the top performers.
Project Tags
Hiring for AI safety roles requires evaluating candidates across dozens of nuanced dimensions—alignment research intuition, policy fluency, technical depth, publication quality. Each signal demands careful reading of CVs, LinkedIn profiles, GitHub repos, and personal websites. A single recruiter screening 200 candidates against 15 signals is looking at 3,000 individual judgement calls.
The process was slow, inconsistent, and expensive. Different reviewers weighted the same evidence differently. Re-screening when role requirements shifted meant starting from scratch. And the best candidates—the ones who clear every bar—were buried in the same queue as everyone else.
Stage one uses Claude to evaluate each candidate against configurable binary signals—pass, fail, or unknown—with evidence citations drawn from assembled context: Airtable profiles, CV PDFs, LinkedIn data, GitHub, and personal websites. Signal definitions live in Airtable as the source of truth. Evaluations can be scoped by role, run in batch mode via Anthropic’s Batches API for 50% cost savings, or executed synchronously for quick iterations.
Stage two is pure threshold math. Role configs define three tiers—hard requirements (all must pass), core competencies (configurable threshold), and differentiators (configurable threshold)—and the shortlister applies them deterministically against existing verdicts. No API calls, fully auditable, free to re-run whenever requirements change. The two stages are decoupled: evaluate incrementally, shortlist on demand.