The Evidence Problem in Candidate Assessments
The psychometric research on what actually predicts job performance is decades old and largely settled. Industrial-organisational psychologists have produced large-scale meta-analyses — most prominently Schmidt and Hunter's 1998 work, updated across subsequent studies — that rank assessment methods by their predictive validity. The findings are consistent: structured interviews, work sample tests, and cognitive ability measures are the strongest predictors. Unstructured interviews, reference checks, and most personality assessments are weak predictors when used alone.
The problem is that the assessment market does not sell primarily to researchers. It sells to HR teams under pressure to hire faster, demonstrate process rigour, and reduce legal risk. Many tools marketed as "predictive" are statistically validated in ways that would not pass peer review. "Face validity" — the assessment feels relevant to the role — is frequently confused with actual predictive validity.
Understanding this distinction matters before you add any assessment to your hiring funnel. Adding an assessment that does not predict performance adds cost, adds dropout, and creates the false impression of objectivity while still producing subjective outcomes.
The Main Assessment Categories and What the Evidence Says
Cognitive ability tests measure general mental ability — working memory, processing speed, abstract reasoning, and verbal and numerical comprehension. Meta-analyses consistently show cognitive ability to be one of the strongest single predictors of job performance across almost every role type. The catch is legal and ethical: cognitive tests show group-level differences that can create adverse impact (disproportionate screening-out of certain demographic groups). This does not make them unusable, but it does require validation and monitoring.
Work sample tests ask candidates to perform actual job-relevant tasks — a writing test for a communications role, a code review for a software engineer, a sales call simulation for a sales representative. Work samples have high predictive validity, high face validity (candidates understand why the test is relevant), and lower adverse impact than cognitive tests. The limitation is cost: designing and evaluating work samples requires domain expertise and evaluator time.
Situational Judgement Tests (SJTs) present candidates with realistic workplace scenarios and ask them to select from a range of possible responses. SJTs are popular for high-volume hiring because they can be administered at scale, are less susceptible to coaching than cognitive tests, and show lower adverse impact. Their predictive validity is moderate — stronger when the scenarios are role-specific and the response options are clearly differentiated.
Personality assessments measure stable traits — typically using frameworks like the Big Five (OCEAN) or proprietary models. Their predictive validity for job performance is generally modest when used alone, but Conscientiousness shows consistent positive correlation with performance across roles. The bigger issue is that personality assessments are frequently misused: applied at early stages, used as primary decision criteria, and interpreted by evaluators without psychometric training. Used as supplementary data in combination with structured interviews and work samples, they add incremental value.
Skills tests verify specific technical or functional competencies — Excel proficiency, language fluency, coding language knowledge, accounting software familiarity. Their predictive validity for the specific skills being tested is high. The limitation is scope: a skills test tells you a candidate can do X today. It does not predict adaptability, learning velocity, or performance in contexts where that skill is deployed differently.
Predictive Validity Quick Reference
Strongest predictors (r > 0.50): Work sample tests, structured interviews, cognitive ability tests. Moderate predictors (r = 0.30–0.50): Situational judgement tests, job knowledge tests. Weaker predictors (r < 0.30 alone): Personality assessments, reference checks, unstructured interviews, years of experience. Source: Schmidt & Hunter (1998) and subsequent meta-analyses.
When to Use Assessments in the Hiring Funnel
Placement of assessments in the recruitment funnel affects both their effectiveness and candidate experience. The general principle is that assessment burden should increase with funnel progression — candidates should not invest significant time in assessments until they have cleared initial screening and the company has invested time reviewing their application.
Pre-application and early screening. Short cognitive or skills tests (10–20 minutes) can be deployed before or immediately after application for high-volume roles where minimum competency thresholds must be established quickly. This is appropriate for customer service, retail, logistics, and administrative roles. For specialist or senior roles, pre-application tests damage candidate experience and signal low employer brand sophistication.
Post-CV screen, pre-interview. This is the most common and usually the most appropriate placement. A candidate who has passed CV screening has already demonstrated baseline relevance. A work sample or SJT at this stage adds substantive information and is felt by candidates as a natural progression rather than an arbitrary gate.
Between interview stages. Cognitive assessments, personality profiles, or extended work samples are appropriate between interview stages for roles where deeper evaluation of specific factors matters — management assessment centres, for example, or technical coding assessments for senior engineering roles.
ATS Assessment Integration in Treegarden
Treegarden connects assessment stages directly to the recruitment pipeline. When a candidate advances to a defined stage, assessment invitations dispatch automatically. Results appear on the candidate profile alongside CV, interview notes, and AI Match Score — giving hiring teams a unified view of every data point before making a decision.
Adverse Impact and Legal Obligations in Europe
Any assessment that disproportionately screens out candidates from a protected group — defined by gender, race, national origin, disability, age, or religion — raises legal risk under EU employment law and the European Convention on Human Rights. This applies even when discrimination is unintentional.
Adverse impact monitoring requires tracking pass rates by demographic group and comparing them to the overall pass rate. The commonly used "4/5ths rule" (also called the 80% rule) flags a selection method for review if the pass rate for a protected group is less than 80% of the pass rate for the highest-passing group. European jurisdictions may use different thresholds and methodologies, but the underlying principle — disproportionate impact requires justification — is broadly consistent.
The justification defence for adverse impact is job-relatedness. If a cognitive test disproportionately screens out one demographic group but can be demonstrated through validation studies to be genuinely predictive of job performance for that specific role, it may be lawful. This validation requirement is why off-the-shelf cognitive tests from reputable providers typically come with role-level validation data — and why you should never use a cognitive assessment that cannot produce this evidence.
GDPR adds another layer: assessment results are personal data. You need a lawful basis for processing, a retention policy, and the ability to respond to data subject access requests that include assessment data.
Building an Assessment Strategy That Works
An effective assessment strategy starts with a clear job analysis. Before selecting any assessment, define the key competencies and knowledge areas that distinguish high performers in the role. This job analysis should be documented, involve multiple stakeholders (including current high performers where possible), and be updated when the role or context changes significantly.
With a documented competency model, assessment selection becomes a matching exercise. For each competency, identify the assessment method with the best validity evidence for measuring it. For cognitive load-heavy roles, a cognitive test is appropriate. For roles requiring specific technical skills, a work sample test is preferable. For client-facing roles where interpersonal judgment matters, a situational judgement test adds value.
Resist the temptation to assess everything. Each additional assessment adds time cost for candidates and evaluator cost for the team. Assess only what you genuinely cannot evaluate from CV, interview, or other available information. Three targeted assessments aligned to key competencies will outperform ten broad assessments in predictive power and candidate completion rates.
Candidate Time Investment Benchmarks
Early-stage screen (pre-interview): maximum 20–30 minutes. Mid-funnel assessment (post-first interview): up to 60 minutes. Late-stage assessment centre or extended work sample: up to 3 hours, but only for senior roles where the stakes justify the candidate investment. Always disclose estimated completion time in the invitation.
Integrating Assessments with Your ATS: What Good Looks Like
An assessment that exists outside your ATS creates administrative burden and fragmented decision-making. Evaluators end up toggling between systems, scores are not visible when reviewing CVs, and candidates who complete assessments but are not progressed may never receive a response because the assessment tool has no connection to the rejection workflow.
Good ATS-assessment integration works as follows. When a recruiter moves a candidate to the "assessment" stage in the Kanban pipeline, an invitation email dispatches automatically with the candidate's name, the role title, and a direct link to the assessment. The candidate completes the assessment in the third-party tool. On completion, results — scores, percentile rankings, or flag indicators — write back into the candidate record in the ATS. The recruiter reviews the assessment results alongside the CV and any previous interview notes within a single profile view. If the assessment result meets the defined threshold, the recruiter advances the candidate; if not, the rejection workflow triggers automatically.
This sequence eliminates manual data entry, ensures no candidate is stranded between systems, and gives every decision-maker a complete picture of the candidate before they form a view.
Frequently Asked Questions
Which type of assessment best predicts job performance?
Meta-analyses consistently show that work sample tests, structured interviews, and cognitive ability tests are the strongest predictors of job performance. Personality tests are useful supplementary data but should not be used as the primary decision-making tool. The combination of a cognitive test and a structured interview outperforms almost any single assessment.
Are personality assessments legal in European recruitment?
Personality assessments are legal in Europe provided they are job-relevant, non-discriminatory, and handled under GDPR. You must have a lawful basis for processing, obtain candidate consent where required, and ensure assessment data is not used in ways that could indirectly discriminate based on protected characteristics. Always validate assessments for the specific role before deploying them.
How long should candidate assessments take?
Keep assessments under 45 minutes for early-stage screening. Completion rates drop sharply for assessments exceeding one hour. Reserve longer work samples and technical tasks for later stages when candidates are more invested. Always communicate the expected time commitment in the invitation so candidates can plan appropriately.
Should assessments be completed before or after the CV review?
It depends on volume and role type. For high-volume roles, deploying a short cognitive or skills screen before CV review can reduce reviewer workload significantly. For specialist roles, CV review typically happens first to ensure minimum qualification requirements are met before investing candidate time in an assessment.
How do I integrate assessments with my ATS?
Look for assessment platforms that offer ATS integration via API or native connectors. At minimum, the integration should automatically send assessment invitations when a candidate reaches the relevant pipeline stage, and write assessment scores back into the candidate record so they appear alongside CV and interview notes for decision-making.