An interview scorecard is the operational bridge between a structured interview and a structured hiring decision. Without a scorecard, interviewers leave interviews with impressions — some vivid, some faded, all subjective — and the hiring decision is made through an informal, largely undocumented conversation. With a scorecard, every interviewer produces a consistent record of their evaluation that can be aggregated, compared, and used as evidence in the hiring decision.

A well-designed interview scorecard includes: the competencies being assessed (typically derived from the role's competency profile), a rating scale for each competency (typically a 4- or 5-point scale with behavioural anchors at each level), the specific interview questions asked, space for qualitative notes that explain the rating, a section for overall impressions, a hire/no-hire recommendation, and any specific strengths or concerns for the debrief discussion.

Scorecards create accountability in the interview process. When interviewers know their evaluation will be documented and compared to those of colleagues, they invest more care in the assessment. When discrepancies between scorers on the same competency are visible, they generate productive calibration conversations that surface different interpretations of the evidence. Over time, scorecard data can be analysed against hire outcomes to identify which interviewers and which criteria are most predictive of success.

The timing of scorecard completion is important. Scores should be completed within a few hours of the interview, while specific observations are still fresh, not days later when memory has faded to a general impression. ATS platforms that make scorecard completion easy from any device — including mobile — improve completion rates and timeliness significantly.

Key Points: Interview Scorecard

  • Competency-based structure: Scorecards are organised around the specific competencies relevant to the role, not general impressions.
  • Behavioural anchors: Each rating point on the scale has defined behavioural descriptors that guide consistent scoring across interviewers.
  • Documentation: Scorecards create a permanent record of evaluation rationale, essential for legally defensible decisions.
  • Calibration data: Score comparisons across interviewers for the same candidate reveal interpretation differences and calibrate future evaluations.
  • Timely completion: Completing within hours of the interview — while observations are fresh — produces more reliable data than retrospective scoring.

How Interview Scorecard Works in Treegarden

Interview Scorecard in Treegarden

Treegarden's scorecard system is built into the interview workflow. When an interview is scheduled, the assigned scorecard is accessible to interviewers on any device. Scorecards include the competencies, questions, rating scale with anchors, and notes fields. Interviewers complete their scores independently; the hiring team sees aggregated results in the candidate pipeline. All scorecard data is stored permanently on the candidate record for audit, compliance, and analytical purposes.

See how Treegarden handles Interview Scorecard → Book a demo

Related HR Glossary Terms

Frequently Asked Questions About Interview Scorecard

A 4-point or 5-point rating scale is most commonly used in interview scorecards. A 4-point scale (1=Does not meet standard, 2=Partially meets standard, 3=Meets standard, 4=Exceeds standard) has the advantage of eliminating a neutral midpoint that interviewers often default to out of indecision — it forces a lean toward either meeting or not meeting the standard. A 5-point scale allows a true middle rating for candidates who demonstrate the competency adequately but not exceptionally. Both can work well; the key requirement is that each point on the scale has clear, behavioural anchor descriptions that define what a response at that level looks like. A rating scale without anchor descriptions produces impressionistic scores that cannot be reliably compared across interviewers or over time.

Scorecard completion quality is influenced by design and accountability. On the design side: make completion as frictionless as possible (accessible from mobile, pre-populated with the relevant competencies and questions); require both a numerical rating and a qualitative note for each competency rather than allowing scores without explanation; and structure the scorecard so it takes 10-15 minutes to complete thoughtfully — too short and it invites carelessness, too long and it invites rushing. On the accountability side: make it visible that scores will be compared in the hiring debrief; train interviewers on how to use the scorecard before they use it; and have the recruiting team follow up promptly (same day) on missing scorecards so that delayed completion becomes visible quickly. Organisations that treat scorecard completion as optional or allow decisions to be made without complete evaluations signal that the process doesn't matter.

Yes — interview scorecards are format-agnostic and apply equally to in-person, video, and phone interviews. For video and phone interviews, the same structured questions and scoring criteria apply. The observable evidence may differ slightly — body language signals that are visible in person may be absent or reduced on video — but the verbal content of candidates' responses and the quality of the STAR-structured examples they provide are equally assessable. Some organisations maintain the same scorecard for all interview formats; others make minor modifications to reflect the reduced observability of certain behaviours in remote formats. The core competency questions and scoring criteria should remain consistent regardless of format to ensure comparable evaluation data across candidates who are interviewed through different modalities.

Calibration is the process of aligning interviewers' interpretation of the rating scale and behavioural anchors so that the same candidate answer receives comparable scores from different interviewers. Calibration is done through group exercises where interviewers score the same recorded interview response independently, then compare and discuss their scores. Discrepancies reveal interpretation differences — one interviewer rated a response as a 4 (Exceeds) while another rated it as a 2 (Partially meets) — which can be explored and resolved through discussion of the behavioural anchors. Regular calibration sessions (quarterly or ahead of high-volume hiring periods) improve inter-rater reliability over time. ATS platforms that store historical scorecard data enable analytical calibration: identifying interviewers who systematically score higher or lower than their colleagues for the same candidates, flagging patterns that require recalibration.