Engineering

Site Reliability Engineer Job Description Template (Free, 2026)

SREs sit at the intersection of software engineering and operations, using code to solve reliability, scalability, and performance problems at scale, and attracting them requires radical transparency about your SLOs, on-call culture, and error budget philosophy. Includes 2026 US salary benchmarks and ATS-optimized formatting.

Post in Treegarden

Copy-ready template

Job Title: Site Reliability Engineer [Mid-Level / Senior / Staff]
Department: Engineering / Infrastructure
Location: [City, State] / Remote / Hybrid
Reports To: Engineering Manager / VP Engineering
Employment Type: Full-Time

About [Company Name]
[Company Name] is a [stage/sector] company operating [product] at [scale, e.g., X requests/day, X TB/day]. We run on [AWS / GCP / Azure] and take reliability seriously, and our engineering teams own their SLOs and operate with full production accountability. We're looking for engineers who make systems more reliable through software, not just process.

About the Role
As a Site Reliability Engineer, you will partner with product engineering teams to define SLOs, build observability infrastructure, automate operational toil, and respond to production incidents. You will spend at least 50% of your time on engineering work - building tooling, improving the deployment pipeline, and designing for failure modes - not firefighting. This is a high-trust, high-impact role where your decisions directly influence the reliability of systems used by [X] customers.

Key Responsibilities
• Define, measure, and enforce SLOs and error budgets across critical services with product engineering teams
• Build and maintain observability infrastructure: metrics, logging, tracing (Prometheus, Grafana, Datadog, Jaeger)
• Develop and operate deployment automation, canary analysis, and progressive delivery pipelines
• Lead incident response: on-call rotation, incident command, blameless post-mortem facilitation
• Identify and eliminate operational toil through automation using Python, Go, or shell scripting
• Design capacity planning models and run load/stress tests to forecast infrastructure needs
• Implement and improve disaster recovery procedures, runbooks, and chaos engineering practices
• Collaborate with development teams to embed reliability patterns (circuit breakers, retries, graceful degradation)
• Evaluate and introduce new infrastructure tooling; contribute to the platform engineering roadmap
• Mentor engineers in reliability engineering practices and help grow SRE culture across the org

Required Qualifications
• [3]+ years in a site reliability, platform, or infrastructure engineering role
• Proficiency in at least one systems programming language: Python, Go, or similar
• Strong understanding of distributed systems, networking fundamentals (TCP/IP, DNS, HTTP/2), and cloud-native architectures
• Hands-on experience with Kubernetes, Docker, and container orchestration at production scale
• Experience designing and operating observability stacks (Prometheus/Alertmanager, Grafana, ELK/Loki, or equivalent)
• Solid understanding of SLO/SLI/SLA concepts and error budget management
• Experience with infrastructure-as-code tools (Terraform, Pulumi, CDK)

Nice to Have
• Contributions to open-source reliability or infrastructure projects
• Chaos engineering experience (Chaos Monkey, Gremlin, LitmusChaos)
• Service mesh experience (Istio, Linkerd, Envoy)
• Experience with eBPF-based tooling for performance profiling

What We Offer
• Competitive salary: $[low]-$[high]/year (see benchmarks below)
• Equity: [X]% stock options / RSUs
• Health, dental, and vision insurance (100% employer-paid for employee)
• Flexible PTO + [X] company-wide holidays
• On-call compensation: [cash / TOIL policy]
• Remote-friendly / home office stipend of $[X]
• Learning & development budget: $[X]/year
• [Additional perk, e.g. conference attendance, wellness stipend, etc.]

Salary Range: $120,000-$210,000/year (US, 2026 benchmark; exact offer commensurate with experience)

[Company Name] is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
    

How to customize this SRE job description

1. State your current SLOs and on-call rotation openly

Experienced SREs will ask about your availability targets, error budgets, and incident frequency before accepting an offer. Being transparent in the JD, e.g. "99.9% SLO, 1-in-4 on-call week, average 2 pages/week", attracts engineers who can realistically assess fit and weeds out those who would be shocked by the workload.

2. Describe the engineering-to-ops work ratio

The defining characteristic of SRE vs. SysAdmin is project work vs. reactive ops. If you follow the Google SRE principle of capping operational work at 50%, say so. Engineers evaluate this split carefully before applying.

3. Name your full observability stack

List every monitoring, alerting, logging, and tracing tool in use. SREs evaluate whether your observability investment is mature or whether they'll be building it from scratch. Neither is a dealbreaker, but candidates need to know which situation they're entering.

4. Clarify cloud provider and Kubernetes maturity

State whether you run managed Kubernetes (EKS, GKE, AKS) or self-managed, your cluster count, and the scale (node count, request volume). This context helps candidates assess the role's technical complexity before the first call.

Site Reliability Engineer salary benchmarks (US, 2026)

Level	Experience	Salary Range
Mid-Level	2-4 years	$120,000 - $155,000
Senior	5-8 years	$155,000 - $190,000
Staff SRE	8-12 years	$190,000 - $210,000
Principal SRE	12+ years	$210,000 - $280,000+

Source: Bureau of Labor Statistics, LinkedIn Salary, Glassdoor 2026 data. Ranges reflect US national median; adjust +20-30% for San Francisco/NYC markets.

Frequently asked questions

What should a site reliability engineer job description include? +

A strong SRE JD includes current SLO targets, on-call rotation structure, observability stack, infrastructure scale, the engineering/ops work split, programming language expectations, and a salary range. SREs are in extremely high demand, and a vague JD will lose top candidates to more transparent postings.

What is the average SRE salary in the US in 2026? +

SRE salaries are among the highest in engineering. Mid-level SREs earn $120,000-$155,000, senior SREs $155,000-$190,000, and staff-level SREs $190,000-$210,000. Principal SREs at FAANG-tier companies can earn $280,000+ in total compensation including equity. San Francisco and New York command 20-30% premiums.

How do I write an SRE job description that attracts top candidates? +

Be transparent about your reliability posture. Share SLOs, error budget policy, and on-call burden honestly. Top SREs evaluate whether a company treats reliability as a first-class concern. Avoid conflating SRE with DevOps or SysAdmin, as the roles differ and experienced candidates will notice the confusion immediately.

Can I use this SRE job description template in my ATS? +

Yes. This template works in any ATS including Treegarden, Greenhouse, Lever, and Workable. In Treegarden, paste it into the job wizard to format for your career page and publish to all connected job boards simultaneously.

Ready to post your first SRE job?

Paste this template into Treegarden, set your pipeline, and publish to 10+ job boards in under 30 seconds.

Book a demo