Job Description
Job Title: Site Reliability Engineer (L4/L5) - Ads
Company: Netflix
Years of Experience: 5+ years in Site Reliability, Production Engineering, or similar roles
Location: Remote
Role Type: Full-Time
Salary: $100,000 – $720,000 (annual, flexible between salary and stock options)
Eligibility:
- 5+ years of experience as SRE, Production Engineer, or similar supporting high-traffic, business-critical services.
- Proficiency in programming languages like Python, Go, or Java.
- Hands-on experience with cloud infrastructure (AWS/Azure/GCP), IaC (Terraform), and container orchestration (Kubernetes).
- Understanding of distributed systems and the challenges of large-scale reliability.
Role Overview
The Ads Reliability Engineer ensures the resilience, scalability, and reliability of Netflix’s Ad Suite. You will proactively design systems, automate workflows, respond to incidents, and embed a culture of reliability across teams. This role balances hands-on engineering with strategic influence to maintain uptime, optimize system performance, and enable engineering velocity at a global scale.
Key Responsibilities
- Design, implement, and maintain scalable and reliable infrastructure for the Netflix Ad Suite.
- Integrate observability, reliability, and security into the software development lifecycle.
- Develop automation for monitoring, deployment, and incident response.
- Participate in on-call rotations and manage incident response and postmortems.
- Coordinate capacity planning for Dynamic Ad Insertion at a global scale.
- Analyze distributed systems for failure modes and proactively prevent instability.
- Create documentation, best practices, and tooling to scale reliability across teams.
Skills and Qualifications
- 5+ years in SRE, Production Engineering, or similar roles.
- Proficient in coding (Python, Go, Java) and automation over manual solutions.
- Hands-on experience with cloud platforms, IaC, and container orchestration (Kubernetes).
- Strong understanding of distributed systems, failure modes, and resiliency strategies.
- Excellent collaboration, communication, and troubleshooting skills.
- Experience with Ad Tech platforms, Dynamic Ad Insertion, or high-scale data pipelines is a plus.
- Growth mindset, proactive problem-solver, and ability to influence cross-functional teams.