• RiseON Suite

Job Details

Site Reliability Engineer (L4/L5) - Ads
Share Icon

Netflix

Location Icon
USA(Remote)

Employment IconEmployment Type: Full Time

Sign up to Apply

Job Description

Job Title: Site Reliability Engineer (L4/L5) - Ads

Company: Netflix

Years of Experience: 5+ years in Site Reliability, Production Engineering, or similar roles

Location: Remote

Role Type: Full-Time

Salary: $100,000 – $720,000 (annual, flexible between salary and stock options)

Eligibility:

  • 5+ years of experience as SRE, Production Engineer, or similar supporting high-traffic, business-critical services.
  • Proficiency in programming languages like Python, Go, or Java.
  • Hands-on experience with cloud infrastructure (AWS/Azure/GCP), IaC (Terraform), and container orchestration (Kubernetes).
  • Understanding of distributed systems and the challenges of large-scale reliability.

Role Overview

The Ads Reliability Engineer ensures the resilience, scalability, and reliability of Netflix’s Ad Suite. You will proactively design systems, automate workflows, respond to incidents, and embed a culture of reliability across teams. This role balances hands-on engineering with strategic influence to maintain uptime, optimize system performance, and enable engineering velocity at a global scale.

Key Responsibilities

  • Design, implement, and maintain scalable and reliable infrastructure for the Netflix Ad Suite.
  • Integrate observability, reliability, and security into the software development lifecycle.
  • Develop automation for monitoring, deployment, and incident response.
  • Participate in on-call rotations and manage incident response and postmortems.
  • Coordinate capacity planning for Dynamic Ad Insertion at a global scale.
  • Analyze distributed systems for failure modes and proactively prevent instability.
  • Create documentation, best practices, and tooling to scale reliability across teams.

Skills and Qualifications

  • 5+ years in SRE, Production Engineering, or similar roles.
  • Proficient in coding (Python, Go, Java) and automation over manual solutions.
  • Hands-on experience with cloud platforms, IaC, and container orchestration (Kubernetes).
  • Strong understanding of distributed systems, failure modes, and resiliency strategies.
  • Excellent collaboration, communication, and troubleshooting skills.
  • Experience with Ad Tech platforms, Dynamic Ad Insertion, or high-scale data pipelines is a plus.
  • Growth mindset, proactive problem-solver, and ability to influence cross-functional teams.