Engineering Manager - SRE

athena India

Full-time

Remote

Worldwide

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We are looking for a Site Reliability Engineering (SRE) Manager to lead our Cloud Infrastructure Engineering team in Chennai R&D. This team ensures the continuous availability of the technologies and systems that power athenahealth’s services.

Manage thousands of servers, petabytes of storage, and process thousands of web requests per second.
Create a seamless operating system for the medical office—abstracting administrative complexities so doctors can focus on patient care.

Key Responsibilities

Team Leadership & Development:
- Lead, mentor, and develop a team of SREs, fostering a culture of collaboration, accountability, and continuous learning.
- Build a high-performing team focused on operational excellence, reliability, and scalability.
- Partner with Engineering, Product, and Project Management teams to align priorities and drive cross-functional collaboration.
Service Reliability & Performance:
- Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for critical systems.
- Monitor and enhance the reliability, availability, and performance of all production services and infrastructure.
- Drive improvements in incident management, root cause analysis, and postmortem processes.
- Implement proactive monitoring, alerting, and incident response strategies.
System Automation & Scalability:
- Lead automation efforts to eliminate manual tasks, improve system reliability, and streamline operations.
- Implement best practices for system design, capacity planning, and cost optimization.
- Work closely with engineering teams to build scalable, resilient, and efficient systems.
Collaboration & Cross-functional Engagement:
- Advocate for reliability best practices across engineering and product teams.
- Ensure reliability is embedded in the development lifecycle by reviewing code, design, and deployment strategies.
- Align with other engineering managers on long-term goals, technical debt, and infrastructure investments.
Process & Efficiency Improvement:
- Continuously improve incident management, deployment pipelines, and system observability.
- Champion automation, monitoring, alerting, and reporting tools.
- Use data-driven insights to measure and optimize operational performance.

Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
10+ years of experience in building, scaling, and supporting highly available systems and services.
2-3 years of experience in managing and mentoring technical teams, with expertise in containerization (Docker, Kubernetes - On-prem & Cloud).
Strong background in Platform Engineering, TechOps, FinOps, and DevSecOps in a hybrid cloud environment.
Expertise in Infrastructure-as-Code (Terraform, Crossplane, Puppet, Ansible) and API integration.
Proficiency in at least one scripting or programming language (Python, Go, Ruby, etc.).
Hands-on experience with Linux systems, VMware, cloud platforms (AWS), and observability tools (Prometheus, Grafana, ELK, CloudWatch, Splunk).
Strong understanding of site reliability principles, telemetry, and monitoring best practices.
Experience with large-scale distributed systems and cloud-native architectures.
Familiarity with configuration management tools (Ansible, Chef, Puppet).
Solid grasp of security best practices and compliance standards.

Benefits

Health and financial benefits.
Perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative workspaces.
Events throughout the year, including book clubs, external speakers, and hackathons.
Company culture based on learning, support of an engaged team, and an inclusive environment.
Flexibility to encourage a better work-life balance.

Engineering Manager - SRE

More jobs

Mural: Customer Support Representative

Mural: Customer Support Representative

Full Stack React/Node.js Engineer

Softermii