Senior Site Reliability Engineer

Full-time

Remote

Worldwide

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

As a Site Reliability Engineer, you will be part of a team that is passionately automating everything possible to make Guidewire systems run more efficiently. The Platform team is dedicated full-time to creating and running software that improves the reliability of systems in production, serving hundreds of customers and supporting millions of transactions each day.

Ensure the reliability of Guidewire’s flagship cloud platform and InsuranceSuite products
Build tooling to help ensure efficient operations and optimal availability of all SaaS multi-tenant and customer-focused systems
Collaborate closely with Guidewire’s core product developers to ensure that the Guidewire core cloud products address functional and non-functional requirements such as availability, performance, observability, and maintainability
Engage with product development (PD) teams by participating in design reviews and production readiness checks
Analyze data from observability and monitoring tools to improve operational metrics of microservices as well as the entire platform
Create system documentation and training materials to empower and educate team members
Oversee and automate the team’s growing presence in AWS
Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure
Improve incident management lifecycle to identify, mitigate, and learn from reliability risks and issues

Qualifications

Bachelor’s Degree in Computer Science or related field
Software engineering and task automation skills with Bash, Python, and/or Go
Experience supporting web applications running on Java / Apache / Tomcat in a live production environment
Familiarity with the Agile software development lifecycle
Deep background with Linux systems and engineering
Highly experienced with engineering and automating on Amazon Web Services (AWS)
Prior experience with IaC tools like Terraform/Terragrunt/Terraspace
Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity)
Production-At-Scale support background in a heavily microservice-based world
Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking)
Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta)
Seasoned expertise around x.509 certificate technology and basic concepts of encryption
Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS
Advanced exposure to application development, web UI (design and development), JSON, application architecture
Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty
Familiarity with event store/stream-processing technologies like Kafka or AWS SQS
Understanding of Open Application Model systems such as KubeVela or Crossplane

Requirements

Ability to read, write, and speak English
Ability to speak in public settings, interface with customers, partners and vendors confidently
Travel – Up to 25% of the job will require travel, approximately a week a month

Personal Qualities and Soft Skills

Greatly prefer writing code than clicking a GUI
Enjoy teaching, being a mentor to others, and working across boundaries
Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving
Strong analytical mind with a penchant for process development and enhancement
A highly positive can-do attitude with desire for being a team player
Great communication skills and ability to explain complex technical concepts to a varied audience
Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments
Speak Japanese

Senior Site Reliability Engineer

More jobs

Mural: Customer Support Representative

Mural: Customer Support Representative

Full Stack React/Node.js Engineer

Softermii