This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.
Role Description
As a Site Reliability Engineer, you will be part of a team that is passionately automating everything possible to make Guidewire systems run more efficiently. The Platform team is dedicated full-time to creating and running software that improves the reliability of systems in production, serving hundreds of customers and supporting millions of transactions each day.
-
Ensure the reliability of Guidewire’s flagship cloud platform and InsuranceSuite products
-
Build tooling to help ensure efficient operations and optimal availability of all SaaS multi-tenant and customer-focused systems
-
Collaborate closely with Guidewire’s core product developers to ensure that the Guidewire core cloud products address functional and non-functional requirements such as availability, performance, observability, and maintainability
-
Engage with product development (PD) teams by participating in design reviews and production readiness checks
-
Analyze data from observability and monitoring tools to improve operational metrics of microservices as well as the entire platform
-
Create system documentation and training materials to empower and educate team members
-
Oversee and automate the team’s growing presence in AWS
-
Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure
-
Improve incident management lifecycle to identify, mitigate, and learn from reliability risks and issues
Qualifications
-
Bachelor’s Degree in Computer Science or related field
-
Software engineering and task automation skills with Bash, Python, and/or Go
-
Experience supporting web applications running on Java / Apache / Tomcat in a live production environment
-
Familiarity with the Agile software development lifecycle
-
Deep background with Linux systems and engineering
-
Highly experienced with engineering and automating on Amazon Web Services (AWS)
-
Prior experience with IaC tools like Terraform/Terragrunt/Terraspace
-
Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity)
-
Production-At-Scale support background in a heavily microservice-based world
-
Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking)
-
Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta)
-
Seasoned expertise around x.509 certificate technology and basic concepts of encryption
-
Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS
-
Advanced exposure to application development, web UI (design and development), JSON, application architecture
-
Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty
-
Familiarity with event store/stream-processing technologies like Kafka or AWS SQS
-
Understanding of Open Application Model systems such as KubeVela or Crossplane
Requirements
-
Ability to read, write, and speak English
-
Ability to speak in public settings, interface with customers, partners and vendors confidently
-
Travel – Up to 25% of the job will require travel, approximately a week a month
Personal Qualities and Soft Skills
-
Greatly prefer writing code than clicking a GUI
-
Enjoy teaching, being a mentor to others, and working across boundaries
-
Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving
-
Strong analytical mind with a penchant for process development and enhancement
-
A highly positive can-do attitude with desire for being a team player
-
Great communication skills and ability to explain complex technical concepts to a varied audience
-
Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments
-
Speak Japanese