Note: By applying to this position you will have an opportunity to share your preferred working location from the following:
Warsaw, Poland; Dublin, Ireland.
Minimum qualifications:
- Bachelor’s degree in Engineering, Computer Science, a related field, or equivalent practical experience.
- 6 years of experience working with client-side web technologies (e.g., HTML, CSS, JavaScript, or HTTP).
- Experience troubleshooting technical issues for internal/external partners or customers.
Preferred qualifications:
- Experience working directly with AI/ML computing hardware, including Graphics Processing Units (GPUs) or other accelerators.
- Experience working with large-scale distributed systems and with ML frameworks (e.g., TensorFlow, Pytorch).
- Familiarity with containerization and orchestration technologies like Kubernetes or Slurm in an on-premises or cloud environment.
- Familiarity with common solutions, design patterns, or best practices.
- Understanding of the AI/ML training and inference life-cycle.
About the job
Our Technical Solutions Engineers for AI Infrastructure own customer issues and provide specialized support to other teams. In this role, you will be a part of a global team that provides 24/7 support to ensure customers can seamlessly deploy their Artificial Intelligence (AI) and Machine Learning (ML) workloads on AI Infrastructure products. When customers encounter technical issues, you will ensure we have the expertise, tools, and processes to resolve the issue. You will troubleshoot technical problems with a mix of hardware and software debugging, networking, Linux system administration, coding/scripting, and updating documentation. You will help our customer’s success in the AI/ML space by making improvements to the product, internal tools, processes, and documentation. You will help drive business growth by recognizing and advocating for our customers’ issues related to AI deployments.Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.Responsibilities
- Develop an in-depth understanding of AI/ML workloads and underlying hardware architectures by troubleshooting, reproducing, determining the root cause for customer reported issues, and building tools for faster diagnosis.
- Manage customer’s problems through effective diagnosis, resolution, or implementation of new investigation tools to increase productivity for customer issues on AI/ML infrastructure.
- Act as a consultant and subject matter expert for internal stakeholders in Engineering, Sales, and customer organizations to resolve deployment and operational obstacles in AI infrastructure environments.
- Work closely with multiple Product and Engineering teams to find ways to improve the product, and interact with our Site Reliability Engineering (SRE) teams to drive high-quality production.
- Maintain availability for non-standard work hours or shifts, which may include weekends as needed.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also
Google's EEO Policy and
EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our
Accommodations for Applicants form.