Fluidstack logo

Product Manager (Lighthouse)

Fluidstack
Full-time
Remote friendly (San Francisco, California, United States)
Worldwide
$200,000 - $300,000 USD yearly
About FluidStackAt Fluidstack, we’re building the infrastructure for abundant intelligence. We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta, and more - to unlock compute at the speed of light.We’re working with urgency to make AGI a reality. As such, our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers’ outcomes as our own, taking pride in the systems we build and the trust we earn. If you’re motivated by purpose, obsessed with excellence, and ready to work very hard to accelerate the future of intelligence, join us in building what's next.About the RoleWe're looking for a Product Manager to lead Lighthouse, our MLOps and observability platform. You'll own the complete product lifecycle—from strategy and roadmap to execution and customer success.You will work directly with our engineering and infrastructure teams as well as collaborate closely with customers to ensure that we're providing ML developers the metrics that matter. You will have the opportunity to partner with top tier AI labs to increase their utilization and performance as well as scale our infrastructure to hundreds of thousands of GPUs.FocusBuilding and executing on the roadmap for Lighthouse.Partner with engineering to translate customer requirements into technical specifications and guide implementation.Creating alerting rules for GPU cluster health, job failures, and resource bottlenecksDesigning dashboards for ML-specific KPIs (training loss curves, inference latency, batch processing metrics)Collaborate with sales and customer success teams to drive adoption, gather feedback, and ensure customer satisfaction.Engage directly with AI labs and enterprises to understand their observability challenges and shape the product roadmap accordingly.About You3-5+ years of experience building developer tools or cloud infrastructure, ideally in the observability space.Deeply experienced with the LGTM stack, Alertmanager, or proprietary observability tools like Datadog, etc.Have an understanding of the metrics that matter to an AI/ML customer, including infrastructure availability, performance, and utilization, as well as application level metrics like MFU.Understanding of GPU monitoring tools (DCGM, nvidia-smi, GPU exporters for Prometheus).Knowledge of Infrastructure-as-Code (IaC) tools (e.g. Terraform, Pulumi) to standardize and simplify the deployment of the observability stack.Comfortable writing SQL queries.Understanding of SLA, SLO, frameworks and error budget management.Experience with ML-specific monitoring tools (Weights & Biases, ClearML, etc.).Salary & BenefitsCompetitive total compensation package (salary + equity).Retirement or pension plan, in line with local norms.Health, dental, and vision insurance.Generous PTO policy, in line with local norms.The base salary range for this position is $200,000 - $300,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.We are committed to pay equity and transparency.Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Apply now
Share this job