
We are hiring a Multimodal & Video lead with a strong technical background in Image/Video/3D generation and Multimodal Foundation Models. You will play a critical role in driving the technical directions and building multimodal foundation models for image/video/3D generation, editing, animation and many more. As a member of the team, you will have the opportunity to drive fundamental capabilities, lead teams to work on ambitious projects and collaborate broadly across Tether with world-class engineers and researchers to advance open source development and the global AI community.
We are a fast-paced group focusing on model, data and applied research on vision and multimodal foundation models.
Responsibilities
- Lead the research, design, and development of state-of-the-art image, video, and 3D generation models, including multimodal foundation models.
- Lead high-impact, specialized projects focused on innovative text, images, audio and video applications.
- Define and drive the technical roadmap for multimodal AI initiatives, aligning research goals with business and product objectives.
- Provide technical leadership and mentorship to teams of AI researchers and engineers, fostering innovation and skill development.
- Oversee the end-to-end lifecycle of multimodal model development, from dataset curation and model training to deployment and performance evaluation.
- Lead large-scale multi-node GPU model training, ensuring scalability, efficiency, and reproducibility of experiments.
- Collaborate closely with cross-functional teams, including product, design, and engineering, to integrate AI solutions into production systems.
- Drive applied research initiatives in image/video/3D generation, editing, animation, and other related domains.
- Monitor advancements in AI research and multimodal technologies, and incorporate novel techniques to improve model capabilities and performance.
- Contribute to the AI research community, including publications, open-source contributions, and participation in conferences.
- Establish best practices and standards for coding, model evaluation, and experimentation within the team.
- Lead and manage complex projects, ensuring timely delivery, quality outcomes, and alignment with strategic objectives.
- Communicate technical insights and updates effectively to executive leadership, stakeholders, and external collaborators.
- Promote a culture of collaboration, innovation, and excellence, maintaining high team morale and accountability.
In this role, you’ll have the opportunity to drive roadmaps, propose your own research plan to advance Image/Video/3D generation models and technologies. You’ll provide technical mentorship and guidance to the team and drive execution. You will have the opportunity to collaborate with broader teams across Tether.
Job requirements
Minimum Qualifications
- PhD, MS or equivalent experience
- Hands on experience in building Image/Video/3D generation and multimodal foundation models building from scratch
- 5+ years of experience in managing or leading 10+ research & engineer teams
- Excellent communication and interpersonal skills
- Excellent understanding of an AI-based product lifecycle.
- Hands-on experience in building end-to-end multimodal foundation models on thousands of multi-node GPUs.
- Proficiency in modern deep learning and diffusion frameworks & libraries.
Preferred Qualifications
- Demonstrated expertise in computer vision, video generation foundation model and/or multimodal research especially building them from scratch.
- Strong history of delivering innovation in the space of multimodal & video.
- Ability to develop a long-term vision and execute strategies at scale while maintaining a grasp of technical details for better decision-making.
- Experience with VP-level presentations and reporting.
- Publications at leading AI conferences such as CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS etc.