Senior HPC Engineer This position would be joining a leading start-up company at the forefront of LLMs & AI Safety, working on Machine Learning & HPC Engineering.
This company are dedicated to engineering cutting-edge AI systems poised to revolutionize industries worldwide.
As an HPC Engineer, you will play a crucial role in developing a robust framework for rapid training and experimentation of large language models on multi-GPUs.
You will develop the core inference engine to seamlessly deploy large machine learning models to customers at scale and across distributed systems, contributing significantly to the automated pipeline, optimizing for high throughput training runs and rapid experimentation while achieving top hardware efficiency.
Qualifications: We are seeking candidates with exceptional ML engineering evidenced by: Experience in creating and managing high-performance computing clusters across GPU/TPU, preferably in PyTorch.
Proficiency in efficient serving of large machine learning models at scale, including quantization and distributed computing, leveraging libraries such as deepspeed.
Strong software engineering acumen with expertise in software design/architecture, particularly in Python.
Any cloud experience working with AWS, GCP or Azure is a plus.
Understanding of the latest AI research and ability to efficiently implement these systems.
Prior experience at a leading machine learning company (OpenAI, DeepMind, Meta, Anthropic, HuggingFace, etc.).
Key Words: Machine Learning / LLM / Large Language Model / PyTorch / High Performance Computing / HPC / GPU / TPU / Deepspeed / AI / OpenAI / Distributed Systems By applying to this role, you understand that we may collect your personal data and store and process it on our systems.
For more information please see our Privacy Notice https://eu-recruit.com/wp-content/uploads/2020/12/Privacy-Notice.pdf