Computing Systems
·
Huawei Research Center Zürich
Internship: High-performance LLM inference algorithm and framework
We are seeking a highly skilled and motivated student intern to join our team working on the development and optimization of Large Language Model (LLM) inference engine. The successful candidate will have substantial experience with state-of-the-art AI models and frameworks, with a strong interest in performance optimization and engineering.
Key Responsibilities:
- Conduct cutting-edge research in high-performance LLM inference algorithm and framework
- Design and implement fast algorithms using frameworks such as PyTorch, TorchDynamo, CUDAGraph
- Design and implement high-performance GPU/NPU kernels, using frameworks such as Triton, Cutlass, ThunderKittens, etc.
- Integrate novel optimizations into inference frameworks such as vLLM and SGLang
- Publish research findings in top-tier conferences and journals to contribute to the academic community.
Qualifications:
- Master or PhD students in Computer Science, Electrical Engineering, or a related field.
- Proven experience and proficiency in some of the frameworks: PyTorch, TorchDynamo, CUDAGraph, Triton, Cutlass, ThunderKittens
- Experience with profiling tools (e.g. torch profiler, Nsight compute) and performance analysis (e.g. roofline model)
- Experience with high-performance matrix and AI algorithms, such as tiled matrix multiplication, FlashAttention, LinearAttention, etc.
Preferred Qualifications:
- Contributions to open-source projects or active involvement in the AI research community.
- Strong understanding of hardware architecture and high-performance parallel computing
- Can work full time for 6 months
- Department
- Computing Systems
- Locations
- Huawei Research Center Zürich
- Employment type
- Internship
Huawei Research Center Zürich
About Huawei Switzerland
Founded in
1987
Already working at Huawei Switzerland?
Let’s recruit together and find your next colleague.