Internship: High-performance LLM inference algorithm and framework

We are seeking a highly skilled and motivated student intern to join our team working on the development and optimization of Large Language Model (LLM) inference engine. The successful candidate will have substantial experience with state-of-the-art AI models and frameworks, with a strong interest in performance optimization and engineering.

Key Responsibilities:

Conduct cutting-edge research in high-performance LLM inference algorithm and framework

Design and implement fast algorithms using frameworks such as PyTorch, TorchDynamo, CUDAGraph

Design and implement high-performance GPU/NPU kernels, using frameworks such as Triton, Cutlass, ThunderKittens, etc.

Integrate novel optimizations into inference frameworks such as vLLM and SGLang

Publish research findings in top-tier conferences and journals to contribute to the academic community.

Qualifications:

Master or PhD students in Computer Science, Electrical Engineering, or a related field.

Proven experience and proficiency in some of the frameworks: PyTorch, TorchDynamo, CUDAGraph, Triton, Cutlass, ThunderKittens

Experience with profiling tools (e.g. torch profiler, Nsight compute) and performance analysis (e.g. roofline model)

Experience with high-performance matrix and AI algorithms, such as tiled matrix multiplication, FlashAttention, LinearAttention, etc.

Preferred Qualifications:

Contributions to open-source projects or active involvement in the AI research community.

Strong understanding of hardware architecture and high-performance parallel computing

Can work full time for 6 months

Internship: High-performance LLM inference algorithm and framework

Huawei Research Center Zürich

About Huawei Switzerland

Internship: High-performance LLM inference algorithm and framework

Already working at Huawei Switzerland?

Internship: High-performance LLM inference algorithm and framework

Current Job Openings

Huawei Research Center Zürich

About Huawei Switzerland

Internship: High-performance LLM inference algorithm and framework

Already working at Huawei Switzerland?