Senior Researcher: AI Computing Systems
If you are enthusiastic in shaping Huawei’s European Research Institute together with a multicultural team of leading researchers, this is the right opportunity for you!
Huawei envisions a world where technology connects people, empowers industries, and unlocks human potential. Guided by its mission to enrich lives through communication and intelligent innovation, Huawei stands at the forefront of global digital transformation. As a leader in Information and Communications Technology (ICT), the company pioneers breakthroughs in artificial intelligence, cloud computing, and smart devices - building the intelligent foundation of a fully connected world.
Through its Carrier, Enterprise, and Consumer business groups, Huawei delivers resilient digital infrastructure, advanced cloud and AI platforms, and transformative devices that enable progress at every level. Supporting 45 of the world’s top 50 telecom operators and serving one-third of the global population across more than 170 countries, Huawei is shaping a future where connectivity becomes a powerful catalyst for opportunity and sustainable growth.
This spirit of bold innovation is embodied by Huawei Technologies Switzerland AG. From its research hubs in Zurich and Lausanne, pioneering teams push the boundaries of High-Performance Computing, Computer Architecture, Computer Vision, Robotics, Artificial Intelligence, Neuromorphic Computing, Wireless Technologies, and Networking - architecting the intelligent systems that will define tomorrow’s digital era.
We are looking for a strong researcher with hands-on LLM + RAG experience who can help build and optimize techniques such as KV-cache precomputation, KV reuse/blending (e.g., CacheBlend-style), and sparse attention / selective recompute. You will work close to the metal (attention kernels + profiling) and at the system level (vLLM/LMCache-style stacks), turning research ideas into robust, high-performance code.
Responsibilities:
Design and implement RAG acceleration techniques that reduce TTFT and improve throughput (e.g., document KV precomputation, reuse, caching policies).
Develop KV-cache reuse / blending pipelines and integrate them into inference stacks (batching, paging, eviction, correctness/quality trade-offs).
Implement and optimize sparse attention / selective attention paths, including mask construction and block-granularity strategies.
Work with PyTorch and modern attention backends/kernels (e.g., FlashAttention / FlashInfer-like kernels), profiling and optimizing performance.
Stay up to date with the latest research and open-source progress in LLM inference, KV caching, and RAG systems, and translate it into practical improvements.
Qualifications:
PhD in Computer Science, Electrical Engineering, or a related field.
Strong software engineering skills in Python, with substantial PyTorch experience (model internals, attention/KV cache concepts, performance-aware coding).
Solid understanding of transformer inference fundamentals: prefill vs decode, KV cache layout, masking, batching, latency/throughput trade-offs.
Experience benchmarking and profiling AI LLM workloads, and diagnosing performance bottlenecks.
Strong communication skills and comfort collaborating across research + engineering.
Preferred Qualifications (Nice to Have):
Experience with vLLM and/or LMCache (integration, debugging, extending attention/KV-cache logic).
Familiarity with attention kernel stacks and customization (FlashAttention/FlashInfer, Triton, CUDA extensions, custom ops).
Practical experience building RAG pipelines (retrieval, chunking, indexing, reranking) and understanding how retrieval interacts with inference latency.
Contributions to open-source projects or publications/technical reports in AI systems, LLM inference, caching, or storage-aware ML systems.
Systems background (Linux, performance engineering, storage/IO, memory hierarchy) and comfort working close to hardware.
Why join us:
Collaborate with world-class scientists and engineers in an open, curiosity-driven environment.
Access to state-of-the-art technology and tools.
Opportunities for professional growth and development.
Competitive salary, and a high quality of life in Zurich, at the center of Europe.
Last but certainly not least: be part of innovative projects that make a difference.
- Department
- Future Storage Systems
- Locations
- Zürich
- Employment type
- Full-time
- Employment level
- Professionals
Zürich
Already working at Huawei Switzerland?
Let’s recruit together and find your next colleague.