System Engineer

Systems Research Engineer (AI Infrastructure) - Edinburgh - Global Leading Firm

As large language models reshape the foundational software stack, next-generation AI-native infrastructure is redefining how large-scale models are trained, served, and deployed. We are driving innovation in AI infrastructure and agent-oriented serving architectures to define future large-scale data centre and distributed AI systems.

Key Responsibilities

Distributed Systems R&D: Architect and implement distributed system components for AI workloads across heterogeneous clusters (CPU, GPU, accelerators).
Performance Optimisation: Conduct deep profiling and performance tuning of large-scale inference pipelines, focusing on KV cache management and memory scheduling.
Scalable Serving Infrastructure: Develop frameworks for multi-tenant, low-latency, and fault-tolerant AI serving, researching techniques for cache sharing and data locality.
Research & Publications: Translate novel designs into publishable contributions for leading systems and ML venues (e.g., OSDI, SOSP, EuroSys, NeurIPS).
Cross-Team Collaboration: Communicate technical insights to multidisciplinary teams and align on long-term infrastructure strategy.

Qualifications

Education: BSc, MSc, or PhD in Computer Science, Electrical Engineering, or a related field.
Systems Expertise: Strong knowledge of Operating Systems, Distributed Systems, and AI inference serving.
Technical Stack: Proficiency in C/C++ for systems development and Python for research prototyping.
Hands-on Experience: Experience with LLM serving frameworks, distributed cache optimisation, and performance profiling tools.
Preferred: A track record of publications in top-tier systems or ML conferences and practical experience in load balancing or resource scheduling for inference clusters.

Apply Now

System Engineer

Job Details