LLM Architect
Edinburgh, City of Edinburgh, United Kingdom
Bright Purple
systems: building ultra-reliable, ultra-scalable environments for inference and deployment. What you’ll be doing Designing cloud-native architectures to run large language models on serverless frameworks (e.g. Kubernetes, Knative, or custom-built FaaS). Developing approaches to minimise cold-start latency through advanced container snapshotting, weight pre-loading, and graph partitioning . Building distributed inference pipelines with tensor … Ray, Dask, MPI, or custom equivalents). Experience with GPU cluster management (CUDA, NCCL, Triton Inference Server) and performance tuning across accelerators. Solid grasp of cloud-native orchestration (Docker, Kubernetes, Helm) and observability tooling (Prometheus, Grafana, Jaeger). Proven ability to translate cutting-edge research into engineered solutions that can scale globally. Why this role stands out Influence how next More ❯
Employment Type: Permanent
Posted: