Weights & Biases,MLflow,LangSmith,LangFuse,PromptLayer,Humanloop,Helicone,Arize Phoenix Benchmark and evaluate LLM systems usingRagas,DeepEval and structured evaluation suites. Deployment & Infrastructure Containerize and deploy workloads withDocker, Kubernetes, KNative and managed inference endpoints. Optimize model performance with quantization, distillation, caching, batching and routing strategies. Youll Bring Strong Python skills, with experience usingTransformers,LangChain,LlamaIndex and the broader GenAI ecosystem. More ❯
Employment Type: Contract
Rate: market rates, outside IR35, remote first, UK but 1-2 days on site
building ultra-reliable, ultra-scalable environments for inference and deployment. What you ll be doing Designing cloud-native architectures to run large language models on serverless frameworks (e.g. Kubernetes, Knative, or custom-built FaaS). Developing approaches to minimise cold-start latency through advanced container snapshotting, weight pre-loading, and graph partitioning . Building distributed inference pipelines with tensor parallelism More ❯
building ultra-reliable, ultra-scalable environments for inference and deployment. What you’ll be doing Designing cloud-native architectures to run large language models on serverless frameworks (e.g. Kubernetes, Knative, or custom-built FaaS). Developing approaches to minimise cold-start latency through advanced container snapshotting, weight pre-loading, and graph partitioning . Building distributed inference pipelines with tensor parallelism More ❯