robust AI/ML pipelines for model training, validation, and deployment (e.g., using MLflow, Vertex AI, or Azure ML). Expertise in managing model evaluation, drift monitoring, and continuousimprovement processes. Strong focus on optimizing inference performance and cost (e.g., model compression, quantization, API optimization). Data Engineering for Generative AI: Experience in preparing and curating More ❯
robust AI/ML pipelines for model training, validation, and deployment (e.g., using MLflow, Vertex AI, or Azure ML). Expertise in managing model evaluation, drift monitoring, and continuousimprovement processes. Strong focus on optimizing inference performance and cost (e.g., model compression, quantization, API optimization). Data Engineering for Generative AI: Experience in preparing and curating More ❯
Terraform, Pulumi, CloudFormation) for scalable, repeatable deployments. Automating with PowerShell, Python, or Bash to drive efficiency. Supporting Kubernetes and AKS environments in production. Leading incident response, postmortems, and continuousimprovement processes. Driving cost optimisation, capacity planning, and load testing. Championing best practices in cloud security and resilience. Key Skills & Experience Required: Proven Site Reliability Engineering background. More ❯