Machine Learning Systems & Infrastructure Engineer
- Hiring Organisation
- Jobleads-UK
- Location
- Greater London, England, United Kingdom
Ship workloads with Docker and Kubernetes; maintain IaC (Terraform) for the surfaces you own and CI/CD pipelines, including self‐hosted GPU runners. Observability and reliability: Monitoring, logging, and alerting for job performance, data‐pipeline health, and cost (e.g., Prometheus/Grafana, OpenTelemetry); define SLOs and incident response … stores; and object storage with caching layers. Familiarity with ML workflow orchestration and experiment tracking (e.g., Kubeflow Pipelines, MLflow). Experience with monitoring and observability tooling (e.g., Prometheus/Grafana, OpenTelemetry) and CI/CD for infra and ML workflows (e.g., GitHub Actions). At SpAItial, we are committed ...