Wokingham, Berkshire, United Kingdom Hybrid / WFH Options
Experis
teams to automate deployment, monitoring, and infrastructure management. Ensure platform and business application reliability and performance against strict SLAs and KPIs. Implement and maintain cloud-native observability stacks (Prometheus, Grafana, Loki, Tempo). Develop and maintain Infrastructure as Code (IaC) using tools like Kustomize or Helm. Manage CI/CD pipelines using Tekton and ArgoCD. Support and troubleshoot OpenShift Operators More ❯
Wokingham, Berkshire, United Kingdom Hybrid / WFH Options
Experis
teams to automate deployment, monitoring, and infrastructure management. Ensure platform and business application reliability and performance against strict SLAs and KPIs. Implement and maintain cloud-native observability stacks (Prometheus, Grafana, Loki, Tempo). Develop and maintain Infrastructure as Code (IaC) using tools like Kustomize or Helm. Manage CI/CD pipelines using Tekton and ArgoCD. Support and troubleshoot OpenShift Operators More ❯
Experience working in Agile environments Strong understanding of Site Reliability Engineering (SRE) principles Familiarity with Azure DevOps for CI/CD and pipeline management Knowledge of observability tools: Prometheus, Grafana, Loki, Tempo Experience with Infrastructure as Code: Helm, Kustomize Hands-on experience with Tekton and ArgoCD Ability to support and troubleshoot OpenShift Operators (ServiceMesh, ODF, ACS, ACM, AMQ) Understanding of More ❯
Wokingham, Berkshire, England, United Kingdom Hybrid / WFH Options
Opus Recruitment Solutions Ltd
architecture are key. What You’ll Be Working With: MySQL , Vitess , and Linux in production (Dont worry if you haven't worked with Vitess) Monitoring tools like Prometheus and Grafana Shard allocation, replication tuning, disk performance Backup, restore, and DR testing Data migrations and custom table loads for NHS tenants Zero-downtime patching and performance baselining What You’ll Bring More ❯
API endpoints and overseeing model deployment workflows to ensure seamless integration and scalability. Key Responsibilities: Platform Operations & Monitoring • Monitor ML model endpoints and overall platform health using tools like Grafana and Domino Data Lab. • Respond to incidents and alerts, perform code fixes, manage incidents internally and manages changes through ServiceNow • Interface directly with Domino Data Lab support to resolve model … monitoring. • Working knowledge of core data science concepts, such as model evaluation metrics, overfitting, data drift, and feature importance. • Proficiency in AWS services (like S3, RedShift etc) • Experience with Grafana for monitoring and alerting. • Good to have hands-on experience with Domino Data Lab platform. • Solid understanding of CI/CD pipelines, version control, containerization, and orchestration. • Ability to communicate More ❯