Cloud Infrastructure: AWS (EKS, RDS, Aurora, ElastiCache, Kafka, IAM) Secure Hosting: Experience working with air-gapped or government-secure environments Container & Cluster Management: Docker, Kubernetes, Rancher, Jenkins, Helm Monitoring & Observability: Prometheus, Grafana, ELK Stack, Dynatrace Secrets & Identity Management: HashiCorp Vault, Keycloak CI/CD & DevOps Tooling: Jenkins, Git, ServiceNow, Trivy, Terraform Streaming & Messaging: Apache Kafka (including Kafka Replication) Data Layers … tooling and self-service developer pipelines for tenant teams. Proactively manage and resolve tech debt by working with central governance bodies and ensure visibility to the board. Increase automation, observability, and testing coverage across the platform components while enabling data-driven decision-making. Align delivery with the product roadmap, collaborating with internal/external platform and infrastructure teams to support More ❯
Cloud Infrastructure: AWS (EKS, RDS, Aurora, ElastiCache, Kafka, IAM) Secure Hosting: Experience working with air-gapped or government-secure environments Container & Cluster Management: Docker, Kubernetes, Rancher, Jenkins, Helm Monitoring & Observability: Prometheus, Grafana, ELK Stack, Dynatrace Secrets & Identity Management: HashiCorp Vault, Keycloak CI/CD & DevOps Tooling: Jenkins, Git, ServiceNow, Trivy, Terraform Streaming & Messaging: Apache Kafka (including Kafka Replication) Data Layers … tooling and self-service developer pipelines for tenant teams. Proactively manage and resolve tech debt by working with central governance bodies and ensure visibility to the board. Increase automation, observability, and testing coverage across the platform components while enabling data-driven decision-making. Align delivery with the product roadmap, collaborating with internal/external platform and infrastructure teams to support More ❯
london (city of london), south east england, united kingdom
Scrumconnect Consulting
Cloud Infrastructure: AWS (EKS, RDS, Aurora, ElastiCache, Kafka, IAM) Secure Hosting: Experience working with air-gapped or government-secure environments Container & Cluster Management: Docker, Kubernetes, Rancher, Jenkins, Helm Monitoring & Observability: Prometheus, Grafana, ELK Stack, Dynatrace Secrets & Identity Management: HashiCorp Vault, Keycloak CI/CD & DevOps Tooling: Jenkins, Git, ServiceNow, Trivy, Terraform Streaming & Messaging: Apache Kafka (including Kafka Replication) Data Layers … tooling and self-service developer pipelines for tenant teams. Proactively manage and resolve tech debt by working with central governance bodies and ensure visibility to the board. Increase automation, observability, and testing coverage across the platform components while enabling data-driven decision-making. Align delivery with the product roadmap, collaborating with internal/external platform and infrastructure teams to support More ❯
global transportation agencies. As a senior engineer, you will play a critical role in designing, building, and scaling cloud services that enable remote device management, over-the-air updates, observability, and high-availability operations for our mobile perception platform. We tackle complex challenges related to scalability, performance, and security to enable smarter and safer cities through cutting-edge innovation. As … future of intelligent transportation systems. Responsibilities: Participate in incident prevention, response, and remediation efforts, learning and applying best practices. Design, build, and maintain scalable cloud services that support device observability, OTA updates, and fleet operations. Lead efforts to improve the reliability, security, and performance of multi-region AWS infrastructure using Infrastructure as Code (IaC) tools. Own CI/CD pipelines More ❯
Nottingham, Nottinghamshire, United Kingdom Hybrid / WFH Options
Capital One (Europe) plc
engineering solutions to make them more efficient, stable, and scalable. You'll lead on planning and implementing key SRE initiatives, optimise and automate how our systems operate, and improve observability through better monitoring and logging. You'll also work closely with your peers to drive consistency and high standards across SRE and the wider engineering community, so a real enthusiasm … vision set out by your Site Reliability Engineering Manager (SREM). Contribute to the major optimisation and improvement themes within the team. Identifying opportunities to reduce operational overheads through observability and service automation. Drive engineering best practice (e.g., Operational Excellence, Security, Quality, Resilience etc.) and set standards across the team and wider SRE community. Innovate within your team and contribute More ❯
new AI/ML methods Deployment and serving of models at scale Infrastructure automation and cloud-native design Responsible AI, LLM safety, and interpretability tooling Data pipelines, versioning, and observability in production A glimpse of roles we recruit for: AI Research Scientist Machine Learning Engineer Data Engineer with ML experience Applied Scientist/Research Engineer DevOps for AI/AI More ❯
building, and maintaining secure, high-performance network platforms. Work closely with internal teams and external partners to deliver integrated, end-to-end solutions. Support and enhance existing monitoring and observability frameworks using tools like SNMP and syslog. Deliver against technical roadmaps, ensuring platforms remain aligned with product support lifecycles. Stay up to date with new technologies, expanding your knowledge to More ❯
Hemel Hempstead, Hertfordshire, Felden, United Kingdom
Meritus
building, and maintaining secure, high-performance network platforms. Work closely with internal teams and external partners to deliver integrated, end-to-end solutions. Support and enhance existing monitoring and observability frameworks using tools like SNMP and syslog. Deliver against technical roadmaps, ensuring platforms remain aligned with product support lifecycles. Stay up to date with new technologies, expanding your knowledge to More ❯
Hemel Hempstead, Hertfordshire, South East, United Kingdom
Yolk Recruitment
building, and maintaining secure, high-performance network platforms. Work closely with internal teams and external partners to deliver integrated, end-to-end solutions. Support and enhance existing monitoring and observability frameworks using tools like SNMP and syslog. Deliver against technical roadmaps, ensuring platforms remain aligned with product support lifecycles. Stay up to date with new technologies, expanding your knowledge to More ❯
in biotech, pharma, or AI-driven drug discovery Experience in both large organisations (with structured processes and metrics) and smaller/startup environments (delivering with limited resources) Knowledge of observability and reliability practices for product platforms Security or compliance experience Why Join? Be part of a world-class AI-first research environment shaping the future of drug discovery Work on More ❯
london (city of london), south east england, united kingdom
Hlx Life Sciences
in biotech, pharma, or AI-driven drug discovery Experience in both large organisations (with structured processes and metrics) and smaller/startup environments (delivering with limited resources) Knowledge of observability and reliability practices for product platforms Security or compliance experience Why Join? Be part of a world-class AI-first research environment shaping the future of drug discovery Work on More ❯
in biotech, pharma, or AI-driven drug discovery Experience in both large organisations (with structured processes and metrics) and smaller/startup environments (delivering with limited resources) Knowledge of observability and reliability practices for product platforms Security or compliance experience Why Join? Be part of a world-class AI-first research environment shaping the future of drug discovery Work on More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Addition
complex sales cycles and build C-level relationships Confident presenter with a consultative sales approach Experience working with nearshore/offshore delivery models is a plus Knowledge of AIOps, observability, or platform engineering is advantageous What’s in It for You: Be part of a global team of 6,000+ technologists, with autonomy to shape a key growth sector Hybrid More ❯
AI solutions for high-stakes processes. Our platform enables teams to create AI co-workers that automate complex workflows while keeping humans central to decision-making. With robust governance, observability, and scalability tailored for regulated sectors like healthcare, finance, and aviation, Noxus drives confident, effective AI adoption. About the Role We are seeking a Junior Product Designer who will play More ❯
AI solutions for high-stakes processes. Our platform enables teams to create AI co-workers that automate complex workflows while keeping humans central to decision-making. With robust governance, observability, and scalability tailored for regulated sectors like healthcare, finance, and aviation, Noxus drives confident, effective AI adoption. About the Role We are seeking a Junior Product Designer who will play More ❯
london (city of london), south east england, united kingdom
Noxus
AI solutions for high-stakes processes. Our platform enables teams to create AI co-workers that automate complex workflows while keeping humans central to decision-making. With robust governance, observability, and scalability tailored for regulated sectors like healthcare, finance, and aviation, Noxus drives confident, effective AI adoption. About the Role We are seeking a Junior Product Designer who will play More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Hays
SQL for validation and analysis Experience working with offshore engineering partners Proficient with tools such as JIRA, Confluence, Figma and an API documentation platform Nice to have Exposure to observability and support workflows, for example Grafana or similar Experience in UX or service design research and translating insights into requirements Familiarity with cloud platforms and CI or CD concepts Background More ❯
london, south east england, united kingdom Hybrid / WFH Options
Hays
SQL for validation and analysis Experience working with offshore engineering partners Proficient with tools such as JIRA, Confluence, Figma and an API documentation platform Nice to have Exposure to observability and support workflows, for example Grafana or similar Experience in UX or service design research and translating insights into requirements Familiarity with cloud platforms and CI or CD concepts Background More ❯
London, Lime Street, United Kingdom Hybrid / WFH Options
Hays Technology
SQL for validation and analysis Experience working with offshore engineering partners Proficient with tools such as JIRA, Confluence, Figma and an API documentation platform Nice to have Exposure to observability and support workflows, for example Grafana or similar Experience in UX or service design research and translating insights into requirements Familiarity with cloud platforms and CI or CD concepts Background More ❯
help safeguard our enterprise systems and support secure digital transformation. Dynatrace exists to make the world's software work perfectly. Our unified software intelligence platform combines broad and deep observability and continuous runtime application security with the most advanced AIOps to provide answers and intelligent automation from data at an enormous scale. This enables innovators to modernize and automate cloud More ❯
Birmingham, England, United Kingdom Hybrid / WFH Options
eTeam
Birmingham/Sheffield/Hybrid End Date: 28/11/2025 Role Overview: We are seeking an experienced OpenTelemetry Developer to lead the design, development, and deployment of observability solutions in on-premises environments. The ideal candidate will have strong expertise in Go programming, OpenTelemetry instrumentation, and CI/CD automation tailored for enterprise infrastructure. Key Responsibilities: · Develop and … diverse infrastructure setups. · Design and implement CI/CD pipelines for automated rollout and updates of Otel agents and collectors. · Collaborate with infrastructure, DevOps, and application teams to integrate observability into legacy and modern systems. · Optimize telemetry data collection, processing, and storage for performance and reliability. · Troubleshoot and resolve issues related to observability pipelines and instrumentation. · Contribute to internal documentation … Required Skills & Experience: · Strong proficiency in Go (Golang), especially for writing modular and reusable code. · Hands-on experience with OpenTelemetry Collector, agents, and SDKs. · Proven experience deploying and managing observability tools in on-premises infrastructure. · Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI, GitHub Actions). · Experience with Linux systems, networking, and containerization (Docker). · Understanding of monitoring More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Harnham - Data & Analytics Recruitment
SaaS hosting Implement GitOps deployment workflows using ArgoCD Create and manage infrastructure as code with Terraform Set up CI/CD pipelines for infrastructure and application deployment Implement monitoring, observability, and cloud cost optimisation (FinOps) Collaborate with ML engineers to fine-tune infrastructure for large-scale model training What You'll Bring 5+ years in cloud infrastructure/DevOps roles … GitOps tools, and CI/CD (GitHub Actions preferred) Proficiency in Python and scripting for automation Solid understanding of cloud networking, security, and cross-cloud connectivity Experience in monitoring, observability, and cost optimisation Nice to Have Experience with ML tooling (MLflow, Kubeflow) Knowledge of FastAPI , Databricks, or Snowflake Exposure to SRE practices or cloud security certifications Familiarity with Prometheus , Grafana More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Suits Me Limited
across multiple squads to ensure our platform is scalable, secure, and designed for rapid deployment and operational excellence. You'll contribute to the development and automation of cloud infrastructure, observability systems, CI/CD pipelines, and event-based services that power key parts of our product ecosystem. About Suits Me Suits Me is a multi-award-winning, ethical fintech dedicated … pipelines (e.g. GitHub Actions) to enable rapid and reliable delivery of services Contributing to the design of scalable and secure platform components that enable developer productivity Building and improving observability tooling (e.g. CloudWatch, Grafana) to support rapid detection and resolution of issues Collaborating with developers and stakeholders across squads to understand infrastructure needs and ensure best practices are applied Writing More ❯
with UK retailers and marketplaces. In this role, you'll ensure our systems are reliable, scalable, and secure. You'll help automate deployments, evolve our cloud infrastructure, and improve observability and developer experience - making it easier for product teams to deliver quality software quickly and safely. Why Zopa Manchester? We're building a new tech hub right in the heart … platform and developer experience teams Ensuring our container platforms (including Kubernetes) are reliable, secure, and up to date Designing scalable, self-service tools to reduce operational toil Supporting infrastructure observability through metrics, tracing, and alerting Working closely with product teams to foster a culture of reliability engineering About you: 4+ years in a Platform/Site Reliability Engineering or similar More ❯
Leeds, Yorkshire, United Kingdom Hybrid / WFH Options
William Hill PLC
AWS, Linux, Kubernetes (EKS), Terraform, Istio, ArgoCD and Crossplane to continually evolve and meet the demands of our fast-paced industry. What you will be doing: Championing reliability:Implement observability and security solutions, with robust testing and disaster recovery strategies Accelerating productivity:Automate deployments and maintain state-of-the-art CI/CD pipelines to deliver efficiently at scale Powering … in AWS, using Terraform or similar Infrastructure as Code tools for streamlined management Containerization:Skilled in Kubernetes administration and orchestration Developer Experience:Experienced in developing SDLC pipelines with GitOps Observability:Familiar with Prometheus, New Relic, Splunk, or similar monitoring tools Security First:Demonstrates an understanding of security best practices in every workflow with an Agile Mindset you'll be an More ❯