plus. Linux: Solid understanding of Linux systems administration and scripting. Testing: Commitment to automated testing (unit, integration, end-to-end) and quality assurance throughout the software delivery lifecycle. Monitoring & Observability: Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK stack) is advantageous. Analytical & Problem-Solving: Excellent analytical and problem-solving skills, with a focus on delivering measurable business More ❯
City Of Westminster, London, United Kingdom Hybrid/Remote Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
Westminster, City of Westminster, Greater London, United Kingdom Hybrid/Remote Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Additional Resources Ltd
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
philadelphia, pennsylvania, united states Hybrid/Remote Options
Morgan Lewis
services on Azure Cloud. Ensure scalability, reliability, and performance of cloud environments and deployed applications. Monitor, troubleshoot, and optimize infrastructure and containerized services using Docker. Manage logging, alerting, and observability systems for deployed applications and APIs. Work closely with engineering teams to automate testing, release management, and environment provisioning. Ensure security and compliance best practices are followed in cloud and More ❯
and problem-solving skills. Knowledge of security practices (IAM, encryption, secrets management Experience with incident management frameworks and SRE principles. Knowledge of performance tuning and capacity planning. Exposure to observability tools and log aggregation systems. Understanding of networking and security fundamentals. Design, implement, and maintain monitoring, logging, and alerting systems. Define and track Service Level Indicators (SLIs), Objectives (SLOs), and More ❯
Edinburgh, Midlothian, United Kingdom Hybrid/Remote Options
Aberdeen
internal workshops, brown bags, or tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (eg, Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineering with a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. More ❯
and/or Motion Planning to inform modeling & simulation (M&S) and physical systems Developing and testing multi-agent autonomous systems and deploying in real-world environments Familiarity with observability concepts and tools. Knowledge of security best practices for DevOps and MLOps. Note: If you are interested, please share your updated resume and suggest the best number & time to connect More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid/Remote Options
WRK DIGITAL LTD
Lambda, CloudFront, RDS, etc.) and Azure (you don't need to be an expert, but being interested helps!) Promote strong engineering practices around code quality, automated testing, peer reviews, observability, and security, helping to instil a culture of quality and accountability in engineering Collaborate closely with designers, product managers, and QA to ensure solutions are user-focused, technically sound, and More ❯
london, south east england, united kingdom Hybrid/Remote Options
Mercor
fault-tolerant microservices. Build and maintain CI/CD pipelines, deployment workflows, and infrastructure-as-code. Manage Kubernetes clusters, cloud infrastructure (AWS/GCP), and container orchestration. Implement monitoring, observability, and security best practices. Collaborate with backend and AI teams to optimize system performance and reliability. Continuously improve automation, deployment speed, and operational efficiency. Requirements 3+ years of experience in More ❯
Excellent communication and leadership skills with experience mentoring engineers. Preferred Skills Experience with MuleSoft or other enterprise integration tools. Familiarity with container orchestration and cloud-native practices. Experience with observability tools such as Splunk, Prometheus, Grafana, or ELK Stack. Knowledge of domain-driven design, event sourcing, and reactive programming. Exposure to Agile environment and SAFe methodologies. Any relevant certifications in More ❯
Manchester, North West, United Kingdom Hybrid/Remote Options
Anson Mccade
GDS Service Standards, OAuth2.0/OIDC, Zero Trust principles and government accreditation requirements . Oversee software quality, engineering standards, testing strategies, CI/CD pipelines, IaC (Terraform/Ansible), observability and resilience . Work alongside product, delivery, user research, DevOps and data teams to align user needs, policy requirements and technical feasibility. Mentor engineering and architecture teams, fostering best-practice More ❯
DevOps & SRE Practices Experience implementing CI/CD pipelines and DevOps methodologies Knowledge of infrastructure monitoring (Datadog), log aggregation, and incident management Understanding of SLO/SLA definition and observability best practices Strategic & Business Acumen Ability to align technical initiatives with business objectives and articulate ROI Experience creating technical roadmaps and conducting cost-benefit analyses Track record presenting to C More ❯
best practices across the platform (IAM, secrets management, encryption) • Support compliance initiatives (ISO 27001, NIST, GDPR, MCERTS, etc.) • Manage network configuration, firewalls, and secure endpoints Monitoring & Reliability • Set up observability and monitoring tools (Prometheus, Grafana, Datadog, or CloudWatch) • Ensure high availability, scalability, and cost efficiency of cloud services • Define SLIs, SLOs, and SLAs for platform components • Troubleshoot production issues and More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full More ❯
london, south east england, united kingdom Hybrid/Remote Options
Mott MacDonald
production-grade products, and with product managers to shape roadmaps based on technical feasibility and user value. DevOps & CI/CD: Support cloud-native deployment pipelines, automated testing, and observability for everything we build. Champion software engineering excellence: Drive continuous improvement across software engineering culture, codebases, and development practices. What You'll Bring Clear communicator, with the ability to engage More ❯
templates, or Bicep Experience with streaming and messaging (Event Hubs, Service Bus) and orchestration via Data Factory Working knowledge of containerization and orchestration fundamentals (Docker, AKS, ACR) Familiarity with observability and SRE practices using Azure Monitor, Log Analytics, Application Insights; ability to define SLAs/SLOs/SLIs and runbook procedures Strong software engineering skills in Python and/or More ❯
using tools such as Terraform or CloudFormation. * Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. * Monitor system performance, availability, and security, implementing observability best practices. * Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: * Experience deploying and managing cloud infrastructure on AWS More ❯