Docker) and orchestration (Kubernetes) Proficiency in CI/CD tools (Jenkins, GitLab CI, GitHub Actions, or similar) Strong scripting skills in Python, Bash, or Go Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, or similar) Developer Experience Focus Experience building internal tools and platforms for development teams Understanding of software development lifecycle and common developer pain points Familiarity More ❯
Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services. Strong plus if you are a database wiz. Expertise in monitoring and observability tools like Prometheus, Grafana, Honeycomb, Datadog, Open Telemetry, New Relic, or similar tools to measure system health and performance. Programming and scripting experience in languages such as Python, Go, Bash More ❯
City Of Westminster, London, United Kingdom Hybrid/Remote Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
Westminster, City of Westminster, Greater London, United Kingdom Hybrid/Remote Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Additional Resources Ltd
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
philadelphia, pennsylvania, united states Hybrid/Remote Options
Morgan Lewis
services on Azure Cloud. Ensure scalability, reliability, and performance of cloud environments and deployed applications. Monitor, troubleshoot, and optimize infrastructure and containerized services using Docker. Manage logging, alerting, and observability systems for deployed applications and APIs. Work closely with engineering teams to automate testing, release management, and environment provisioning. Ensure security and compliance best practices are followed in cloud and More ❯
and problem-solving skills. Knowledge of security practices (IAM, encryption, secrets management Experience with incident management frameworks and SRE principles. Knowledge of performance tuning and capacity planning. Exposure to observability tools and log aggregation systems. Understanding of networking and security fundamentals. Design, implement, and maintain monitoring, logging, and alerting systems. Define and track Service Level Indicators (SLIs), Objectives (SLOs), and More ❯
Edinburgh, Midlothian, United Kingdom Hybrid/Remote Options
Aberdeen
internal workshops, brown bags, or tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (eg, Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineering with a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. More ❯
and/or Motion Planning to inform modeling & simulation (M&S) and physical systems Developing and testing multi-agent autonomous systems and deploying in real-world environments Familiarity with observability concepts and tools. Knowledge of security best practices for DevOps and MLOps. Note: If you are interested, please share your updated resume and suggest the best number & time to connect More ❯
and managing containerized workloads. Proficient with Terraform or similar Infrastructure as Code tools. Experience with CI/CD pipelines , using GitHub Actions, GitLab CI, ArgoCD, or Flux. Familiar with observability tools like OpenTelemetry, Prometheus, Grafana, or DataDog. Experience with at least one major cloud provider (AWS, Azure, or GCP); multi-cloud exposure is a plus. Basic understanding of networking and More ❯
driving a major transformation within Capital One. The Cloud Operations Resilience Engineering (CORE) Technology division is responsible for enabling and evolving Capital One's foundational cloud infrastructure layer, including observability, connectivity, resilience and availability. What You'll Do: Lead a team managing a portfolio of diverse technology projects and developers specializing in automation, distributed microservices, and back-end systems to More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid/Remote Options
WRK DIGITAL LTD
Lambda, CloudFront, RDS, etc.) and Azure (you don't need to be an expert, but being interested helps!) Promote strong engineering practices around code quality, automated testing, peer reviews, observability, and security, helping to instil a culture of quality and accountability in engineering Collaborate closely with designers, product managers, and QA to ensure solutions are user-focused, technically sound, and More ❯
Excellent communication and leadership skills with experience mentoring engineers. Preferred Skills Experience with MuleSoft or other enterprise integration tools. Familiarity with container orchestration and cloud-native practices. Experience with observability tools such as Splunk, Prometheus, Grafana, or ELK Stack. Knowledge of domain-driven design, event sourcing, and reactive programming. Exposure to Agile environment and SAFe methodologies. Any relevant certifications in More ❯
. Design, implement, and optimize CI/CD pipelines to accelerate delivery. Build and maintain containerization and orchestration environments (Docker, Kubernetes). Contribute to infrastructure automation, monitoring, and observability using tools such as Datadog, Grafana, and Prometheus. Provide support for Active Directory domains, group policies, and access controls. Configure and troubleshoot virtual networking components, firewalls, and VPNs in cloud and More ❯
Manchester, North West, United Kingdom Hybrid/Remote Options
Anson Mccade
GDS Service Standards, OAuth2.0/OIDC, Zero Trust principles and government accreditation requirements . Oversee software quality, engineering standards, testing strategies, CI/CD pipelines, IaC (Terraform/Ansible), observability and resilience . Work alongside product, delivery, user research, DevOps and data teams to align user needs, policy requirements and technical feasibility. Mentor engineering and architecture teams, fostering best-practice More ❯
DevOps & SRE Practices Experience implementing CI/CD pipelines and DevOps methodologies Knowledge of infrastructure monitoring (Datadog), log aggregation, and incident management Understanding of SLO/SLA definition and observability best practices Strategic & Business Acumen Ability to align technical initiatives with business objectives and articulate ROI Experience creating technical roadmaps and conducting cost-benefit analyses Track record presenting to C More ❯
best practices across the platform (IAM, secrets management, encryption) • Support compliance initiatives (ISO 27001, NIST, GDPR, MCERTS, etc.) • Manage network configuration, firewalls, and secure endpoints Monitoring & Reliability • Set up observability and monitoring tools (Prometheus, Grafana, Datadog, or CloudWatch) • Ensure high availability, scalability, and cost efficiency of cloud services • Define SLIs, SLOs, and SLAs for platform components • Troubleshoot production issues and More ❯
Seattle, Washington, United States Hybrid/Remote Options
Axon
or similar. Experience of code collaboration such as GitHub, ArgoCD, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, Pulumi, or similar. Experience designing tooling More ❯
templates, or Bicep Experience with streaming and messaging (Event Hubs, Service Bus) and orchestration via Data Factory Working knowledge of containerization and orchestration fundamentals (Docker, AKS, ACR) Familiarity with observability and SRE practices using Azure Monitor, Log Analytics, Application Insights; ability to define SLAs/SLOs/SLIs and runbook procedures Strong software engineering skills in Python and/or More ❯
using tools such as Terraform or CloudFormation. * Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. * Monitor system performance, availability, and security, implementing observability best practices. * Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: * Experience deploying and managing cloud infrastructure on AWS More ❯
as Terraform or CloudFormation. Implement and manage CI/CD pipelines , enabling continuous integration and deployment of mission-critical applications. Monitor and optimise system performance, availability, and security, applying observability best practices. Collaborate in an Agile environment, engaging with stakeholders to gather requirements and deliver iterative improvements. This role allows you to apply your expertise to challenging problems while shaping More ❯
Bristol, Avon, South West, United Kingdom Hybrid/Remote Options
Indotronix Avani UK Ltd
and CI/CD (e.g. GitHub, Azure DevOps). Awareness of data governance, lineage, and cataloguing tools. Curiosity to explore infrastructure as code, containerisation (Docker/Kubernetes), or data observability tools More ❯
high-quality, well-tested code using modern testing frameworks and patterns. Occasionally contributing to simple React/Next.js based UI screens for configuration or internal tools. Improving system reliability, observability, and developer efficiency through metrics, logging, and automation. Working closely with cross-functional teams in an agile environment, contributing to planning, delivery, and code reviews. Supporting continuous improvement of codebases More ❯
e.g., Grype, Syft) SAST/SCA tools (e.g., Fortify, SonarQube, Snyk, Trivy, ZAP) AWS (EKS, EC2, Lambda) Application networking with tools such as Istio, NGINX, or Traefik Monitoring and observability tools (e.g., Prometheus, Grafana) Authentication tools (e.g., Keycloak) Artifact repositories (e.g., JFrog Artifactory, Nexus) Additional programming experience with strongly typed languages such as C++ or Rust Familiarity with secure software More ❯
CircleCI) At least 5+ years of hands-on experience with Python or Golang A solid background in configuration management and infrastructure-as-code(Terraform) Solid experience in monitoring/observability systems (Grafana, Prometheus, etc.) Demonstrated expertise with Container orchestration ( Kubernetes/GKE) Experience managing Kubernetes platforms and resources, and using Kubernetes deployment tool and patterns ( Helm, GitOps, Knative) You might More ❯