and service incidents with root cause analysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. … AWS services at the DevOps Engineer level Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/ More ❯
London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
cross-functional teams. Embed environment automation via tools like Ansible and integrate seamlessly with CI/CD pipelines (e.g., Jenkins, GitLab). Monitor environment health and performance using modern observability tools, driving continuous improvement initiatives. What they are looking for At least 5+ years in test environment management, including hands-on work with OpenShift/Linux platforms. Strong exposure to … . Experience working within complex transformation programmes ideally involving legacy-to-modern transitions. Proficient in Linux shell scripting and systems management. Sound knowledge of CI/CD frameworks and observability stacks (e.g., Prometheus, Grafana). Confident communicator, able to collaborate effectively with architects, developers, QA, and operations. The role is paying a base salary of between More ❯
that affect millions of users Design and implement Infrastructure as Code solutions that set industry standards Build resilient CI/CD pipelines using Bitbucket and Spacelift orchestration Develop sophisticated observability strategies with Grafana , CloudWatch , and advanced monitoring tools Leadership & Growth Opportunities Mentor emerging DevOps talent and shape team culture Influence architectural decisions across cross-functional teams Drive strategic initiatives that … TypeScript capabilities (this is code-heavy DevOps) Cloud Platforms : Recent AWS experience with enterprise-scale deployments CI/CD Mastery : Advanced experience with Jenkins, Bitbucket Pipelines, and orchestration tools Observability : Hands-on expertise with Grafana, Splunk, CloudWatch for proactive monitoring Leadership & Delivery: Proven track record architecting scalable, secure infrastructure solutions Experience implementing advanced security measures across DevOps workflows Large-scale More ❯
London, England, United Kingdom Hybrid / WFH Options
Durlston Partners
automation and internal tools for deployment, monitoring, and incident response Tune performance across OS, network, and cloud layers — this role is hands-on and detail-oriented Improve system resilience, observability, and security in a high-stakes production environment Requirements: Fluent in Linux — not just using it, but understanding how it works under the hood Advanced terminal skills — manipulating systems efficiently … time environments Hands-on with Docker (Kubernetes is a plus), infrastructure-as-code, and CI/CD tooling Strong scripting and automation experience in Python and Bash Familiarity with observability stacks (Prometheus, OpenTelemetry, eBPF) Cloud infrastructure experience (AWS/GCP/Azure), with attention to IAM and software supply chain security Curious, persistent, and comfortable experimenting at the lowest levels More ❯
automation, infrastructure provisioning and tooling to enhance development efficiency. You will manage Platform Reliability and Infrastructure ensuring a reliable and stable platform. You will oversee YouLend's Security and Observability frameworks, focusing on platform security, maintaining observability, and providing dashboards for developers to monitor service health. The ideal candidate is someone who has successfully built and scaled platform architectures, led … the ability to work across technical and non-technical teams. Excellent communication skills, with the ability to translate complex technical concepts to business stakeholders. Operational Focus: Expertise in platform observability, monitoring, incident management, and creating highly reliable systems. Experience implementing SLAs, SLOs, and SLIs is a plus. Security & Compliance: In-depth understanding of platform security, data privacy, and regulatory compliance More ❯
Stoxx's GCP platform infrastructure Ensure the platform's scalability, reliability, and efficiency meets business and client requirements Develop, build and support a robust CI/CD pipeline and observability stack Be the go-to person for the most critical Platform issues, leading cross-functional teams where necessary, to deliver best-in-class engineering solutions. Drive continuous improvement initiatives to … Experience working in a global or multinational team setting Strong documentation, communication and collaboration skills Proven ability to drive innovation and continuous improvement initiatives Focus on simplicity, automation and observability Expertise in Python, GitHub Actions, Apigee, Airflow Expertise in Observability tooling such as Prometheus/Grafana, ELK, Splunk or similar Bachelor's or Master's degree in Computer Science or More ❯
performance issues Managing regular patching and upgrade cycles for Infrastructure and Software Managing security vulnerabilities and performing platform hardening activities Developing automation to remove manual tasks Developing and maintaining observability dashboards and alerting Collaborating with Software Engineers and Users across the business Required skills and experience: Strong knowledge of at least one Public Cloud provider: Azure, AWS or GCP (Managed … Compute, Networking, RBAC/IAM) Prior experience in Linux system administration in a production environment Prior experience in provisioning and operating Kubernetes clusters in a production environment Experience in observability with Grafana with a good understanding of PromQL and LogQL Good knowledge of using Infrastructure-as-Code solutions such as Terraform Comfortable with scripting for automation using Bash and Python More ❯
London, England, United Kingdom Hybrid / WFH Options
Ikerian
scalable AWS cloud environments and services. Manage and prioritise tasks in the cloud infrastructure backlog to address immediate needs and plan long-term improvements. Set up infrastructure monitoring and observability solutions, proactively addressing availability, performance or security issues. Assess new technologies, systems, and services for production readiness, ensuring seamless and stable integration. Prepare and maintain documentation on cloud processes, procedures … CI/CD pipelines and tools, including GitLab (preferred), GitHub Actions, Jenkins, etc. Basic understanding of cloud networking concepts, including VPC, Subnets, and Load Balancing. Familiarity with monitoring and observability tools for cloud environments, such as Grafana, Prometheus, OpenSearch, and the ELK stack. Strong analytical and problem-solving skills, with a proactive approach to challenges. A genuine interest in staying More ❯
London, England, United Kingdom Hybrid / WFH Options
CFP Energy
and enhance CI/CD pipelines, infrastructure/app templates, and automation workflows. Explore and integrate emerging technologies to evolve our platform offerings and support developer needs. Fine-tune observability tools to resolve issues quickly and deliver actionable alerts to the right people. Infrastructure as Code (IaC): Proven experience with cloud infrastructure automation (Terraform and Azure preferred). Kubernetes: Proficiency … GitOps workflows and Helm charts. Security: Hands-on experience with token/secret management tools (e.g., HashiCorp Vault, Azure Key Vault) and SSO/authentication systems (e.g., Okta). Observability: Hands-on experience with platforms like DataDog, Grafana, or Azure Monitor. Networking: Strong understanding of networking principles, DNS, and related technologies. CI/CD: Skilled in creating and maintaining CI More ❯
London, England, United Kingdom Hybrid / WFH Options
Elwood Technologies
environment. Automate manual processes and workflows, reducing operational overhead. Work closely with engineering teams to design and deploy scalable, fault-tolerant infrastructure solutions on AWS or GCP . Improve observability by utilizing monitoring, logging, and alerting systems (e.g., CloudWatch , Datadog ). Lead post-incident reviews , contribute to the continuous improvement of system reliability and follow up on strategic fixes. Develop … you have experience of some or all of the following: Experience with client-impact triage , working cross-functionally with account managers or product teams. Proficiency with Datadog or similar observability platforms. Knowledge of serverless architectures (e.g., AWS Lambda, GCP Cloud Functions). Familiarity with RDBMS and NoSQL databases , such as RDS, CloudSQL, DynamoDB. Prior experience in fintech , trading platforms, or More ❯
understanding of modern architecture methods and patterns. Composable Architecture based on MACH principles (Microservices, API-first, Cloud-native, Headless), Event Driven. Skills to modernise architectural estates and drive serviceability, observability dashboarding and metrics in end products. Experience of Digital Transformation within either Java or Microsoft technologies landscape, Azure platform and .Net ecosystem. Expertise in Mobile and Web development frameworks and … languages like .Net, Java, Python Database technologies and platforms like SQL, NoSQL, Data Lake, Snowflake, Databricks, MongoDB, Oracle Frontend web development languages like React, Angular, JavaScript, HTML and CSS Observability platforms like Splunk, Dynatrace, Datadog, Grafana Integration technologies like REST, Kafka, iPaaS, API Management, ESB Awareness of placement of workloads on On-Prem Servers and Cloud (Azure/AWS/ More ❯
Slough, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
and will help clients adopt modern DevOps practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
and will help clients adopt modern DevOps practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
governance compliance. Utilize AWS, containerization (e.g., Docker), and Infrastructure as Code tools like Terraform and Ansible for performance and cost optimization. Implement best practices in DevOps and DevSecOps, including observability, security, networking, API integration, and disaster recovery. Mentor junior engineers and contribute to technical leadership, preferably with experience in broadcast workflows, audio/video streaming, and Agile methodologies. Key Requirements More ❯
Edinburgh, Scotland, United Kingdom Hybrid / WFH Options
JR United Kingdom
ideally with Terraform or CloudFormation. Hands-on experience with CI/CD pipelines and automation tooling. Background in containerisation and orchestration – e.g., Docker, Kubernetes. Familiarity with monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, CloudWatch). Proven ability to troubleshoot and resolve complex infrastructure issues. Experience working in cross-functional engineering teams, ideally in a DevOps or SRE capacity. Strong More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
strong track record of building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. Strong scripting More ❯
Flask/FastAPI/Django) Demonstrated expertise in the process of containerization for applications and their subsequent orchestration within Kubernetes environments. Experience working on at least one monitoring/observability stack (Datadog, ELK, Splunk, Loki, Grafana). Strong knowledge of Unix or Linux Strong communication skills to collaborate with various stakeholders Able to work independently in a fast-paced environment More ❯
London, England, United Kingdom Hybrid / WFH Options
Arcus Search
and will help clients adopt modern DevOps practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
London, England, United Kingdom Hybrid / WFH Options
Anson McCade Pty
to automate provisioning. • Deploy and manage Kubernetes solutions, including AKS, EKS, and OpenShift. • Implement DevSecOps practices, integrating CI/CD pipelines and security controls. • Optimize cloud environments using FinOps, observability tooling, and SRE methodologies. • Work closely with Cloud Architects, Engineers, and Business Leaders to build scalable, high-performance platforms. • Enhance networking and security capabilities across hybrid cloud environments. The ideal More ❯