with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Liverpool, Lancashire, United Kingdom Hybrid / WFH Options
The Acorn Group
with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Arm Limited
infrastructure "Nice To Have" Skills and Experience: Experience in a GitOps solution such as ArgoCD, Flux or Fleet Implementation of the Security Development Lifecycle (SDL) in infrastructure Monitoring and observability using Prometheus and Grafana, ELK stack or equivalent Use of Kubernetes management systems such as Rancher Familiarity with open source project development cycles and contribution processes, particularly around CI/ More ❯
DevOps & Automation Create and manage automation pipelines for deployments. Implement Infrastructure as Code (IaC) using tools such as Terraform or Ansible. Monitor and enhance system performance using logging and observability tools. Develop automation solutions for provisioning, scaling, and maintenance. Support containerization efforts with Docker/Kubernetes where applicable. Networking & System Administration Configure and maintain network infrastructure, including firewalls, VLANs, and More ❯
multiple stakeholders including development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities and Impact: Design, implement, and maintain observability solutions to track system health and performance. Analyze observability data to identify and troubleshoot potential issues proactively. Develop and implement alerts and notifications for critical events. Collaborate with development teams … in Computer Science, Information Technology, or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such as GitHub Actions, Azure DevOps More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Magentus Group
to implement robust solutions that improve system performance, security, and developer productivity. You will be responsible for maintaining and evolving platform services, adopting best practices in infrastructure as code, observability, and DevOps methodologies. Key Responsibilities of the role: Platform Development & Automation Design, develop, and maintain cloud-native infrastructure and platform services. Automate provisioning, scaling, and monitoring of infrastructure and application … reliability. Implement Infrastructure as Code (IaC) using tools such as CDK, Terraform or CloudFormation. Reliability & Security Ensure platform reliability, scalability, and security through best practices and proactive monitoring. Implement observability solutions including logging, metrics, and distributed tracing. Support incident response and post-mortem analysis, driving continuous improvements. Collaborate with security teams to ensure compliance with security and regulatory requirements. Collaboration … tools (GitHub Actions, GitLab CI, or similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work More ❯
of new software and tools into the platform. Support scalable, resilient cloud environments with modern DevOps practices. Promote GitOps deployment strategies and mentor peers in DevOps best practice. Enhance observability using tools like Prometheus and Grafana. This role is ideal for someone looking to take the next step in a DevOps career while working with a modern tech stack in More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Salford, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
understanding of modern architecture methods and patterns. Composable Architecture based on MACH principles (Microservices, API-first, Cloud-native, Headless), Event Driven. Skills to modernise architectural estates and drive serviceability, observability dashboarding and metrics in end products. Experience of Digital Transformation within either Java or Microsoft technologies landscape, Azure platform and .Net ecosystem. Expertise in Mobile and Web development frameworks and … languages like .Net, Java, Python Database technologies and platforms like SQL, NoSQL, Data Lake, Snowflake, Databricks, MongoDB, Oracle Frontend web development languages like React, Angular, JavaScript, HTML and CSS Observability platforms like Splunk, Dynatrace, Datadog, Grafana Integration technologies like REST, Kafka, iPaaS, API Management, ESB Awareness of placement of workloads on On-Prem Servers and Cloud (Azure/AWS/ More ❯
a minimum of two years working with us post training Nice to have: Domain knowledge: Banking, Financial Services, Lending (Very nice to have – understanding the wholesale lending lifecycle) Monitoring & Observability: Experience in managing Tools like APPD, ELK stack, Grafana Security Practices: DevSecOps principles, vulnerability scanning, compliance automation, Certificate/vault/user role management. Strong attention to detail a passion More ❯
a minimum of two years working with us post training Nice to have: Domain knowledge: Banking, Financial Services, Lending (Very nice to have – understanding the wholesale lending lifecycle) Monitoring & Observability: Experience in managing Tools like APPD, ELK stack, Grafana Security Practices: DevSecOps principles, vulnerability scanning, compliance automation, Certificate/vault/user role management. Strong attention to detail a passion More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps UtilisingCI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks A More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
BAE Systems (New)
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps UtilisingCI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks A More ❯
ll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and more cost-efficiently. What you … ll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through platform tools, reusable Terraform modules, and self-service infrastructure Enhancing CI/CD pipelines (Azure DevOps, YAML-based) with security … knowledge (AKS, Functions, SQL, Cosmos DB, etc.) Strong Infrastructure as Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices - including security scanning, IAM, and More ❯
Crewe, Cheshire, United Kingdom Hybrid / WFH Options
Manchester Digital
platform security, reliability, and performance across systems deployed in Canada, the UK, and AWS cloud environments Contribute to key projects, platform optimizations, and ongoing maintenance initiatives Help drive scalability, observability, and operational excellence If you're passionate about infrastructure, cloud, and systems engineering-and want to help shape the future of mobility-we want to hear from you! Requirements We … configurations (Azure AD , Ory, Cognito, Firebase) - Understanding of Site Reliability Engineering and key concepts - Proficient in Infrastructure as Code pipeline deployments and pipeline version control within Terraform or CloudFormation. - Observability Systems, e.g., Nagios, New Relic - Able to troubleshoot/work under pressure, meet deadlines. - Previous experience in a cloud engineering role. - AWS certified as SysOps Administrator/Solutions Architect/… understanding of Infrastructure as Code principles and related tech such as Terraform or CloudFormation - Enhanced experience of AWS cloud technologies, e.g., ECS, EC2, VPC, Lambda, CFS. Ideally AWS certified. - Observability Systems, e.g., New Relic, CloudWatch, SquadCast - ITIL Qualified or awareness of the framework. Bonus Qualifications: -Experience with Linux system administration and troubleshooting. -Basic knowledge of AWS cloud technologies such as More ❯
data-related processes like data migrations and environment setup. Preferred (Nice to Have): Banking/Financial Services knowledge — especially around wholesale lending and Loan IQ. Experience with monitoring and observability tools such as APPD, ELK Stack, or Grafana. Understanding of DevSecOps principles, including vulnerability scanning, secrets management, and compliance automation. Further experience with CI/CD integration and pipeline automation More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
Revolent Group
related processes like data migrations and environment setup. ✅ Preferred (Nice to Have): Banking/Financial Services knowledge — especially around wholesale lending and Loan IQ . Experience with monitoring and observability tools such as APPD, ELK Stack, or Grafana. Understanding of DevSecOps principles , including vulnerability scanning, secrets management, and compliance automation. Further experience with CI/CD integration and pipeline automation More ❯
this role, you will assist in upgrading the Elastic DP estate to Kubernetes, moving away from obsolete technology (Cloudera), upgrading to RHEL 8, and contributing to improving stability and observability of the platform. You will provide advanced analytics tooling and services for modeling analytics, working across continuous integration, development, build, and deployment using automation and cloud technologies to support the More ❯
position will align to a discipline where you will be expected to build and support solutions aligned with SDLC principles, providing technical excellence with a focus on scripting and observability coupled with a security mindset. What will you be doing day-to-day? Automation and Orchestration: Streamline the delivery and support processes by leveraging automation and IaC principles. Support and More ❯
Salford, Manchester, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
/CD pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions More ❯
The CoE Lead - Observability & Tools at JD Sports Fashion Plc is a critical, hands-on technical role focused on designing, building, and maintaining the company's Observability platform.This role ensures that our technology platforms operate efficiently and reliably, providing early insights for Engineering, Service Reliability, Service Delivery, and DevOps teams. The CoE Lead will manage the contract with third-party … performance indicators (KPIs). The position involves a 75% focus on the design of frameworks and a 25% focus on implementation and adoption. · Job Title – Centre Of Excellence Lead- Observability & Tooling · Location – BL9 8RR · Working rota – Monday Friday · Working hours – 40 What You'll Be Doing: We are looking for an experienced CoE Lead to design, build, and maintain our … Observability platform. The CoE Lead will work closely with DevOps, Engineering, Service Reliability, and Service Delivery teams to continuously improve our Observability capabilities. This role is a technical, hands-on position with a 75% focus on framework design and 25% on implementation and adoption. You will contribute to pipeline design, enabling observability from the first deployment in test environments and More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Couchbase
Reliability Engineers are hybrid software and systems engineers. They are the glue holding things together, whether that’s infrastructure/platform, tooling support for our cloud business or managing Observability posture for Couchbase. In this role the candidate we are looking for is for the Observability team which is responsible for maintaining Reliability, Availability and Serviceability for the entire Couchbase … You will have an immediate impact on the day-to-day efficiency of cloud operations and an ongoing impact on growth. Responsibilities Develop/maintain software features in the Observability stack which includes metrics pipeline, alerting, logging and notifications Create/maintain monitoring dashboards which gives insights to our customer cluster health Develop control plane features requiring observability needs High … to identify and solve issues before they affect business productivity Roll up your sleeves to be a full stack engineer as we build end-end software solutions in the Observability domain Requirements 2+ years experience as a software developer Proficiency with programming and scripting languages like Go, Python, Java, or Ruby Strong ability to write code, understands basic DSA concepts More ❯
and refine queue-based processing to support asynchronous workflows and event-driven architecture. Work collaboratively with cross-functional teams, including DevOps, Infrastructure, and Product, to deliver robust systems. Leverage observability tools to monitor, alert, and troubleshoot application and integration health. Stay current on AI-driven software development practices (e.g., GPT-assisted development, Agentic AI workflows) and suggest practical implementations. Participate … Prior experience building middleware for data sync, order processing, and internal APIs in a multi-system e-commerce environment Understanding of architecture patterns: Microservices, SOA, Hexagonal, Modular Monolith Monitoring & Observability: Grafana, Prometheus, CloudWatch, New Relic, Datadog, etc. Solid grasp of AI trends in software development, particularly in using GPT tools and agentic systems Education: Mathematics or Computer Science degree (or More ❯
and refine queue-based processing to support asynchronous workflows and event-driven architecture. Work collaboratively with cross-functional teams, including DevOps, Infrastructure, and Product, to deliver robust systems. Leverage observability tools to monitor, alert, and troubleshoot application and integration health. Stay current on AI-driven software development practices (e.g., GPT-assisted development, Agentic AI workflows) and suggest practical implementations. Participate … Prior experience building middleware for data sync, order processing, and internal APIs in a multi-system e-commerce environment Understanding of architecture patterns: Microservices , SOA , Hexagonal , Modular Monolith Monitoring & Observability: Grafana , Prometheus , CloudWatch , New Relic , Datadog , etc. Solid grasp of AI trends in software development , particularly in using GPT tools and agentic systems Education: Mathematics or Computer Science degree (or More ❯