with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Liverpool, Lancashire, United Kingdom Hybrid / WFH Options
The Acorn Group
with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Arm Limited
infrastructure "Nice To Have" Skills and Experience: Experience in a GitOps solution such as ArgoCD, Flux or Fleet Implementation of the Security Development Lifecycle (SDL) in infrastructure Monitoring and observability using Prometheus and Grafana, ELK stack or equivalent Use of Kubernetes management systems such as Rancher Familiarity with open source project development cycles and contribution processes, particularly around CI/ More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven technical and some leader/mentoring experience Cloud- expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI More ❯
Bradford, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Proven technical and some leader/mentoring experience Cloud-native expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI/CD, Terraform More ❯
DevOps & Automation Create and manage automation pipelines for deployments. Implement Infrastructure as Code (IaC) using tools such as Terraform or Ansible. Monitor and enhance system performance using logging and observability tools. Develop automation solutions for provisioning, scaling, and maintenance. Support containerization efforts with Docker/Kubernetes where applicable. Networking & System Administration Configure and maintain network infrastructure, including firewalls, VLANs, and More ❯
multiple stakeholders including development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities and Impact: Design, implement, and maintain observability solutions to track system health and performance. Analyze observability data to identify and troubleshoot potential issues proactively. Develop and implement alerts and notifications for critical events. Collaborate with development teams … in Computer Science, Information Technology, or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such as GitHub Actions, Azure DevOps More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Magentus Group
to implement robust solutions that improve system performance, security, and developer productivity. You will be responsible for maintaining and evolving platform services, adopting best practices in infrastructure as code, observability, and DevOps methodologies. Key Responsibilities of the role: Platform Development & Automation Design, develop, and maintain cloud-native infrastructure and platform services. Automate provisioning, scaling, and monitoring of infrastructure and application … reliability. Implement Infrastructure as Code (IaC) using tools such as CDK, Terraform or CloudFormation. Reliability & Security Ensure platform reliability, scalability, and security through best practices and proactive monitoring. Implement observability solutions including logging, metrics, and distributed tracing. Support incident response and post-mortem analysis, driving continuous improvements. Collaborate with security teams to ensure compliance with security and regulatory requirements. Collaboration … tools (GitHub Actions, GitLab CI, or similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work More ❯
of new software and tools into the platform. Support scalable, resilient cloud environments with modern DevOps practices. Promote GitOps deployment strategies and mentor peers in DevOps best practice. Enhance observability using tools like Prometheus and Grafana. This role is ideal for someone looking to take the next step in a DevOps career while working with a modern tech stack in More ❯
You will establish governance frameworks to enhance security, compliance, and incident response. This role provides access to cutting-edge cloud technologies, including AWS serverless computing, Kubernetes orchestration, AI-driven observability, and security automation, keeping you at the forefront of innovation. Your responsibilities: Implement and manage highly available, scalable, and secure applications hosted on AWS Cloud, leveraging multi-region deployment strategies. More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This role focuses on maintaining and improving system observability, automating operations, and enhancing deployment practices to support business-critical services. Reporting directly to the Lead Site Reliability Engineer, you will be expected to work independently while collaborating closely with … within commutable distance to the Ilkley office for occasional office attendance. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing … efficiency WHAT ARE WE LOOKING FOR IN A CANDIDATE? Experience with SRE principles, such as incident management, error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding More ❯
Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This role focuses on maintaining and improving system observability, automating operations, and enhancing deployment practices to support business-critical services. Reporting directly to the Lead Site Reliability Engineer, you will be expected to work independently while collaborating closely with … learning and improving performance based on set targets will be expected. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing … efficiency WHAT ARE WE LOOKING FOR IN A CANDIDATE? Experience with SRE principles, such as incident management, error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding More ❯
Salford, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
Newcastle upon Tyne, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
performance issues Managing regular patching and upgrade cycles for Infrastructure and Software Managing security vulnerabilities and performing platform hardening activities Developing automation to remove manual tasks Developing and maintaining observability dashboards and alerting Collaborating with Software Engineers and Users across the business Required skills and experience: Strong knowledge of at least one Public Cloud provider: Azure, AWS or GCP (Managed … Compute, Networking, RBAC/IAM) Prior experience in Linux system administration in a production environment Prior experience in provisioning and operating Kubernetes clusters in a production environment Experience in observability with Grafana with a good understanding of PromQL and LogQL Good knowledge of using Infrastructure-as-Code solutions such as Terraform Comfortable with scripting for automation using Bash and Python More ❯
understanding of modern architecture methods and patterns. Composable Architecture based on MACH principles (Microservices, API-first, Cloud-native, Headless), Event Driven. Skills to modernise architectural estates and drive serviceability, observability dashboarding and metrics in end products. Experience of Digital Transformation within either Java or Microsoft technologies landscape, Azure platform and .Net ecosystem. Expertise in Mobile and Web development frameworks and … languages like .Net, Java, Python Database technologies and platforms like SQL, NoSQL, Data Lake, Snowflake, Databricks, MongoDB, Oracle Frontend web development languages like React, Angular, JavaScript, HTML and CSS Observability platforms like Splunk, Dynatrace, Datadog, Grafana Integration technologies like REST, Kafka, iPaaS, API Management, ESB Awareness of placement of workloads on On-Prem Servers and Cloud (Azure/AWS/ More ❯
a minimum of two years working with us post training Nice to have: Domain knowledge: Banking, Financial Services, Lending (Very nice to have – understanding the wholesale lending lifecycle) Monitoring & Observability: Experience in managing Tools like APPD, ELK stack, Grafana Security Practices: DevSecOps principles, vulnerability scanning, compliance automation, Certificate/vault/user role management. Strong attention to detail a passion More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps UtilisingCI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks A More ❯
ll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and more cost-efficiently. What you … ll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through platform tools, reusable Terraform modules, and self-service infrastructure Enhancing CI/CD pipelines (Azure DevOps, YAML-based) with security … knowledge (AKS, Functions, SQL, Cosmos DB, etc.) Strong Infrastructure as Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices - including security scanning, IAM, and More ❯
Sheffield, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
automation and internal tools for deployment, monitoring, and incident response Tune performance across OS, network, and cloud layers — this role is hands-on and detail-oriented Improve system resilience, observability, and security in a high-stakes production environment Requirements: Fluent in Linux — not just using it, but understanding how it works under the hood Advanced terminal skills — manipulating systems efficiently … time environments Hands-on with Docker (Kubernetes is a plus), infrastructure-as-code, and CI/CD tooling Strong scripting and automation experience in Python and Bash Familiarity with observability stacks (Prometheus, OpenTelemetry, eBPF) Cloud infrastructure experience (AWS/GCP/Azure), with attention to IAM and software supply chain security Curious, persistent, and comfortable experimenting at the lowest levels More ❯
Crewe, Cheshire, United Kingdom Hybrid / WFH Options
Manchester Digital
platform security, reliability, and performance across systems deployed in Canada, the UK, and AWS cloud environments Contribute to key projects, platform optimizations, and ongoing maintenance initiatives Help drive scalability, observability, and operational excellence If you're passionate about infrastructure, cloud, and systems engineering-and want to help shape the future of mobility-we want to hear from you! Requirements We … configurations (Azure AD , Ory, Cognito, Firebase) - Understanding of Site Reliability Engineering and key concepts - Proficient in Infrastructure as Code pipeline deployments and pipeline version control within Terraform or CloudFormation. - Observability Systems, e.g., Nagios, New Relic - Able to troubleshoot/work under pressure, meet deadlines. - Previous experience in a cloud engineering role. - AWS certified as SysOps Administrator/Solutions Architect/… understanding of Infrastructure as Code principles and related tech such as Terraform or CloudFormation - Enhanced experience of AWS cloud technologies, e.g., ECS, EC2, VPC, Lambda, CFS. Ideally AWS certified. - Observability Systems, e.g., New Relic, CloudWatch, SquadCast - ITIL Qualified or awareness of the framework. Bonus Qualifications: -Experience with Linux system administration and troubleshooting. -Basic knowledge of AWS cloud technologies such as More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
Revolent Group
related processes like data migrations and environment setup. ✅ Preferred (Nice to Have): Banking/Financial Services knowledge — especially around wholesale lending and Loan IQ . Experience with monitoring and observability tools such as APPD, ELK Stack, or Grafana. Understanding of DevSecOps principles , including vulnerability scanning, secrets management, and compliance automation. Further experience with CI/CD integration and pipeline automation More ❯
position will align to a discipline where you will be expected to build and support solutions aligned with SDLC principles, providing technical excellence with a focus on scripting and observability coupled with a security mindset. What will you be doing day-to-day? Automation and Orchestration: Streamline the delivery and support processes by leveraging automation and IaC principles. Support and More ❯