chester, north west england, United Kingdom Hybrid / WFH Options
Ascendion
We are seeking a Platform Engineering Manager with a strong hands-on background in Java development and SiteReliabilityEngineering (SRE). The ideal candidate will have a broad technical skillset across Java, Spring, MuleSoft, Kafka, and Oracle DB, and must be capable of leading platform … develop resilient backend systems primarily using Java, Spring, Kafka, and Oracle. Implement best practices for observability, incident response, and operational excellence in line with SRE principles. Drive automation and self-healing mechanisms across platform components. Provide technical leadership and hands-on coding as needed. Monitor, troubleshoot, and resolve production issues … Java expertise with deep understanding of backend design patterns and frameworks (Spring Boot preferred). Proven experience in SiteReliabilityEngineering (SRE), including monitoring, alerting, and incident management. Hands-on experience with Kafka, MuleSoft, and Oracle DB. Familiarity with performance tuning, system design, and distributed computing concepts. More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and … availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. … Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will … monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and … automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will More ❯
Who we are looking for A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will … monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and … automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will More ❯
helps Data Services & Analytics (DSA) teams improve their product and service reliability by providing observability and embedding SiteReliabilityEngineering (SRE) principles. You will be a key part of the team, working on engagements with product teams and helping grow SRE culture within the organisation. Job … description The DevOps (SRE) is responsible for improving the reliability of our platforms and services. Your role is proactive, ensuring relevant metrics are being measured and reliability improvements are identified and implemented when necessary. This will ensure the reliability and availability of services for users. You will … UK residency and security requirements - You need to have lived in the UK for the past 5 years . Essential Criteria As a DevOps (SRE), you will have experience of: Designing and implementing reliable cloud solutions using AWS or Azure according to best practices. (Software design - SWDN) Implementing automated testing More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom
Ascendion
We are seeking a Platform Engineering Manager with a strong hands-on background in Java development and SiteReliabilityEngineering (SRE). The ideal candidate will have a broad technical skillset across Java, Spring, MuleSoft, Kafka, and Oracle DB, and must be capable of leading platform … develop resilient backend systems primarily using Java, Spring, Kafka, and Oracle. Implement best practices for observability, incident response, and operational excellence in line with SRE principles. Drive automation and self-healing mechanisms across platform components. Provide technical leadership and hands-on coding as needed. Monitor, troubleshoot, and resolve production issues … Java expertise with deep understanding of backend design patterns and frameworks (Spring Boot preferred). Proven experience in SiteReliabilityEngineering (SRE), including monitoring, alerting, and incident management. Hands-on experience with Kafka, MuleSoft, and Oracle DB. Familiarity with performance tuning, system design, and distributed computing concepts. More ❯
The SiteReliabilityEngineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our … on-call and incident management functions, supporting a high-throughput platform which processes more than 15 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design … systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2. Role Responsibilities Write high-quality More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
SiteReliability & Platform Engineer to help lead the way. You'll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering … ship faster, safer, and more cost-efficiently. What you'll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through … for someone passionate about building robust infrastructure and enabling others to move faster and more securely. You might come from a cloud engineering, SRE, or DevOps background - what matters most is your curiosity, systems thinking, and drive to improve operational efficiency. At Sorted, we are committed to fostering an More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Freemans Grattan Holdings (fgh)
our customer journey. Working collaboratively with a team of transformation experts you will have the flexibility to leverage your professional experience to solve computer engineering issues across a variety of technical areas, dependent on where your interests lie. Innovation is key as we look for new ideas which will … in a DevOps, or SiteReliabilityEngineering building high-traffic, high availability systems. Experience with sitereliabilityengineering (SRE) principles and monitoring tools, including New Relic. Experience in website performance monitoring and tuning using tools such as Lighthouse and the ability to troubleshoot performance More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Fruition Group
Job Title: Senior SiteReliability Engineer (SRE) Location: Leeds (Hybrid - c. 1-2 days per week) Salary: £60,000 - £80,000 + benefits Why Apply? This is a fantastic opportunity for a seasoned Senior SiteReliability Engineer to take a lead role in shaping the infrastructure … most innovative businesses in their market. Working with cutting-edge technology, this role offers high-impact challenges, meaningful collaboration, and excellent career progression. Senior SRE Responsibilities Manage and optimise cloud infrastructure to ensure scalability, high availability, and security. Design and implement robust CI/CD pipelines for efficient product delivery. … like GitlabCI, Terraform/OpenTofu, Ansible, and scripting languages such as PowerShell or Python. Champion infrastructure best practices and mentor junior team members. Senior SRE Requirements Extensive experience in SRE or DevOps roles within high-availability, cloud-native environments. Strong expertise with AWS (including EKS, MSK, RDS, VPC design, encryption More ❯
Job Title: Senior SiteReliability Engineer (SRE) Location: Leeds (Hybrid - c. 1-2 days per week) Salary: £60,000 - £80,000 + benefits Why Apply? This is a fantastic opportunity for a seasoned Senior SiteReliability Engineer to take a lead role in shaping the infrastructure … most innovative businesses in their market. Working with cutting-edge technology, this role offers high-impact challenges, meaningful collaboration, and excellent career progression. Senior SRE Responsibilities Manage and optimise cloud infrastructure to ensure scalability, high availability, and security. Design and implement robust CI/CD pipelines for efficient product delivery. … like GitlabCI, Terraform/OpenTofu, Ansible, and scripting languages such as PowerShell or Python. Champion infrastructure best practices and mentor junior team members. Senior SRE Requirements Extensive experience in SRE or DevOps roles within high-availability, cloud-native environments. Strong expertise with AWS (including EKS, MSK, RDS, VPC design, encryption More ❯
re Looking For: Basic Required Qualifications: Bachelor's degree in Computer Science, Information Technology, or a related field. 5+ years of experience as a SiteReliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in … troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues Familiarity working in an Agile environment True understanding of SiteReliabilityEngineering Ability to build and maintain a system and culture that supports and implements SLOs. Familiar with Docker & Kubernetes, specifically EKS More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
慨正橡扯
so they can thrive in a digital world. We're looking for a QA Engineer to join our SiteReliabilityEngineering (SRE) team within Cloud Services Engineering & Operations. In this role, you'll play a key part in ensuring the reliability, performance, and resilience of … contributing to continuous improvement initiatives. Coordinate User Acceptance Testing (UAT) to ensure smooth product releases. Mentor junior QA engineers in automation, cloud QA, and SRE methodologies. What We're Looking For: Must-Have Skills: Proficiency in UI test automation (e.g., Selenium). Knowledge of CI/CD pipelines and test … automation integration. Familiarity with monitoring and logging tools (e.g., New Relic, Datadog, Prometheus, Grafana, Splunk). Understanding of SRE principles, including reliability testing and fault tolerance. Experience across the full testing lifecycle, from test planning to User Acceptance Testing (UAT). Nice-to-Have Skills: Experience in performance and More ❯
Principal Cloud Engineer WRK digital are thrilled to be partnered with Skipton Building Society supporting the growth of their cloud engineering and architecture functions. As their highly skilled team expands, they are seeking a Principal Cloud Engineer to play a pivotal role in the development, implementation and optimization of … and customer outcomes. This is a great opportunity to join an expanding team! As Principal Cloud Engineer, you will be responsible for: Ensuring the reliability, security, and scalability of Azure Cloud based solutions while aligning with the Society’s overall objectives of innovation, efficiency, and regulatory compliance. Serving as … Azure methodologies and adhere to industry standards for security, compliance, and performance. Driving the adoption of DevOps and SiteReliabilityEngineering (SRE) principles to improve operational efficiency, resilience, and service reliability. Engaging with IT leadership, Security teams, Engineering and Data teams to shape Cloud strategy ensuring More ❯
Principal Cloud Engineer WRK digital are thrilled to be partnered with Skipton Building Society supporting the growth of their cloud engineering and architecture functions. As their highly skilled team expands, they are seeking a Principal Cloud Engineer to play a pivotal role in the development, implementation and optimization of … and customer outcomes. This is a great opportunity to join an expanding team! As Principal Cloud Engineer, you will be responsible for: Ensuring the reliability, security, and scalability of Azure Cloud based solutions while aligning with the Society’s overall objectives of innovation, efficiency, and regulatory compliance. Serving as … Azure methodologies and adhere to industry standards for security, compliance, and performance. Driving the adoption of DevOps and SiteReliabilityEngineering (SRE) principles to improve operational efficiency, resilience, and service reliability. Engaging with IT leadership, Security teams, Engineering and Data teams to shape Cloud strategy ensuring More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Sage City
Job Description We are looking for a SiteReliability Engineer to join our SRE Enablement team, a specialised function within Cloud Operations focused on building reusable infrastructure, automation, and tools that enable CloudOps and Engineering teams to operate more efficiently. You will have the opportunity to be … a key driver for SRE adoption within Sage, taking the helm in developing scalable frameworks to improve developer experience, remove toil and ultimately focus on embedding SRE best practices within the wider business. If you have experience working with Terraform and modern CI/CD workflows this could be the … also engage with broader teams to help implement these new approaches. You will have oversight of the entirety of Sage's product-suite and SRE teams as you work closely with them to build tools to make them more successful. Please note this is a hybrid role - you will be More ❯
Bolton, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Leeds, West Yorkshire, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Leigh, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Bury, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Altrincham, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Ashton-Under-Lyne, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Ashton-Under-Lyne, Greater Manchester, UK Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
native and containerised applications Build solutions as part of a DevSecOps and Agile ecosystem Ensure solution works in a reliable and resilient way using SiteReliabilityEngineering methods to increase availability while reducing costs and callouts. Help the client and end users to understand trade-offs when More ❯
native and containerised applications Build solutions as part of a DevSecOps and Agile ecosystem Ensure solution works in a reliable and resilient way using SiteReliabilityEngineering methods to increase availability while reducing costs and callouts. Help the client and end users to understand trade-offs when More ❯