Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and … availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. … Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will … monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and … automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom
Ascendion
We are seeking a Platform Engineering Manager with a strong hands-on background in Java development and SiteReliabilityEngineering (SRE). The ideal candidate will have a broad technical skillset across Java, Spring, MuleSoft, Kafka, and Oracle DB, and must be capable of leading platform … develop resilient backend systems primarily using Java, Spring, Kafka, and Oracle. Implement best practices for observability, incident response, and operational excellence in line with SRE principles. Drive automation and self-healing mechanisms across platform components. Provide technical leadership and hands-on coding as needed. Monitor, troubleshoot, and resolve production issues … Java expertise with deep understanding of backend design patterns and frameworks (Spring Boot preferred). Proven experience in SiteReliabilityEngineering (SRE), including monitoring, alerting, and incident management. Hands-on experience with Kafka, MuleSoft, and Oracle DB. Familiarity with performance tuning, system design, and distributed computing concepts. More ❯
and ensure Morrisons’ applications and infrastructure are resilient, efficient, and aligned with architectural goals. This is a key role for those passionate about advancing SRE practices at enterprise scale. Responsibilities Act as SME within their Domain teams for advice & guidance in terms of CI/CD, automation and product ways … of working and SRE/Engineering standards Drive the adoption of Engineering standards and Continuous Delivery principles within multiple domains The escalation point for SRE/Engineering ways of working Influence good practices and standards within SDLC throughout the business Influence partners Infrastructure best practices Implementation of … and patterns Engineering Tooling, Patterns, Framework and Standards Proprietary code quality management inclusive of technical debt About you Knowledge In depth understanding of SRE/Engineering, Architecture and Testing practices In depth understanding of the principals of CI/CD within SRE/Engineering In depth understanding More ❯
and ensure Morrisons’ applications and infrastructure are resilient, efficient, and aligned with architectural goals. This is a key role for those passionate about advancing SRE practices at enterprise scale. Responsibilities Act as SME within their Domain teams for advice & guidance in terms of CI/CD, automation and product ways … of working and SRE/Engineering standards Drive the adoption of Engineering standards and Continuous Delivery principles within multiple domains The escalation point for SRE/Engineering ways of working Influence good practices and standards within SDLC throughout the business Influence partners Infrastructure best practices Implementation of … and patterns Engineering Tooling, Patterns, Framework and Standards Proprietary code quality management inclusive of technical debt About you Knowledge In depth understanding of SRE/Engineering, Architecture and Testing practices In depth understanding of the principals of CI/CD within SRE/Engineering In depth understanding More ❯
The SiteReliabilityEngineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our … on-call and incident management functions, supporting a high-throughput platform which processes more than 15 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design … systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2. Role Responsibilities Write high-quality More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
SiteReliability & Platform Engineer to help lead the way. You'll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering … ship faster, safer, and more cost-efficiently. What you'll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through … for someone passionate about building robust infrastructure and enabling others to move faster and more securely. You might come from a cloud engineering, SRE, or DevOps background - what matters most is your curiosity, systems thinking, and drive to improve operational efficiency. At Sorted, we are committed to fostering an More ❯
also assist with CloudOps activities. Are you an experienced IT professional with a strong background in DevOps and SiteReliabilityEngineering (SRE)? Are you passionate about working with cutting-edge technologies, driving agile methodologies, and implementing CI/CD practices? Do you have knowledge of infrastructure as … code? Experience required: - Solid experience in a similar role, working on DevOps or SRE initiatives within complex IT environments with Software Engineering - AWS environment - Proficiency in DevOps practices and related technologies, such as CI/CD pipelines & infrastructure as code tools such as Terraform, Ansible, Puppet or Bicep. - Strong … to ensure system reliability and performance. - Any Linux experience would be a bonus Key Responsibilities: - Drive the strategy and implementation for DevOps and SRE practices. - Collaborate with cross-functional teams to design and implement CI/CD pipelines, ensuring efficient and reliable software delivery. - Establish and maintain best practices More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Freemans Grattan Holdings (fgh)
our customer journey. Working collaboratively with a team of transformation experts you will have the flexibility to leverage your professional experience to solve computer engineering issues across a variety of technical areas, dependent on where your interests lie. Innovation is key as we look for new ideas which will … in a DevOps, or SiteReliabilityEngineering building high-traffic, high availability systems. Experience with sitereliabilityengineering (SRE) principles and monitoring tools, including New Relic. Experience in website performance monitoring and tuning using tools such as Lighthouse and the ability to troubleshoot performance More ❯
re Looking For: Basic Required Qualifications: Bachelor's degree in Computer Science, Information Technology, or a related field. 5+ years of experience as a SiteReliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in … troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues Familiarity working in an Agile environment True understanding of SiteReliabilityEngineering Ability to build and maintain a system and culture that supports and implements SLOs. Familiar with Docker & Kubernetes, specifically EKS More ❯
program migrating services between Kubernetes environments. This position requires a strong blend of software engineering fundamentals and SiteReliabilityEngineering (SRE) principles, focusing on automation, reliability, and observability throughout the migration lifecycle. You will leverage your expertise in our cloud-native, Agile DevOps environment to … ensure a smooth and efficient transition, shaping the reliability and performance of our services. Coaching and mentoring others on best practices related to migration and reliability is a key part of this role. Key Responsibilities & Skills: Software Development & Adaptation: Design, build, test, and refactor software applications, specifically adapting … migration plans, technical designs, status updates, and risks to technical and product stakeholders. Collaborate effectively across teams and mentor engineers on software craft, Kubernetes, SRE principles, and migration techniques. More ❯
Principal Cloud Engineer WRK digital are thrilled to be partnered with Skipton Building Society supporting the growth of their cloud engineering and architecture functions. As their highly skilled team expands, they are seeking a Principal Cloud Engineer to play a pivotal role in the development, implementation and optimization of … and customer outcomes. This is a great opportunity to join an expanding team! As Principal Cloud Engineer, you will be responsible for: Ensuring the reliability, security, and scalability of Azure Cloud based solutions while aligning with the Society’s overall objectives of innovation, efficiency, and regulatory compliance. Serving as … Azure methodologies and adhere to industry standards for security, compliance, and performance. Driving the adoption of DevOps and SiteReliabilityEngineering (SRE) principles to improve operational efficiency, resilience, and service reliability. Engaging with IT leadership, Security teams, Engineering and Data teams to shape Cloud strategy ensuring More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Sage City
Job Description We are looking for a SiteReliability Engineer to join our SRE Enablement team, a specialised function within Cloud Operations focused on building reusable infrastructure, automation, and tools that enable CloudOps and Engineering teams to operate more efficiently. You will have the opportunity to be … a key driver for SRE adoption within Sage, taking the helm in developing scalable frameworks to improve developer experience, remove toil and ultimately focus on embedding SRE best practices within the wider business. If you have experience working with Terraform and modern CI/CD workflows this could be the … also engage with broader teams to help implement these new approaches. You will have oversight of the entirety of Sage's product-suite and SRE teams as you work closely with them to build tools to make them more successful. Please note this is a hybrid role - you will be More ❯
new priorities, you’ll set the standard. You’ll engage with technical and non-technical customers and have a positive influence on the wider engineering community. With our encouragement to spend up to 30% of your time on development, innovation, and experimentation, you’ll have the freedom to explore … new possibilities for yourself, and for GCHQ. You don’t need to be a Software Engineer to apply; you might be working in Cloud Engineering and Security, UX, SiteReliabilityEngineering, Front-End Design, Agile, Solution Architecture, Data Engineering, or Machine Learning Operations. You’ll More ❯
Bolton, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Altrincham, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Leigh, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Bury, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Leeds, West Yorkshire, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Ashton-Under-Lyne, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Bradford, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
Exalto Consulting ltd
UK-GDPR, PCI-DSS) and OWASP best practices. Support disaster recovery planning and backup strategies. What Were Looking For: Considerable experience in DevOps/SiteReliabilityEngineering for high-traffic systems. Experience with CI/CD tools (Jenkins, GitLab) and performance monitoring (New Relic, Lighthouse). Strong More ❯
Newcastle upon Tyne, Tyne and Wear, Tyne & Wear, United Kingdom
Tenth Revolution Group
rota to support business-critical services Continuously identify and deliver improvements to operational processes and tooling Essential Skills & Experience: Significant experience in a DevOps, SRE, Systems Administration or similar role (ideally 5+ years) Proven hands-on experience with AWS (Kubernetes experience preferred) Strong scripting skills (e.g. PowerShell, Python, Bash) Operating More ❯
and with the communities in which we work and live. It is personal to all of us.” – Julie Sweet, Accenture CEO Accenture Next Gen Engineering: Next Gen Engineering is the home of our dedicated Technology people who are focused on engineering memorable yet differentiated and captivating customer … looking for experienced DevOps/Platform Engineers to join our vibrant community of Platform Engineering professionals, encompassing knowledge and experience in DevOps, DevSecOps, SRE, Observability, and Internal Developer Platforms/Portals, based at our London offices on a full-time, permanent basis As a member of our Next Gen … Engineering team, you will have the opportunity to: Create Innovative Platform Solutions: Take part in architecting and implementing cutting-edge platform engineering solutions tailored to address unique business challenges across several industries. Technical Leadership: Display your technical leadership skills by guiding and collaborating with both onshore and offshore More ❯
engage with content and communities they are most passionate about. The Position: FanDuel Sports Network is seeking a Senior Director, Infrastructure to join the Engineering Team. Reporting to the VP, Engineering this position will lead our organization's broadcast, systems, network, and cloud strategy, design, implementation, maintenance, and … operations. The ideal candidate will possess strong leadership qualities, technical expertise, and strategic vision to ensure the reliability, scalability, and security of our infrastructure. This position is located in Southport, CT. The Game Plan : (What you will do) Strategic Planning: Develop and execute a comprehensive infrastructure strategy aligned with … organizational goals and objectives. Team Leadership: Lead and mentor a team of broadcast and sitereliability engineers fostering a culture of collaboration, innovation, and excellence. Broadcast Systems Design: Oversee design, implementation, and maintenance of broadcast systems ensuring flexibility, efficiency, and stability. Network Design and Architecture: Oversee design, implementation More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Modix International
platforms across the UK and Europe. As a Senior Cloud Platform Engineer, you'll play a pivotal role in ensuring the scalability, security, and reliability of our AWS environments, while driving automation and best practices. You'll be responsible for architecting, configuring, and supporting cloud infrastructure that supports high … For: To succeed in this role, you should have at least 5 years of hands-on experience in cloud platforms, ideally in a DevOps , SRE , or Cloud Engineering capacity. If you're passionate about AWS and enjoy collaborating with teams to deliver cutting-edge cloud solutions, this is the … role for you. Key qualifications and skills include: 5+ years' experience in Cloud Platform Engineering, DevOps, or SRE roles. A degree in Computer Science, IT, or equivalent experience. In-depth knowledge of AWS services including EC2, Lambda, ECS, EKS, S3, VPC, Route 53, API Gateway, and RDS. Experience with More ❯