SiteReliabilityEngineering (SRE) Manager page is loaded SiteReliabilityEngineering (SRE) Managerlocations: London, UKtime type: Full timeposted on: Posted Todayjob requisition id: R35765As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial … name a few. SiteReliability Manager Locations : London, Surbiton, Essex Hybrid The Opportunity We are seeking a highly motivated and experienced SiteReliabilityEngineering (SRE) Manager to lead a team of SREs responsible for the reliability, scalability, and performance of our production systems. This role is pivotal in bridging the gap between development and … direction of infrastructure and reliability initiatives. Advocate for best practices in observability, CI/CD, and infrastructure as code. What You Will Bring: Proven experience managing or leading SRE, DevOps, or infrastructure teams. Strong background in systems engineering, cloud platforms (AWS, Azure), and container orchestration (Kubernetes). Proficiency in monitoring, alerting, and incident management tools (Prometheus, Grafana, PagerDuty More ❯
SiteReliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits) Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
SiteReliability Engineer £65,000 £95,000 DOE Hybrid (Bristol-based, occasional site visits) Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
SiteReliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits)Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
SiteReliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits) Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
Preferred qualifications: Master's degree or PhD in Computer Science, or a related technical field. Experience as a cloud customer. About the job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our … externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to … manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate More ❯
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom Hybrid / WFH Options
Ascendion
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of Payments systems More ❯
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom
Ascendion
Job Description: We are seeking a Platform Engineering Manager with a strong hands-on background in Java development and SiteReliabilityEngineering (SRE). The ideal candidate will have a broad technical skillset across Java, Spring, MuleSoft, Kafka, and Oracle DB, and must be capable of leading platform stability efforts while contributing directly to development. Experience … and implement improvements. Architect and develop resilient backend systems primarily using Java, Spring, Kafka, and Oracle. Implement best practices for observability, incident response, and operational excellence in line with SRE principles. Drive automation and self-healing mechanisms across platform components. Provide technical leadership and hands-on coding as needed. Monitor, troubleshoot, and resolve production issues, conducting root cause analysis and … platform engineering experience. Strong Java expertise with deep understanding of backend design patterns and frameworks (Spring Boot preferred). Proven experience in SiteReliabilityEngineering (SRE), including monitoring, alerting, and incident management. Hands-on experience with Kafka, MuleSoft, and Oracle DB. Familiarity with performance tuning, system design, and distributed computing concepts. Experience with CI/CD More ❯
Farnborough, Hampshire, England, United Kingdom Hybrid / WFH Options
Addition
SiteReliability Engineer (Defence) This is a chance to join a forward-thinking digital solutions business delivering secure technology for the Defence and Security sector. As a SiteReliability Engineer, you’ll be at the heart of building, scaling, and maintaining critical platforms that underpin mission-ready technology. Role Overview: Role: SiteReliability Engineer … Security What You’ll Be Doing: Designing and maintaining Kubernetes environments for scalable deployments. Building and optimising CI/CD pipelines to improve efficiency. Implementing monitoring systems to ensure reliability and performance. Driving automation initiatives to reduce manual processes. Managing repositories and version control for seamless collaboration. Partnering with development teams to align platform capabilities with requirements. Supporting long … in security, maintainability, and scalability. Staying ahead of emerging technologies to keep the platform cutting-edge. Main Skills Needed: Applications must be eligible for Security Clearance. Proven experience in SiteReliability or Platform Engineering (5+ years). Strong knowledge of Kubernetes and container orchestration. Expertise in CI/CD tools (Jenkins, GitLab, etc.). Experience with AWS More ❯
SiteReliability Engineer (International Travel) This is a chance to join a forward-thinking digital solutions business delivering secure technology for the Defence and Security sector. As a SiteReliability Engineer, you’ll be at the heart of building, scaling, and maintaining critical platforms that underpin mission-ready technology. Role Overview: Role: SiteReliability … Security What You’ll Be Doing: Designing and maintaining Kubernetes environments for scalable deployments. Building and optimising CI/CD pipelines to improve efficiency. Implementing monitoring systems to ensure reliability and performance. Driving automation initiatives to reduce manual processes. Managing repositories and version control for seamless collaboration. Partnering with development teams to align platform capabilities with requirements. Supporting long … the platform cutting-edge. Main Skills Needed: Applications must be eligible for Security Clearance. Happy to travel internationally according to project requirements (All costs covered). Proven experience in SiteReliability or Platform Engineering (4+ years). Strong knowledge of Kubernetes and container orchestration. Expertise in CI/CD tools (Jenkins, GitLab, etc.). Experience with AWS More ❯
Junior SiteReliability … Engineer We are currently working with a leading Financial Services company, who are looking for a Junior SiteReliability Engineer to join their ever-expanding platform/SRE team from their Shoreditch, London, Office where you will be expected to travel to the office 4 days a week. They are looking for you to have excellent cloud knowledge … ideally AWS as well as having experience of Powershell/Python. As the Junior SiteReliability Engineer, you will be a self-starter who has excellent stakeholder management experience who can show outcome based work. You will ideally have 2 years of commercial experience coming from an IT Operations/Cloud infrastructure background. Please note this is an More ❯
to debug, optimize code, and to automate routine tasks. Excellent problem-solving approach, with effective verbal and written communication skills. About the job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our … externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to … manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate More ❯
FDM is a global business and technology consultancy seeking a Private Cloud SRE Manager to work for our client within the financial services sector. This is initially a 12-month contract with the potential to extend and become permanent. It will be a hybrid role based in Leeds or Manchester. Our client is looking for a passionate and experienced Engineer … to join the SiteReliabilityEngineering (SRE) team to help run and evolve one of the group’s most critical platforms. You’ll be a key contributor to the stability, performance, and scalability of services, supporting the organisations digital transformation and long-term technology vision. You’ll work actively with container platforms, VMware infrastructure, and observability tooling … ensuring their services are resilient and efficient. You’ll also lead and participate in post-mortems, drive automation, and continuously improve the platform through engineering-led solutions. This role also involves working in Agile environments, collaborating across multiple teams and disciplines to deliver high-quality outcomes at pace. Responsibilities Enhance and support a wide range of platform technologies, including More ❯
Join our team as a MongoDB SiteReliability Engineer, where you'll be at the forefront of designing and maintaining robust, high-performance systems that power critical financial services. In this dynamic and fast-paced environment, your role will be essential to ensuring our infrastructure remains resilient, secure, and scalable. You'll work on automating operations, enhancing system … If you're motivated by solving, multi-layered problems and building systems that perform reliably amid shifting priorities, we encourage you to apply. To be successful as a MongoDB SiteReliability Engineer, you should have experience with: Working in SiteReliabilityEngineering, DevOps, and MongoDB administration in financial services. Using MongoDB features like replicaset, sharding More ❯
Cheltenham, Gloucestershire, South West, United Kingdom
Oscar Associates (UK) Limited
SiteReliability Engineer | Cheltenham | £600 per day (Outside IR35) About the Role: We're seeking an experienced Site … Reliability Engineer with live eDV clearance to join an on-site team in Cheltenham. This contract role involves supporting and maintaining a managed cross-domain service, applying SRE practices to ensure reliability, security, and performance. Contract Details: Employment: Contract (Outside IR35) Rate: £600 per day Length: 6 months (long-term extensions very likely). Location: Cheltenham … days on-site) Clearance: Live eDV required Start Date: ASAP Key Responsibilities: Build and deploy code using Java, Maven, NPM, Terraform, and Ansible across OpenShift, RHEL/CentOS, and Docker. Monitor and optimise system performance with Influx and Grafana. Provide 2nd/3rd line support, incident response, and root cause analysis. Carry out BAU maintenance including patching, database housekeeping More ❯
Role Overview: We are seeking a highly skilled and motivated SiteReliability Engineer (SRE) to join our engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our … and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability … tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying More ❯
CDW. JOB TITLE: Senior Automation Engineer II DEPARTMENT: DevOps Engineer ROLE PURPOSE: This role is to design, build, and scale enterprise cloud platforms with a strong focus on automation, reliability, and developer experience. As part of the Cloud Infrastructure & DevOps team, you will build multi-cloud infrastructure that powers hundreds of production services, including critical Salesforce DevOps pipelines. You … environments. Drive infrastructure compliance, DevSecOps, and policy-as-code practices. KNOWLEDGE, SKILLS AND EXPERIENCE: Minimum 5 years of experience in Platform Engineering, SiteReliabilityEngineering (SRE), or DevOps roles supporting cloud-native enterprise environments Proficient in Microsoft Azure and AWS platforms with hands-on experience in Kubernetes (AKS/EKS), Helm charts, and service mesh technologies … or HashiCorp Terraform Associate are advantageous Strong interpersonal skills including clear communication, collaboration across teams, adaptability in fast-paced environments, and a proactive mindset with a focus on reliability, performance, and developer enablement We make technology work so people can do great things. CDW is a leading multi-brand provider of information technology solutions to business, government, education and More ❯
an essential role in supporting AWS public cloud infrastructure while championing automation through Infrastructure as Code solutions such as Terraform. Your day-to-day activities will involve collaborating with SRE and engineering teams to enhance system observability, proactively managing operational risks, maintaining high standards of security compliance, and ensuring robust disaster recovery capabilities. You will be responsible for documenting … Maintain the reliability and security of cloud environments by implementing robust monitoring tools and adhering to industry best practices.* Enhance observability and telemetry within cloud-hosted environments using SRE methodologies to deliver on Service Level Agreements (SLAs), Objectives (SLOs), and Indicators (SLIs).* Document and regularly review operational risks within the cloud environment, ensuring that identified issues are tracked … for all cloud-hosted services through effective backup strategies and disaster recovery processes, including planning and conducting quarterly DR tests.* Collaborate closely with SiteReliabilityEngineering (SRE) and engineering teams to ensure optimal management of the cloud environment.* Support asset management processes throughout their lifecycle, ensuring compliance with end-of-service (EOS) and end-of-life More ❯
has helped build some of the world's largest companies. Our team in London is growing and we're looking for talented people to join us on our journey Engineering at Duffel We're building tools to simplify travel distribution, search and booking. What does this actually mean? It's one common and seamless API. This brings huge technical … experience to go with it. The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform. SiteReliabilityEngineering at Duffel As an SRE at Duffel, you'll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will … be working closely with engineering teams to understand their needs and help meet the demands of our product as we scale globally. What we're looking for - An infrastructure and systems engineering generalist who is comfortable diving deep into the weeds on different issues. Some recent examples include: - A configuration issue between Google's Load Balancer and the More ❯
JOB TITLE Google Product SiteReliability Engineer LOCATION London HOURS Full-time - 35 hours per week WORKING PATTERN Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our London office. About this opportunity We're modernising with cloud, a platform that is quick, secure and resilient for … customers and easy, modern and green for developers. We're looking for a Google Product SiteReliability Engineer to join our Public Cloud Platform. You'll have a unique opportunity to be part of an ambitious team with the purpose of driving our tech modernisation agenda and enable us to become the biggest Fintech in the UK. The … learn and develop your engineering skills It would be great if you also had Candidates with direct experience in cloud engineering, with understanding and demonstrable experience of SRE Principles Strong Linux background, including working with filesystems and processes Good understanding of the SDLC Experience building fault-tolerant systems and strong DR policies. You'll be able to demonstrate More ❯
Core, BCG X, and CT worldwide. This role is also accountable for embedding security within DevSecOps practices, enforcing automation at scale, and applying SiteReliabilityEngineering (SRE) principles across all security services. The role requires strong partnership with ISRM, with a focus on balancing and prioritizing security requirements, automation opportunities, user experience needs, and broader business outcomes. … that support modern work scenarios, remote access, zero-trust networking, and AI/ML workloads. Leverage automation frameworks and IaC to improve scalability and reduce manual intervention. Operational Security, SRE & Assurance: Ensure security platforms are resilient, continuously monitored, and designed for 24x7 support and incident response readiness. Embed security telemetry and observability to enable proactive threat detection and automated response. … Apply SRE principles to improve reliability, performance, and maintainability of security services. Lead platform health, patching automation, and vulnerability remediation workflows. Define service level objectives (SLOs) and key performance indicators (KPIs) for all security services. Compliance, Governance & Risk Management: Ensure alignment with global compliance requirements such as ISO 27001, NIST, SOC 2, GDPR, and others. Partner with governance, legal More ❯
enabling innovation and agility across BCG Core, BCG X, and CT worldwide. This role is accountable for embedding security within DevSecOps practices, applying SiteReliabilityEngineering (SRE) principles across all security services, and aligning with privacy, compliance, and business leaders to maintain trust and regulatory compliance. Key Responsibilities: Strategic Leadership & Transformation: Define and execute a unified security … remote access, zero-trust networking, and protection of sensitive data in AI/ML workloads. Leverage automation frameworks and IaC to improve scalability and reduce manual intervention. Operational Security, SRE & Assurance: Ensure security platforms are resilient, continuously monitored, and designed for 24x7 support and incident response readiness. Embed security telemetry and observability to enable proactive threat detection and automated response. … Apply SRE principles to improve reliability, performance, and maintainability of security services. Define service level objectives (SLOs) and key performance indicators (KPIs) for all security services. Compliance, Governance & Risk Management: Ensure alignment with global compliance requirements such as ISO 27001, NIST, SOC 2, GDPR, and others. Partner with governance, legal, and ISRM teams to implement enforceable policies and standards More ❯
Bath, Avon, England, United Kingdom Hybrid / WFH Options
Deerfoot Recruitment Solutions Ltd
SiteReliability EngineerWork From Home (WFH) + Quarterly Visits to BathFull Time, Initial 12 Month Fixed Term ContractSalary DOE ( ̃ £45k - £60k) + Benefits + Bonus Deerfoot Recruitment is working with an established FCA-authorised outsourced service provider in the financial services sector, seeking a talented SiteReliability Engineer to join their IT Operations team. This role … cloud architecture Engage in infrastructure design, implementation, and operation to ensure highly available, scalable systems Work collaboratively across development and operations teams throughout the software lifecycle Champion system automation, reliability, and continuous improvement initiatives Monitor production systems with auto … healing and auto-scaling methodologies Support CI/CD pipelines and streamline infrastructure-as-code workflows Maintain strong security-first practices within infrastructure design and management About You Proven SRE generalist with broad cloud infrastructure experience and adaptability Experience deploying cloud infrastructure in a regulated financial services environment Skilled in Terraform and PowerShell automation tools Familiarity with Windows Server and More ❯