SiteReliabilityEngineering (SRE) Manager page is loaded SiteReliabilityEngineering (SRE) Managerlocations: London, UKtime type: Full timeposted on: Posted Todayjob requisition id: R35765As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial … name a few. SiteReliability Manager Locations : London, Surbiton, Essex Hybrid The Opportunity We are seeking a highly motivated and experienced SiteReliabilityEngineering (SRE) Manager to lead a team of SREs responsible for the reliability, scalability, and performance of our production systems. This role is pivotal in bridging the gap between development and … direction of infrastructure and reliability initiatives. Advocate for best practices in observability, CI/CD, and infrastructure as code. What You Will Bring: Proven experience managing or leading SRE, DevOps, or infrastructure teams. Strong background in systems engineering, cloud platforms (AWS, Azure), and container orchestration (Kubernetes). Proficiency in monitoring, alerting, and incident management tools (Prometheus, Grafana, PagerDuty More ❯
SiteReliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits) Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
SiteReliability Engineer £65,000 £95,000 DOE Hybrid (Bristol-based, occasional site visits) Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
SiteReliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits)Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
SiteReliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits) Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
Preferred qualifications: Master's degree or PhD in Computer Science, or a related technical field. Experience as a cloud customer. About the job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our … externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to … manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate More ❯
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom Hybrid / WFH Options
Ascendion
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of Payments systems More ❯
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Reliability Engineer III Apply locations Belfast - Millennium House time type Full time posted on Posted 3 Days Ago job requisition id 32913 CME Group is seeking a SRE III to help, build, operate and scale systems in our Markets portfolio. Markets SREs work on products and applications related to CME's Globex trading platform. Our systems deliver an … learn how we observe, monitor, automate, and improve Production service reliability and act as a mentor to junior colleagues. He/she will have a keen interest in SRE and enjoy the cut-and-thrust of operating Production systems. They will be a strong communicator, and may have previously worked in an SRE role, a software engineering role … ideas and reliability improvement suggestions to the Product backlog Support the migration of markets applications to Google Cloud Platform (GCP) Act as a mentor to L2 and L1 SRE colleagues What We're Looking for: Experience with Linux-based systems Experience with Cloud-based platform(s) - Google Cloud Platform, GCE, and/or GKE a bonus Understanding of application More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom
Ascendion
Job Description: We are seeking a Platform Engineering Manager with a strong hands-on background in Java development and SiteReliabilityEngineering (SRE). The ideal candidate will have a broad technical skillset across Java, Spring, MuleSoft, Kafka, and Oracle DB, and must be capable of leading platform stability efforts while contributing directly to development. Experience … and implement improvements. Architect and develop resilient backend systems primarily using Java, Spring, Kafka, and Oracle. Implement best practices for observability, incident response, and operational excellence in line with SRE principles. Drive automation and self-healing mechanisms across platform components. Provide technical leadership and hands-on coding as needed. Monitor, troubleshoot, and resolve production issues, conducting root cause analysis and … platform engineering experience. Strong Java expertise with deep understanding of backend design patterns and frameworks (Spring Boot preferred). Proven experience in SiteReliabilityEngineering (SRE), including monitoring, alerting, and incident management. Hands-on experience with Kafka, MuleSoft, and Oracle DB. Familiarity with performance tuning, system design, and distributed computing concepts. Experience with CI/CD More ❯
Farnborough, Hampshire, England, United Kingdom Hybrid / WFH Options
Addition
SiteReliability Engineer (Defence) This is a chance to join a forward-thinking digital solutions business delivering secure technology for the Defence and Security sector. As a SiteReliability Engineer, you’ll be at the heart of building, scaling, and maintaining critical platforms that underpin mission-ready technology. Role Overview: Role: SiteReliability Engineer … Security What You’ll Be Doing: Designing and maintaining Kubernetes environments for scalable deployments. Building and optimising CI/CD pipelines to improve efficiency. Implementing monitoring systems to ensure reliability and performance. Driving automation initiatives to reduce manual processes. Managing repositories and version control for seamless collaboration. Partnering with development teams to align platform capabilities with requirements. Supporting long … in security, maintainability, and scalability. Staying ahead of emerging technologies to keep the platform cutting-edge. Main Skills Needed: Applications must be eligible for Security Clearance. Proven experience in SiteReliability or Platform Engineering (5+ years). Strong knowledge of Kubernetes and container orchestration. Expertise in CI/CD tools (Jenkins, GitLab, etc.). Experience with AWS More ❯
SiteReliability Engineer (International Travel) This is a chance to join a forward-thinking digital solutions business delivering secure technology for the Defence and Security sector. As a SiteReliability Engineer, you’ll be at the heart of building, scaling, and maintaining critical platforms that underpin mission-ready technology. Role Overview: Role: SiteReliability … Security What You’ll Be Doing: Designing and maintaining Kubernetes environments for scalable deployments. Building and optimising CI/CD pipelines to improve efficiency. Implementing monitoring systems to ensure reliability and performance. Driving automation initiatives to reduce manual processes. Managing repositories and version control for seamless collaboration. Partnering with development teams to align platform capabilities with requirements. Supporting long … the platform cutting-edge. Main Skills Needed: Applications must be eligible for Security Clearance. Happy to travel internationally according to project requirements (All costs covered). Proven experience in SiteReliability or Platform Engineering (4+ years). Strong knowledge of Kubernetes and container orchestration. Expertise in CI/CD tools (Jenkins, GitLab, etc.). Experience with AWS More ❯
Junior SiteReliability … Engineer We are currently working with a leading Financial Services company, who are looking for a Junior SiteReliability Engineer to join their ever-expanding platform/SRE team from their Shoreditch, London, Office where you will be expected to travel to the office 4 days a week. They are looking for you to have excellent cloud knowledge … ideally AWS as well as having experience of Powershell/Python. As the Junior SiteReliability Engineer, you will be a self-starter who has excellent stakeholder management experience who can show outcome based work. You will ideally have 2 years of commercial experience coming from an IT Operations/Cloud infrastructure background. Please note this is an More ❯
to debug, optimize code, and to automate routine tasks. Excellent problem-solving approach, with effective verbal and written communication skills. About the job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our … externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to … manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate More ❯
FDM is a global business and technology consultancy seeking a Private Cloud SRE Manager to work for our client within the financial services sector. This is initially a 12-month contract with the potential to extend and become permanent. It will be a hybrid role based in Leeds or Manchester. Our client is looking for a passionate and experienced Engineer … to join the SiteReliabilityEngineering (SRE) team to help run and evolve one of the group’s most critical platforms. You’ll be a key contributor to the stability, performance, and scalability of services, supporting the organisations digital transformation and long-term technology vision. You’ll work actively with container platforms, VMware infrastructure, and observability tooling … ensuring their services are resilient and efficient. You’ll also lead and participate in post-mortems, drive automation, and continuously improve the platform through engineering-led solutions. This role also involves working in Agile environments, collaborating across multiple teams and disciplines to deliver high-quality outcomes at pace. Responsibilities Enhance and support a wide range of platform technologies, including More ❯
Join our team as a MongoDB SiteReliability Engineer, where you'll be at the forefront of designing and maintaining robust, high-performance systems that power critical financial services. In this dynamic and fast-paced environment, your role will be essential to ensuring our infrastructure remains resilient, secure, and scalable. You'll work on automating operations, enhancing system … If you're motivated by solving, multi-layered problems and building systems that perform reliably amid shifting priorities, we encourage you to apply. To be successful as a MongoDB SiteReliability Engineer, you should have experience with: Working in SiteReliabilityEngineering, DevOps, and MongoDB administration in financial services. Using MongoDB features like replicaset, sharding More ❯
Lisburn, County Antrim, United Kingdom Hybrid / WFH Options
Camlin
industries, including power and rail, and also has interests in a number of R&D projects in a variety of scientific sectors. At Camlin we believe in high quality engineering and design, allowing us to develop market leading products and services. In short, we love creating value for our customers by solving … difficult problems. As of today, the Camlin operation spans over 20 countries across the globe. Job Overview We are seeking a dedicated and experienced SiteReliability Engineer (SRE) to join our dynamic team. The SRE will be responsible for ensuring the reliability, performance, and availability of our critical systems and services. This role requires a blend of … software engineering and operations skills to build and run large-scale, distributed, fault-tolerant systems. Key Responsibilities System Reliability and Performance Design, implement, and maintain scalable and reliable infrastructure. Monitor system performance, detect issues, and ensure maximum uptime. Develop and implement strategies for disaster recovery and data backup. Automation and Tooling Automate repetitive tasks to improve efficiency and More ❯
Sr SiteReliability Engineer page is loaded Sr SiteReliability Engineer Apply locations Ireland United Kingdom time type Full time posted on Posted 4 Days Ago job requisition id REQ - 04040 - It's fun to work in a company where people truly BELIEVE in what they're doing! We're committed to bringing passion and customer … focus to the business. Job Description SENIOR SITERELIABILITY ENGINEER WHAT YOU'LL DO: Collaborate with software development teams to enable reliable and stable software services Develop software solutions that enhance the reliability and performance of services Optimize software release and deployment of ABC systems and cloud infrastructure in AWS Be an advocate for availability, reliability and scalability practices Define and adhere to Service Level Objectives and adhere to standard processes Enable product engineering teams through support of automated deployment pipelines Collaborate with product development as an advocate for scalable architectural approaches Advocate for infrastructure and application security practices in the development process Respond to production incidents in a balanced rotation with other SREs More ❯
Cheltenham, Gloucestershire, South West, United Kingdom
Oscar Associates (UK) Limited
SiteReliability Engineer | Cheltenham | £600 per day (Outside IR35) About the Role: We're seeking an experienced Site … Reliability Engineer with live eDV clearance to join an on-site team in Cheltenham. This contract role involves supporting and maintaining a managed cross-domain service, applying SRE practices to ensure reliability, security, and performance. Contract Details: Employment: Contract (Outside IR35) Rate: £600 per day Length: 6 months (long-term extensions very likely). Location: Cheltenham … days on-site) Clearance: Live eDV required Start Date: ASAP Key Responsibilities: Build and deploy code using Java, Maven, NPM, Terraform, and Ansible across OpenShift, RHEL/CentOS, and Docker. Monitor and optimise system performance with Influx and Grafana. Provide 2nd/3rd line support, incident response, and root cause analysis. Carry out BAU maintenance including patching, database housekeeping More ❯
Role Overview: We are seeking a highly skilled and motivated SiteReliability Engineer (SRE) to join our engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our … and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability … tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying More ❯
CDW. JOB TITLE: Senior Automation Engineer II DEPARTMENT: DevOps Engineer ROLE PURPOSE: This role is to design, build, and scale enterprise cloud platforms with a strong focus on automation, reliability, and developer experience. As part of the Cloud Infrastructure & DevOps team, you will build multi-cloud infrastructure that powers hundreds of production services, including critical Salesforce DevOps pipelines. You … environments. Drive infrastructure compliance, DevSecOps, and policy-as-code practices. KNOWLEDGE, SKILLS AND EXPERIENCE: Minimum 5 years of experience in Platform Engineering, SiteReliabilityEngineering (SRE), or DevOps roles supporting cloud-native enterprise environments Proficient in Microsoft Azure and AWS platforms with hands-on experience in Kubernetes (AKS/EKS), Helm charts, and service mesh technologies … or HashiCorp Terraform Associate are advantageous Strong interpersonal skills including clear communication, collaboration across teams, adaptability in fast-paced environments, and a proactive mindset with a focus on reliability, performance, and developer enablement We make technology work so people can do great things. CDW is a leading multi-brand provider of information technology solutions to business, government, education and More ❯
an essential role in supporting AWS public cloud infrastructure while championing automation through Infrastructure as Code solutions such as Terraform. Your day-to-day activities will involve collaborating with SRE and engineering teams to enhance system observability, proactively managing operational risks, maintaining high standards of security compliance, and ensuring robust disaster recovery capabilities. You will be responsible for documenting … Maintain the reliability and security of cloud environments by implementing robust monitoring tools and adhering to industry best practices.* Enhance observability and telemetry within cloud-hosted environments using SRE methodologies to deliver on Service Level Agreements (SLAs), Objectives (SLOs), and Indicators (SLIs).* Document and regularly review operational risks within the cloud environment, ensuring that identified issues are tracked … for all cloud-hosted services through effective backup strategies and disaster recovery processes, including planning and conducting quarterly DR tests.* Collaborate closely with SiteReliabilityEngineering (SRE) and engineering teams to ensure optimal management of the cloud environment.* Support asset management processes throughout their lifecycle, ensuring compliance with end-of-service (EOS) and end-of-life More ❯
SiteReliability Engineer - Barcelona Join a leading global travel technology company that's transforming the way businesses manage travel. You will be working with cutting-edge platforms combined with world-class travel inventory with powerful management tools, delivering freedom for … travellers and control for companies, saving time, money,and hassle for everyone. Our client are an award-winning scale-up company seeking a talented SiteReliability Engineer (SRE) to help ensure our systems are fast, reliable, and ready to scale. What is in t for you: Salary up to €85,000plus equity in the company. Generous holiday allowance … company events. Mental health support tools. Significant opportunities for growth and progression. The role: Design, build, and maintain scalable, secure cloud infrastructure in AWS. Monitor and manage system performance, reliability, and security. Implement and refine monitoring tools to ensure system health and availability. Partner with development teams to create resilient, secure applications. Participate in our on-call rotation, responding More ❯
has helped build some of the world's largest companies. Our team in London is growing and we're looking for talented people to join us on our journey Engineering at Duffel We're building tools to simplify travel distribution, search and booking. What does this actually mean? It's one common and seamless API. This brings huge technical … experience to go with it. The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform. SiteReliabilityEngineering at Duffel As an SRE at Duffel, you'll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will … be working closely with engineering teams to understand their needs and help meet the demands of our product as we scale globally. What we're looking for - An infrastructure and systems engineering generalist who is comfortable diving deep into the weeds on different issues. Some recent examples include: - A configuration issue between Google's Load Balancer and the More ❯