SiteReliabilityEngineering (SRE) Manager page is loaded SiteReliabilityEngineering (SRE) Managerlocations: London, UKtime type: Full timeposted on: Posted Todayjob requisition id: R35765As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial … name a few. SiteReliability Manager Locations : London, Surbiton, Essex Hybrid The Opportunity We are seeking a highly motivated and experienced SiteReliabilityEngineering (SRE) Manager to lead a team of SREs responsible for the reliability, scalability, and performance of our production systems. This role is pivotal in bridging the gap between development and … direction of infrastructure and reliability initiatives. Advocate for best practices in observability, CI/CD, and infrastructure as code. What You Will Bring: Proven experience managing or leading SRE, DevOps, or infrastructure teams. Strong background in systems engineering, cloud platforms (AWS, Azure), and container orchestration (Kubernetes). Proficiency in monitoring, alerting, and incident management tools (Prometheus, Grafana, PagerDuty More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
SiteReliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits) Clearance: Must be eligible for DV Clearance Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site … and remotely, supporting mission-critical systems where performance and reliability are paramount. The SiteReliability Engineer Role: We are seeking a SiteReliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and … proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the SiteReliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks and eliminate unnecessary alerts. Enhance monitoring capabilities to identify and resolve issues before they impact users. Support and optimise CI/CD pipelines and cloud infrastructure. Research and evaluate More ❯
richmond, virginia, united states Hybrid / WFH Options
CarMax
and prevention of issues. Define and measure service level objectives (SLO/SLI) and key performance indicators (KPI). Identify patterns in issues and drive plans to improve the reliability of the platforms and solutions. Direct teams in the definition and maintenance of the data platform and solutions to ensure that the critical business insights and end analyst(s … (Enterprise Data Lake/Data Warehouse), Cloud Computing, Systems Engineering, Master Data Management (MDM), Machine Learning (ML) Engineering, Infrastructure & Operations, SiteReliabilityEngineering (SRE), Data Governance 5+ years experience in managing direct reports 5+ years of working experience in leading the end-to-end design, development and support of data management disciplines including [data … production support 5+ years’ experience building enterprise-grade solutions with Microsoft Azure or equivalent cloud technologies Experience in DevOps, version control systems (Git), SiteReliabilityEngineering (SRE) practices, testing frameworks, Infrastructure as Code, building CI/CD pipelines (preferably Azure DevOps), scripting languages such as Python and shell scripts and working in an Agile/Scrum setting. More ❯
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom Hybrid / WFH Options
Ascendion
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of Payments systems More ❯
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with SiteReliabilityEngineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). SiteReliabilityEngineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
San Jose, California, United States Hybrid / WFH Options
JPS Tech Solutions LLC
in delivering tailored staffing solutions and empowering professionals to achieve their career goals. Important: These roles are strictly for W2 candidates (No C2C or third-party submissions allowed Role: SiteReliability Engineer Location: Sanjose , CA (Hybrid) Position Type: Contract on … W2 Only Experience Required: 10+ Years of Experience Eligible Visas: USC/GC/H4-EAD Job Description: We are looking for a talented SiteReliability Engineer (SRE) with a strong background in Google Cloud Platform (Google Cloud Platform), and RedHat OpenShift administration. The ideal candidate will be responsible for ensuring the reliability, performance, and scalability of … our on-premise and cloud-based systems along with focus on reducing costs for Google Cloud. System Reliability:Ensure the reliability and uptime of critical services and infrastructure. Google Cloud Expertise:Design, implement, and manage cloud infrastructure using Google Cloud services. Automation:Develop and maintain automation scripts and tools to improve system efficiency and reduce manual intervention. Monitoring More ❯
Farnborough, Hampshire, England, United Kingdom Hybrid / WFH Options
Addition
SiteReliability Engineer (Defence) This is a chance to join a forward-thinking digital solutions business delivering secure technology for the Defence and Security sector. As a SiteReliability Engineer, you’ll be at the heart of building, scaling, and maintaining critical platforms that underpin mission-ready technology. Role Overview: Role: SiteReliability Engineer … Security What You’ll Be Doing: Designing and maintaining Kubernetes environments for scalable deployments. Building and optimising CI/CD pipelines to improve efficiency. Implementing monitoring systems to ensure reliability and performance. Driving automation initiatives to reduce manual processes. Managing repositories and version control for seamless collaboration. Partnering with development teams to align platform capabilities with requirements. Supporting long … in security, maintainability, and scalability. Staying ahead of emerging technologies to keep the platform cutting-edge. Main Skills Needed: Applications must be eligible for Security Clearance. Proven experience in SiteReliability or Platform Engineering (5+ years). Strong knowledge of Kubernetes and container orchestration. Expertise in CI/CD tools (Jenkins, GitLab, etc.). Experience with AWS More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Freemans Grattan Holdings (fgh)
capabilities and optimise and enhance our customer journey. Working collaboratively with a team of transformation experts you will have the flexibility to leverage your professional experience to solve computer engineering issues across a variety of technical areas, dependent on where your interests lie. Innovation is key as we look for new ideas which will improve the customer experience and … centre 5+ years of experience in a DevOps, or SiteReliabilityEngineering building high-traffic, high availability systems. Experience with sitereliabilityengineering (SRE) principles and monitoring tools, including New Relic. Experience in website performance monitoring and tuning using tools such as Lighthouse and the ability to troubleshoot performance issues. Proficiency in CI/ More ❯
Lisburn, County Antrim, United Kingdom Hybrid / WFH Options
Camlin
industries, including power and rail, and also has interests in a number of R&D projects in a variety of scientific sectors. At Camlin we believe in high quality engineering and design, allowing us to develop market leading products and services. In short, we love creating value for our customers by solving … difficult problems. As of today, the Camlin operation spans over 20 countries across the globe. Job Overview We are seeking a dedicated and experienced SiteReliability Engineer (SRE) to join our dynamic team. The SRE will be responsible for ensuring the reliability, performance, and availability of our critical systems and services. This role requires a blend of … software engineering and operations skills to build and run large-scale, distributed, fault-tolerant systems. Key Responsibilities System Reliability and Performance Design, implement, and maintain scalable and reliable infrastructure. Monitor system performance, detect issues, and ensure maximum uptime. Develop and implement strategies for disaster recovery and data backup. Automation and Tooling Automate repetitive tasks to improve efficiency and More ❯
Washington, Washington DC, United States Hybrid / WFH Options
ClearanceJobs
Remote - SiteReliability Engineer (SRE) ClearanceJobs is aiding their partner, headquartered in New York City and widely recognized as the industry leader in CPS protection, in their search for a skilled SiteReliability Engineer (SRE). The selected candidate will support and maintain our customers' FedRAMP- compliant deployment in AWS GovCloud for public sector customers. The … SRE will be responsible for ensuring high availability, security, and compliance of cloud-based environments while driving automation, monitoring, and incident response best practices. U.S. Citizenship (required for working in GovCloud environments) Terms: Fulltime/Direct Hire Location: Remote (DMV area) Salary: $200k - $260k (will fluctuate pending experience) Qualifications: • 6-8+ years of experience in SRE, DevOps, or Cloud … and scripting (Python, Bash). • Experience with logging, monitoring, and observability tools in a cloud-native environment. • Strong troubleshooting, problem-solving, and automation mindset. Responsibilities/Impact as a SRE: • AWS GovCloud Operations: Manage and optimize cloud-based infrastructure in AWS GovCloud, ensuring FedRAMP compliance and high availability. • Reliability & Performance: Monitor and enhance system performance, scalability, and reliabilityMore ❯
Gloucester, Gloucestershire, UK Hybrid / WFH Options
CGI
We work, build, and operate bespoke, technically complex, mission-critical systems which help our clients keep us all safe and secure. We are currently looking for an experienced sire reliability engineers to join our cross-functional team who, in partnership with our clients, will help define, guide and assure the delivery of integrated solutions. The role offers fantastic opportunities … ELK stack, Terraform, Grafana, Sonarqube, Openshift, Linux Required qualifications to be successful in this role Proven experience in SiteReliabilityEngineering or a similar DevOps/SRE role supporting cloud-based applications. Strong scripting and automation skills using Bash, Python, or Go. Experience with CI/CD pipelines and tools such as Jenkins, GitLab CI, and Ansible. … on big data projects is highly advantageous. Qualifications: Degree in Computer Science, Engineering, or related technical field (or equivalent practical experience). Relevant certifications in AWS, DevOps, or SRE practices are a plus. #LI-JS2 Together, as owners, let’s turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you’ll More ❯
Role Overview: We are seeking a highly skilled and motivated SiteReliability Engineer (SRE) to join our engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our … and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability … tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying More ❯
Overview SiteReliability Engineer - Global Network Services Transformation A leading financial technology organisation is embarking on an exciting journey to transform its Network Services Group , and they're now seeking a SiteReliability Engineer to join their growing team. This opportunity is perfect for someone who thrives at the intersection of software engineering and infrastructure … reliability . The successful candidate will design, develop, and maintain self-service automation tools that drive efficiency, reduce costs, and improve resilience across one of the world's most sophisticated network infrastructures. Working with colleagues across the US, UK, India, and Singapore , this engineer will play a pivotal role in advancing the company's automation-first approach, deploying microservices … JavaScript/jQuery/HTML5/CSS is highly desirable. Familiar with Terraform or similar IaC tools. Comfortable in Linux environments; confident using VSCode . Strong grounding in software engineering practices and DevOps culture. Excellent communicator with analytical problem-solving skills. Experience in networking or security automation is a distinct advantage. Personal Qualities Proactive, with a problem-solving mindset. More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
esure Group
Reliability Engineer to join our Tech Enable team. As a Lead Engineer for SiteReliability, you must demonstrate various skills to effectively lead and engage in SRE practices. The successful candidate will act as a point of escalation for critical issues, applying technical expertise to promptly address complex problems in collaboration with additional teams. What you’ll … do: Serve as the SRE Lead's backup, assuming leadership duties when necessary to maintain the continuity and efficiency of SRE operations. Provide day-to-day guidance, support, and informed decision-making for the team, maintaining stability and direction. Serve as a subject matter expert, shaping technical direction, leading initiatives, and mentoring colleagues to build team capability. Stay up to … date with emerging technologies and industry trends, sharing knowledge across company communities to embed SRE best practice. Drive continual improvement by automating manual processes and optimising monitoring systems to achieve full estate coverage. Lead initiatives to improve availability, performance, and scalability through proactive monitoring, capacity planning, and ongoing maintenance. Collaborate with development squads to embed monitoring, reliability, and scalability More ❯
Nottingham, Nottinghamshire, United Kingdom Hybrid / WFH Options
Commify Group
us and be part of our success story! Role Summary In the role of SiteReliability Engineer at Commify, you will be an integral part of our SRE team. Your focus will be on ensuring that our products and platforms perform at their best, understanding how our software interacts with both physical and Cloud infrastructure to deliver exceptional … Maintaining high levels of system performance through monitoring and performance tuning Implementing scalability and fault tolerance Automating processes and improving operational efficiencies Troubleshooting application and middleware challenges Collaborating with engineering teams to support high-throughput production environments Building and maintaining robust deployment pipelines What essentials are we looking for? Proficiency with Microsoft Azure Strong expertise in Terraform, App Services … and Kubernetes Fluent in both written and spoken English A genuine passion for reliability in systems Experience in creating and modifying Terraform deployments Prior experience in an operations role, ideally as a SiteReliability Engineer Ability to work cross-functionally, take ownership of tasks, and prioritize effectively Excellent communication and collaboration skills Experience with monitoring solutions (e.g. More ❯
software-defined networking principles. Embed zero-trust principles and user-centric design into all remote connectivity services. Align remote connectivity architecture with broader enterprise network, security, and cloud strategies. Engineering & Operations: Lead the engineering, deployment, and lifecycle management of remote access solutions such as Cisco AnyConnect, Zscaler, and other mainstream VPN … platforms. Drive automation of remote access provisioning, policy enforcement, and configuration management through Infrastructure as Code (IaC) and zero-touch deployment practices. Apply SiteReliabilityEngineering (SRE) principles to improve performance, availability, and troubleshooting. Establish observability practices across all access points with real-time metrics, logs, and telemetry. Security, Compliance & Governance: Ensure compliance with corporate security and … segmentation, and endpoint-based access control. Proven ability to scale remote connectivity solutions to tens of thousands of users and devices. Experience with IaC, network automation, observability tooling, and SRE methodologies. Preferred Qualifications: Certifications such as CCNP, CCIE, PCNSE, Zscaler Certified, or equivalent. Familiarity with secure hybrid work and cloud networking models. Background in network performance optimization, user-centric design More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Holland & Barrett International Limited
want to hear from you! Key Responsibilities: Security Strategy: Help define and execute the Holland & Barrett cloud security strategy, partnering with platform and SiteReliabilityEngineering (SRE) teams to build robust infrastructure that supports our business. Perimeter Security: Establish platform perimeter security by implementing controls at ingress and egress points, including creating and maintaining an edge network More ❯
Washington, Washington DC, United States Hybrid / WFH Options
OMW Consulting
Job Title: SiteReliability Engineer (SRE) Location: Washington, DC - Hybrid Clearance: TS/SCI Salary: $160k-$200k Join a dynamic team dedicated to delivering best-in-class service quality and issue resolution for mission-critical deployments. In this role, you will be instrumental in shaping operational policies and implementations while working in both on-premise DoD environments and … various OSI model layers to meet SLAs. Collaborate with developers to maintain secure and efficient workflows. What We're Looking For: Minimum of 4 years of experience as an SRE engineer, with a strong focus on automation and deployment. Active security clearance with experience in DoD IT environments. Proficiency in VMware, Kubernetes, Docker, Helm, Ansible, and Terraform. Strong understanding of More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Digital Realty (UK) Limited
Position Title: SiteReliability Engineer, Interconnection Service and Network Delivery Location: Hybrid: Austin, Dallas, Boston, Ashburn, Atlanta, London, or Amsterdam Your role In this role, you will be responsible for deploying and maintaining all Digital Realty interconnection fabric network infrastructure. The ideal candidate can demonstrate a unique blend of network engineering, network operations, and software understanding through … the application of engineering principals. You will focus on delivering operational discipline and embrace key operational principals including automation, agile development, and scripting. What youll do You will be part of the global Fabric Engineering organization and work in tandem with other teams to build and maintain a global network infrastructure. Ideal candidates for this role will bring … an understanding carrier class network infrastructure as well as experience working in a fast-paced development environment. What youll need 5+ years of operations and engineering experience Bachelors degree in Computer Science (or equivalent) preferred Strong experience with automation tools (Ansible, Terraform, etc) Strong experience working with Linux systems and tools Experience with Python (or equivalent high-level language More ❯
want it to go. *** Applicants Must be solely UK National and already hold HMG HLC clearance *** Role Location: Gloucester or Manchester We are seeking a highly skilled and motivated SiteReliability Engineers to join our team. The ideal candidates will possess a good understanding of engineering principals, and broad understanding of full-stack software technologies, with hands … and cost optimisation (rightsizing, reserved instances, auto scaling). • Disaster Recovery & Business Continuity Planning Develop and test backup/DR strategies, restore drills, and self healing infrastructure to ensure reliability and uptime. • Collaboration & Knowledge Sharing Work closely with DevOps, development, security and operations teams; prepare architecture/design documents, network diagrams, runbooks and training materials. Required qualifications to be … encryption, audit logging, network isolation, and compliance frameworks. • Monitoring & Optimization Tools: Familiarity with CloudWatch, Grafana, Datadog, Prometheus, ELK or similar The position requires team members to work from client-site to ensure the reliability and availability of critical systems. Together, as owners, let’s turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect More ❯
the globe. What you'll do: As a SiteReliability Engineer at Zefr, you'll apply your expertise in cloud infrastructure, CI/CD, Observability, and core SRE concepts, to deliver high-quality, reliable, and scalable solutions. A significant aspect of this role involves working closely with Zefr's Engineering and Data Science teams ensuring the infrastructure … secure, resilient, scalable, and cost-efficient applications and systems/pipelines in AWS and GCP. Foster and push our DevOps culture and philosophy by encouraging continuous improvement across all engineering teams. Proactively maintain the health of production environments, including monitoring application performance and resource utilization. Participate in 24/7 on-call rotation, respond to system performance issues and … at the application and infrastructure level. Mature our CI/CD workflows and release process. Maintains a forward-thinking approach, actively researching and proposing new solutions. Propose and review Engineering Request for Comments (RFC) to drive Engineering architecture and practices. Technology Stack at Zefr: Core Infrastructure & Cloud Platforms: Cloud Providers: Google Cloud Platform (GCP), Amazon Web Services (AWS More ❯
Honolulu, Hawaii, United States Hybrid / WFH Options
OMW Consulting
Role - SiteReliability Engineer Location - Honolulu - Hybrid - 1-2 days a week on site Security … clearance - Minimum Secret - need this ahead of applying Salary - $150k-$200k + Equity I am partnered with a leading defense tech scale up who are looking to add an SRE to their team based in Hawaii. This role is hybrid with an expectation of 1-2 days on site in Honolulu, however there is some weeks where you will … not need to go on site at all. Due to the nature of the client you must hold an active secret clearance as a minimum ahead of applying for this position. To be considered for this position you must have experience with the following: Experience with Security Clearance and DoD IT Environment: You hold an active security clearance, are More ❯
Fleet, Hampshire, United Kingdom Hybrid / WFH Options
RVU Co UK
Staff Platform Engineer Department: Engineering Employment Type: Permanent Location: Fleet Description Hybrid - 2 Days per week in the Fleet office Tempcover Tempcover is at the forefront of the fast-growing world of short term insurance. Our mission is to make car insurance flexible, quick, and easy for drivers. We've sold millions of policies that have helped drivers get … ownership, empowerment and impact. Each Engineer plays an integral role in the development, delivery, maintenance, and support of our insurance-based systems, both public-facing and internal. The platform engineering team enables our engineers to quickly build and run safe, secure and cost effective systems in our public cloud. What you'll be doing As a Staff Platform Engineer … you'll be working as part of an agile team that provides services and tools to our internal engineering teams Suggest and drive change across the engineering team and both challenge and improve existing practices and systems Mentor and coach other engineers; helping them grow whilst fostering a strong engineering culture You'll be introducing new technologies More ❯
Hart, Yorkshire, United Kingdom Hybrid / WFH Options
RVU Co UK
Staff Platform Engineer Department: Engineering Employment Type: Permanent Location: Fleet Description Hybrid - 2 Days per week in the Fleet office Tempcover is at the forefront of the fast-growing world of short term insurance. Our mission is to make car insurance flexible, quick, and easy for drivers. We've sold millions of policies that have helped drivers get where … ownership, empowerment and impact. Each Engineer plays an integral role in the development, delivery, maintenance, and support of our insurance-based systems, both public-facing and internal. The platform engineering team enables our engineers to quickly build and run safe, secure and cost effective systems in our public cloud. What you'll be doing As a Staff Platform Engineer … you'll be working as part of an agile team that provides services and tools to our internal engineering teams Sug gest and drive change across the engineering team and both challenge and improve existing practices and systems Mentor and coach other engineers; helping them grow whilst fostering a strong engineering culture You'll be introducing new More ❯