London, England, United Kingdom Hybrid / WFH Options
Client Server
Lead SiteReliabilityEngineerSRE Java - FinTech Lead SiteReliabilityEngineerSRE Java - FinTech Get AI-powered advice on this job and more exclusive features. This range is provided by Client Server. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range Direct … message the job poster from Client Server Team Lead (DevOps & Cyber Security) at Client Server Lead SiteReliabilityEngineer/SRE (Java) London/WFH to £130k Are you a SiteReliability Engineering technologist with a Java software engineering background seeking a role where you can make the technology choices, influence strategy and remain hands … been consistently voted as one of the UKs top employers. As a Lead SiteReliabilityEngineer you will focus on improving and raising the bar for SRE operations across the firm. You establish SLOs, leveraging public cloud, containerisation, reliability testing and observability, liaising with business stakeholders to establish the product roadmap and providing technical leadership to More ❯
SiteReliabilityEngineer (SRE) - Payments London, England, United Kingdom Software and Services Description SRE and Engineering Operations Engineers in the team take part in every aspect of the software development lifecycle. We work in a fast-paced environment and are responsible for hands-on coding of critical system components. We have constructive design discussions, learn from each … optimization, analytical problem solving, and analytical thinking skills. Experience building systems both on-premise (data center) and on public cloud (AWS, GCP, or Azure welcome). Understanding of core SRE concepts - Monitoring, Alerting, Incident management. Preferred Qualifications Expertise with container platforms (e.g. Docker, or similar). Experience in presenting complex technical concepts to both technical and non-technical stakeholders. Proven More ❯
Altinity is looking for a great Cloud Service SiteReliabilityEngineer to work on ClickHouse, the hottest analytic database on the planet. ClickHouse now has more contributors than ElasticSearch, previously the biggest open-source analytic project on GitHub. We're looking to hire even more.Altinity is a distributed company that values employees, open-source, and doing the … right things for customers. As a Cloud Service SiteReliabilityEngineer you will be helping us build out Altinity.Cloud, an enterprise ready, cloud service for managed ClickHouse. Here's how to tell if you fit: You are interested in all things cloud and understand the plumbing that makes cloud applications work. You know how to deploy and … operate public-facing, container-based services. You work easily with remote engineers. You have outstanding skills in site operation, including: Proven operational skills on Kubernetes and public clouds including AWS Native fluency in Golang with Python a plus Outstanding knowledge of networking (including DNS, load-balancers, peering), storage, and compute Experienced at automating service deployment using CI/CD More ❯
and known for consistent success and impressive profitability. With continued growth across the firm, they are now looking to expand their world-class engineering team by hiring an experienced SiteReliabilityEngineer to help design, optimise and maintain their global trading infrastructure. (FYI: the base salary advertised does not include cash bonuses, paid bi-annually. Your total More ❯
London, England, United Kingdom Hybrid / WFH Options
xAI
SiteReliabilityEngineer (SRE) - grok.com & API London, UK About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We … the field, please tell us the name of your employer. If you are currently employed in the field, please tell us your role including your seniority level (e.g. Software Engineer II). LinkedIn Profile If you have a public LinkedIn profile, please provide its URL. X Profile If you have a public X profile, please provide its URL. If More ❯
Crawley, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Job Opportunity: SiteReliabilityEngineer (SRE) Are you an Azure DevOps/SRE professional looking for your next opportunity? Do you have a passion for ensuring application reliability and performance? Do you thrive in a collaborative, high-impact environment? If yes, this could be your next big opportunity! Our client, a leading provider of financial services … to join their team on a permanent basis. Responsibilities: Managing incidents and post-mortems for on-premises and cloud applications. Monitoring performance using modern tools and implementing automation. Driving SRE and DevOps best practices. Supporting releases with minimal downtime. Key Skills & Experience: Experience in SRE, IT operations, software development, or DevOps. Familiarity with CI/CD, IaC, Agile, and ITIL … frameworks. Proficiency in Azure Monitor, Application Insights, KQL, and incident management. Hands-on experience with YAML pipelines. Experience with Bicep, SolarWinds, Terraform, and PowerShell. Interested in joining a growing SRE team focused on automation and reliability? Click Apply now or send your CV to email@domain.com . This role offers hybrid working with one day a week in the More ❯
London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Social network you want to login/join with: SiteReliabilityEngineer (SRE) Lead – Observability Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll … champion best practices across product teams, drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application … and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. Solid More ❯
Slough, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Social network you want to login/join with: SiteReliabilityEngineer (SRE) Lead – Observability Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll … champion best practices across product teams, drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application … and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. Solid More ❯
Crawley, England, United Kingdom Hybrid / WFH Options
James Chase
Engineer with a passion for leadership and AWS innovation? We’re partnering with a high-growth technology company that is seeking a SiteReliability Engineering (SRE) Team Lead/Technical Lead to join their world-class engineering function. This is not your average technical leadership role — you’ll be driving strategic reliability initiatives, shaping cloud … practices, and leading a team of talented SREs committed to automation, scalability, and operational excellence. What You’ll Be Doing Lead, coach, and grow a high-performing DevOps/SRE team. Define and execute the SRE strategy to support scalability, performance, and resilience across critical systems. Own and evolve the AWS infrastructure – think EC2, RDS, ECS, Fargate, IAM, VPC and … Prometheus, and Datadog. Act as a technical mentor and thought leader within both your team and the broader engineering organisation. What We’re Looking For: Proven leadership experience within SRE, DevOps, or Infrastructure teams. Hands-on mastery of AWS services and cloud-native design patterns (microservices, containers, serverless). Proficient in Ansible (Terraform knowledge is a strong advantage). Strategic More ❯
Lead SiteReliabilityEngineer Central London (Hybrid) Up to £95k + Car Allowance & Bonus TRIA are working with a leading hospitality client for a Lead SRE, where they are investing heavily in the performance, stability, and reliability of its digital platforms. This is a hands-on leadership role - you won’t just guide others, you’ll … uptime The stack includes Kubernetes , Terraform , AWS , Python , and modern CI/CD tools, and it's evolving. If you're confident in a crisis, understand what a good SRE practice looks like, and want to leave systems in a better place than you found them, please apply to be considered and learn more! What you’ll bring : Experience in … high-traffic digital or eCommerce platforms 5+ years in SRE/DevOps roles; strong background in incident response Observability, automation, and infrastructure as code expertise Leadership skills - mentoring others or leading from the front More ❯
Lead SiteReliabilityEngineer Central London (Hybrid) Up to £95k + Car Allowance & Bonus TRIA are working with a leading hospitality client for a Lead SRE, where they are investing heavily in the performance, stability, and reliability of its digital platforms. This is a hands-on leadership role - you won’t just guide others, you’ll … uptime The stack includes Kubernetes , Terraform , AWS , Python , and modern CI/CD tools, and it's evolving. If you're confident in a crisis, understand what a good SRE practice looks like, and want to leave systems in a better place than you found them, please apply to be considered and learn more! What you’ll bring : Experience in … high-traffic digital or eCommerce platforms 5+ years in SRE/DevOps roles; strong background in incident response Observability, automation, and infrastructure as code expertise Leadership skills - mentoring others or leading from the front More ❯
Crawley, England, United Kingdom Hybrid / WFH Options
Manor Royal Business District
Vacancy Name: SiteReliabilityEngineer Vacancy No: VN1607 Employment Type: Full-Time Primary Work Location: People's Partnership - Manhattan Building, Crawley Description: SiteReliabilityEngineer About People’s Partnership: At the heart of our not-for-profit organisation is a commitment and a motivation to make the future-saving experience a simple one for … to reduce toil, and improve availability, reliability, security, and velocity. Maintain effective feedback loops so that findings can be prioritised and acted upon in a timely fashion. Follow SRE and DevOps core principles to drive adoption and utilisation. What we’re looking for: Strong background in one or more of the following areas: SRE/application support/IT …/software development/DevOps. Experience working within both Agile and ITIL frameworks. Experience working with DevOps principles and concepts such as CI/CD and IaC. Experience of SRE environments and processes specifically in the areas of availability, incident management and monitoring. Knowledge of scripting languages and desired state configuration such as Bicep or Terraform, and PowerShell. Experience using More ❯
Halian Technology looking for a talented and driven SiteReliabilityEngineer (SRE) to join our growing technology team. In this role, youll ensure the reliability, scalability, and performance of our digital platforms that support memorable customer experiences across the hospitality sector. Youll work alongside our engineering, product, and infrastructure teams to build high-availability systems and … automated operations that support the future of digital hospitality. Key Responsibilities: Drive system reliability, availability, and performance through engineering excellence. Design and implement monitoring, alerting, and observability tools using platforms like Datadog. Automate operational tasks using scripting, Infrastructure as Code (IaC), and configuration management tools. Troubleshoot incidents, lead root cause analysis, and improve Mean Time to Resolution (MTTR). … infrastructure meets security and compliance standards. Optimise system resources for both performance and cost-effectiveness. Contribute to incident response and participate in on-call rotations. Track and improve key SRE metrics such as error rates, incident count, and monitoring coverage. What Youll Bring: 3+ years of experience in SiteReliability Engineering, DevOps, or equivalent roles. Strong skills in More ❯
London, England, United Kingdom Hybrid / WFH Options
Dayforce
Engineering team at Dayforce, where we lead the charge in ensuring our state-of-the-art products set new benchmarks in scalability, availability, and reliability. We embrace the SRE engagement model to deliver exceptional performance. As a member of our team, you'll help build and maintain a suite of internal tools that proactively alert, report, and autonomously remediate … about Dayforce’s cloud infrastructure and the applications that run on them to build a full mental model of how the Dayforce ecosystem works. Work on projects to improve SRE processes. Participate in incidents, assist in investigating root cause, and help remediate Dayforce environment issues. Create runbooks using existing components. Develop trusted relationships with all parts of Dayforce’s business. … PagerDuty On-Call rotations as required. Skills And Experience We Value Self-starter and passionate individual willing to learn new concepts and technologies as well as contributing to the SRE powered ecosystem. Experience with at least one object-oriented programming language (C# and Java preferred). Experience with at least one scripting language (Python and PowerShell preferred). Experience with More ❯
Social network you want to login/join with: Senior SiteReliabilityEngineer, London col-narrow-left Client: Leap29 Location: London, United Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Reference: dc49efa9d6ed Job Views: 5 Posted: 25.06.2025 Expiry Date: 09.08.2025 col-wide Job Description: Senior SiteReliabilityEngineer, observability … borderless network. Arming their clients with a precise understanding of how the network impacts their applications, users and customers. This role will be a unique opportunity for an experienced SRE to provide the tools, services, and infrastructure to monitor and observe the Platform. Leveraging cloud native tools and enabling the developers to instrument, analyse, and monitor the application. Permanent position More ❯
environment where software engineers and infrastructure specialists collaborate seamlessly? If so, this opportunity is for you! We’re looking for a mid-level SiteReliabilityEngineer (SRE) to join our User Apps Department and play a vital role in supporting the world’s most technologically advanced controlled environment agricultural system and its spin-off technologies. What You … software engineers to self-serve on feature delivery *Embedding within teams and projects as the subject matter expert in infrastructure and software delivery *Enhancing our monitoring suite to make reliability and debugging a shared responsibility across all teams *Exploring and evaluating new tools and technologies, driving proofs of concept, and shaping best practices *Participating in the out-of-hours … system resilience and reliability Who We’re Looking For: You’ll have experience working as an embedded engineer within a team , while also engaging in the broader SRE practice. Ideally, you have a strong foundation in both software engineering and infrastructure , allowing you to empathise with challenges on both sides. Our stack includes Azure and Kubernetes , with a More ❯
Job Description SiteReliabilityEngineer Exciting opportunity to join a growing technical leader, in a specialist technical capacity Hybrid based position (2 days a week on site) Salary up to £60,000 Central Manchester based client Based out of our revamped central Manchester office, you will join at an exciting time for our organisation, where we … with everything that happens Utilize a wide range of technologies like Terraform, AWS/GCP, Splunk, New Relic, Grafana, Python, and Golang We need you to have Experience in SRE/DevOps focused positions An appreciation of the Software Delivery lifecycle A finger on the pulse for the latest technologies and trends To be Considered Please apply by clicking online More ❯
London, England, United Kingdom Hybrid / WFH Options
Prima
Join to apply for the Senior SiteReliabilityEngineer role at Prima Join to apply for the Senior SiteReliabilityEngineer role at Prima Is it a thrilling opportunity in a dynamic environment which is constantly evolving what you are currently looking for? Are you curious to see how a company that operates in … health and well-being is, so we’ll go the extra mile to help you when we can. We are seeking an experienced SiteReliabilityEngineer (SRE) to join our Infrastructure team. As an SRE, your primary responsibility will be to ensure the reliability, availability, and performance of our technology platforms. You will support the Software … the implementation and maintenance of best security practices, participating in vulnerability assessments, and threat mitigation. Requirements Deep understanding and experience in SiteReliability Engineering and in implementing SRE Practices Excellent knowledge of AWS services and hands-on experience in production environments Proficiency with networking protocols, DNS principles, and container orchestration technologies (Kubernetes, Helm) Experience in proposing, discussing, and More ❯
Social network you want to login/join with: SiteReliabilityEngineer Senior Lead, Slough col-narrow-left Client: Mars Location: Slough, United Kingdom Job Category: Other - EU … work permit required: Yes col-narrow-right Job Reference: 202520eca4d8 Job Views: 2 Posted: 02.06.2025 Expiry Date: 17.07.2025 col-wide Job Description: Job Description: The Systems Reliability Engineering (SRE) Senior Lead is a pivotal leader within our organization, responsible for ensuring the reliability, performance, and scalability of our critical systems. This role is instrumental in strategizing and overseeing … s degree in Information Technology, Computer Science, Business Management, or a related field 7+ years of experience in IT departments or a relevant field 3+ years in a leadership, SRE, DevOps, or systems engineering role. A seasoned professional with a deep understanding of SiteReliability Engineering (SRE) principles, DevOps best practices, and cutting-edge technologies. Strong analytical, interpersonal More ❯
and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. Our team is globally located, focused on ensuring production stability, automation, reliability, and observability. We are looking for solution-oriented, commercially minded, customer-focused individuals used to working in an agile environment who want to be part of building something new … eager to expand their skills while working on an exciting new venture. Your work will have a significant impact on our company, clients, and business partners worldwide. As a SiteReliabilityEngineer III at JPMorgan Chase within the Corporate Technology - Market Risk, you will address complex business problems with simple solutions. Using code and cloud infrastructure, you … and engineers. Design and implement deployment strategies using automated CI/CD pipelines. Implement infrastructure, configuration, and network as code. Understand SLIs and SLOs to proactively resolve issues, supporting SRE best practices. Minimum Qualifications Formal training or certification in SRE concepts. Proficiency in at least one programming language such as Python. Experience with a technology stack involving software design, coding More ❯
London, England, United Kingdom Hybrid / WFH Options
Orgvue
Principal SiteReliabilityEngineer, London Client: Orgvue Location: London, United Kingdom Job Category: Other - EU work permit required: Yes Job Reference: 465704a68d8a Job Views: 37 Posted: 22.06.2025 Expiry Date: 06.08.2025 Job Description: Orgvue is an organisational design and planning platform that empowers your business to transform its workforce by understanding the work people do and the skills … teams. Responsibilities Define and enforce SLOs, SLIs, and error budgets across critical services Crafting and implementing a cloud infrastructure and tooling strategy Work across our Org to level up SRE practices Help implement robust observability metrics, logs & traces using our observability tool Guide the team in building automated, self-healing systems Own and evolve our incident response processes, including on … scalability, and operational excellence Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform Requirements Desired Skills & Experience: Demonstrable experience leading SRE transformations Deep hands-on expertise with Kubernetes (EKS preferred) in production environments Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Expert in More ❯
Senior SiteReliability/Gitops Engineer Join to apply for the Senior SiteReliability/Gitops Engineer role at Canonical . Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in enterprise initiatives such as public cloud, data … and growing company structure, employing over 1200 colleagues across 75+ countries with few office-based roles, emphasizing global collaboration and periodic in-person meetings. We are hiring a Senior SiteReliability/Gitops Engineer for our Information Systems (IS) team. This role is ideal for an automation-focused senior technologist passionate about Linux, eager to develop their … success of Ubuntu and open source products. Job Summary The IS team supports and maintains Canonical’s IT production services, serving over 60 million Ubuntu users. As a Senior SRE & Gitops Engineer, you will drive operations automation in both private and public clouds, utilizing open source infrastructure as code tools, CI/CD practices, and Canonical’s automation products. More ❯
Global SiteReliabilityEngineer Location: London About Us Founded in 2013, GSR is a leading market maker and programmatic trading firm in the fast-evolving world of cryptocurrency trading. With over 200 employees across seven countries, we provide billions of dollars in liquidity daily to cryptocurrency protocols and exchanges. We build long-term relationships with crypto communities … GSR is an opportunity to be deeply embedded in every major sector of the cryptocurrency ecosystem. About the Role We are seeking a SiteReliabilityEngineer (SRE) to design, optimize, and support highly available systems across our global trading infrastructure. As part of GSR's SRE team, you will manage a multi-regional cloud environment while integrating … work across all layers of infrastructure, including: Networking & Exchange Connectivity Linux Systems & Kubernetes Administration Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction, and enhance developer velocity through better tooling, CI/CD, and infrastructure More ❯
scale across all three major cloud providers, Aura runs hundreds of Kubernetes clusters and hosts thousands of Neo4j instances in production at any given time. We’re reshaping what SRE means at Neo4j Aura—and we want you to be part of that journey. Rather than firefighting or chasing alerts, we’re helping teams design for reliability from day … one. That means building the tools, practices, and culture that embed SRE principles at the heart of how Aura operates. You’ll be joining a team focused on long-term resilience, engineering excellence, and meaningful collaboration with product teams. The Role Automate for insight and scale: Build systems that make troubleshooting fast, safe, and scalable across thousands of Neo4j instances. … tools and automation in Go—our primary language—with an emphasis on sound architecture, testing, and maintainability. Strong software skills in other languages, like Python, are also welcome. Applying SRE practices in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering. Collaborating with other teams to promote SRE thinking—educating on principles More ❯
scale across all three major cloud providers, Aura runs hundreds of Kubernetes clusters and hosts thousands of Neo4j instances in production at any given time. We’re reshaping what SRE means at Neo4j Aura—and we want you to be part of that journey. Rather than firefighting or chasing alerts, we’re helping teams design for reliability from day … one. That means building the tools, practices, and culture that embed SRE principles at the heart of how Aura operates. You’ll be joining a team focused on long-term resilience, engineering excellence, and meaningful collaboration with product teams. The Role Automate for insight and scale: Build systems that make troubleshooting fast, safe, and scalable across thousands of Neo4j instances. … tools and automation in Go—our primary language—with an emphasis on sound architecture, testing, and maintainability. Strong software skills in other languages, like Python, are also welcome. Applying SRE practices in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering. Collaborating with other teams to promote SRE thinking—educating on principles More ❯