open source contributions, including open sourcing internal tools, contributing to public repositories, and sponsoring conferences. Responsibilities As our first SiteReliabilityEngineer, you will help evolve SRE practices such as incident management, blameless postmortems, SLOs, and error budgets. Your role will involve building reliable, performant, auto-scalable, and highly available systems with support from the existing Platform … Infrastructure team. Enhance SRE practices across teams. Improve reliability KPIs of the platform. Balance reliability with feature delivery using SLOs and error budgets. Our engineering teams manage the entire lifecycle of services from initial development to high-load production operation. Your responsibility is to enable engineering teams to succeed in operations, not to run their services for them. … What you'll be working on Kick-start our SRE function by promoting reliability best practices and processes. Identify slow code paths in critical applications using tools like Java Flight Recorder or Go’s pprof. Develop or modify tools and applications with reliability and performance in mind. Ensure systems can handle ten times the current load by improving More ❯
contributions to public repositories, open sourcing in-house tools and sponsoring conferences. Responsibilities As our first SiteReliabilityEngineer, you will contribute to the evolution of SRE practices like incident management, blameless postmortems, SLOs and error budgets. You will contribute to building reliable, performant, auto-scalable and highly available systems. You will have support of the existing … Platform Infrastructure team. Leveling up of SRE practices across the teams. Improvement of reliability KPIs of the platform. Help balance reliability with feature delivery using SLOs and error budgets. Our engineering teams own the lifecycle of services from first commit to high-load operation in production. Your responsibility will be to help engineering teams succeed at operations, not … ll be working on Exposing slow running code paths in critical applications using tools like Java Flight Recorder or Go’s pprof. Writing tools or modifying existing applications with reliability and performance in mind. Ensuring our systems and their individual components can withstand x10 load by improving our performance testing. Shortening mean time to discovery and recovery with improvements More ❯
Reading, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
gaming studios in the industry and are seeking an exceptional SiteReliabilityEngineer to bring their expertise and innovative thinking to strengthen their team. As an SRE , the main purpose is to solve scalability issues through collaboration and automation, applying engineering principles to infrastructure and operational challenges. Work closely with various teams to improve manual tasks, operational More ❯
flexible remoteworking locations within UK/Europe) Employment type: Permanent Working Hours: Full time (9-6 UK) Salary: Up to £110K + Shares + Benefits TransFICC is hiring a SiteReliabilityEngineer to provide high-performance services to our customers. We develop an integration service … product that enables our clients to have a flexible, hosted service without requiring their internal resources to respond to connectivity challenges across trading venues. You will be joining our SRE team and contributing to TransFICC's automation culture. We are a multi-disciplinary team covering everything from desktop and laptop support to data centre provisioning of servers and vendor network … automated, so having experience with a software automation tool like Ansible and coding ability is a must. We are looking for someone experienced as a sys admin or network engineer; however, you must have a reasonable understanding of both. Constructive, open-minded and self-motivated. A belief in life learning, and an awareness of how much there still is More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Time to enhance your scope; broaden your horizon by delving into SiteReliability Engineering (SRE). You’ll take the skills you have picked up in software engineering and apply these to improve overall system and application performance and reliability. You’ll work on internal developer tooling, using modern programming languages such as Golang, Python or TypeScript - so More ❯
Social network you want to login/join with: One of our long standing global SaaS clients is making a key hire in the form of a SiteReliability Engineer. This business, with a multi-national reach, specialise in providing truly industry leading loyalty solutions to their retail and hospitality clients. Their growth has been impressive, and there … s never been a more exciting time to get in on the action! In this role, you'll play a key role in ensuring the reliability, performance, and scalability of their platform. You'll support internal and external stakeholders … and clients to drive improvement and innovation, helping to move their platform forwards by introducing new processes and technologies. This is an incredibly exciting opportunity for a mid-level SRE to join a global company who put their employees growth and development at the heart of what they do! The Person: 4+ years experience in a similar role Experience working More ❯
days a week in Birmingham or Sheffield Role Overview: We are seeking a Lead Technical Subject Matter Expert (SME) with strong systems thinking and a solid grasp of SRE principles to drive the technical uplift of capacity and observability controls across our technology estate. This role blends hands-on engineering depth with architectural oversight and focuses on enhancing performance, resilience … aligning technical capabilities with internal control frameworks and regulatory expectations. Key Responsibilities: • Lead the design and technical evaluation of capacity management, utilisation monitoring, and observability controls across platforms. • Apply SRE-aligned practices to identify control gaps, performance risks, and areas for automation. • Assess existing tooling, data flows and operational practices to identify control gaps and propose remediation strategies. • Collaborate with … Experience: • 10+ years in engineering, infrastructure, or technical architecture roles in complex technology environments. • Solid understanding of compute, storage, and network capacity planning across mixed deployment models. • Familiarity with SRE disciplines such as observability, service-level indicators/objectives (SLIs/SLOs), and automation of operational tasks. • Demonstrated ability to interpret and apply control requirements in technical design contexts. • Hands More ❯
for experienced SREs to help grow our small team into a global footprint that can provide expert engagement across our core serving systems. As an early member of the SRE team, you will report directly to the Director of Managed Infrastructure and play a foundational role in expanding our SRE practice, integrating reliability principles more deeply into Vercel’s … repeatable, low-toil operational practices through the development of automated systems for software delivery, system failover, and capacity management. About You: At least 3 years of experience in an SRE role, or at least 5 years of experience in an adjacent role (e.g., platform engineering), operating in a scaled environment. Firm grasp of the SRE philosophy and mindset, with practical … experience working on or directly with SRE teams that have proactively engaged in system design and improvement. Strong sense of accountability and commitment to problem-solving, backed by a curiosity to dig deep and identify root causes. Willingness to proactively engage with development teams to influence the course of software design and operational practices. Capability to manage risk, make decisions More ❯
Hamilton Barnes is currently representing a major vehicle manufacturer that is actively seeking a SiteReliabilityEngineer for an initial 6-month contract with the possibility of extension. This position has on site commitments 2/3 Days Per Week in Gaydon. If you are interested in learning more we encourage you to apply today! Responsibilities More ❯
London, England, United Kingdom Hybrid / WFH Options
Vercel
looking for experienced SREs help grow our small team into a global footprint that can provide expert engagement across our core serving systems. As an early member of the SRE team you will report directly to the Director of Managed Infrastructure and play a foundational role in expanding our SRE practice, integrating reliability principles more deeply into Vercel’s … Devise repeatable, low-toil operational practices through the development of automated systems for software delivery, system failover, and capacity management. About You: At least 3 years experience in an SRE role, or at least 5 years experience in an adjacent role (e.g. platform engineering), operating in a scaled environment. Firm grasp of the SRE philosophy and mindset, with practical experience … working on or directly with SRE teams that have proactively engaged in system design and improvement. Strong sense of accountability and commitment to problem solving, backed by a curiosity to dig deep and identify root causes. Willingness to proactively engage with development teams to influence the course of software design and operational practices. Capability to manage risk, make decisions, and More ❯
SiteReliabilityEngineer - Core & Security (f/m/d) Posted On 04/28/2025 Job Information Number of Positions 1 Assigned Recruiter(s) Yann Provost Hiring Manager Yann Provost Technology Work Experience 4-5 years City Lausanne, Switzerland or remote in EU/UK State/Province Vaud (fr) 1006 Job Description Exoscale is … Exoscale strives to create an environment with great working conditions and welcomes diverse applicants. As part of its ongoing efforts to grow its infrastructure footprint Exoscale is hiring a SiteReliability Engineer. The sitereliabilityengineer plays a critical role in ensuring constant availability of the Exoscale platform. The engineering team at Exoscale works on … all aspects from designing & developing products, to their operation and support. With an expanding customer base and new products to further advance Exoscale's product portfolio, sitereliability engineers build and maintain a wide range of technologies. As users of Exoscale itself, sitereliability engineers also take active part in improving products. This position focuses on More ❯
London, England, United Kingdom Hybrid / WFH Options
cloudControl
SiteReliabilityEngineer - Core & Security (f/m/d) Posted On 04/28/2025 Job Information Number of Positions 1 Assigned Recruiter(s) Yann Provost Hiring Manager Yann Provost Technology Work Experience 4-5 years City Lausanne, Switzerland or remote in EU/UK State/Province Vaud (fr) 1006 Job Description Exoscale is … Exoscale strives to create an environment with great working conditions and welcomes diverse applicants. As part of its ongoing efforts to grow its infrastructure footprint Exoscale is hiring a SiteReliability Engineer. The sitereliabilityengineer plays a critical role in ensuring constant availability of the Exoscale platform. The engineering team at Exoscale works on … all aspects from designing & developing products, to their operation and support. With an expanding customer base and new products to further advance Exoscale's product portfolio, sitereliability engineers build and maintain a wide range of technologies. As users of Exoscale itself, sitereliability engineers also take active part in improving products. This position focuses on More ❯
Overview of Role As a sitereliabilityengineer, you will be responsible for implementing, building and maintaining tooling and automation to enhance the reliability of our platform, and working closely with product engineering teams to achieve optimal reliability outcomes. This role is crucial to ensuring our payment systems are robust, scalable, and dependable, providing our … experiences. Be responsible for and deliver: Collaborate with product engineering teams to improve the observability of their applications. Create and maintain standards and ways of working that promote the reliability of PayPoint’s platform. Promote a culture of accountability and ownership by creating tooling and automation that allows product engineers to effectively operate their services in production. Provide primary … office in Welwyn Garden City. You will benefit from a range of company benefits such as: Holiday purchase scheme, with 25 days holiday plus bank holidays as standard. On-site gym at our office (Free), and nationwide corporate rate gym membership. Online benefits portal where you can access lots of deals, discounts - for example of shopping or holidays. Progression More ❯
mission, and comprehensive benefits. Your Mission Provide self-service cloud-native products for delivery teams while matching business requirements such as security, compliance, cost and reliability. As a Senior SRE, you will: Take part in the design, development, deployment and management of infrastructure products Evangelize the best practices around observability, reliability, security and performance Help the company grow faster More ❯
London, England, United Kingdom Hybrid / WFH Options
Attio Ltd
role important? You will join the Security, Infrastructure, and Performance (SIP) team, focusing on building a resilient, scalable, and secure platform to support our growing customer base. As a SiteReliabilityEngineer, your work will directly impact Attio’s ability to scale and deliver a robust platform for our users. This role requires software engineering experience. What … Kubernetes Contribute across the stack, including TypeScript, Node.js, and Google Cloud Platform Champion operational excellence and resilience (99.99% SLO) Manage CI/CD pipelines to improve deployment speed and reliability Support backup, disaster recovery, and security Experience with Google Spanner is a nice to have Hiring Process An introductory call with a member of our talent team ~ 30 minutes … Competitive salary of £80,000 to £100,000 Equity in an early-stage tech company on an incredible trajectory Optional remote working and flexibility Enhanced parental leave Team off-site in fun places! (We've been to Barcelona, Lisbon and Malta so far) Team events in London Apple hardware and a budget for desk amenities #J-18808-Ljbffr More ❯
Salary: Up to £80,000 + 25% annual bonus + PMI + benefits package About the role We're looking for a Lead SiteReliabilityEngineer to shape and drive our approach to monitoring and observability, helping us to deliver reliable, scalable solutions that underpin the performance of our critical services. As part of the Monitoring Team … evolution of automation and event-driven response across the platform. Internally, this role is known as Senior Technical Operations Engineer. About you You're an experienced observability specialist or sitereliabilityengineer with the technical depth and critical thinking needed to lead by example. You understand how monitoring underpins availability and performance - and you know how to More ❯
SiteReliabilityEngineer page is loaded SiteReliabilityEngineer Apply locations IND-BLR-Divyasree Technopolis time type Full time posted on Posted Yesterday job requisition id R About LSEG: The London Stock Exchange Group (LSEG) is a global financial markets infrastructure and data provider headquartered in London, UK. Established in 2007, though its core … Exchange-dates to 1801, LSEG plays a meaningful role in capital markets worldwide. It operates several trading venues, including the London Stock Exchange, Borsa Italiana, and Turquoise. Application Support Engineer role that does the ingestion and delivery of subsets of licensed data to each of our client bases using in-house software. This role also involves taking database backup … and creative culture where we encourage new ideas and are committed to sustainability across our global business. You will experience the critical role we have in helping to re-engineer the financial ecosystem to support and drive sustainable economic growth. Together, we are aiming to achieve this growth by accelerating the just transition to net zero, enabling growth of More ❯
Are you a passionate Software Engineer looking for an exciting new challenge? Join this team and transition into maintaining and enhancing the reliability of one of the world's largest platforms. In this role, you will utilise your expertise in Golang coding to develop robust applications, ensuring the systems remain resilient, scalable, and efficient. If you thrive in … presence and commitment to innovation, you will have the opportunity to work on projects that reach millions of users, making a real difference in the tech world. As a SiteReliabilityEngineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will monitor and optimise system performance with tools such as … Grafana, Prometheus, New Relic, and Splunk. Your role will involve identifying and resolving reliability issues, automating processes, and ensuring the seamless operation of the platform. If you have a passion for technology and a drive to ensure excellence, we would love to hear from you More ❯
DailyPay Belfast, Northern Ireland, United Kingdom Senior Software Engineer (SRE) DailyPay Belfast, Northern Ireland, United Kingdom 3 weeks ago Be among the first 25 applicants About Us DailyPay is transforming the way people get paid. As a worktech company and the industry’s leading on demand pay solution, DailyPay uses an award-winning technology platform to help America’s … with operations throughout the United States as well as in Belfast. For more information, visit DailyPay's Press Center. The Role DailyPay is looking for a talented and motivated engineer with 4+ years of experience as a professional software engineer or sitereliability engineer. You will be a senior member and a technical leader of our … advocate for operational and engineering excellence. Our team primarily uses Go, Terraform, AWS and DataDog. How You Will Make An Impact You will be a key contributor to our SRE team You will tackle a wide variety of technical problems, providing solutions and tooling to product development teams enabling them to monitor and improve their systems You will provide advice More ❯
to debug, optimize code, and to automate routine tasks. Systematic problem-solving approach, coupled with effective verbal and written communication skills. About the job SiteReliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally … visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage … the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think More ❯
and to automate routine tasks. Systematic problem-solving approach, coupled with effective communication skills. Excellent communication, project management and technical skills. About the job SiteReliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally … visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage … the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think More ❯
to debug, optimize code, and to automate routine tasks. Systematic problem-solving approach, coupled with effective verbal and written communication skills. About the job SiteReliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally … visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage … the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think More ❯
our colleagues, clients and partners - and the way we deliver value. Being agile will make us more responsive, more adaptable, and ultimately more innovative We're looking for a SiteReliabilityEngineer to: • work as a part of an agile pod (team) • determine the reliability of our digital products, technology services, and the infrastructure that underpins … problems • collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement • apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to automated testing, deployment, and operations • ensure the … quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements • engage across the company and chapter of SRE and DevSecOps to push SLOs to earlier in the SLDC - 'left shift' Your Career Comeback We are open to applications from career returners. Find out more about our program on More ❯
London, England, United Kingdom Hybrid / WFH Options
NatWest Group
Join us as a SiteReliabilityEngineer, Financial Crime Technology In this key role, you’ll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure … ll work from home some of the time, but you’ll also spend a minimum 2 days per week working from the office What you'll do As our SiteReliabilityEngineer, you’ll work closely with our feature team and other colleagues to meet defined service level objectives and continually improve systems and environments. You’ll … and help to our release process, suggesting and making improvements where possible. You’ll scale systems sustainably through mechanisms like automation, evolving them by pushing for changes that improve reliability and velocity. We’ll also look to you to coach and provide guidance to colleagues and the wider team, leading where required. In addition to this, you’ll: Proactively More ❯
Edinburgh, Scotland, United Kingdom Hybrid / WFH Options
NatWest Group
Join us as a SiteReliabilityEngineer, Financial Crime Technology In this key role, you’ll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure … ll work from home some of the time, but you’ll also spend a minimum 2 days per week working from the office What you'll do As our SiteReliabilityEngineer, you’ll work closely with our feature team and other colleagues to meet defined service level objectives and continually improve systems and environments. You’ll … and help to our release process, suggesting and making improvements where possible. You’ll scale systems sustainably through mechanisms like automation, evolving them by pushing for changes that improve reliability and velocity. We’ll also look to you to coach and provide guidance to colleagues and the wider team, leading where required. In addition to this, you’ll: Proactively More ❯