Wakefield, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
here, and as we continue to grow; reliability, automation, and scalability have never been more important to us. You will be our first SRE so a strong background in implementing SRE best practices would be Ideal. You will know what good looks like and strive to continuously improve automation … tooling using AWS, Terraform, Docker, and CI/CD pipelines. Supporting and evolving our container-based architecture (we use ECS and Fargate). Driving SRE best practices: SLIs/SLOs, error budgets, reducing toil, and improving observability. Using (and hopefully enjoying!) tools like Datadog, Prometheus, Grafana, and Nix to support … with AWS, Terraform, Docker, and container orchestration (ECS/Fargate). Good understanding of CI/CD pipelines and DevOps workflows. Solid grasp of SRE principles – SLIs, SLOs, error budgets, observability, etc. Familiarity with Datadog, Prometheus, Grafana, or similar tools. Experience with Nix is a plus (or curiosity to learn More ❯
Stockport, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
here, and as we continue to grow; reliability, automation, and scalability have never been more important to us. You will be our first SRE so a strong background in implementing SRE best practices would be Ideal. You will know what good looks like and strive to continuously improve automation … tooling using AWS, Terraform, Docker, and CI/CD pipelines. Supporting and evolving our container-based architecture (we use ECS and Fargate). Driving SRE best practices: SLIs/SLOs, error budgets, reducing toil, and improving observability. Using (and hopefully enjoying!) tools like Datadog, Prometheus, Grafana, and Nix to support … with AWS, Terraform, Docker, and container orchestration (ECS/Fargate). Good understanding of CI/CD pipelines and DevOps workflows. Solid grasp of SRE principles – SLIs, SLOs, error budgets, observability, etc. Familiarity with Datadog, Prometheus, Grafana, or similar tools. Experience with Nix is a plus (or curiosity to learn More ❯
London, England, United Kingdom Hybrid / WFH Options
Blockchain.com
code, create, and ultimately build an open, accessible and fair financial future, one piece of software at a time. We are looking for a SiteReliabilityEngineer to join our Core team to encourage infrastructure best practices across our organization that would allow to securely scale a … distributed financial platform tackles some of the most interesting problems in the crypto for millions of our customers and continues to grow rapidly. The SRE team at blockchain combines software and systems engineering to provide a platform that abstracts complexity for increased security, reliability and rapid product delivery. The … SRE organization at Blockchain is a work in progress - our focus is always on how to make our existing systems better. We pride ourselves on having created an environment where individuals have a high degree of freedom in proposing, discussing, designing and implementing changes. We are a team that places More ❯
London, England, United Kingdom Hybrid / WFH Options
Blockchain Ventures
code, create, and ultimately build an open, accessible and fair financial future, one piece of software at a time. We are looking for a SiteReliabilityEngineer to join our Core team to encourage infrastructure best practices across our organization that would allow to securely scale a … distributed financial platform tackles some of the most interesting problems in the crypto for millions of our customers and continues to grow rapidly. The SRE team at Blockchain combines software and systems engineering to provide a platform that abstracts complexity for increased security, reliability, and rapid product delivery. The … SRE organization at Blockchain is a work in progress - our focus is always on how to make our existing systems better. We pride ourselves on having created an environment where individuals have a high degree of freedom in proposing, discussing, designing, and implementing changes. We are a team that places More ❯
Peterborough, England, United Kingdom Hybrid / WFH Options
Compare the Market
never been more blindingly obvious why you would choose Compare the Market. We’d love you to be part of our journey. As the SiteReliabilityEngineer, you will ensure the highest levels of system uptime and performance, contributing directly to the trust and reliability of … boxes but would love you to hear what makes you great for this role. Some of the great things you’ll be doing: • System Reliability - Ensure the uptime and reliability of critical systems and applications, minimizing downtime and service disruptions. • Automation - Develop and maintain automated processes for deployment More ❯
London, England, United Kingdom Hybrid / WFH Options
Algolia
Scalability is at the heart of our mission: we power 1.5 trillion searches a year, for 10K+ customers all over the world. As a SiteReliability Engineering Manager in the Production Engineering team of Algolia, you will lead the Fleet team of SiteReliability Engineers responsible … for the provisioning and the global reliability of the Search Products at scale. Your team will focus on creating pragmatic solutions to optimize the Search Products availability and costs at scale, depending on the needs of the customer, the Product teams, and the different engineering teams that deliver a … YOUR ROLE WILL CONSIST OF: Collaborating with senior leadership to define the overall technical direction and strategy for the organization , and ensure that the SRE team's goals and initiatives are aligned with this strategy. Building and maintaining strong relationships with stakeholders across the organization , as you represent the SREMore ❯
organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. Role: Principal SiteReliabilityEngineer You will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will … a collaborative spirit. Responsibilities: Define and enforce SLOs, SLIs, and error budgets across critical services Develop and implement cloud infrastructure and tooling strategies Enhance SRE practices across the organization Implement robust observability metrics, logs, and traces using our observability tools Guide the team in building automated, self-healing systems Own … to ensure compliance and operational excellence Evaluate and adopt tools and practices to improve platform performance and reliability Desired Skills & Experience: Experience leading SRE transformations Hands-on expertise with Kubernetes (EKS preferred) in production Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
Job Description Senior SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Mindrift
Freelance SiteReliabilityEngineer (Security Automation & Penetration Testing) 1 day ago Be among the first 25 applicants About The Company At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI. Our goal? Advance the field of More ❯
London, England, United Kingdom Hybrid / WFH Options
Orgvue
and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. As a Principal SiteReliabilityEngineer, you will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will … SLOs, SLIs, and error budgets across critical services Craft and implement a cloud infrastructure and tooling strategy Work across our organization to level up SRE practices Help implement robust observability metrics, logs & traces using our observability tools Guide the team in building automated, self-healing systems Own and evolve our … Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform Desired Skills & Experience: Demonstrable experience leading SRE transformations Deep hands-on expertise with Kubernetes (EKS preferred) in production environments Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB More ❯
our customers to innovate in hyperscaler cloud, enabling seamless migration, advanced security, and data-driven success. Currently, we are looking for a Senior Azure SiteReliabilityEngineer to join our team in the UK. Your daily responsibilities: Architect, implement, and improve existing monitoring and alerting systems Proactively More ❯
Role Join us as a SiteReliabilityEngineer and help us build the future of data sovereignty! We're seeking an SRE passionate about creating high-performance, scalable, and reliable services for our production infrastructure. You'll have a direct impact, improving existing systems and developing innovative … for self-hosted deployments, including infrastructure and tooling for monitoring, alerting, and troubleshooting. This will involve designing and implementing robust metrics and logging systems. Engineer the Acra platform for high availability and fault tolerance. This includes ensuring resilience against Cloud Availability Zone outages and the ability to gracefully handle … resource utilization. Collaborate closely with the product engineering team to influence the design and implementation of new products and features, ensuring they meet our reliability and scalability standards from the outset. Preferred Qualifications Bachelor's degree (or foreign equivalent) in Computer Science or a related field is desired; relevant More ❯
Maidstone, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliabilityEngineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with … ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks … and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development More ❯
Belfast, Northern Ireland, United Kingdom Hybrid / WFH Options
JR United Kingdom
streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliabilityEngineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with … ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks … and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development More ❯
the world's most innovative fintechs, and the Financial Times recognised us as one of Europe's fastest-growing companies in 2023. The Client SiteReliabilityEngineer role in Infrastructure, Client Services will be responsible for enabling and supporting our clients to deliver a best in class … at scale. This role supports clients in their cloud infrastructure preparation, deployment, optimisation and troubleshooting. Duties Hands on cloud infrastructure consulting both on client site and remote Working with customers and external partners to design and prepare suitable cloud infrastructure to ensure Thought Machine Vault products can be tested … empower holistic digital transformation in collaboration with Thought Machine Client Architects Supporting and troubleshooting client, SaaS and internal cloud infrastructure both remotely and on site, including by promoting and deploying suitable monitoring, logging and alerting tools Working closely with internal product and engineering teams to ensure client feedback is More ❯
our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role We're looking for SiteReliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure. SiteReliabilityMore ❯
Sheffield, England, United Kingdom Hybrid / WFH Options
KnowBe4
York-Location-EI_IE969384.0,7_IL.8,12_IC3297365.htm LinkedIn : https://www.linkedin.com/company/knowbe4/life/uk/KnowBe4’s SiteReliability Engineers help ensure that our platforms are reliable, secure, scalable, and efficient. They work alongside other engineers in a fast-paced, agile … development environment, and share solutions to advance the technologies running our systems, improve their safety and reliability, and make the complex distributed services that deliver our platforms easy to understand. The ideal member of our team gets excited about new AWS service releases, stays up-to-date on industry … AWS - ECS, Lambda, Step Functions, SNS/SQS, Transit Gateway, Aurora, DynamoDB, CloudFront, S3, AppSync, API Gateway, and many more. Responsibilities: Work with other SiteReliability Engineers to build highly scalable and resilient applications and infrastructure in AWS Maintain and improve extensible infrastructure-as-code using Terraform Learn More ❯
Reading, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
and are seeking an exceptional SiteReliabilityEngineer to bring their expertise and innovative thinking to strengthen their team. As an SRE , the main purpose is to solve scalability issues through collaboration and automation, applying engineering principles to infrastructure and operational challenges. Work closely with various teams More ❯
/Europe) Employment type: Permanent Working Hours: Full time (9-6 UK) Salary: Up to £110K + Shares + Benefits TransFICC is hiring a SiteReliabilityEngineer to provide high-performance services to our customers. We develop an integration service product that enables our clients … to have a flexible, hosted service without requiring their internal resources to respond to connectivity challenges across trading venues. You will be joining our SRE team and contributing to TransFICC's automation culture. We are a multi-disciplinary team covering everything from desktop and laptop support to data centre provisioning … a software automation tool like Ansible and coding ability is a must. We are looking for someone experienced as a sys admin or network engineer; however, you must have a reasonable understanding of both. Constructive, open-minded and self-motivated. A belief in life learning, and an awareness of More ❯
London, England, United Kingdom Hybrid / WFH Options
Attio Ltd
the Security, Infrastructure, and Performance (SIP) team, focusing on building a resilient, scalable, and secure platform to support our growing customer base. As a SiteReliabilityEngineer, your work will directly impact Attio’s ability to scale and deliver a robust platform for our users. This role … including TypeScript, Node.js, and Google Cloud Platform Champion operational excellence and resilience (99.99% SLO) Manage CI/CD pipelines to improve deployment speed and reliability Support backup, disaster recovery, and security Experience with Google Spanner is a nice to have Hiring Process An introductory call with a member of … to £100,000 Equity in an early-stage tech company on an incredible trajectory Optional remote working and flexibility Enhanced parental leave Team off-site in fun places! (We've been to Barcelona, Lisbon and Malta so far) Team events in London Apple hardware and a budget for desk More ❯
London, England, United Kingdom Hybrid / WFH Options
Heroic Labs
Hi there! We're looking for a Cloud Engineer (Software ReliabilityEngineer) to join the growing team at Heroic Labs. Our cornerstone offerings—Nakama, Hiro, Satori, and Heroic Cloud—comprise the Heroic Game Stack (HGS), an integrated platform delivering unparalleled performance and flexibility. We are simplifying the … across the globe to power games that individually make over 90+ billion requests per month into our infrastructure. About The Role Our Cloud/SRE engineers run the Heroic infrastructure which powers many thousands of games in the world. You’ll be building both the infrastructure and the gateway access … Heroic Cloud) which interact with it. We believe that in order for a Cloud/SREengineer to be successful, they must have very good working knowledge of internals of Postgres (think: ‘almost’ building an extension for Postgres), as well as very in-depth technical knowledge of GCP, AWS More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Time to enhance your scope; broaden your horizon by delving into SiteReliability Engineering (SRE). You’ll take the skills you have picked up in software engineering and apply these to improve overall system and application performance and reliability. You’ll work on internal developer tooling, using More ❯
which involves spending at least two days per week currently, or 40% of our time, at our Bristol office. About this opportunity Our Cloud SRE (SiteReliability Engineering) team is looking for an experienced and passionate Engineer with strong hands-on development experience. As a Cloud SRE … the Bank's vision for 2023 and beyond! Specific activities might include: Working with service teams to directly influence and drive the adoption of SRE best practices and ways of working within our microservices; Collaborating with infrastructure engineers to ensure resilience and scalability across the platform; Observing, investigating & fixing service … the following to consider you for interview: Ideally, you'll come from a software engineering or telemetry background and have now moved into an SRE role. Technical Skills: Experience working with a broad set of GCP products (or extensive experience with another Public Cloud platform, such as Azure or AWS More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
a presence in London, Hong Kong, Amsterdam, and as well in Mumbai and now in New York in 2001. About the role : As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution … streamline operational workflows and improve efficiency. Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability. Build a first class SRE team. Through a combination of leading by example, coaching and mentoring, mould the team would want to have around you. Provide leadership and guidance to … the SRE team, fostering a culture of collaboration, innovation, and continuous improvement. RESPONSIBILITIES: Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in More ❯
databases and we want to grow that number, along with delivering more features without compromising from reliability and scalability. This is where our SRE team comes into the picture. The SRE team is responsible for managing Neon's multi-region, multi-cloud deployment in close collaboration with the broader … engineering team. All the features we want to implement can only reach our customers if the changes are delivered reliably, which means the SRE team plays a significant role in defining our pace of development. Successful candidates will get the opportunity to contribute to the effort of evolving Neon to … cloud and infrastructure topics Be ready to join an on-call rotation We're looking for someone who has 4+ years experience working in SiteReliability Engineering Experience with cloud infrastructure components in Azure and/or AWS Experience in a complex Linux infrastructure environment Experience focusing on More ❯