Global SiteReliabilityEngineer Location: London About Us Founded in 2013, GSR is a leading market maker and programmatic trading firm in the fast-evolving world of cryptocurrency trading. With over 200 employees across seven countries, we provide billions of dollars in liquidity daily to cryptocurrency protocols and exchanges. We build long-term relationships with crypto communities … GSR is an opportunity to be deeply embedded in every major sector of the cryptocurrency ecosystem. About the Role We are seeking a SiteReliabilityEngineer (SRE) to design, optimize, and support highly available systems across our global trading infrastructure. As part of GSR's SRE team, you will manage a multi-regional cloud environment while integrating … work across all layers of infrastructure, including: Networking & Exchange Connectivity Linux Systems & Kubernetes Administration Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction, and enhance developer velocity through better tooling, CI/CD, and infrastructure More ❯
scale across all three major cloud providers, Aura runs hundreds of Kubernetes clusters and hosts thousands of Neo4j instances in production at any given time. We’re reshaping what SRE means at Neo4j Aura—and we want you to be part of that journey. Rather than firefighting or chasing alerts, we’re helping teams design for reliability from day … one. That means building the tools, practices, and culture that embed SRE principles at the heart of how Aura operates. You’ll be joining a team focused on long-term resilience, engineering excellence, and meaningful collaboration with product teams. The Role Automate for insight and scale: Build systems that make troubleshooting fast, safe, and scalable across thousands of Neo4j instances. … tools and automation in Go—our primary language—with an emphasis on sound architecture, testing, and maintainability. Strong software skills in other languages, like Python, are also welcome. Applying SRE practices in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering. Collaborating with other teams to promote SRE thinking—educating on principles More ❯
scale across all three major cloud providers, Aura runs hundreds of Kubernetes clusters and hosts thousands of Neo4j instances in production at any given time. We’re reshaping what SRE means at Neo4j Aura—and we want you to be part of that journey. Rather than firefighting or chasing alerts, we’re helping teams design for reliability from day … one. That means building the tools, practices, and culture that embed SRE principles at the heart of how Aura operates. You’ll be joining a team focused on long-term resilience, engineering excellence, and meaningful collaboration with product teams. The Role Automate for insight and scale: Build systems that make troubleshooting fast, safe, and scalable across thousands of Neo4j instances. … tools and automation in Go—our primary language—with an emphasis on sound architecture, testing, and maintainability. Strong software skills in other languages, like Python, are also welcome. Applying SRE practices in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering. Collaborating with other teams to promote SRE thinking—educating on principles More ❯
Every Preply lesson sparks change, fuels ambition, and drives progress that matters. Meet the team! As a member of the Platform tribe, the SiteReliabilityEngineer (SRE) at Preply combines software development, infrastructure operations and business skills to run a large-scale, fault-tolerant, global language education platform. The SRE ensures that Preply systems have high reliability, top-in-the-industry uptime and a fast rate of product innovations. Additionally, SRE’s keep an ever-watchful eye on the capacity and performance of our system. You'll work on core parts of our platform and help us to meet the challenges of growing our organization in terms of both traffic and the number of developers. The … SRE team unites infrastructure, engineering and business to ensure great synergy that helps Preply succeed. Our main focuses are: Top in the industry uptime record and latency. Blazingly fast Lead Time for our product engineers. Running infrastructure in a cost effective way. We work in small teams, thus you will be able to influence system design and contribute a lot More ❯
London, England, United Kingdom Hybrid / WFH Options
GSR
GSR is an opportunity to be deeply embedded in every major sector of the cryptocurrency ecosystem. About the Role We are seeking a SiteReliabilityEngineer (SRE) to design, optimize, and support highly available systems across our global trading infrastructure. As part of GSR’s SRE team, you will manage a multi-regional cloud environment while integrating … IaC). You will work across all layers of infrastructure, including: Networking & Exchange Connectivity Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction, and enhance developer velocity through better tooling, CI/CD, and infrastructure More ❯
Principal SiteReliabilityEngineer - Core Systems Hybrid in London or Remote within the UK The company Imagine a world where every small business has the power to thrive. That's the world we're building at iwoca. Small businesses aren't just statistics – they're the heartbeat of our communities, the character of our high streets, and … get closer to our goal of funding one million businesses. The role will focus on complex data systems and flows that power multiple internal products and services. Our biggest reliability challenges aren’t sudden spikes in users or data volume - they’re the accumulating complexity, interconnectivity … and constant evolution of these internal systems. So, you’ll need to empathise with different team’s pressures, have a practical, problem-solving mindset, and do some far-sighted SRE work. You’ll combine hands-on work with technical leadership to: Influence without authority- understand our systems, advocate for reliability improvements, and build relationships with other teams and tech More ❯
London, England, United Kingdom Hybrid / WFH Options
FastMarkets Ltd
Principal SiteReliabilityEngineer (Hybrid) Full-time Department: Technology Employment Type: Permanent Fastmarkets is an independent commodity pricing and information organisation with over 600 staff. We are fuelled by values that bring us all together and are united by a collective passion to make a difference. We are supported by a working model that is based ona … journey. Fastmarkets is owned by global private equity firm Astorg, a specialist investor in healthcare, software, technology, business services and technology-based industrial companies. Fastmarkets requires an experienced Senior SiteReliabilityEngineer with great DevOps and Stake holder management skills. To compliment an worldwide … existing team we're looking for someone to help us modernise our Azure cloud platforms to a cloud native, containerised fully automated deployment pipelines. Reporting to the Head of SRE, the correct candidate will have extensive experience in modernising Azure platforms, excel in Infrastructure and code, as well as being comfortable in more traditional DevOps work. This role will also More ❯
Senior SiteReliabilityEngineer - Monitoring and Observability Join to apply for the Senior SiteReliabilityEngineer - Monitoring and Observability role at Macquarie Group Senior SiteReliabilityEngineer - Monitoring and Observability Join to apply for the Senior SiteReliabilityEngineer - Monitoring and Observability role at Macquarie Group Get AI … be part of a friendly and supportive team where everyone - no matter what role - contributes ideas and drives outcomes. What role will you play? As a Monitoring and Observability Engineer, you will run and maintain enterprise-wide log analytics, monitoring, and observability services. You will be responsible for improving the value provided by the log analytics platform to drive … type Employment type Full-time Job function Job function Engineering and Information Technology Referrals increase your chances of interviewing at Macquarie Group by 2x Get notified about new Senior SiteReliabilityEngineer jobs in London, England, United Kingdom . Isleworth, England, United Kingdom 5 days ago Senior SiteReliabilityEngineer, Production Engineering London, England More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
Job Description Who we are looking for A Junior SiteReliabilityEngineer, who will improve system reliability, observability and performance through strong engineering and assist with incident resolution and operational excellence. Supported by our sitereliability engineering team, you will work to integrate reliability and observability practices into the Software Development Life Cycle … and enhance overall performance. You will ensure the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including instrumentation with tools such as OpenTelemetry, improving logging practices, and developing features for maintainability. You will also assist in creating tools and automation for effective service management. This … toolsets. Working with IT Operations to provide and support the use of critical tooling that will enable increasing levels of value to the Business. Driving initiatives to enhance system reliability and observability, both within the team and across the department, fostering a culture of continuous improvement. “By applying to us you are agreeing to share your Personal Data in More ❯
Stoke-on-Trent, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
Job Description Who we are looking for A Junior SiteReliabilityEngineer, who will improve system reliability, observability and performance through strong engineering and assist with incident resolution and operational excellence. Supported by our sitereliability engineering team, you will work to integrate reliability and observability practices into the Software Development Life Cycle … and enhance overall performance. You will ensure the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including instrumentation with tools such as OpenTelemetry, improving logging practices, and developing features for maintainability. You will also assist in creating tools and automation for effective service management. This … toolsets. Working with IT Operations to provide and support the use of critical tooling that will enable increasing levels of value to the Business. Driving initiatives to enhance system reliability and observability, both within the team and across the department, fostering a culture of continuous improvement. “By applying to us you are agreeing to share your Personal Data in More ❯
vouchers for their team and the ability to "work from anywhere" for two weeks of the year Paid one month sabbatical after four years' employment Role Overview Luminance’s SRE team combines strong problem solving, infrastructure tooling and wider DevOps practices to provide a service of Luminance’s unique software applications. The team plays a crucial role in incident response … and issue resolution, swiftly addressing and resolving service interruptions to maintain the highest level of customer satisfaction. With a focus on automation, scalability, reliability and security, the team enable Luminance to ensure a performant, seamless experience for its users. You will join a small, dynamic team of creative engineers and work together to tackle some of Luminance’s greatest More ❯
top AI computing platform. We equip engineers with the tools to deploy AI that is fast, secure, affordable, and built to scale. Whether they need powerhouse GPU hardware on-site or the flexibility of cloud-based solutions, we've got the horsepower to make it happen. Lambda's AI Cloud has been adopted by the world's leading companies … performance through the use of network engineering and other applicable technologies Help with deploying and maintaining network monitoring and management tools You Have 5+ years of experience being SWE, SRE or Network Reliability Engineering Been part of the implementation of production-scale networking projects Experience being on-call and incident response management Have experience building and maintaining Software Defined More ❯
Social network you want to login/join with: SiteReliabilityEngineer, london (city of london) col-narrow-left Client: Caspian One Location: london (city of london), United Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Views: 3 Posted: 16.06.2025 Expiry Date: 31.07.2025 col-wide Job Description: Overview This role is critical More ❯
Description Summary : We're looking for an experienced Platform/Infrastructure Engineer with a strong Microsoft Azure background and deep knowledge of Kubernetes. You'll play a key role in designing, deploying, and maintaining infrastructure and services that power our products. This role requires hands-on experience with automation, modern IaC practices, CI/CD, and maintaining production-grade … and maintain Infrastructure as Code using Terraform or OpenTofu Develop scripts and automation to support infrastructure and deployment workflows - PowerShell is preferred Collaborate with engineering teams to support platform reliability and enable delivery Maintain visibility and awareness through monitoring and logging tools such as Datadog, Azure Monitor, App Insights etc. Support incident resolution and participate in an on-call … such as Azure Monitor, App Insights, or similar Clear communicator with the ability to collaborate across cross-functional teams Nice to Have: Azure certifications (e.g. Azure Administrator, Azure DevOps Engineer) Experience with GitOps and tools such as ArgoCD or Flux Familiarity with Configuration as Code tools like Ansible or Puppet Exposure to large-scale distributed systems or high-volume More ❯
Sheffield, England, United Kingdom Hybrid / WFH Options
KnowBe4, Inc
Snr. SiteReliabilityEngineer (Remote position located in Leeds/Sheffield, United Kingdom) Sheffield, United Kingdom About KnowBe4 KnowBe4, the provider of the world's largest security awareness training and simulated phishing platform, is used by tens of thousands of organizations around the globe. KnowBe4 enables organizations to manage the ongoing problem of social engineering by helping … person, we strive to make every day fun and engaging; from team lunches to trivia competitions to local outings, there is always something exciting happening at KnowBe4. KnowBe4’s SiteReliability Engineers help ensure that our platforms are reliable, secure, scalable, and efficient. They work alongside other engineers in a fast-paced, agile development environment, and share solutions … to advance the technologies running our systems, improve their safety and reliability, and make the complex distributed services that deliver our platforms easy to understand. The ideal member of our team gets excited about new AWS service releases, stays up-to-date on industry trends and design patterns, and has excellent time-management and communication skills. Some of the More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
KnowBe4, Inc
Snr. SiteReliabilityEngineer (Remote position located in Leeds/Sheffield, United Kingdom) Sheffield, United Kingdom About KnowBe4 KnowBe4, the provider of the world's largest security awareness training and simulated phishing platform, is used by tens of thousands of organizations around the globe. KnowBe4 enables organizations to manage the ongoing problem of social engineering by helping … person, we strive to make every day fun and engaging; from team lunches to trivia competitions to local outings, there is always something exciting happening at KnowBe4. KnowBe4’s SiteReliability Engineers help ensure that our platforms are reliable, secure, scalable, and efficient. They work alongside other engineers in a fast-paced, agile development environment, and share solutions … to advance the technologies running our systems, improve their safety and reliability, and make the complex distributed services that deliver our platforms easy to understand. The ideal member of our team gets excited about new AWS service releases, stays up-to-date on industry trends and design patterns, and has excellent time-management and communication skills. Some of the More ❯
London, England, United Kingdom Hybrid / WFH Options
Palantir
the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role We’re looking for SiteReliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure. SiteReliability Engineers combine engineering experience More ❯
SiteReliability and DevOps - Senior SiteReliabilityEngineerSiteReliability and DevOps - Senior SiteReliabilityEngineer 1 week ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. oin Whitbread as a Senior SiteReliabilityEngineer and help ensure … our systems run at their best, delivering exceptional experiences for both our guests and teams. You’ll focus on improving per... oin Whitbread as a Senior SiteReliabilityEngineer and help ensure our systems run at their best, delivering exceptional experiences for both our guests and teams. You’ll focus on improving performance, designing and automating processes … and evaluate emerging technologies, market trends, and innovations, assessing their potential to add value to strategy formation, planning, and decision-making within Whitbread. What You’ll Need Highly experienced SREEngineer with solid proven experience in IT who can influence teams in the best practise in SRE. A DevOps mindset with a willingness to share knowledge and able to More ❯
London, England, United Kingdom Hybrid / WFH Options
DIGITALQ UK IT SOLUTIONS LIMITED
Social network you want to login/join with: Job title: SiteReliability Engineering - DevOps Job Description: - Job Summary The Opportunity: Our client is a leader in the mobile sector. As SiteReliabilityEngineer, you will partner with other teams, using your expertise to guide the design, development, and delivery of products and solutions built … CD Pipeline Enhancement: Contribute to creating and improving CI/CD pipelines to support rapid and reliable software releases. Key Skills/Experience: •* Minimum 5 years of experience as SRE •* Strong knowledge of SRE principles and methodology •* Adaptable and flexible approach to work If you possess these skills and want to contribute to a dynamic team environment, we encourage you … the updated CV to Work Location: Hybrid remote in London City Reference ID: DQUK/APPLE/AD004 Requirements Key Skills/Experience: •* Minimum 5 years of experience as SRE •* Strong knowledge of SRE principles and methodology •* Adaptable and flexible approach to work #J-18808-Ljbffr More ❯
consultants, analysts, and support staff. Overview: We are looking for a highly skilled and visionary leader to join our team as the Head of SiteReliability Engineering (SRE) with a strong focus on AWS cloud infrastructure. The ideal candidate will have a deep understanding of cloud architectures, extensive experience in SRE practices, and the ability to lead and … scale SRE teams to ensure the availability, performance, and security of our systems. Key Responsibilities: Leadership and Team Management: Lead and manage the SRE team to ensure high availability, scalability, and performance of our AWS-based infrastructure. Provide mentorship and guidance to junior and senior engineers, fostering a culture of operational excellence and continuous improvement. Cloud Infrastructure Management: Oversee the … design, implementation, and maintenance of cloud infrastructure in AWS, ensuring the systems are secure, reliable, and highly available. Use best practices for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability. Incident More ❯
consultants, analysts, and support staff. Overview: We are looking for a highly skilled and visionary leader to join our team as the Head of SiteReliability Engineering (SRE) with a strong focus on AWS cloud infrastructure. The ideal candidate will have a deep understanding of cloud architectures, extensive experience in SRE practices, and the ability to lead and … scale SRE teams to ensure the availability, performance, and security of our systems. Key Responsibilities: Leadership and Team Management: Lead and manage the SRE team to ensure high availability, scalability, and performance of our AWS-based infrastructure. Provide mentorship and guidance to junior and senior engineers, fostering a culture of operational excellence and continuous improvement. Cloud Infrastructure Management: Oversee the … design, implementation, and maintenance of cloud infrastructure in AWS, ensuring the systems are secure, reliable, and highly available. Use best practices for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability. Incident More ❯
Job Description The SRE Manager is responsible for leading the SiteReliability Engineering function across Europe, ensuring the reliability, scalability, and performance of critical infrastructure and services. This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to support platforms worldwide. The ideal candidate will bring … operational excellence to a high-impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a regional SRE team focused on continuous improvement and innovation. Key Responsibilities: Technical Leadership Develop deep expertise in the Titanium trading platform to lead and support critical business operations. Oversee team workload, ensuring … priorities align with business goals and resource capacity. Operational Excellence Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery). Cross-Functional Collaboration Partner with Software Engineering, Infrastructure, Operations, Security, and Business teams to deliver secure and reliable platforms. Team Development More ❯
expansion, an opportunity has become available for a Head of SiteReliability Engineering to join our team to help us transform our existing operational workloads to an SRE approach. Key Responsibilities Establishing and managing our new SRE function Operating and modernising our existing cloud infrastructure Partnering with our DevOps team to ensure fast & supportable platform updates Maintaining the … are maintained at the highest levels Acting as a key Incident Commander and escalation point Liaising closely with our SecOps teams to ensure timely vulnerability management Educating teams in SRE practices and maintaining high standards of compliance Implementing world-class observability standards utilising SLI/SLO/Error Budgets Continually evolving our observability platforms for greater coverage Liaising with Product … Engineering teams for constant evolution of metrics Aligning SRE Sprints & Backlog with our roadmaps to meet business expectations Guiding our teams in a more Agile approach to demand management Actively taking part in our daily stand-ups and keeping our Sprints on track Keeping up-to-date documentation in our JIRA & Confluence tools Owning and maintaining our SRE Incident Management More ❯
SRE/DevOps Engineer – High Frequency Trading - Multi Strategy Hedge Fund - Multi Billion Dollar Hedge Fund - Multiple Headcount - Up to £600k TC SRE/DevOps Engineer – High Frequency Trading - Multi Strategy Hedge Fund - Multi Billion Dollar Hedge Fund - Multiple Headcount - Up to £600k TC Get AI-powered advice on this job and more exclusive features. Direct message the … job poster from Mondrian Alpha SRE/DevOps Engineer – High Frequency Trading - Multi Strategy Hedge Fund - Multi Billion Dollar Hedge Fund - Multiple Headcount - Up to £600k TC Join a leading multi-strategy hedge fund, where you’ll collaborate with elite engineers and top investment professionals to develop cutting-edge trading technology. We are seeking highly skilled SRE/DevOps … across multiple assets globally. Build effective tooling for automation across all phases of SDLC, ensuring rigorous testing, release, and deployment processes. Improve trading systems' performance, monitoring, availability, and reliability . Provide day-to-day support for mission-critical trading applications and infrastructure. Develop a deep understanding of trading workflows and demonstrate excellent incident management skills . Communicate effectively with More ❯
London, England, United Kingdom Hybrid / WFH Options
Algolia
API-first approach. Performance and Scalability is at the heart of our mission: we power 1.5 trillion searches a year, for 10K+ customers all over the world. As a SiteReliability Engineering Manager in the Production Engineering team of Algolia, you will lead the Fleet team of SiteReliability Engineers responsible for the provisioning and the … global reliability of the Search Products at scale. Your team will focus on creating pragmatic solutions to optimize the Search Products availability and costs at scale, depending on the needs of the customer, the Product teams, and the different engineering teams that deliver a unique Search Experience to our customers. You will manage a team of experienced Individual Contributors … scale and identifying optimization opportunities. YOUR ROLE WILL CONSIST OF: Collaborating with senior leadership to define the overall technical direction and strategy for the organization , and ensure that the SRE team's goals and initiatives are aligned with this strategy. Building and maintaining strong relationships with stakeholders across the organization , as you represent the SRE organization in cross-functional meetings. More ❯