SiteReliability Engineer, ML Infrastructure, Large Models SRE link Copy link corporate_fare Google place London, UK Mid Experience driving progress, solving problems, and mentoring more junior team members; deeper expertise and applied knowledge within relevant area. Apply link Copy link Bachelor's degree in Computer Science or a related technical field or equivalent practical experience. 5 years … Models/Machine Learning tooling and infrastructure. Experience in automation, monitoring, and incident response. Experience in C++, Java, Python, or Go. Understanding of SiteReliabilityEngineering (SRE) principles and best practices. Excellent communication, project and stakeholder management skills. About the job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and … run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing More ❯
reliability of all cloud systems while keeping levels of manual work low. SREs are expected to be experienced in software engineering principals, operational discipline, and automation. The SRE team work on a fully remote basis and work in conjunction with their US and Australian teams as well. This company are a market leader in Student community management software … ensure high availability and performance Collaborate with product engineering teams to design/build fit-for-purpose and observable software Required Skills and Experience: Proven experience in a SRE/DevOps/Platform Engineering role and having previously worked in a Software Engineering role in .Net and C# or Java or similar OO development language. Proficiency in … and this job is part of a large program of change and improvement in their Cloud SaaS products over the coming years. If you are looking for an interesting SRE role with a forward-thinking global organisation, then this would be a tremendous career opportunity to consider. Please apply with your CV to find out more. More ❯
Nottingham, Nottinghamshire, United Kingdom Hybrid / WFH Options
Capital One (Europe) plc
Nottingham, Nottinghamshire Senior Software Development Engineer - SiteReliability About the Role We're looking for a Senior Engineer to join our SiteReliabilityEngineering (SRE) team. This role is ideal for a skilled Java engineer with a passion for understanding how complex systems work, analysing performance, and applying engineering solutions to make them more … efficient, stable, and scalable. You'll lead on planning and implementing key SRE initiatives, optimise and automate how our systems operate, and improve observability through better monitoring and logging. You'll also work closely with your peers to drive consistency and high standards across SRE and the wider engineering community, so a real enthusiasm for influencing others and leading … to reduce operational overheads through observability and service automation. Drive engineering best practice (e.g., Operational Excellence, Security, Quality, Resilience etc.) and set standards across the team and wider SRE community. Innovate within your team and contribute within your technical domain. Deliver key pieces of intent from inception through to design and hands-on delivery, in collaboration with your SREM. More ❯
Milton Keynes, Buckinghamshire, England, United Kingdom
Noir
SiteReliability Engineer (SRE) - Market leading company - Milton Keynes (Tech stack: .Net, C#, ASP.Net Core, SQL Server, PowerShell, Azure CLI, Bash, Azure DevOps, Jenkins, GitHub Actions, Docker, Kubernetes) Help shape the tech future of UK market leader! Backed by a major financial institution with soaring profits - my client is modernising platforms, embracing AI, and driving automation at scale. … We're hiring a Lead SiteReliability Engineer (SRE) to drive reliability, observability, and performance across our Azure cloud infrastructure. You'll work in a modern engineering environment where we live by "you build it, you run it", focused on automation, scale, and resilience. Tech stack you'll work with: .NET, C#, ASP.NET Core, SQL Server … PowerShell, Azure CLI, Bash, Azure DevOps, Jenkins, GitHub Actions, Docker, Kubernetes We want to hear from you if: As a SiteReliability Engineer (SRE) you've delivered scalable systems using .NET, C#, and ASP.NET Core , with real-world experience managing production workloads You've automated operations using PowerShell, Azure CLI, and Bash to reduce toil and boost efficiency More ❯
Prestigious opportunity with a Global Investment Giant for a SiteReliabilityEngineering (SRE) Manager to be based in our Manchester HQ, leading a talented team of engineers dedicated to maintaining and enhancing the reliability of our systems.Working closely with cross-functional teams across the globe, including business stakeholders, product managers, and software engineers, you will ensure … role has an opportunity to provide strategic guidance on improvements. At the forefront of providing production support services including, incident logging, incident resolution, problem management, change management practices, and SRE support, we are inviting you to join our success story.As our SiteReliabilityEngineering Manager you will:- Lead, coach, and develop a high-performing SRE team. Foster … for incident response, root cause analysis, and post-mortem reviews to prevent future incidents. Work closely with business and technology teams to understand their needs and ensure alignment with reliability and uptime goals. Facilitate communication and collaboration across global teams. Drive the development and adoption of automation tools to improve efficiency and reduce manual intervention. Establish and maintain comprehensive More ❯
Prestigious opportunity with a Global Investment Giant for a SiteReliabilityEngineering (SRE) Manager to be based in our Manchester HQ, leading a talented team of engineers dedicated to maintaining and enhancing the reliability of our systems. Working closely with cross-functional teams across the globe, including business stakeholders, product managers, and software engineers, you will … role has an opportunity to provide strategic guidance on improvements. At the forefront of providing production support services including, incident logging, incident resolution, problem management, change management practices, and SRE support, we are inviting you to join our success story. As our SiteReliabilityEngineering Manager you will:- Lead, coach, and develop a high-performing SRE team. … for incident response, root cause analysis, and post-mortem reviews to prevent future incidents. Work closely with business and technology teams to understand their needs and ensure alignment with reliability and uptime goals. Facilitate communication and collaboration across global teams. Drive the development and adoption of automation tools to improve efficiency and reduce manual intervention. Establish and maintain comprehensive More ❯
automotive software development. The right candidate will have excellent communication skills, solid coding skills, expertise in building scalable, reliable, highly available and fault-tolerant systems, broad knowledge of software engineering and sitereliabilityengineering in areas such as Large-Scale Data and Compute Infrastructure, Stream Processing, Kubernetes, High-Performance Networking, Observability and Infrastructure Automation. RESPONSIBILITIES Set … maintain, optimize and support large scale, multi-region, multi-cloud compute and storage infrastructure powering our data platform and mission critical services. Work with fellow Data Infrastructure engineers and SiteReliability engineers to ensure our systems are scalable, reliable, fault-tolerant, highly available, highly performant, and observable. Manage incidents, triage product or system issues and debug/track …/resolve by analyzing the root cause of these issues and the impact on users & operations. Work closely with other Data Infrastructure engineers, SiteReliability engineers, ML Platform engineers, Computer Vision and ML engineers on high-impact projects to create innovative solutions to problems in the self-drive space. Mentor junior engineers in their day to day work More ❯
SiteReliability Engineer (SRE) Manager - Apple Services Engineering London, England, United Kingdom Software and Services Description Apple Service Engineering (ASE)'s Compute team is seeking highly motivated individual with strong technical and communication skills to join us in on our quest to build and enhance massive clusters hosting Virtual Machines, Containers and associated infrastructure that can … engage with the upstream community to drive Apple's requirements. Ultimately, you will help build the platform that delivers our applications at scale to our end users.As a Compute SiteReliabilityEngineering manager, you will be leading a team responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow … for new applications and services to flourish. Minimum Qualifications Extensive Leadership in Cloud Computing: In depth experience building and leading high-performing engineering teams, with a deep focus on cloud computing and hands-on experience across public and/or private cloud environments. Large-Scale Infrastructure Management: Proven ability to manage enterprise services in large-scale nix environments and More ❯
Technical Specialist - SiteReliability Engineer page is loaded Technical Specialist - SiteReliability Engineer 申请 locations Gurgaon Office FIL Bengaluru Office time type Full time posted on 发布于 2 天前 time left to apply 结束日期 2025年9月16日 (申请时间还剩 24 天) job requisition id J60278 About the Opportunity Job Type: PermanentApplication Deadline: 16 September 2025 Job Description Title Technical … analysts and investment operations staff in all international locations, including Canada, London, Hong Kong and Tokyo. About your role We are seeking a talented SiteReliability Engineer (SRE) to join our Technology team supporting critical applications within the ISS Production Services. This role blends traditional software engineering practices with reliability-focused operations, aiming to enhance the … scalability, availability, and performance of client- and market-facing applications. The SRE will work directly with application development, architecture, DevOps, and business teams to ensure systems are designed and maintained with reliability and performance in mind, while meeting the demanding requirements of financial services operations. About you Define and manage SLOs, SLIs, and error budgets aligned with business goals. More ❯
Reliability Engineer Apply locations Belfast - Millennium House time type Full time posted on Posted Yesterday job requisition id 32912 About us: CME Group is seeking a Staff SRE to help, build, operate and scale systems in our Markets portfolio. Markets SREs work on products and applications related to CME's Globex trading platform. Our systems deliver an exceptional … combination of low-latency performance and rock-solid reliability to seamlessly handle the world's busiest trading days. The successful candidate will have a strong understanding of SRE principles and practices, enjoy the cut-and-thrust of operating Production systems, be a strong communicator, and may have previously worked in an SRE role, a software engineering role, a … DevOps role or a systems engineering role. About the role: As a Staff SRE you'll lead Product direction for improving reliability. You will shape our roadmap, architecture and drive high-impact changes across teams. Key responsibilities: Serve as the technical leader for Product reliability - defining a Product Reliability Roadmap and influencing decisions on direction and prioritisation More ❯
Reliability Engineer III Apply locations Belfast - Millennium House time type Full time posted on Posted 3 Days Ago job requisition id 32913 CME Group is seeking a SRE III to help, build, operate and scale systems in our Markets portfolio. Markets SREs work on products and applications related to CME's Globex trading platform. Our systems deliver an … learn how we observe, monitor, automate, and improve Production service reliability and act as a mentor to junior colleagues. He/she will have a keen interest in SRE and enjoy the cut-and-thrust of operating Production systems. They will be a strong communicator, and may have previously worked in an SRE role, a software engineering role … ideas and reliability improvement suggestions to the Product backlog Support the migration of markets applications to Google Cloud Platform (GCP) Act as a mentor to L2 and L1 SRE colleagues What We're Looking for: Experience with Linux-based systems Experience with Cloud-based platform(s) - Google Cloud Platform, GCE, and/or GKE a bonus Understanding of application More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
Click Dealer
Lead SiteReliability Engineer Location: Remote working *1 day in every 2 weeks at our Stoke-On-Trent office (5 mins from station). Salary: £Competitive + company benefits (Full time/permanent role) About Click Dealer At Click Dealer, we’re passionate about building software and building digital tools that make life easier for automotive dealerships - driving … of companies backed by the Global Investment firm, Carlyle Group. Today, we’re proud to be trusted by over 1,800 independent and franchise dealerships across the UK. The SRE Team We count on our sitereliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance levels to pursue their goals. About … the role This is a hands-on role. Working as part of the Software Engineering team, you will run the SRE function on a day-to-day basis working closely with the Head of Engineering providing SRE expertise whilst collaborating with cross-functional teams to develop real-world solutions and positive user experiences. The successful candidate will review More ❯
sector. This is initially a 12-month contract with the potential to extend and will be a hybrid role based in London. Our client is seeking an experienced DevOps SiteReliability Engineer to help build, maintain, and evolve modern technology platforms that deliver reliable, scalable, and secure services. This is a hands-on technical role that combines full … stack development, cloud engineering, and operational excellence - with a strong emphasis on automation, performance, and continuous improvement. You will work closely with development teams, infrastructure engineers, and business stakeholders to design, implement, and support resilient systems capable of adapting to rapidly changing requirements. This role is perfect for someone who enjoys solving complex problems, driving technical innovation, and collaborating … consistency, and reliability Implement advanced monitoring, alerting, and self-healing capabilities to ensure high system availability Partner with development teams to integrate SiteReliabilityEngineering (SRE) best practices into all stages of the software lifecycle Troubleshoot complex production issues, perform root cause analysis, and implement permanent solutions Lead and contribute to continuous improvement initiatives across infrastructure More ❯
to £95,000 + Bonus + Shares Watford (Hybrid) Method Resourcing are proud to be partnering with a fast-growing, international technology business delivering critical services across multiple high-reliability sectors. They're seeking a Head of Delivery Enablement who can … ensure cohesive, end-to-end delivery across architecture, DevOps, quality assurance, and project delivery. Role Overview: Acting as the Technical Product Owner for SiteReliabilityEngineering (SRE), you'll manage the technical backlog to balance future strategic initiatives with feedback from engineering teams. You will guide DevOps engineers through the full delivery lifecycle, lead the development … strategic work, align on tooling, and drive improvements in observability, automation, and testing. Ideal Experience & Skills Demonstrated technical leadership across diverse skillsets, including SiteReliabilityEngineering (SRE), DevOps, and Quality Assurance (QA) Proven track record of aligning and integrating cross-functional technical teams and complex systems Strong stakeholder management skills with the ability to influence decisions and More ❯
Has anyone actually ever given you a good description of what SRE is? Recently I've met dozens of companies implementing an SRE function. Half are just rebranding an ops team (because Ops ain't cool), some don't want to call the additional silo they have created 'DevOps' (because apparently that's the wrong thing to do) so they … re calling it SRE and the rest actually don't really know how to describe what they're doing. And if you can't describe it simply, you don't know what it is, chief (because Google do it, isn't the right answer). That was until today, when I met a company who actually white boarded their vision … process rather than the build. We discussed Kubernetes, Prometheus and API Gateways. Most importantly, they spoke like they knew what the hell they were on about. Not just about SRE, but on the whole Engineering process. This is a company with at the top of their game, who are about to introduce a brand new monitisation model to a More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
Senior SiteReliability EngineerLondon - Hybrid£80,000 - £90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension Excellent opportunity for SiteReliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression!This company operates … performance. With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries.In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems.The ideal candidate … and conduct chaos engineering experiments*Monitor and maintain Kafka clusters for performance and reliability*Respond to and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering*Strong experience with AWS, EKS/Kubernetes, and Terraform*Familiar with Kafka and observability tools like Datadog or Grafana*Able to troubleshoot issues across infrastructure More ❯
Senior SiteReliability Engineer London - Hybrid £80,000 - £90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension Excellent opportunity for SiteReliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression! This company … With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries. In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems. The ideal … and conduct chaos engineering experiments Monitor and maintain Kafka clusters for performance and reliability Respond to and resolve application-level production incidents The Person: 5+ years in SRE, DevOps, or infrastructure engineering Strong experience with AWS, EKS/Kubernetes, and Terraform Familiar with Kafka and observability tools like Datadog or Grafana Able to troubleshoot issues across infrastructure More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment
Senior SiteReliability Engineer London - Hybrid £80,000 - £90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension Excellent opportunity for SiteReliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression! This company … With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries. In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems. The ideal … and conduct chaos engineering experiments *Monitor and maintain Kafka clusters for performance and reliability *Respond to and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering *Strong experience with AWS, EKS/Kubernetes, and Terraform *Familiar with Kafka and observability tools like Datadog or Grafana *Able to troubleshoot issues across infrastructure More ❯
Employment Type: Permanent
Salary: £80000 - £90000/annum 38 Days Holiday, Healthcare, Pension
Join us as an Oracle SiteReliability Engineer to help us build and maintain resilient, high-performing systems in a fast-paced financial services environment. If you're passionate about automation, observability, and continuous improvement, we'd love to hear from you. To be successful as a Oracle SiteReliability Engineer, you should have experience with … Significant experience in SiteReliabilityEngineering, DevOps, Oracle Tunning or Infrastructure-ideally gained over several years in financial services or investment banking. Good experience with Oracle databases, performance tuning and SQL Tunning, and shell scripting (e.g., Bash). Proficiency in scripting languages such as Python or Bash to automate workflows and reduce manual effort. Some other highly More ❯
Vacancy for Snr SiteReliability Engineer (SRE) at Preservica Abingdon/Remote, UK About You You have a proven track record in DevOps and software development, with a passion for creating reliable solutions to deploy software at scale and speed. You are eager to challenge the status quo, learn, and adopt new technologies. Excellent communication skills across all … Our team is small but growing, so self-motivation, organization, and the ability to multitask and prioritize are crucial. The Role Serve as a primary visionary for DevOps/SiteReliabilityEngineering across the entire technology organization. Eliminate process bottlenecks to enable frictionless, reliable, and high-velocity feature development through automation of Build, Test, Deploy, and Operate More ❯
Luupli started internal testing since June 2024 and getting ready for a commercial BETA testing from December 2024, with the hope of launching fully summer of 2025 Job Title: SiteReliability Platform Engineer About Luupli: Luupli is a social media app that has equity, diversity, and equality at its heart. We believe that social media can be a … made up of passionate and dedicated individuals who are committed to making Luupli a success. Role Description: We are seeking a talented and experienced SiteReliability Engineer (SRE) to join our team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure and services, primarily hosted … Terraform, and proficiency in scripting with Python or Bash, we invite you to apply for this exciting opportunity. Role and Responsibilities: 1. Infrastructure Design and Automation: - Collaborate with software engineering and operations teams to design, build, and maintain cloud-based infrastructure using AWS and Terraform. - Implement and enhance infrastructure-as-code (IaC) practices using Terraform to ensure reproducibility and More ❯
Senior SiteReliability Engineer At UnlikelyAI, we are building the future of AI: one that is reliable, accurate and transparent. Our neurosymbolic technology harnesses the power of LLMs and generative AI, and combines it with classical symbolic technology to produce hallucination-resistant artificial intelligence for high-trust applications. To support our rapidly increasing commercial momentum, we're looking … for an experienced and pragmatic sitereliability engineer to join our exceptional team. This role is ideal for someone who has successfully scaled systems from prototype to production and enjoys working in cross-functional teams to champion cloud-native engineering. We are looking for someone with the experience and expertise to define, and own, our approach to building … for reliability and security as first-class citizens. This is a strategically important role for our technology team, as we rapidly approach entering full production in multiple projects. You'll work on a range of customer-facing and internal infrastructure projects, applying your engineering skills to solve complex reliability and scalability challenges. Your ability to build robust More ❯
schools more joyful places to work, as well as learn. About the role We are looking for an enthusiastic and proactive SiteReliability Engineer to join our SRE team and help us ensure we provide world-class resilience and performance across the platform. The remit and focus of the role is to advise on all aspects of site … and backups Conduct assessments of capacity and plan for scaling to meet current and future business needs. Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions. Work closely with the Platform team, feature teams and, 2nd line support and other stakeholders to ensure a good level of service is provided … for our customers and embed SRE practices. Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime. Participate in blameless postmortems to identify root cause and corrective actions Develop and maintain playbooks and documentation About you Experience in performance monitoring and analysis Capacity planning experience Scripting and automation skills, with experience in relevant technologies. Experience More ❯
Fancy being our next SRE Superstar? SiteReliability Engineer (SRE) Sunderland (Hybrid) Full-time Alright, listen up! Here at Tombola, we're not just about bingo - we're about brilliant tech, seamless experiences, and keeping millions of players happy. And to do that, we need a SiteReliability Engineer who's as excited about rock-solid … working hand-in-hand with our dev, infra, and security teams, making sure we balance exciting new features with unbeatable stability. What you'll be getting up to: System Reliability & Availability Hero: You'll be the guardian of our uptime, making sure our critical systems are always available and hitting those all-important SLAs . You'll also be … tech and better ways of doing things, constantly pushing us to improve system reliability, performance, and efficiency. Sound like a bit of you? If you're an experienced SRE with a passion for building reliable, scalable, and efficient systems, and you love working in a fun, collaborative environment, then we want to hear from you! Ready to join the More ❯
SiteReliability Engineer/SRE Location: Remote (UK-based) - background checks required Salary: £45,000 - £55,000 (negotiable) Profectus Recruitment is partnered with a rapidly growing tech business seeking a skilled SiteReliability Engineer to help keep their high-volume systems secure, reliable, and high-performing. The Role: You'll design, maintain, and improve applications and More ❯