SiteReliability Engineer, ML Infrastructure, Large Models SRE link Copy link corporate_fare Google place London, UK Mid Experience driving progress, solving problems, and mentoring more junior team members; deeper expertise and applied knowledge within relevant area. Apply link Copy link Bachelor's degree in Computer Science or a related technical field or equivalent practical experience. 5 years … Models/Machine Learning tooling and infrastructure. Experience in automation, monitoring, and incident response. Experience in C++, Java, Python, or Go. Understanding of SiteReliabilityEngineering (SRE) principles and best practices. Excellent communication, project and stakeholder management skills. About the job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and … run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing More ❯
reliability of all cloud systems while keeping levels of manual work low. SREs are expected to be experienced in software engineering principals, operational discipline, and automation. The SRE team work on a fully remote basis and work in conjunction with their US and Australian teams as well. This company are a market leader in Student community management software … ensure high availability and performance Collaborate with product engineering teams to design/build fit-for-purpose and observable software Required Skills and Experience: Proven experience in a SRE/DevOps/Platform Engineering role and having previously worked in a Software Engineering role in .Net and C# or Java or similar OO development language. Proficiency in … and this job is part of a large program of change and improvement in their Cloud SaaS products over the coming years. If you are looking for an interesting SRE role with a forward-thinking global organisation, then this would be a tremendous career opportunity to consider. Please apply with your CV to find out more. More ❯
Nottingham, Nottinghamshire, United Kingdom Hybrid / WFH Options
Capital One (Europe) plc
Nottingham, Nottinghamshire Senior Software Development Engineer - SiteReliability About the Role We're looking for a Senior Engineer to join our SiteReliabilityEngineering (SRE) team. This role is ideal for a skilled Java engineer with a passion for understanding how complex systems work, analysing performance, and applying engineering solutions to make them more … efficient, stable, and scalable. You'll lead on planning and implementing key SRE initiatives, optimise and automate how our systems operate, and improve observability through better monitoring and logging. You'll also work closely with your peers to drive consistency and high standards across SRE and the wider engineering community, so a real enthusiasm for influencing others and leading … to reduce operational overheads through observability and service automation. Drive engineering best practice (e.g., Operational Excellence, Security, Quality, Resilience etc.) and set standards across the team and wider SRE community. Innovate within your team and contribute within your technical domain. Deliver key pieces of intent from inception through to design and hands-on delivery, in collaboration with your SREM. More ❯
Milton Keynes, Buckinghamshire, England, United Kingdom
Noir
SiteReliability Engineer (SRE) - Market leading company - Milton Keynes (Tech stack: .Net, C#, ASP.Net Core, SQL Server, PowerShell, Azure CLI, Bash, Azure DevOps, Jenkins, GitHub Actions, Docker, Kubernetes) Help shape the tech future of UK market leader! Backed by a major financial institution with soaring profits - my client is modernising platforms, embracing AI, and driving automation at scale. … We're hiring a Lead SiteReliability Engineer (SRE) to drive reliability, observability, and performance across our Azure cloud infrastructure. You'll work in a modern engineering environment where we live by "you build it, you run it", focused on automation, scale, and resilience. Tech stack you'll work with: .NET, C#, ASP.NET Core, SQL Server … PowerShell, Azure CLI, Bash, Azure DevOps, Jenkins, GitHub Actions, Docker, Kubernetes We want to hear from you if: As a SiteReliability Engineer (SRE) you've delivered scalable systems using .NET, C#, and ASP.NET Core , with real-world experience managing production workloads You've automated operations using PowerShell, Azure CLI, and Bash to reduce toil and boost efficiency More ❯
Prestigious opportunity with a Global Investment Giant for a SiteReliabilityEngineering (SRE) Manager to be based in our Manchester HQ, leading a talented team of engineers dedicated to maintaining and enhancing the reliability of our systems.Working closely with cross-functional teams across the globe, including business stakeholders, product managers, and software engineers, you will ensure … role has an opportunity to provide strategic guidance on improvements. At the forefront of providing production support services including, incident logging, incident resolution, problem management, change management practices, and SRE support, we are inviting you to join our success story.As our SiteReliabilityEngineering Manager you will:- Lead, coach, and develop a high-performing SRE team. Foster … for incident response, root cause analysis, and post-mortem reviews to prevent future incidents. Work closely with business and technology teams to understand their needs and ensure alignment with reliability and uptime goals. Facilitate communication and collaboration across global teams. Drive the development and adoption of automation tools to improve efficiency and reduce manual intervention. Establish and maintain comprehensive More ❯
Prestigious opportunity with a Global Investment Giant for a SiteReliabilityEngineering (SRE) Manager to be based in our Manchester HQ, leading a talented team of engineers dedicated to maintaining and enhancing the reliability of our systems. Working closely with cross-functional teams across the globe, including business stakeholders, product managers, and software engineers, you will … role has an opportunity to provide strategic guidance on improvements. At the forefront of providing production support services including, incident logging, incident resolution, problem management, change management practices, and SRE support, we are inviting you to join our success story. As our SiteReliabilityEngineering Manager you will:- Lead, coach, and develop a high-performing SRE team. … for incident response, root cause analysis, and post-mortem reviews to prevent future incidents. Work closely with business and technology teams to understand their needs and ensure alignment with reliability and uptime goals. Facilitate communication and collaboration across global teams. Drive the development and adoption of automation tools to improve efficiency and reduce manual intervention. Establish and maintain comprehensive More ❯
automotive software development. The right candidate will have excellent communication skills, solid coding skills, expertise in building scalable, reliable, highly available and fault-tolerant systems, broad knowledge of software engineering and sitereliabilityengineering in areas such as Large-Scale Data and Compute Infrastructure, Stream Processing, Kubernetes, High-Performance Networking, Observability and Infrastructure Automation. RESPONSIBILITIES Set … maintain, optimize and support large scale, multi-region, multi-cloud compute and storage infrastructure powering our data platform and mission critical services. Work with fellow Data Infrastructure engineers and SiteReliability engineers to ensure our systems are scalable, reliable, fault-tolerant, highly available, highly performant, and observable. Manage incidents, triage product or system issues and debug/track …/resolve by analyzing the root cause of these issues and the impact on users & operations. Work closely with other Data Infrastructure engineers, SiteReliability engineers, ML Platform engineers, Computer Vision and ML engineers on high-impact projects to create innovative solutions to problems in the self-drive space. Mentor junior engineers in their day to day work More ❯
SiteReliability Engineer (SRE) Manager - Apple Services Engineering London, England, United Kingdom Software and Services Description Apple Service Engineering (ASE)'s Compute team is seeking highly motivated individual with strong technical and communication skills to join us in on our quest to build and enhance massive clusters hosting Virtual Machines, Containers and associated infrastructure that can … engage with the upstream community to drive Apple's requirements. Ultimately, you will help build the platform that delivers our applications at scale to our end users.As a Compute SiteReliabilityEngineering manager, you will be leading a team responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow … for new applications and services to flourish. Minimum Qualifications Extensive Leadership in Cloud Computing: In depth experience building and leading high-performing engineering teams, with a deep focus on cloud computing and hands-on experience across public and/or private cloud environments. Large-Scale Infrastructure Management: Proven ability to manage enterprise services in large-scale nix environments and More ❯
Technical Specialist - SiteReliability Engineer page is loaded Technical Specialist - SiteReliability Engineer 申请 locations Gurgaon Office FIL Bengaluru Office time type Full time posted on 发布于 2 天前 time left to apply 结束日期 2025年9月16日 (申请时间还剩 24 天) job requisition id J60278 About the Opportunity Job Type: PermanentApplication Deadline: 16 September 2025 Job Description Title Technical … analysts and investment operations staff in all international locations, including Canada, London, Hong Kong and Tokyo. About your role We are seeking a talented SiteReliability Engineer (SRE) to join our Technology team supporting critical applications within the ISS Production Services. This role blends traditional software engineering practices with reliability-focused operations, aiming to enhance the … scalability, availability, and performance of client- and market-facing applications. The SRE will work directly with application development, architecture, DevOps, and business teams to ensure systems are designed and maintained with reliability and performance in mind, while meeting the demanding requirements of financial services operations. About you Define and manage SLOs, SLIs, and error budgets aligned with business goals. More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
Click Dealer
Lead SiteReliability Engineer Location: Remote working *1 day in every 2 weeks at our Stoke-On-Trent office (5 mins from station). Salary: £Competitive + company benefits (Full time/permanent role) About Click Dealer At Click Dealer, we’re passionate about building software and building digital tools that make life easier for automotive dealerships - driving … of companies backed by the Global Investment firm, Carlyle Group. Today, we’re proud to be trusted by over 1,800 independent and franchise dealerships across the UK. The SRE Team We count on our sitereliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance levels to pursue their goals. About … the role This is a hands-on role. Working as part of the Software Engineering team, you will run the SRE function on a day-to-day basis working closely with the Head of Engineering providing SRE expertise whilst collaborating with cross-functional teams to develop real-world solutions and positive user experiences. The successful candidate will review More ❯
sector. This is initially a 12-month contract with the potential to extend and will be a hybrid role based in London. Our client is seeking an experienced DevOps SiteReliability Engineer to help build, maintain, and evolve modern technology platforms that deliver reliable, scalable, and secure services. This is a hands-on technical role that combines full … stack development, cloud engineering, and operational excellence - with a strong emphasis on automation, performance, and continuous improvement. You will work closely with development teams, infrastructure engineers, and business stakeholders to design, implement, and support resilient systems capable of adapting to rapidly changing requirements. This role is perfect for someone who enjoys solving complex problems, driving technical innovation, and collaborating … consistency, and reliability Implement advanced monitoring, alerting, and self-healing capabilities to ensure high system availability Partner with development teams to integrate SiteReliabilityEngineering (SRE) best practices into all stages of the software lifecycle Troubleshoot complex production issues, perform root cause analysis, and implement permanent solutions Lead and contribute to continuous improvement initiatives across infrastructure More ❯
Halifax, Yorkshire, United Kingdom Hybrid / WFH Options
Lloyds Bank plc
Google Product SiteReliability Engineer page is loaded Google Product SiteReliability Engineer Apply locations Halifax Trinity Road time type Full time posted on Posted Yesterday time left to apply End Date: September 5, 2025 (13 days left to apply) job requisition id 140613 End Date Thursday 04 September 2025 Salary Range … support flexible working - click here for more information on flexible working options Flexible Working Options Hybrid Working, Job Share Job Description Summary . Job Description JOB TITLE: Google Product SiteReliability Engineer SALARY: £70,929 - £78,810 (Halifax) £81,999 - £91,110 (London) LOCATION(S): Halifax or London HOURS: Full-time - 35 hours per week WORKING PATTERN: Our … Experience in automating/scripting to remove toil It would be great if you also had Candidates with direct experience in cloud engineering, with understanding and experience of SRE Principles and practice You'll be able to demonstrate: Ability to work with architectural, business and other engineers to shape, design and engineer solutions. Ability to work with a team More ❯
Has anyone actually ever given you a good description of what SRE is? Recently I've met dozens of companies implementing an SRE function. Half are just rebranding an ops team (because Ops ain't cool), some don't want to call the additional silo they have created 'DevOps' (because apparently that's the wrong thing to do) so they … re calling it SRE and the rest actually don't really know how to describe what they're doing. And if you can't describe it simply, you don't know what it is, chief (because Google do it, isn't the right answer). That was until today, when I met a company who actually white boarded their vision … process rather than the build. We discussed Kubernetes, Prometheus and API Gateways. Most importantly, they spoke like they knew what the hell they were on about. Not just about SRE, but on the whole Engineering process. This is a company with at the top of their game, who are about to introduce a brand new monitisation model to a More ❯
SiteReliability Engineer where you'll spearhead the evolution of our digital landscape, driving innovation and excellence. As a Microsoft SQL Database SiteReliability Engineer ( SRE) at Barclays, you will assume a key technical role. You will assist in shaping the direction of our database administration, ensuring our technological approaches are innovative and aligned with the … Bank's business goals. You will contribute high-impact projects to completion, collaborate with management, and implement SRE practices using software engineering and database administration to address infrastructure and operational challenges at scale. As part of the Database SRE team, you will be data-driven and work to eliminate TOIL through simplification, automation, and observability, thereby enhancing the reliability … have experience with: Technical specialisation with MS SQL expertise on version - SQL for complex database related issues from availability, to tuning to architecture on enterprise scale. Contribute shaping, designing SRE practice for MSSQL offering, delivering through SRE team. Serve as the technical escalation for complex database related issues, providing expert solutions. Assist establishment and evolution of the SRE function and More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
Senior SiteReliability EngineerLondon - Hybrid£80,000 - £90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension Excellent opportunity for SiteReliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression!This company operates … performance. With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries.In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems.The ideal candidate … and conduct chaos engineering experiments*Monitor and maintain Kafka clusters for performance and reliability*Respond to and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering*Strong experience with AWS, EKS/Kubernetes, and Terraform*Familiar with Kafka and observability tools like Datadog or Grafana*Able to troubleshoot issues across infrastructure More ❯
Senior SiteReliability Engineer London - Hybrid £80,000 - £90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension Excellent opportunity for SiteReliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression! This company … With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries. In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems. The ideal … and conduct chaos engineering experiments Monitor and maintain Kafka clusters for performance and reliability Respond to and resolve application-level production incidents The Person: 5+ years in SRE, DevOps, or infrastructure engineering Strong experience with AWS, EKS/Kubernetes, and Terraform Familiar with Kafka and observability tools like Datadog or Grafana Able to troubleshoot issues across infrastructure More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment
Senior SiteReliability Engineer London - Hybrid £80,000 - £90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension Excellent opportunity for SiteReliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression! This company … With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries. In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems. The ideal … and conduct chaos engineering experiments *Monitor and maintain Kafka clusters for performance and reliability *Respond to and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering *Strong experience with AWS, EKS/Kubernetes, and Terraform *Familiar with Kafka and observability tools like Datadog or Grafana *Able to troubleshoot issues across infrastructure More ❯
Employment Type: Permanent
Salary: £80000 - £90000/annum 38 Days Holiday, Healthcare, Pension
Join us as an Oracle SiteReliability Engineer to help us build and maintain resilient, high-performing systems in a fast-paced financial services environment. If you're passionate about automation, observability, and continuous improvement, we'd love to hear from you. To be successful as a Oracle SiteReliability Engineer, you should have experience with … Significant experience in SiteReliabilityEngineering, DevOps, Oracle Tunning or Infrastructure-ideally gained over several years in financial services or investment banking. Good experience with Oracle databases, performance tuning and SQL Tunning, and shell scripting (e.g., Bash). Proficiency in scripting languages such as Python or Bash to automate workflows and reduce manual effort. Some other highly More ❯
Join us as an Oracle SiteReliability Engineer to help us build and maintain resilient, high-performing systems in a fast-paced financial services environment. If you're passionate about automation, observability, and continuous improvement, we'd love to hear from you. To be successful as a Oracle SiteReliability Engineer, you should have experience with … Significant experience in SiteReliabilityEngineering, DevOps, Oracle Tunning or Infrastructure-ideally gained over several years in financial services or investment banking. Good experience with Oracle databases, performance tuning and SQL Tunning, and shell scripting (e.g., Bash). Proficiency in scripting languages such as Python or Bash to automate workflows and reduce manual effort. Some other highly More ❯
Vacancy for Snr SiteReliability Engineer (SRE) at Preservica Abingdon/Remote, UK About You You have a proven track record in DevOps and software development, with a passion for creating reliable solutions to deploy software at scale and speed. You are eager to challenge the status quo, learn, and adopt new technologies. Excellent communication skills across all … Our team is small but growing, so self-motivation, organization, and the ability to multitask and prioritize are crucial. The Role Serve as a primary visionary for DevOps/SiteReliabilityEngineering across the entire technology organization. Eliminate process bottlenecks to enable frictionless, reliable, and high-velocity feature development through automation of Build, Test, Deploy, and Operate More ❯
Senior SiteReliability Engineer At UnlikelyAI, we are building the future of AI: one that is reliable, accurate and transparent. Our neurosymbolic technology harnesses the power of LLMs and generative AI, and combines it with classical symbolic technology to produce hallucination-resistant artificial intelligence for high-trust applications. To support our rapidly increasing commercial momentum, we're looking … for an experienced and pragmatic sitereliability engineer to join our exceptional team. This role is ideal for someone who has successfully scaled systems from prototype to production and enjoys working in cross-functional teams to champion cloud-native engineering. We are looking for someone with the experience and expertise to define, and own, our approach to building … for reliability and security as first-class citizens. This is a strategically important role for our technology team, as we rapidly approach entering full production in multiple projects. You'll work on a range of customer-facing and internal infrastructure projects, applying your engineering skills to solve complex reliability and scalability challenges. Your ability to build robust More ❯
Fancy being our next SRE Superstar? SiteReliability Engineer (SRE) Sunderland (Hybrid) Full-time Alright, listen up! Here at Tombola, we're not just about bingo - we're about brilliant tech, seamless experiences, and keeping millions of players happy. And to do that, we need a SiteReliability Engineer who's as excited about rock-solid … working hand-in-hand with our dev, infra, and security teams, making sure we balance exciting new features with unbeatable stability. What you'll be getting up to: System Reliability & Availability Hero: You'll be the guardian of our uptime, making sure our critical systems are always available and hitting those all-important SLAs . You'll also be … tech and better ways of doing things, constantly pushing us to improve system reliability, performance, and efficiency. Sound like a bit of you? If you're an experienced SRE with a passion for building reliable, scalable, and efficient systems, and you love working in a fun, collaborative environment, then we want to hear from you! Ready to join the More ❯
SiteReliability Engineer/SRE Location: Remote (UK-based) - background checks required Salary: £45,000 - £55,000 (negotiable) Profectus Recruitment is partnered with a rapidly growing tech business seeking a skilled SiteReliability Engineer to help keep their high-volume systems secure, reliable, and high-performing. The Role: You'll design, maintain, and improve applications and More ❯
customer's systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission. The role holder: As an SRE, fundamentally you will be doing work that has historically been done … expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team's time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability … enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to. Participating in the wider DevOps/SRE community within the organisation. Competancies It is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm More ❯
SiteReliability Engineer with a strong focus on leadership and team management . Around 70% of this role is about building, mentoring and directing a high-performing SRE team, setting strategy and driving operational excellence. The remaining 30% will be hands-on involvement in AWS-based platforms, automation and performance tuning. Key Responsibilities Lead and develop a team … of SRE engineers, setting priorities, providing coaching and creating a culture of reliability and continuous improvement Define and own SRE strategy, standards and ways of working across the organisation Collaborate with engineering, operations and product teams to ensure seamless delivery and robust systems Oversee system reliability, availability and performance across large, business-critical platforms Provide technical guidance … GitLab, Concourse) and ensure AWS platforms meet operational best practice Produce regular reporting and communicate clearly with senior stakeholders Key Requirements Strong experience managing or leading engineering/SRE/DevOps teams in a complex environment Track record of mentoring, coaching and growing technical teams Excellent stakeholder engagement skills with the ability to influence at all levels Broad technical More ❯