About the opportunity We are seeking a SiteReliabilityEngineer to join the Platform Engineering domain in the AI Platform team. The mission of Platform Engineering is to provide trusted, performant, self-service platforms that empower product teams to build 'the bank the world loves to use.' The AI Platform team contributes to this mission by creating More ❯
About the opportunity We are seeking a Senior SiteReliabilityEngineer to join the Platform Engineering Domain in the AI Platform Team. The mission of Platform Engineering is to provide trusted, performant, self-service platforms that empower product teams to build 'the bank the world loves to use.' The AI Platform team contributes to this mission by More ❯
Job Title: Cloud Engineer/SRE - Golang & Github Location: Remote - UK, London Salary/Rate: Up to £690 a day Inside IR35 Start Date: August 2025 Job Type: 12 Month Contract Company Introduction: We are seeking a highly skilled Cloud Engineer/SRE with Development experience in Go and Github to join our client in the Global Analytical … Risk sector. We are seeking a highly skilled and motivated Cloud Engineer/SRE to join our newly formed Enterprise GitHub Operations & Tooling team. This is a foundational role where you will be instrumental in designing, building, and managing the core services and tooling that underpin our extensive use of GitHub Enterprise. You will be responsible for developing code … deploying, managing) GitHub Actions (designing complex workflows, custom actions) GitHub Enterprise, Organization and Repository settings. Operations/Infrastructure Background: Proven experience in an operations, sitereliability engineering (SRE), or infrastructure engineering role, with a strong appreciation for automation and stability. Modern SDLC Practices: Familiarity with: Dependency management. Security remediation processes and secure coding practices. Testing frameworks and methodologies. More ❯
We are seeking an exceptional technology leader to oversee our global s ite reliability engineering ( SRE), DevOps, and Platform Engineering teams. This hands-on engineering leadership role requires someone who can both provide technical vision and build strong stakeholder relationships across the organization. The ideal candidate will bring a combination of deep technical expertise, strategic thinking, and people leadership … Leadership: Serve as a hands-on technical leader who can architect, design, and guide the implementation of highly resilient systems Build a compelling vision and strategic roadmap for our SRE, DevOps, and Platform Engineering functions Establish and evangelize engineering best practices across teams and the wider organization Drive technical innovation while ensuring operational excellence Provide architectural guidance to ensure systems … initiatives, capabilities, and constraints Required Skills & Experience: Extensive experience in engineering leadership roles Strong hands-on technical background in cloud platforms, containerization, and modern DevOps practices Demonstrated experience leading SRE, DevOps, or Platform Engineering teams Deep understanding of system architecture, resilience patterns, and high-availability design Experience developing strategic roadmaps and executing technical vision Proven ability to build and maintain More ❯
deployments as well as accurate health monitoring through all our clients, both new and old. The person in this role will join the SiteReliability Engineering team (SRE). The main role of the SRE team is to facilitate the scalability of Dayshape and allow us to meet the demands of an increasing client base. What you'll … do Lead initiatives to enhance Dayshape's ability to scale our cloud platform Maintain and improve our cloud estate in Azure Improve SRE and other teams' working lives through automation of manual tasks Lead in making the deployment of Dayshape more scalable Increase our knowledge sharing of SRE across the organisation Improve the observability of Dayshape through reporting and tool More ❯
cloud environments Reliability Engineering: Lead initiatives to improve system reliability, establish SLOs, and implement monitoring and alerting strategies Team Leadership: Build, mentor, and grow a high-performing SRE team while fostering a culture of innovation and continuous improvement Incident Management: Establish and optimize incident response processes, lead major incident reviews, and drive systematic improvements Automation Development: Spearhead automation … operations and improve system reliability Performance Optimization: Lead projects to optimize system performance, capacity planning, and cost efficiency Cross-team Collaboration: Work closely with development teams to implement SRE best practices and drive operational excellence Technical Strategy: Develop and execute technical roadmaps aligned with business goals and scaling requirements Security Integration: Ensure security best practices are embedded in infrastructure … service providers Operational Excellence: Drive continuous improvement in operational processes, tooling, and methodologies What you bring to the role: Technical Leadership Experience: 5+ years of experience leading and managing SRE/DevOps teams, with a proven track record of improving system reliability and performance Architectural Vision: Deep understanding of distributed systems, cloud platforms (AWS/GCP/Azure), and More ❯
SiteReliability Engineering/DevOps Engineer Are you enthusiastic about designing and managing cloud platforms? Do you find satisfaction in ensuring the reliability and performance of complex systems? About Team: The LexisNexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual … area or product line. It contributes directly to project plans, schedules, and methodologies for implementing cross-functional software assets and infrastructure. Responsibilities include cloud platform design across multiple systems, SRE activities, mentoring less-experienced team members, and collaborating with users, customers, and stakeholders to translate their requirements into effective solutions. Additionally, it focuses on fostering a culture of innovation and … and orchestration tools (e.g., Docker, Kubernetes/EKS). Proficiency in scripting languages (e.g., Python, Bash, TypeScript, PowerShell). Knowledge of networking concepts and security best practices. Familiarity with SRE activities and best practices. Familiarity with DevOps practices and tools. Experience with monitoring and logging tools (e.g., DataDog, Coralogix, AWS CloudWatch, Azure Monitor). Excellent problem-solving and stakeholder management More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
to gemstone supplies They have a presence in London, Hong Kong, Amsterdam, and as well in Mumbai and now in New York in 2001. About the role : As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution along with team building and … tooling. Drive automation initiatives to streamline operational workflows and improve efficiency. Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability. Build a first class SRE team. Through a combination of leading by example, coaching and mentoring, mould the team would want to have around you. Provide leadership and guidance to the SRE team, fostering a … culture of collaboration, innovation, and continuous improvement. RESPONSIBILITIES: Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with More ❯
developer experience to go with it. The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform. SiteReliability Engineering at Duffel As an SRE at Duffel, you'll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will be … silently drop spans. - An enthusiasm for both software development and systems engineering. - A high bar for code and configuration quality and readability. - A good understanding of current observability and reliability practices. - Experienced and comfortable in running incident response. - Big picture thinking - you can make trade offs on technical work streams against business impact. - Fantastic communication skills. You're able … We manage a data pipeline using Pub/Sub, Airbyte, and dbt. Our Current Focus We're currently driving a big shift in how we think about and monitor reliability across the engineering organisation, with a focus on early detection of customer-impacting issues. We're extending and standardising our use of OpenTelemetry, and introducing Honeycomb as the single More ❯
impact. We value continuous learning, personal growth, and providing our team with resources to succeed. Ready to shape the future? Let's talk. We're looking for a seasoned SRE with a front-end focus, expert in React applications, to join our SRE team. In this role, you'll ensure the reliability, performance, and operability of our React-based … invalidation, HTTP caching headers) to reduce latency and origin load. Collaborate with UX teams to balance feature richness with performance targets. Collaboration & Knowledge Sharing Serve as the React/SRE subject-matter expert: mentor engineers on best practices for building resilient front-ends. Produce and maintain runbooks, debugging guides, and incident-playbooks specific to client-side failures. Partner closely with … wider backend SRE, DevOps, and product teams to ensure end-to-end reliability. Enhanced leave - 38 days inclusive of 8 UK Public Holidays. Private Health Care including family cover. Life Assurance - 5x salary. Flexible working - work from home and/or in our London Office. Employee Assistance Program. Company Pension (Salary Sacrifice options available). Access to training and development. More ❯
Our client is looking for a number of Principle SiteReliability Engineers to join their team on a initial six month contract, working a couple days onsite in Wokingham a week and the rest remotely. This role is Inside IR35 and require a candidate with an active SC clearance. Key Responsibilities Lead and drive platform-first initiatives to … improve scalability, reliability, and performance. Design, build, and maintain resilient infrastructure supporting distributed systems. Implement monitoring and alerting systems to ensure high availability and performance. Collaborate with engineering teams to enhance system reliability and mitigate risks. Develop and maintain CI/CD pipelines for seamless deployment and release management. Continuously evaluate and recommend improvements to platform infrastructure and … to appointment which can take up to a minimum 10 weeks. LA International is a HMG approved ICT Recruitment and Project Solutions Consultancy, operating globally from the largest single site in the UK as an IT Consultancy or as an Employment Business & Agency depending upon the precise nature of the work, for security cleared jobs or non-clearance vacancies More ❯
Are you a passionate Software Engineer looking for an exciting new challenge? Join this team and transition into maintaining and enhancing the reliability of one of the world's largest platforms. In this role, you will utilise your expertise in Golang coding to develop robust applications, ensuring the systems remain resilient, scalable, and efficient. If you thrive in … presence and commitment to innovation, you will have the opportunity to work on projects that reach millions of users, making a real difference in the tech world. As a SiteReliabilityEngineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will monitor and optimise system performance with tools such as … Grafana, Prometheus, New Relic, and Splunk. Your role will involve identifying and resolving reliability issues, automating processes, and ensuring the seamless operation of the platform. If you have a passion for technology and a drive to ensure excellence, we would love to hear from you More ❯
Our client is looking for a number of hands on SiteReliability Engineers to join their team on a initial six month contract, working a couple days onsite in Wokingham a week and the rest remotely. This role is Inside IR35 and require a candidate with an active SC clearance. Key Responsibilities Detect and mitigate system issues to … to appointment which can take up to a minimum 10 weeks. LA International is a HMG approved ICT Recruitment and Project Solutions Consultancy, operating globally from the largest single site in the UK as an IT Consultancy or as an Employment Business & Agency depending upon the precise nature of the work, for security cleared jobs or non-clearance vacancies More ❯
MySQL, Vue.js, and AWS. Participating in an on-call roster is required as part of this role. This is a hybrid role with 2 days in the office. Senior SRE Position We are seeking a Senior SRE with experience of working with scaled SaaS production infrastructure. The successful candidate will work as part of a team focused on sitereliability, security, and scalability as we manage our rapid growth. Monitoring the above environments and reacting to alerts and issues that may arise in day-to-day operation of their product line. They will participate in an on-call rota for priority-1 level health, security, stability, and uptime of production, staging, and development environments. More ❯
You'll Do Location: London, England Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, and securely. Exemplify cloud-native sitereliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. More ❯
along the way! Job Summary We have built Curve Dental into an industry-leading provider of beautiful cloud software for the dental industry. Who We're Looking For Our SiteReliability Engineers (SREs) are passionate about automation and its power to streamline the deployment and operation of software. They collaborate closely with developers to support a wide range More ❯
no wonder that leading organizations, like Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications. We are looking for an experienced Staff Engineer for our SRE, InfraSec team , to guide the security of our cloud-based infrastructure. As a Staff SRE , you will be very hands-on technically while also mentoring a small team of SREs. … to ensure that our infrastructure adheres to the highest security standards. They build essential security infrastructure and implement controls that reinforce the platform's security posture. This is an SRE team, which means you can expect a highly hands-on approach, tackling the technical challenges of implementing large scale solutions. This team is deeply involved in the technical aspects of … monitoring and anomaly detection. Security Tooling: Evaluate, implement, and manage cloud-native security tools and platforms for endpoint security, identity management (IAM), and CSPM. Qualifications: Experience: 7+ years in SRE, infrastructure engineering or similar, with a strong focus on security, including 2+ years in a senior or staff engineering role. Security Mindset: Deep understanding of cloud environment security, from OS More ❯
Senior Software Engineer/SRE - Application Middleware Location London Business Area Engineering and CTO Ref # Description & Requirements Are you passionate about building high-performance systems that are fast, resilient, and operate at global scale? Join Bloomberg's Application Middleware SRE team, where you'll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal … running smoothly for hundreds of thousands of users around the world. We're not your typical SRE team. We're embedded in a group that powers real-time connectivity, and we own systems where uptime isn't just important-it's essential to the global financial system. This is your opportunity to engineer resilience at scale, automate critical infrastructure … and shape reliability practices across one of the world's most powerful tech platforms. The Team We're the SiteReliability Engineering team within Bloomberg's Application Middleware group. Our mission: ensure that Bloomberg's core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency More ❯
Senior Software Engineer/SRE - Application Middleware Location London Business Area Engineering and CTO Ref # Description & Requirements Are you passionate about building high-performance systems that are fast, resilient, and operate at global scale? Join Bloomberg's Application Middleware SRE team, where you'll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal … running smoothly for hundreds of thousands of users around the world. We're not your typical SRE team. We're embedded in a group that powers real-time connectivity, and we own systems where uptime isn't just important-it's essential to the global financial system. This is your opportunity to engineer resilience at scale, automate critical infrastructure … and shape reliability practices across one of the world's most powerful tech platforms. The Team We're the SiteReliability Engineering team within Bloomberg's Application Middleware group. Our mission: ensure that Bloomberg's core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency More ❯
top AI computing platform. We equip engineers with the tools to deploy AI that is fast, secure, affordable, and built to scale. Whether they need powerhouse GPU hardware on-site or the flexibility of cloud-based solutions, we've got the horsepower to make it happen. Lambda's AI Cloud has been adopted by the world's leading companies … performance through the use of network engineering and other applicable technologies Help with deploying and maintaining network monitoring and management tools You Have 5+ years of experience being SWE, SRE or Network Reliability Engineering Been part of the implementation of production-scale networking projects Experience being on-call and incident response management Have experience building and maintaining Software Defined More ❯
Senior Software Engineer/SRE - Application Middleware Location London Business Area Engineering and CTO Ref # Description & Requirements Are you passionate about building high-performance systems that are fast, resilient, and operate at global scale? Join Bloomberg's Application Middleware SRE team, where you'll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal … running smoothly for hundreds of thousands of users around the world. We're not your typical SRE team. We're embedded in a group that powers real-time connectivity, and we own systems where uptime isn't just important-it's essential to the global financial system. This is your opportunity to engineer resilience at scale, automate critical infrastructure … and shape reliability practices across one of the world's most powerful tech platforms. The Team We're the SiteReliability Engineering team within Bloomberg's Application Middleware group. Our mission: ensure that Bloomberg's core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency More ❯
We are seeking a skilled Azure Cloud DevOps Engineer to join our team. The ideal candidate will have a strong background in DevOps practices, cloud solutions, and network engineering in Microsoft Azure. This role involves maintaining and developing a cloud environment that hosts mission critical financial services applications used across Australia and New Zealand. This role is pivotal for … in Computer Science, Information Technology, or a related field. At least one of the below certifications: Microsoft Certified: Azure Administrator Associate Microsoft Certified: Azure Developer Associate Microsoft Certified: DevOps Engineer Expert Microsoft Certified: Azure Network Engineer Associate Cisco Certified Network Associate (CCNA) Additional Information What We Offer Hybrid work model 20 days of annual leave Comprehensive medical and … countries, FORTUNE Best Companies to work and Glassdoor Best Places to Work (globally 4.4 Stars) to name a few. Check out Experian Life on social or our Careers Site to understand why. Experian is proud to be an Equal Opportunity and Affirmative Action employer. Innovation is a critical part of Experian's DNA and practices, and our diverse workforce More ❯
The role As a Site reliabilityengineer you will focus on improving stability and security aspectsof the technical stackofQuorso by: Owning monitoring and logging integrations, as well as alerting capabilities by improving andautomating currently manual processes Identifying andlogging discovered performance and security related issues Working on remediation for the discovered issues related to backend and infrastructural layers, as well as … Stores technology simplifies retailers' data into daily Next Best Actions ("Missions") for every store, guaranteed to engage teams and drive sales. We're an Enterprise platform, targeting large multi-site retailers. We're growing fast with some of the largest retailers in the world already using Quorso to react faster and become more Agile in the face of a … our investors include CEOs and Chairpersons of a number of the 100 largest companies in the world. Requirements Experience of working in the role or as backend/devops engineer for at least 4 years on projects using Ruby, SQL and Kubernetes Ability to quickly learn platform stack andstaying up to date with ongoing development Experience in proactive implementation More ❯
shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on . About The Job Mistral AI is seeking an Applied AI Engineer focused on DevOps to facilitate the adoption of its products among customers and collaborate with them to address complex technical challenges. Applied AI Engineers, ML Infra at Mistral AI … in English • You hold a Bachelor's or Master's degree in Computer Science, Engineering, or a related field • You have 2+ years of experience in a DevOps or SiteReliability Engineering role • You're experienced with deploying and managing AI-based products in production environments • You are fluent in Python • You have experience with containerization technologies such … You hold strong communication skills with an ability to explain complex technical concepts in simple terms to technical and non-technical audiences Ideally you have: • Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect, or Technical Product Manager • Familiarity with AI frameworks such as PyTorch or TensorFlow • Contributions to open-source projects, particularly in More ❯
Wokingham, England, United Kingdom Hybrid / WFH Options
eTeam
We are a Global Recruitment specialist that provides support to the clients across EMEA, APAC, US and Canada. We have an excellent job opportunity for you. Role Title: Principal SRE Location: Wokingham (Reading). Hybrid, 60% remote and 40% onsite Duration: Until 30/01/2026 Rate: £580 per day Inside IR35 through an Umbrella Company C ontractor Must … Hold Active SC Clearance Role Description: Key Responsibilities: Lead and drive platform-first initiatives to improve scalability, reliability, and performance. Design, build, and maintain resilient infrastructure supporting distributed systems. Implement monitoring and alerting systems to ensure high availability and performance. Collaborate with engineering teams to enhance system reliability and mitigate risks. Develop and maintain CI/CD pipelines More ❯