london, south east england, united kingdom Hybrid / WFH Options
Nationwide Building Society
Senior Application Engineer London or Swindon Office Hybrid role - x2 days on site/x3 Work from home Nationwide is leveraging the power of Cloud, DevOps and Agile to bring teams together and create compelling Digital experiences for members of today and tomorrow. At the same time, we’re … solutions Knowledge of Financial services and the design and delivery of Conversational Banking solutions Knowledge or interest in SiteReliabilityEngineering (SRE) principles Our customer first behaviours put customers and members at the heart of how we work together. They are the set of behaviours that every More ❯
Senior Application Engineer London or Swindon Office Hybrid role - x2 days on site/x3 Work from home Nationwide is leveraging the power of Cloud, DevOps and Agile to bring teams together and create compelling Digital experiences for members of today and tomorrow. At the same time, we’re … solutions Knowledge of Financial services and the design and delivery of Conversational Banking solutions Knowledge or interest in SiteReliabilityEngineering (SRE) principles Our customer first behaviours put customers and members at the heart of how we work together. They are the set of behaviours that every More ❯
products and services are designed for people to be fearless, to be changemakers. What the role involves: As a SiteReliability Engineer (SRE), you are an integral part of our open-source project, ensuring the reliability, availability, and performance of our production systems. This role combines service … operation, systems engineering, and software engineering principles to operate and monitor services as well as create or maintain tools, automations, and infrastructure code that bolster the efficiency and resilience of our platform. Design, write, and deliver tools and software primarily using Python, Bash, Terraform, or Nix to improve … postmortems. Collaborate with the development teams to ensure that solutions are designed with customer experience, scalability, and performance in mind. Analyze system performance and reliability, offering recommendations for enhancement. Develop and uphold service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for our services. Participate in on More ❯
Performance Optimisation: Lighthouse, caching (Squid Cache, F5 Load Balancer) Security & Compliance: OWASP, UK-GDPR, PCI-DSS Infrastructure & Networking: SiteReliabilityEngineering (SRE), disaster recovery planning Key Responsibilities: Develop and manage CI/CD pipelines to streamline deployments. Monitor and optimise website performance, ensuring uptime and scalability. Implement … security best practices and compliance policies. Enhance system observability through logging, monitoring, and alerting. Collaborate with engineering teams to drive DevOps best practices. If you're interested, please apply by emailing me with a copy of your most up to date CV and your current availability so I may More ❯
Monitor and analyze security events, investigate incidents, and provide response and remediation support. Collaborate with cross-functional teams to integrate security controls into software engineering, business processes, and IT systems. Stay abreast of emerging threats and technologies, recommending enhancements to the company's security posture. Participate in security audits … Relic, or similar analytics and monitoring tools. Stay informed about modern DevOps concepts, infrastructure automation, cloud services, and SiteReliabilityEngineering (SRE) best practices. Manage endpoint security including antivirus, patch management, and encryption for devices. Ensure configuration and maintenance of network hardware, firewalls, VPNs, working closely with … ISO 27001, SOC 2, GDPR, PCI DSS). Experience in DevOps with a solid grasp of infrastructure automation, CI/CD, cloud infrastructure, and SRE principles. Proficiency with Datadog, New Relic, or comparable monitoring and analytics platforms. Familiarity with ITSM tools, endpoint management, and asset tracking solutions. Strong leadership, analytical More ❯
leadership - Managing senior-level stakeholder relationships - Leading technical delivery workstreams and supporting their direction - Implementing Agile, DevOps and SiteReliabilityEngineering (SRE) practices - Forming blended delivery teams with clients and third parties - Mentoring team members and developing programme and project management capabilities - Building and growing the PM … Excellent problem-solving and decision-making skills - Proven success delivering medium to large-scale programmes in complex environments - Strong understanding of Agile, DevOps and SRE - Experience working with Central Government, policing or law enforcement highly advantageous #LI-RJ1 Together, as owners, let's turn meaningful insights into action. Life at More ❯
which pronouns you use (For example: she/her, he/him, they/them, etc). At Bumble, SiteReliability Engineers (SRE) are responsible for ensuring the reliability, scalability and performance of software systems while bridging the gap between development, security and operations. We proactively manage … infrastructure provisioning. Monitor system health and performance, identifying and fixing issues Respond to system outages, troubleshooting root causes and implementing preventative measures Collaborate with engineering teams and security engineers to improve system reliability, security and performance Participate in on-call rotations Create and maintain documentation to improve knowledge … must Proficiency in at least Python or Golang programming languages Experience with CI/CD pipelines Strong proficiency with Kubernetes architecture Prior experience in SRE, System administration or DevOps roles Strong proficiency with Linux/Unix operating systems, including hands-on experience in configuration and troubleshooting Proficiency with using Puppet More ❯
Developer Engineering is a function of the CTO organisation. Our mission is to make it easy and enjoyable for software engineering teams to go from a business idea to delivering an innovative product solution. The main goals are to improve and upgrade our tools, streamline our processes, automate … and strengthen our controls, and help development teams adopt modern working methods. The Team The Codified Controls team within Developer Engineering is revolutionizing how we manage policies, standards, and controls through a company-wide "everything-as-code" initiative. This greenfield program offers a unique opportunity to significantly reduce process … engineers with proven experience in product-oriented environments and a demonstrated ability to empathize with users. The Role We're seeking a technically versatile Engineering Lead to join a new product team. You'll provide technical leadership and strategic direction, shaping the future of our software engineering efforts More ❯
with timely remediation of issues. Administer third-party service agreements (generators, UPS, batteries, etc.) ensuring service levels are met. Drive continuous improvement in infrastructure reliability, energy efficiency, and compliance. Governance, Compliance & Security Develop, implement, and audit security programs including emergency response, crisis management, physical security, and incident handling. Ensure … Required Experience & Skills 15+ years’ experience in telecoms, IT infrastructure, and data centre M&E operations, design, and buildouts. Proven leadership experience managing multi-site critical infrastructure and high-availability environments. Strong knowledge of data centre electrical and mechanical systems, ITIL processes, and facilities management. Experience with ISO standards … Compliance City & Guilds BS7671 – Latest Edition IOSH Managing Safely® Desirable Certifications (ISC)² CISSP PRINCE2 ISO 45001 Internal Auditor SiteReliabilityEngineering (SRE) qualifications Certified Data Centre Professional (CDCDP, CDCMP, CDCSP, CDCEP) MIET/IWFM Memberships Please apply now for a informal chat More ❯
SiteReliability … Engineer Fully Remote (UK Only) £60,000 – £65,000 We’re working with a fast-growing online retailer that’s hiring its first-ever SRE . This is your chance to take full ownership of reliability across a high-traffic, customer-facing platform – setting the standards, choosing the tools … incident response – with a direct line into the product, engineering, and security teams. The business is investing heavily in performance, uptime, and scalability. SRE is a key part of that strategy. Tech stack includes: AWS, Azure, Docker, Kubernetes, Prometheus, Grafana, Linux, and Cloudflare – but there’s full freedom to More ❯
leading organizations, like Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications. We are looking for an experienced Lead for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Lead SRE, you will be very hands-on technically while also directly … adheres to the highest security standards. They build essential security infrastructure and implement controls that reinforce the platform's security posture. This is an SRE team, which means you can expect a highly hands-on approach, tackling the technical challenges of implementing large scale solutions. This team is deeply involved … implement, and manage cloud-native security tools and platforms for endpoint security, identity management (IAM), and CSPM Qualifications: Experience: 7+ years of experience in SRE, infrastructure engineering or similar role, with a strong focus on security work, with ideally 2+ years in a leadership or senior engineering role More ❯
the role: We are looking for a highly capable and experienced SiteReliability Engineer to join our growing tech team. As an SRE you will be a hands-on coach for the development teams maintaining and improving our solutions' reliability. You will be part of our DevOps team … but spend most of your time working closely with the engineering teams. Our ideal candidate will be passionate about best practices within technology teams, fully supportive of what the group is doing, and who wishes to make a difference. Responsibilities: Work with the development teams to build robust and More ❯
within creative or product-led software organisations (SMEs preferred). Expert level knowledge of AWS, IaC, Pipelines, and Containers. 10+ years of experience in sitereliabilityengineering, systems engineering, or a related field, with a focus on cloud migration and modernization. Deep hands-on knowledge of … the target architecture and migration roadmap. Hybrid Architecture Design: Create scalable, secure hybrid cloud solutions that integrate on-premise infrastructure with cloud services. Ensure Reliability: Maintain and enhance the reliability and availability of key systems throughout the transformation. Champion Automation & IaC: Promote Infrastructure as Code (e.g., Terraform, CloudFormation More ❯
london, south east england, united kingdom Hybrid / WFH Options
Tenth Revolution Group
within creative or product-led software organisations (SMEs preferred). Expert level knowledge of AWS, IaC, Pipelines, and Containers. 10+ years of experience in sitereliabilityengineering, systems engineering, or a related field, with a focus on cloud migration and modernization. Deep hands-on knowledge of … the target architecture and migration roadmap. Hybrid Architecture Design: Create scalable, secure hybrid cloud solutions that integrate on-premise infrastructure with cloud services. Ensure Reliability: Maintain and enhance the reliability and availability of key systems throughout the transformation. Champion Automation & IaC: Promote Infrastructure as Code (e.g., Terraform, CloudFormation More ❯
Shazam SiteReliability Engineers are not just responsible for making sure all services and systems that Shazam relies on are operating at their highest level; they're also responsible for helping development teams embrace these principles … as they develop software. Shazam SREs embed themselves with development teams and act as extensions of those teams to propagate best practices. As an SRE, you'll collaborate with development teams to help them understand the bigger picture of distributed systems, beyond individual components. We are strong believers in ownership … with software engineers being responsible for the code they write. The SRE team helps build the competencies across teams to ensure we build scalable and supportable systems. This role sits in our London office reporting to our Head of SRE. The successful candidate will be assisting multiple development teams based More ❯
Out in Science, Technology, Engineering, and Mathematics
Your Impact As a contributor in the APX SRE organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with … the APX SRE organization, but your technical deliverables will reach the entire engineering organization to enable product teams to continuously deliver features on the vanguard of innovation. What You'll Do Location: London, England. Build robust, easy-to-use foundational platforms and tools that enable engineering teams to … provision services rapidly, consistently, and securely. Exemplify cloud-native sitereliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. Influence and educate the engineering organization to adopt new More ❯
are seeking an experienced Platform Engineering leader with a hands-on engineering background, who can articulate the business benefits that Observability and SRE provide to our clients and take on the responsibility of handling client engagements from both technical and business perspectives. Requirements: We are ideally looking for … someone with a strong background and experience in the following: Observability and SRE Practices: In-depth understanding of observability and SiteReliabilityEngineering practices. Familiarity with tools in the LGTM stack (Loki, Grafana, Tempo, Mimir) or equivalent observability platforms. Containerisation: Strong experience building and managing containerised applications … We help brands across the globe design and build innovative products, platforms, and digital experiences for the modern world. By integrating experience design, complex engineering, and data expertise-we help our clients imagine what's possible, and accelerate their transition into tomorrow's digital businesses. Headquartered in Silicon Valley More ❯
Blip is a leading tech company focused on software engineering solutions for sports entertainment. We operate at scale. As part of Flutter Entertainment, we play an essential role in the Group's goal of becoming the global leader in online sports betting and iGaming, developing innovative products and platforms … create or leverage). Experience being "on-call" for a service, and familiarity with incident notification tooling (ex. Pagerduty, Opsgenie). Comprehensive understanding of SRE principles (ex. Working knowledge of the Google SRE book). Demonstrated strength in leading a project in an agile/scrum environment. Thrives in a … maintained a system and culture that supported and implemented SLOs. Has shown to be a thought leader contributing to the broader industry conversation about SRE principles and topics (ex. Speaking at conferences). This is what you should have. What do we have, you ask? Well you can check our More ❯
SiteReliability Engineer - Field Operations London, UK C3 AI (NYSE: AI), is the Enterprise AI application software company. C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing, deploying, and operating enterprise AI applications, C3 AI … to streamline system updates and upgrades. Set up critical infrastructure, tools, and framework to streamline the deployment cycle. Work cross-functionally with Services and Engineering teams. Qualifications: Bachelor's degree in a Science, Technology, Engineering or Mathematics (STEM), or comparable area of study. Demonstrated experience in deploying, managing More ❯
empowering development teams by creating toolchains, guidelines, and standards. Our focus is on enabling seamless automation and CI/CD, comprehensive observability, and unwavering reliability in a secured cloud-native environment. The Opportunity The Staff Engineer position within the Platform As a Service team offers a compelling opportunity for … utilisation, enhancing fault tolerance, and ensuring the platform's ability to meet evolving demands efficiently and effectively. You provide guidance and mentorship to other SRE team members, helping them to develop their skills and knowledge of best practices in sitereliability engineering. You establish and enforce engineering … organization. You collaborate with senior leadership to shape the vision and direction of the company (cloud) infrastructures, and you help drive the development of SRE-specific strategies and initiatives that align with business objectives. You build and maintain strong relationships with stakeholders across the organization, and you represent the SREMore ❯
new priorities, you’ll set the standard. You’ll engage with technical and non-technical customers and have a positive influence on the wider engineering community. With our encouragement to spend up to 30% of your time on development, innovation, and experimentation, you’ll have the freedom to explore … new possibilities for yourself, and for GCHQ. You don’t need to be a Software Engineer to apply; you might be working in Cloud Engineering and Security, UX, SiteReliabilityEngineering, Front-End Design, Agile, Solution Architecture, Data Engineering, or Machine Learning Operations. You’ll More ❯
As a Senior SiteReliability Engineer at Convera, your role is pivotal in ensuring the stability and resilience of our systems. You'll spearhead our incident management strategy, swiftly identifying and mitigating risks to uphold our service reliability. You will be responsible for: Taking the lead on incident … architecture, deployment processes, and observability practices. Elevating the customer experience as the ultimate benchmark of our reliability standards. Sharing industry best practices in SRE, ensuring our team remains at the forefront of innovation. Facilitating blameless post-mortems, instituting actionable alerts, and streamlining incident management through automation. You should apply … Amazon EKS. Preferred qualifications include: Prior involvement in the Fintech sector or other regulated industries. Familiarity with the Grafana observability stack. Experience in Chaos Engineering methodologies. Your expertise will be instrumental in fortifying our infrastructure and delivering exceptional reliability to our customers. About Convera Convera is the largest More ❯
eDV SiteReliability Engineer Looking for an eDV SRE. Someone with a defence industry specialism with a passion … for creating efficient and secure cloud infrastructure. You will play a critical part in transforming and enhancing both internal and external operations through effective SRE practices. Core Responsibilities Infrastructure Excellence: Design, manage, and evolve our cloud-based infrastructure to support high-traffic applications and seamless service delivery. Secure Deployment: Develop More ❯
eDV SiteReliability Engineer Looking for an eDV SRE. Someone with a defence industry specialism with a passion … for creating efficient and secure cloud infrastructure. You will play a critical part in transforming and enhancing both internal and external operations through effective SRE practices. Core Responsibilities Infrastructure Excellence: Design, manage, and evolve our cloud-based infrastructure to support high-traffic applications and seamless service delivery. Secure Deployment: Develop More ❯
As a Senior SiteReliability Engineer at Convera, your role is pivotal in ensuring the stability and resilience of our systems. You'll spearhead our incident management strategy, swiftly identifying and mitigating risks to uphold our service reliability. You will be responsible for: Taking the lead on incident … architecture, deployment processes, and observability practices. Elevating the customer experience as the ultimate benchmark of our reliability standards. Sharing industry best practices in SRE, ensuring our team remains at the forefront of innovation. Facilitating blameless post-mortems, instituting actionable alerts, and streamlining incident management through automation. You should apply … Amazon EKS. Preferred qualifications include: Prior involvement in the Fintech sector or other regulated industries. Familiarity with the Grafana observability stack. Experience in Chaos Engineering methodologies. About Convera Convera is the largest non-bank B2B cross-border payments company in the world. Formerly Western Union Business Solutions, we leverage More ❯