Manchester, England, United Kingdom Hybrid / WFH Options
Women's Engineering Society
You’ll contribute to the architecture and design of new and existing systems, establish best working practices, and deliver high-quality software products. With your knowledge of various software engineering methodologies, you’ll bring fresh ideas and approaches that have a real impact at the heart of our mission to keep the UK safe in the real world, and … role with plenty of opportunities to develop yourself and others. You might be reviewing pull requests, defining review, branching, and deployment strategies, or working with a range of software engineering frameworks. You operate at a deep technical level, leveraging your familiarity with languages such as JavaScript, Java, C++, Node, Python, Rust, Go, and .NET. Importantly, you’ll bring a … genuine excitement for discovering new software engineering techniques. You are part of a wider network of peers keen to share experiences, collaborate on projects, and learn from each other. With your experience, you set the standard, share innovative ways of working, and identify new priorities. You might lead and mentor a team or be the technical expert within a More ❯
London, England, United Kingdom Hybrid / WFH Options
BAE Systems Digital Intelligence
to complex challenges as part of a team who help keep the UK safe? Join BAE Systems as an experienced DevOps Engineer. As a key member of a Software Engineering team, you’ll be working with our National Security Customers to build systems that support their core mission capabilities. You’ll work as part of empowered, autonomous DevOps teams … our customer organisations. You will work in a small team given as much ownership and responsibility as you have the appetite for but be part of a much bigger Engineering community to give you the support you need to grow in your career. We fully embrace DevOps ways of working in our teams, and build a very broad range … an organisation who makes a huge impact to the security of the UK. About you You will have many of the following: Experience working in a similar DevOps/SRE/Infrastructure role An appreciation of Infrastructure as Code, and CI/CD tooling An understanding of live service and how to support critical business systems Scripting abilities with languages More ❯
Cambridge, England, United Kingdom Hybrid / WFH Options
Arm
Get AI-powered advice on this job and more exclusive features. Job Overview We are building a modern, cloud-native compute orchestration platform to support large-scale, compute-intensive engineering workloads. As a Senior Software Engineer, you will play a key role in designing and delivering a highly scalable, reliable, and observable system, with a particular focus on software … development and performance testing. Job Overview We are building a modern, cloud-native compute orchestration platform to support large-scale, compute-intensive engineering workloads. As a Senior Software Engineer, you will play a key role in designing and delivering a highly scalable, reliable, and observable system, with a particular focus on software development and performance testing. This role is … Design, implement, and maintain core components of the platform using cloud-native technologies. Lead efforts around performance benchmarking, load testing, and scalability validation. Define and enforce SLAs; work with SRE/DevOps to ensure high availability and observability. Tune platform performance under high-throughput workloads and lead capacity planning. Automate and execute stress/load tests using both synthetic and More ❯
London, England, United Kingdom Hybrid / WFH Options
Rollbar, Inc
Our Senior Backend Engineer will be an integral part of our EMEA engineering teams. This role is based remotely as a full-time employee in the UK, Ireland, Estonia, Netherlands, Sweden and Spain. We are also open to contractors in East Europe and Portugal. Who We Are DoiT is a global technology company that works with cloud-driven organizations … Microsoft Azure, we work alongside more than 4,000 customers worldwide. About DoiT's PerfectScale Platform DoiT offers PerfectScale, a pioneering Kubernetes optimization and management solution that empowers DevOps, SRE, and Platform Engineering teams to optimize cloud performance while minimizing costs. We combine advanced AI technology with SME-human expertise to help organizations achieve peak Kubernetes efficiency. The solution … through design and implementation to maintenance. You're expected to propose things that you think can be an excellent addition to the products Write clean and maintainable code using engineering best practices, you will also ensure clean code and best practices while performing code reviews for your peers Improving the health of the codebase. We’re mindful of accumulating More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
MRJ Recruitment
strong DevOps culture, so you'll be a central figure in advocating for scalable infrastructure and robust platform engineering principles. This means close collaboration with development, QA, and SRE teams to build secure, cost-effective, and repeatable systems. You'll blend deep technical work with impactful leadership, needing a solid grasp of production operations, incident response, Infrastructure as Code More ❯
Slough, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
play a critical role in ensuring system reliability, scalability, and performance across both AWS and Azure environments. This is your opportunity to lead cloud-native transformation and embed SRE best practices into engineering at scale. What you’ll be doing as their SiteReliability Engineer: You’ll be the go-to expert for designing and maintaining … CI/CD pipelines to reduce toil and accelerate deployment frequency. Build observability into everything—own monitoring, alerting, and incident response to minimize MTTR and improve system health. Champion SRE culture and reliability-focused engineering—help shape sustainable engineering practices, SLAs, SLOs, and error budgets. Contribute across the stack with flexibility in tooling—experience with Python, Go … dental insurance 25 days annual leave + bank holidays R&D and personal training budgets And much more... This is an incredibly rare chance for a seasoned, high-performing SRE to leave your mark on high-impact transformation projects in a business that’s truly committed to doing things the right way. #J-18808-Ljbffr More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
play a critical role in ensuring system reliability, scalability, and performance across both AWS and Azure environments. This is your opportunity to lead cloud-native transformation and embed SRE best practices into engineering at scale. What you’ll be doing as their SiteReliability Engineer: You’ll be the go-to expert for designing and maintaining … CI/CD pipelines to reduce toil and accelerate deployment frequency. Build observability into everything—own monitoring, alerting, and incident response to minimize MTTR and improve system health. Champion SRE culture and reliability-focused engineering—help shape sustainable engineering practices, SLAs, SLOs, and error budgets. Contribute across the stack with flexibility in tooling—experience with Python, Go … dental insurance 25 days annual leave + bank holidays R&D and personal training budgets And much more... This is an incredibly rare chance for a seasoned, high-performing SRE to leave your mark on high-impact transformation projects in a business that’s truly committed to doing things the right way. #J-18808-Ljbffr More ❯
Stockport, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
team. Things are moving fast here, and as we continue to grow; reliability, automation, and scalability have never been more important to us. You will be our first SRE so a strong background in implementing SRE best practices would be Ideal. You will know what good looks like and strive to continuously improve automation, availability and resilience. This is … to build out infrastructure and tooling using AWS, Terraform, Docker, and CI/CD pipelines. Supporting and evolving our container-based architecture (we use ECS and Fargate). Driving SRE best practices: SLIs/SLOs, error budgets, reducing toil, and improving observability. Using (and hopefully enjoying!) tools like Datadog, Prometheus, Grafana, and Nix to support your work. What we’re … looking for: Strong experience with AWS, Terraform, Docker, and container orchestration (ECS/Fargate). Good understanding of CI/CD pipelines and DevOps workflows. Solid grasp of SRE principles – SLIs, SLOs, error budgets, observability, etc. Familiarity with Datadog, Prometheus, Grafana, or similar tools. Experience with Nix is a plus (or curiosity to learn it). Bonus if you’ve More ❯
Liverpool, England, United Kingdom Hybrid / WFH Options
Bellrock Group
SiteReliability Engineer - Liverpool (Hybrid Working) As a SiteReliability Engineer at Concerto (part of Bellrock Group), you will play a pivotal role in ensuring the reliability, performance, and scalability of our Intelligent Assets Management SaaS platform. You will lead the improvement of … infrastructure, DevOps, and monitoring across our systems—empowering the engineering team to release features faster and more safely. Your hands-on experience and strategic thinking will help embed SRE principles throughout the team, improving customer experience, system health and developer productivity. You’ll work across internal environments and customer-facing systems, shaping operational excellence and reliability at every … scalable environments using technologies such as Terraform. Work closely with developers, QA, and DBAs to improve platform design and release workflows. Implement and promote best practices for operational readiness, reliability, and fault tolerance. Guide the platform team on tooling, automation, instrumentation, observability and best practice in Azure. Build a high-quality platform aligned to the Microsoft Cloud Adoption Framework More ❯
Ipswich, England, United Kingdom Hybrid / WFH Options
Devopshunt
brings together Design, Development, Test and Technical Services all under one roof. Collectively we work in an Agile/Scrum model, uniquely positioning us to exploit the best of SRE/DevOps practices. You will help us manage changes and deliveries for our platforms to support the ‘stand out services’ our company is so proud of. You will have opportunities … to contribute to the best practices used by our SRE team within Software Delivery. The team is diverse and adaptive varying from beginners to experienced hybrid engineers. The activities covered are broad, exploiting a range of cloud environments such as AWS and GCP, truly embracing the hybrid skills of the future! What you’ll be doing Be the delivery focused … coach for the teams, using various tools and agile methodologies focusing on driving efficiency for our SiteReliabilityEngineering (SRE) teams. Play a crucial part in collecting requirements, setting up deliverables, reporting progress to executive stakeholders, managing complex dependencies, and contributing to the product delivery process. Directly impact our customers by owning and scheduling our critical projects More ❯
Swindon, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Social network you want to login/join with: SiteReliability Engineer, swindon, wiltshire col-narrow-left Client: Harrington Starr Location: swindon, wiltshire, United Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Views: 8 Posted: 04.06.2025 Expiry Date: 19.07.2025 col-wide Job Description: SiteReliability Engineer – Fintech Up to … s leading financial institutions to streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliability Engineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing and performance insights , and thrive … SLIs/SLOs , automating tasks, and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development teams Skills in Python or More ❯
Maidstone, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
s leading financial institutions to streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliability Engineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing and performance insights , and thrive … great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving … SLIs/SLOs , automating tasks, and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development teams Skills in Python or More ❯
Belfast, Northern Ireland, United Kingdom Hybrid / WFH Options
JR United Kingdom
s leading financial institutions to streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliability Engineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing and performance insights , and thrive … great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving … SLIs/SLOs , automating tasks, and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development teams Skills in Python or More ❯
sector, our technology is truly flexible and designed to transform any business at scale. We've created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can't match. At ZILO, our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious … re ready to shape the future, let's talk. Job Description: As a Go Developer at ZILO Technologies, you will play a crucial role in maintaining and enhancing the reliability, performance, and scalability of our platform. You will be responsible for addressing defect fixes, implementing small changes, and contributing to ongoing enhancements of our Go-based microservices stack. Key … platform. Implement small changes and enhancements to improve system functionality and performance. Contribute to the design, development, and deployment of microservices in a Go environment. Monitor system performance and reliability, proactively addressing potential issues. Develop and maintain automation tools to streamline operational processes. Participate in on-call rotations to ensure 24/7 system availability and rapid incident response. More ❯
sector, our technology is truly flexible and designed to transform any business at scale. We've created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can't match. At ZILO, our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious … you're ready to shape the future, let's talk. Job Description: As a Developer at ZILO Technologies, you will play a crucial role in maintaining and enhancing the reliability, performance, and scalability of our platform. You will be responsible for addressing defect fixes, implementing small changes, and contributing to ongoing enhancements of our Java-based microservices stack. Key … platform. Implement small changes and enhancements to improve system functionality and performance. Contribute to the design, development, and deployment of microservices in a Java environment. Monitor system performance and reliability, proactively addressing potential issues. Develop and maintain automation tools to streamline operational processes. Participate in on-call rotations to ensure 24/7 system availability and rapid incident response. More ❯
London, England, United Kingdom Hybrid / WFH Options
Thought Machine
Magazine named us one of the world’s most innovative fintechs, and the Financial Times recognised us as one of Europe’s fastest-growing companies in 2023. The Client SiteReliability Engineer role in Infrastructure, Client Services will be responsible for enabling and supporting our clients to deliver a best in class cloud native implementation of Thought Machine … infrastructure, from presales to production at scale. This role supports clients in their cloud infrastructure preparation, deployment, optimisation and troubleshooting. Duties Hands on cloud infrastructure consulting both on client site and remote Working with customers and external partners to design and prepare suitable cloud infrastructure to ensure Thought Machine Vault products can be tested and run successfully at scale. … systems outside of Vault to empower holistic digital transformation in collaboration with Thought Machine Client Architects Supporting and troubleshooting client, SaaS and internal cloud infrastructure both remotely and on site, including by promoting and deploying suitable monitoring, logging and alerting tools Working closely with internal product and engineering teams to ensure client feedback is incorporated into improvements to More ❯
London, England, United Kingdom Hybrid / WFH Options
TECEZE
week ago Be among the first 25 applicants Direct message the job poster from TECEZE Job Title: SiteReliability Engineer - Manager Location: Hybrid Remote – London EC2M Contract (12 months) Rate: Outside IR35 … to £330 Per Day About the Role: We are partnering with one of the top companies in the mobile industry to hire a SiteReliability Engineer (SRE) Manager . In this role, you will collaborate with cross-functional teams to drive the design, development, and delivery of high-performing, scalable, and reliable infrastructure and services. You’ll be … Infrastructure as Code using Terraform . Deep understanding of Linux internals, standard networking protocols, and distributed systems architecture. Hands-on experience with automation and performance optimisation. Strong knowledge of SRE principles and methodologies. Experience with observability tools and telemetry systems. Exposure to Google Cloud Platform (GCP). Familiarity with hybrid or multi-cloud architecture. Experience with service meshes or edge More ❯
London, England, United Kingdom Hybrid / WFH Options
Algolia
API-first approach. Performance and Scalability is at the heart of our mission: we power 1.5 trillion searches a year, for 10K+ customers all over the world. As a SiteReliabilityEngineering Manager in the Production Engineering team of Algolia, you will lead the Fleet team of SiteReliability Engineers responsible for the provisioning … and the global reliability of the Search Products at scale. Your team will focus on creating pragmatic solutions to optimize the Search Products availability and costs at scale, depending on the needs of the customer, the Product teams, and the different engineering teams that deliver a unique Search Experience to our customers. You will manage a team of … scale and identifying optimization opportunities. YOUR ROLE WILL CONSIST OF: Collaborating with senior leadership to define the overall technical direction and strategy for the organization , and ensure that the SRE team's goals and initiatives are aligned with this strategy. Building and maintaining strong relationships with stakeholders across the organization , as you represent the SRE organization in cross-functional meetings. More ❯
London, England, United Kingdom Hybrid / WFH Options
Tide
and money. About the role As the Principal Engineer (Director) - Quality & Reliability, you will be a hands-on leader overseeing teams including SiteReliabilityEngineering (SRE), Quality, Client Platform, and Engineering Productivity. Your goal is to develop strategies that enhance system reliability, streamline processes, and ensure high-quality software products. This role is for … Assurance and Testing Strategy: Lead the QA team in creating automated testing frameworks and integrating them into CI/CD pipelines. SiteReliabilityEngineering: Guide the SRE team in maintaining scalable, available, and secure systems with effective monitoring and alerting. Client Platform Enablement: Oversee tools and frameworks for web and mobile client engineers to improve developer experience. … major incident responses, root cause analysis, and preventive measures. What we are looking for Extensive experience in senior technical leadership roles in software or systems engineering. Proven leadership in SRE, QA, or engineering productivity teams, preferably in fintech. Strong programming skills (e.g., Python, Java, C++) and hands-on development experience. Deep understanding of cloud platforms (AWS, Azure, GCP), containers More ❯
London, England, United Kingdom Hybrid / WFH Options
Track24
or New Relic to gain performance and health insights. Incident Management: Establish and manage monitoring and incident response processes to maintain system reliability. SiteReliabilityEngineering (SRE): Support system availability, performance, and scalability through SRE best practices. Application Support: Collaborate with development teams to assist in deployment and ongoing performance monitoring of applications. The highlights: 25 Days … or New Relic to gain performance and health insights. Incident Management: Establish and manage monitoring and incident response processes to maintain system reliability. SiteReliabilityEngineering (SRE): Support system availability, performance, and scalability through SRE best practices. Application Support: Collaborate with development teams to assist in deployment and ongoing performance monitoring of applications. Benefits The highlights … and orchestration tools. Proficiency in monitoring tools such as DataDog, Splunk, or New Relic. Strong understanding of CI/CD pipelines and automation tools. Experience with incident management and SRE best practices. Excellent problem-solving skills and the ability to work collaboratively across teams. #J-18808-Ljbffr More ❯
Portsmouth, Hampshire, United Kingdom Hybrid / WFH Options
Checkatrade
journey and providing support throughout the process. You will play a key role in shaping our platform's technical direction, working with modern technologies, and ensuring high standards of reliability, security, and performance. Location: Kings Cross, London or Portsmouth. Hybrid working. Where do you fit in? We're seeking a Senior Platform Engineer with a strong background in cloud … and maintaining our infrastructure using tools like Kubernetes, Terraform, Helm, and Datadog. You will drive the adoption of infrastructure-as-code practices, implement CI/CD pipelines, and champion SRE principles to ensure platform reliability and scalability. Collaborating with cross-functional teams, you'll contribute to a seamless developer experience and play a vital role in securing and optimizing … Experience with AWS is also valuable, with a willingness to work within a GCP environment. Experience with programming languages such as Golang, Python, and JavaScript. Passion for automation, DevOps, SRE, and observability practices. Proven leadership, management skills, and excellent communication abilities. We are an equal opportunities employer committed to diversity and inclusion in the workplace. About us We're Checkatrade More ❯
Crewe, Cheshire, United Kingdom Hybrid / WFH Options
Manchester Digital
safer and smarter through connected car and telematics innovation. As a Platform Engineer, you'll play a critical role in supporting that mission by collaborating with infrastructure, cloud, and engineering teams to build, maintain, and continuously enhance the availability, security, and performance of our production and test environments. In this role, you will be responsible for: Ensure platform security … reliability, and performance across systems deployed in Canada, the UK, and AWS cloud environments Contribute to key projects, platform optimizations, and ongoing maintenance initiatives Help drive scalability, observability, and operational excellence If you're passionate about infrastructure, cloud, and systems engineering-and want to help shape the future of mobility-we want to hear from you! Requirements We … for CI/CD. - Understanding and implementation of security hardening and vulnerability management. - Understanding and management of identity providers and SSO configurations (Azure AD , Ory, Cognito, Firebase) - Understanding of SiteReliabilityEngineering and key concepts - Proficient in Infrastructure as Code pipeline deployments and pipeline version control within Terraform or CloudFormation. - Observability Systems, e.g., Nagios, New Relic - Able More ❯
platform and innovative technological solutions. In this role, you will shape and execute Twinstake's technological vision, lead our technology strategy, foster innovation, and ensure the scalability, security, and reliability of our staking platform. This is a unique opportunity for a seasoned technology leader to make a meaningful impact in a dynamic and rapidly evolving industry. What you will … Twinstake's business objectives and market needs. Oversee infrastructure security, ensuring regulatory compliance, proactive threat mitigation, and best-in-class data protection. Optimize infrastructure and product platforms for maximum reliability, scalability, and cost efficiency Organizational Leadership Build, lead, and inspire a talented team of engineers and data experts Foster a culture of excellence, innovation, and accountability within the technology … What makes you stand out: Extensive experience with cloud computing platforms (AWS, Azure, or Google Cloud) and containerization technologies such as Docker and Kubernetes Solid understanding of DevOps and SRE practices, with hands-on experience in CI/CD pipelines and infrastructure automation Proficiency in infrastructure-focused programming languages such as Go, C#, or Java What we offer: Exposure to More ❯
London, England, United Kingdom Hybrid / WFH Options
BAE
to complex challenges as part of a team who help keep the UK safe? Join BAE Systems as an experienced DevOps Engineer. As a key member of a Software Engineering team,you’ll be working with our National Security Customers to build systems that support their core mission capabilities. You’ll work as part of empowered, autonomous DevOps teams … our customer organisations. You will work in a small team given as much ownership and responsibility as you have the appetite for but be part of a much bigger Engineering community to give you the support you need to grow in your career. We fully embrace DevOps ways of working in our teams, and build a very broad range … an organisation who makes a huge impact to the security of the UK. About you You will have many of the following: Experience working in a similar DevOps/SRE/Infrastructure role An appreciation of Infrastructure as Code, and CI/CD tooling An understanding of live service and how to support critical business systems Scripting abilities with languages More ❯
APIs that enable internal and external users to access data and model outputs. Implement secure authentication and authorization systems for platform users. Maintain and improve our cloud platform’s reliability, security, and compliance (e.g., GDPR, HIPAA readiness). Automate testing, training, and deployment of models through … robust CI/CD pipelines. Monitor and troubleshoot performance issues across data and inference workflows in production. What We’re Looking For 5+ years of experience in DevOps, MLOps, SRE, or Data Engineering roles. Strong proficiency with public cloud platforms (e.g., GCP, AWS, or Azure), with preference for GCP. Expertise in Terraform and infrastructure-as-code practices. Solid experience More ❯