London, England, United Kingdom Hybrid / WFH Options
Metro Bank
associated technologies like Istio, Karpenter, and Kong. • Demonstrated experience in managing the full route to live, encompassing the automation of the software development lifecycle (SDLC), with a focus on observability, security, code quality, infrastructure as code (IaC), and environment provisioning. • Strong communication and presentation skills, with experience communicating with stakeholders at all levels Our promise to you... • We will make More ❯
Basingstoke, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
native Infrastructure-as-Code (IaC) solutions from the ground up? Our client is seeking a talented and motivated Senior Software Engineer to lead the development of our next-generation observability platform. THIS IS NOT A DEVOPS ROLE. Responsibilities Collaborate within a dynamic software engineering team to architect and build a new cloud-native IaC platform. Develop software using technologies such More ❯
Hull, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
native Infrastructure-as-Code (IaC) solutions from the ground up? Our client is seeking a talented and motivated Senior Software Engineer to lead the development of our next-generation observability platform. THIS IS NOT A DEVOPS ROLE. Responsibilities Collaborate within a dynamic software engineering team to architect and build a new cloud-native IaC platform. Develop software using technologies such More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Twinstream Limited
Socials & Events Cycle to Work Scheme & Life Assurance Key Responsibilities of the Site Reliability Engineer: Work closely with engineers and sysadmins to increase performance and reduce toil Advance system observability, monitoring and alerting Automate, troubleshoot, and proactively resolve issues before they escalate Improve development environments to meet delivery and quality targets Research and evaluate tools and platforms to support scale More ❯
BS1, Bristol, City of Bristol, United Kingdom Hybrid / WFH Options
Twinstream Limited
Socials & Events Cycle to Work Scheme & Life Assurance Key Responsibilities of the Site Reliability Engineer: Work closely with engineers and sysadmins to increase performance and reduce toil Advance system observability, monitoring and alerting Automate, troubleshoot, and proactively resolve issues before they escalate Improve development environments to meet delivery and quality targets Research and evaluate tools and platforms to support scale More ❯
Employment Type: Permanent
Salary: £80000 - £110000/annum Hybrid, Great Benefits
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
Socials & Events Cycle to Work Scheme & Life Assurance Key Responsibilities of the Site Reliability Engineer: Work closely with engineers and sysadmins to increase performance and reduce toil Advance system observability, monitoring and alerting Automate, troubleshoot, and proactively resolve issues before they escalate Improve development environments to meet delivery and quality targets Research and evaluate tools and platforms to support scale More ❯
London, England, United Kingdom Hybrid / WFH Options
Ten Lifestyle Group
cost optimisation). Experience with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform). Familiarity and hands-on with DevOps practices (CI/CD, Docker, K8s) and observability tools (Prometheus, Grafana, Datadog). Experience in distributed systems and scaling. Knowledge and hands-on experience with multiple data stores (both SQL and NoSQL). Desired experience in building agentic More ❯
position will align to a discipline where you will be expected to build and support solutions aligned with SDLC principles, providing technical excellence with a focus on scripting and observability coupled with a security mindset. What will you be doing day-to-day? Automation and Orchestration: Streamline the delivery and support processes by leveraging automation and IaC principles. Support and More ❯
position will align to a discipline where you will be expected to build and support solutions aligned with SDLC principles, providing technical excellence with a focus on scripting and observability coupled with a security mindset. What will you be doing day-to-day? Automation and Orchestration: Streamline the delivery and support processes by leveraging automation and IaC principles. Support and More ❯
Cardiff, Wales, United Kingdom Hybrid / WFH Options
ZipRecruiter
/CD pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions More ❯
London, England, United Kingdom Hybrid / WFH Options
Smarkets
designing, developing, and implementing distributed systems Can demonstrate deep knowledge in running services in cloud microservice environments and hands-on experience with Kubernetes Familiarity with AWS cloud Familiarity with observability principles and tools (Grafana, Prometheus, Sentry Elasticsearch, Jaeger) Excellent planning and communications skills and able to lead conversations with development and product teams Preferred Skills and Experience 6-8+ More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown
Excited to grow your career? Our purpose is to empower people to save and invest with confidence. We are looking for great people to join us, so please come and invest in YOUR future at HL. We know that sometimes More ❯
London, England, United Kingdom Hybrid / WFH Options
Sprout.ai LTD
Salary banding: £90,000 - £110,000 dependent on experience Working pattern: 1-2 days per week in office Location: London About our Engineering Team As a business which has AI at its core, we need to have a reliable, scalable More ❯
London, England, United Kingdom Hybrid / WFH Options
EDB
Social network you want to login/join with: EDB provides a data and AI platform that enables organizations to harness the full power of Postgres for transactional, analytical, and AI workloads across any cloud, anywhere. EDB empowers enterprises to More ❯
Complexio is Foundational AI workstoautomate business activities by ingesting whole company data- both structured andunstructured - and making sense of it. Usingproprietarymodels and algorithms Complexio forms adeepunderstanding ofhow humans are interacting and using it. Automation can then replicate and improve these More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Just Eat Takeaway.com
customers with hundreds of thousands of restaurant, grocery and convenience partners across the globe. About the role: Just Eat Takeaway is seeking an aspiring Engineer to join the Platform Observability team. The team sits within the Platform & Reliability department, which exists to provide global engineering a magnifying glass into their services while driving commercial availability and optimization. The team is … responsible for looking after a wide range of Observability capabilities that underpin our global platforms. As a Platform Engineer, you will support the implementation and continual evolution of these areas, following guidance from senior engineers within the department. In this role, you will be expected to have a passion for technology and a desire to learn. You will have the More ❯
Job Title: Senior SRE - Site Reliability Engineering for Observability Location: London (Mostly Remote | 1 Day/Week in Office) Pay Rate: £50 - £62 per hour (Inside IR35) Contract Duration: Initial 12 Months Working Hours: 11:00 AM - 7:00 PM About the Role We're looking for a Senior Site Reliability Engineer (SRE) to join a high-impact Observability team … monitoring and logging platforms that ensure service reliability, performance, and visibility. If you're passionate about distributed systems, high-throughput data pipelines, and enabling engineering teams with top-tier observability tooling-this is the role for you. What You'll Be Doing Designing and operating observability platforms (logging, monitoring, alerting) at scale. Managing large, high-performance ElasticSearch clusters and Prometheus … deployments. Building scalable data pipelines using Kafka to process millions of events per second. Developing tools, APIs, and dashboards to enable self-service observability for engineering teams. Automating infrastructure using Terraform and configuration with Ansible . Participating in on-call rotations to ensure platform uptime and responsiveness. What We're Looking For 5+ years of experience in SRE/DevOps More ❯
SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or Germany. This is a fully … our 24x7 on-call rotation, SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling to improve efficiency and reduce … time of applications and infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
TwinStream
services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability, demonstrating performance and capacity improvements and proactively identifying and mitigating reliability risks. Key Responsibilities of the Site Reliability Engineer: Collaborate with Software Engineers to improve reliability and performance in their … subsystems Partner with System Administrators in automating toil and eliminating alerts Evolve observability and monitoring capabilities to identify and solve problems before they impact the business Support development environments to help us achieve our delivery and quality goals Research and evaluate technologies, tools and services to influence buy-vs-build decisions Develop expertise in diverse technical and business domains Expand … in one of our platform languages (Java, Go, Python or similar) Knowledge of cross domain principles & technologies Experience of working in a service management environment Practical applications of using observability patterns in previous systems Creating and monitoring system availability metrics and using those to drive work that reduces downtime There are many great reasons to join our team! Pension Plan More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
TwinStream
services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability, demonstrating performance and capacity improvements and proactively identifying and mitigating reliability risks. Key Responsibilities of the Site Reliability Engineer: Collaborate with Software Engineers to improve reliability and performance in their … subsystems Partner with System Administrators in automating toil and eliminating alerts Evolve observability and monitoring capabilities to identify and solve problems before they impact the business Support development environments to help us achieve our delivery and quality goals Research and evaluate technologies, tools and services to influence buy-vs-build decisions Develop expertise in diverse technical and business domains Expand … in one of our platform languages (Java, Go, Python or similar) Knowledge of cross domain principles & technologies Experience of working in a service management environment Practical applications of using observability patterns in previous systems Creating and monitoring system availability metrics and using those to drive work that reduces downtime There are many great reasons to join our team! Pension Plan More ❯
Liverpool, England, United Kingdom Hybrid / WFH Options
Bellrock Group
and DBAs to improve platform design and release workflows. Implement and promote best practices for operational readiness, reliability, and fault tolerance. Guide the platform team on tooling, automation, instrumentation, observability and best practice in Azure. Build a high-quality platform aligned to the Microsoft Cloud Adoption Framework, with Well Architected design, Defender, Advisor, Policy and governance in mind. Design and … background in CI/CD tools—GitHub Actions and Octopus Deploy. Proficient in writing and managing Infrastructure as Code (Terraform, ARM templates). Experienced in setting up and maintaining observability stacks (e.g. Application Insights, Prometheus, Grafana). Familiar with container orchestration concepts; Kubernetes experience is a plus. Scripting or programming experience in PowerShell, Python, or similar languages. Comfortable balancing speed More ❯
London, England, United Kingdom Hybrid / WFH Options
So Energy
supporting our sustainability mission. What you’ll be getting up to: Architect, design and evolve scalable, performant, low-maintenance backend systems that power the Nova platform. Optimise for resilience, observability, security, and cost efficiency across all layers of the system. Be an active agent of change, constantly identifying opportunities to simplify, consolidate, and modernise our technical landscape. Lead the evaluation … meaningfully across the stack: Frontend conversations: Vue.js, modern component-driven design, API design for seamless integration. Infrastructure: GCP stack, Terraform, Kubernetes, Docker, CI/CD pipelines (GitHub Actions, SonarCloud), observability (Datadog, Grafana). Data: BigQuery, SQL/NoSQL, event-driven architecture, data pipelines. Bring holistic thinking to system design, including scalability, latency, operational excellence, and future-proofing. This role will … microservices, API design, and event-driven architectures. Experience with cloud-native development (GCP preferred; AWS experience relevant). Infrastructure-as-code expertise: Terraform, Kubernetes. Database mastery: PostgreSQL, BigQuery, NoSQL. Observability and monitoring: Datadog, Grafana, logging pipelines. Security best practices: OAuth, SSO, data protection, and secure coding principles. Familiarity with frontend frameworks (React, Vue) and mobile technologies (Ionic, Swift, Android) a More ❯
Sheffield, England, United Kingdom Hybrid / WFH Options
KnowBe4
and communication skills. Some Of The Technologies We Use Programming Languages - Python, Ruby, Rust Infrastructure as Code - Terraform, AWS CDK Source Code Management and CI/CD - GitLab, Snyk Observability - DataDog, Airbrake Containerized Workloads - Docker Cloud-native infrastructure in AWS - ECS, Lambda, Step Functions, SNS/SQS, Transit Gateway, Aurora, DynamoDB, CloudFront, S3, AppSync, API Gateway, and many more. Responsibilities … build highly scalable and resilient applications and infrastructure in AWS Maintain and improve extensible infrastructure-as-code using Terraform Learn, maintain, and improve our existing deployment strategies Deliver effective observability, monitoring, and alerting patterns for KnowBe4’s applications and infrastructure Minimum Qualifications: BS/MS/Ph.D. or equivalent plus 5 years experience Training in secure coding practices (preferred) Proficient More ❯
Blackpool, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
company's data platforms, ensuring high availability, performance, and security Implement data governance policies and procedures, ensuring compliance with data quality standards and regulatory requirements Design and implement data observability and data quality monitoring solutions, enabling proactive identification and resolution of data issues Key Behaviours Technical Passion & Innovation: Demonstrates a strong passion for data technologies and a commitment to staying … as GitHub or Azure DevOps Experience with Azure DevOps for CI/CD pipeline development and data operations (DataOps) Experience with Python or other relevant coding Experience with Data Observability tools Exposure to Agile Project Methodology, i.e. Scrum ️ The Application Timeline A first stage video call with the internal recruitment team (15 minute call) A face to face or video More ❯
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯