London, England, United Kingdom Hybrid / WFH Options
Circadia Technologies Ltd
monitored by our Circadia Contactless Monitor (IoT devices) every day, growing to 100k+ in the next 2 to 3 years. Key Responsibilities: Maintain and enhance AWS infrastructure instrumentation and observability tools (e.g., Grafana, alarms) to ensure system reliability. Oversee Circadia's CI/CD pipelines (Jenkins) to enable efficient and seamless code deployment. Manage and maintain a fully separated staging More ❯
London, England, United Kingdom Hybrid / WFH Options
Cencora
Python, or Bash. Deep knowledge of containerization technologies, including Docker and Kubernetes. Excellent understanding of networking principles (IP addressing, virtual networks, network security and networking models). Understanding of observability and site-reliability principles (SLO's, SLI's) and working with engineering teams to improve the applications and platform. Good understanding of SQL and working with relational databases. Experience working More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
BAE Systems (New)
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps UtilisingCI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks A More ❯
London, England, United Kingdom Hybrid / WFH Options
BAE
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps UtilisingCI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks A More ❯
London, England, United Kingdom Hybrid / WFH Options
Gorgias
Kafka, Debezium, Apache Flink Facilitate rollout strategies at scale with Gitlab CI and ArgoCD Roll out best practices around Kubernetes/Helm/Operators, SLIs/SLOs, Incident Management, Observability, Security, and Disaster Recovery to all Product-Engineering teams and drive adoption by them Automate complex infrastructure pieces for our worldwide footprint with best practices IaC with TF, strong scripting More ❯
London, England, United Kingdom Hybrid / WFH Options
IG Group Holdings plc
Job Title Staff Platform Engineer (AWS/GCP) Job Description So, who are we? Hello, we're IG Group, an FTSE 250 Fintech that provides cutting-edge mobile, web and desktop platforms that help our clients trade Stocks & Shares, leveraged More ❯
working, or the ability to flex your start and finish times. Where possible, we support a working pattern that suits your lifestyle and helps you reach your ambitions. Title: Observability Engineer Base location: Belfast/Remote UK About the company: Imperva, a Thales company, is an analyst-recognized cybersecurity leader-championing the fight to secure data and applications wherever they … pops and core infrastructure with new modern technologies, embracing Infrastructure as code at all levels with automation as a core requirement for all projects. We are looking for an Observability Engineer to work within our SRE teams to design, build and iterate on our O11Y platform. This engineer will have to work both hands on and strategically with our architects … global service delivery and product teams to plan an observability road map and then execute on those plans. Responsibilities: Assess & Enhance Observability: Review the current observability platform, identify areas for improvement, and guide the team in enhancing monitoring, logging, tracing, and alerting capabilities. Design & Implement Solutions: Develop and optimize observability solutions that provide deep insights into system and service health. More ❯
LOTS of growth opportunities If you’re an SRE who thrives in a fast-moving environment, loves solving real infrastructure challenges, and wants to play a strategic role in observability and uptime, this one’s for you. We're supporting a tech-driven, purpose-led company in the energy sector that's scaling up its platform capabilities. As their next … Site Reliability Engineer , you'll work closely with engineering and platform teams to ensure systems are performant, resilient, and scalable while shaping the observability and incident response strategy. What you’ll be doing: Designing and maintaining cost-effective, reliable AWS infrastructure Driving monitoring and alerting standards using New Relic (or similar) Automating processes to reduce manual work and increase reliability … Modern AWS setup: EC2, ECS, RDS, S3, VPC, etc. IaC: AWS CDK, Terraform, or CloudFormation CI/CD pipelines + scripting (Python, Bash, PowerShell) Containerized applications (Docker + ECS) Observability tooling like New Relic, CloudWatch, Prometheus, Datadog Who we’re looking for: Proven SRE or platform engineering experience in a high-availability environment Passion for reliability, automation, and system performance More ❯
Hereford, Herefordshire, West Midlands, United Kingdom Hybrid / WFH Options
Twinstream Limited
Work Scheme Key Responsibilities of the Site Reliability Engineer: Partner with developers to improve performance and reliability across systems Automate toil and reduce unnecessary alerts with smart tooling Evolve observability so we can prevent issues before they become incidents Improve CI/CD pipelines and support development teams in delivering quality faster Explore new technologies, tools, and services that improve … plus) Experience with Terraform and modern IaC practices Hands-on with Docker and orchestration tools (Kubernetes, OpenShift, or Docker Swarm) CI/CD experience (Jenkins or equivalent) Monitoring/observability tools: Grafana , Prometheus , or InfluxDB Event-driven messaging: RabbitMQ or similar Strong Linux skills, scripting, and understanding of network security protocols Experience with AWS: EC2, S3, RDS, Lambda Desirable: Experience … coding in Python, Java, or Go Exposure to cross-domain solutions Experience in a service management environment Observability best practices and metric-driven reliability improvement Security Requirements Due to the sensitive nature of our work, candidates must be eligible for Developed Vetting (DV) clearance. All offers are subject to security screening. Ready to Engineer Systems That Matter? If youre a More ❯
Chesterfield, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
automation and internal tools for deployment, monitoring, and incident response Tune performance across OS, network, and cloud layers — this role is hands-on and detail-oriented Improve system resilience, observability, and security in a high-stakes production environment Requirements: Fluent in Linux — not just using it, but understanding how it works under the hood Advanced terminal skills — manipulating systems efficiently … time environments Hands-on with Docker (Kubernetes is a plus), infrastructure-as-code, and CI/CD tooling Strong scripting and automation experience in Python and Bash Familiarity with observability stacks (Prometheus, OpenTelemetry, eBPF) Cloud infrastructure experience (AWS/GCP/Azure), with attention to IAM and software supply chain security Curious, persistent, and comfortable experimenting at the lowest levels More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
ll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and more cost-efficiently. What you … ll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through platform tools, reusable Terraform modules, and self-service infrastructure Enhancing CI/CD pipelines (Azure DevOps, YAML-based) with security … knowledge (AKS, Functions, SQL, Cosmos DB, etc.) Strong Infrastructure as Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices - including security scanning, IAM, and More ❯
Sheffield, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
automation and internal tools for deployment, monitoring, and incident response Tune performance across OS, network, and cloud layers — this role is hands-on and detail-oriented Improve system resilience, observability, and security in a high-stakes production environment Requirements: Fluent in Linux — not just using it, but understanding how it works under the hood Advanced terminal skills — manipulating systems efficiently … time environments Hands-on with Docker (Kubernetes is a plus), infrastructure-as-code, and CI/CD tooling Strong scripting and automation experience in Python and Bash Familiarity with observability stacks (Prometheus, OpenTelemetry, eBPF) Cloud infrastructure experience (AWS/GCP/Azure), with attention to IAM and software supply chain security Curious, persistent, and comfortable experimenting at the lowest levels More ❯
Crewe, Cheshire, United Kingdom Hybrid / WFH Options
Manchester Digital
platform security, reliability, and performance across systems deployed in Canada, the UK, and AWS cloud environments Contribute to key projects, platform optimizations, and ongoing maintenance initiatives Help drive scalability, observability, and operational excellence If you're passionate about infrastructure, cloud, and systems engineering-and want to help shape the future of mobility-we want to hear from you! Requirements We … configurations (Azure AD , Ory, Cognito, Firebase) - Understanding of Site Reliability Engineering and key concepts - Proficient in Infrastructure as Code pipeline deployments and pipeline version control within Terraform or CloudFormation. - Observability Systems, e.g., Nagios, New Relic - Able to troubleshoot/work under pressure, meet deadlines. - Previous experience in a cloud engineering role. - AWS certified as SysOps Administrator/Solutions Architect/… understanding of Infrastructure as Code principles and related tech such as Terraform or CloudFormation - Enhanced experience of AWS cloud technologies, e.g., ECS, EC2, VPC, Lambda, CFS. Ideally AWS certified. - Observability Systems, e.g., New Relic, CloudWatch, SquadCast - ITIL Qualified or awareness of the framework. Bonus Qualifications: -Experience with Linux system administration and troubleshooting. -Basic knowledge of AWS cloud technologies such as More ❯
infrastructure and system issues, as well as log ingestion and communication issues. Design and develop scalable, robust, and high-performance data pipelines and data storage solutions. Develop and maintain observability frameworks using tools like Kibana, Grafana, or similar Work with cross-functional teams to define observability and search requirements. Scale, script and maintain our development and production platform foundation with More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
with the founding team to integrate models into internal and external user flows Write clean, production-ready code - often improving or refactoring existing prototypes Think holistically about agent lifecycle , observability, failure handling, and scalability Help define the tech stack and architecture for core components of the platform Contribute to novel research and publish at top conferences when opportunities arise What …/LLM libraries (e.g., Transformers, LangChain, LangGraph, OpenAI APIs) Experience with cloud platforms (AWS, GCP, or Azure), deployment, and CI/CD pipelines Familiarity with containerization (Docker, Kubernetes) and observability (e.g., Prometheus, Grafana) A builder mindset: you're comfortable with ambiguous specs, early-stage infrastructure, and iterating fast Excellent communication and self-management skills Nice To Have Familiarity with agentic More ❯
Northern Ireland, United Kingdom Hybrid / WFH Options
Ocho
cross-functional teams to design and deliver full-featured software components • Drive a “security-first” mindset across development practices, including OAuth2 and IAM policies • Lead operational efforts using modern observability frameworks to monitor and debug production systems • Mentor junior engineers and contribute to a culture of continuous improvement Essential Criteria: • Strong commercial experience in Golang and Python • Proven track record … secure application design principles • Hands-on experience designing and consuming RESTful and GraphQL APIs • Strong SQL skills and familiarity with data warehouses like Snowflake • Day-2 operations experience including observability, debugging, and triage Desirable Skills: • Experience with Auth0 , AWS Cognito , or similar identity platforms • Familiarity with Helm , Prometheus , Grafana , or OpenTelemetry • Exposure to other cloud platforms (GCP, Azure) • CI/ More ❯
London, England, United Kingdom Hybrid / WFH Options
55 Redefined Ltd
using Docker and deploying them to Container Platforms (EKS, AKS and Kubernetes). Implementing and managing CI/CD pipelines for data applications. Implementing and managing comprehensive monitoring and observability solutions using tools like Grafana, Prometheus, and other non-native monitoring tools, ensuring data quality across the entire data flow. Working with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible … Familiarity with open-source data tools (e.g., Spark, Kafka, PostgreSQL). Competency understanding of IaC concepts (e.g., Terraform, Ansible). Understanding of data architecture principles. Experience with monitoring and observability tools like Grafana and Prometheus. Your Security Clearance To be successfully appointed to this role, it is a requirement to obtain Security Check (SC) clearance. To obtain SC clearance, the More ❯
strategy, execution, tooling and best practices Collaborate with multiple product teams and respective owners to design infrastructure as we scale Building custom metrics and features to enhance Primer's observability Infrastructure as code (IaC) development Writing processes and documentation for system design, troubleshooting and maintenance What are we looking for? Strong experience with a cloud provider (AWS preferred but we … Kubernetes clusters Knowledge of security best practices and the ability to implement security controls at the infrastructure level Experience with monitoring and logging tools like DataDog or Grafana's observability stack (Prometheus, Tempo, Loki, Grafana) Familiarity with the open standard OpenTelemetry Excellent written and verbal communication skills, we're a collaborative team! PLEASE NOTE: Our engineering teams work fully remotely More ❯
results that matter. By taking advantage of all structured and unstructured data - securing and protecting private information more effectively - Elastic's complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What Is The Role: You will have the opportunity to work with a tremendous services, engineering, product, and sales team and wear … consultant will be focused on excellence, taking the initiative for self-improvement and possess great communication skills. Our customers' use cases extend across all the Elastic Solutions: Enterprise Search, Observability and Security, and beyond, and the scale of data in their environments ranges from gigabytes to petabytes. This diverse mix of a customer base means the challenges they face that More ❯
London, England, United Kingdom Hybrid / WFH Options
Elasticsearch B.V
results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What Is The Role: You will have the opportunity to work with a tremendous services, engineering, product, and sales team and wear … consultant will be focused on excellence, taking the initiative for self-improvement and possess great communication skills. Our customers’ use cases extend across all the Elastic Solutions: Enterprise Search, Observability and Security, and beyond, and the scale of data in their environments ranges from gigabytes to petabytes. This diverse mix of a customer base means the challenges they face that More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
Revolent Group
related processes like data migrations and environment setup. ✅ Preferred (Nice to Have): Banking/Financial Services knowledge — especially around wholesale lending and Loan IQ . Experience with monitoring and observability tools such as APPD, ELK Stack, or Grafana. Understanding of DevSecOps principles , including vulnerability scanning, secrets management, and compliance automation. Further experience with CI/CD integration and pipeline automation More ❯
Edinburgh, Scotland, United Kingdom Hybrid / WFH Options
JR United Kingdom
native Infrastructure-as-Code (IaC) solutions from the ground up? Our client is seeking a talented and motivated Senior Software Engineer to lead the development of our next-generation observability platform. THIS IS NOT A DEVOPS ROLE. Responsibilities Collaborate within a dynamic software engineering team to architect and build a new cloud-native IaC platform. Develop software using technologies such More ❯
Bath, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
native Infrastructure-as-Code (IaC) solutions from the ground up? Our client is seeking a talented and motivated Senior Software Engineer to lead the development of our next-generation observability platform. THIS IS NOT A DEVOPS ROLE. Responsibilities Collaborate within a dynamic software engineering team to architect and build a new cloud-native IaC platform. Develop software using technologies such More ❯
Aberdeen, Scotland, United Kingdom Hybrid / WFH Options
JR United Kingdom
native Infrastructure-as-Code (IaC) solutions from the ground up? Our client is seeking a talented and motivated Senior Software Engineer to lead the development of our next-generation observability platform. THIS IS NOT A DEVOPS ROLE. Responsibilities Collaborate within a dynamic software engineering team to architect and build a new cloud-native IaC platform. Develop software using technologies such More ❯