transactional operations and columnar formats for efficient large-scale analytical querying. Support DevOps practices including CI/CD, infrastructure-as-code, automated testing, release and version control and system observability for data pipelines. Establish metrics and KPIs and identify and deploy tools to measure data pipeline health, data quality, timeliness and accuracy, team performance, cost-effectiveness, and business impact. Actively More ❯
/CD pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions More ❯
Salford, Manchester, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
/CD pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
/CD pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
/CD pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions More ❯
Cardiff, Wales, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
/CD pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown
Excited to grow your career? Our purpose is to empower people to save and invest with confidence. We are looking for great people to join us, so please come and invest in YOUR future at HL. We know that sometimes More ❯
We’re seeking an experienced contractor to support the delivery of observability solutions for a new, large-scale infrastructure environment. This role focuses on developing insightful and automated Grafana dashboards, with a strong emphasis on data integration and actionable telemetry. Required Skills Excellent, concise communication skills - essential for collaborating with technical teams to shape observability outputs. Deep experience with Grafana … dashboard creation, templating, and performance optimization. Strong understanding of PromQL, VictoriaMetrics, or VictoriaLogs query languages. Ability to interpret and map RESTful API data into observability pipelines and dashboards. Familiarity with IaC outputs and tooling (e.g., Terraform) as data sources for observability. Solid programming ability in Golang (preferred) or Python for automation and integration. Strong collaboration skills to work with cross More ❯
The CoE Lead - Observability & Tools at JD Sports Fashion Plc is a critical, hands-on technical role focused on designing, building, and maintaining the company's Observability platform.This role ensures that our technology platforms operate efficiently and reliably, providing early insights for Engineering, Service Reliability, Service Delivery, and DevOps teams. The CoE Lead will manage the contract with third-party … performance indicators (KPIs). The position involves a 75% focus on the design of frameworks and a 25% focus on implementation and adoption. · Job Title – Centre Of Excellence Lead- Observability & Tooling · Location – BL9 8RR · Working rota – Monday Friday · Working hours – 40 What You'll Be Doing: We are looking for an experienced CoE Lead to design, build, and maintain our … Observability platform. The CoE Lead will work closely with DevOps, Engineering, Service Reliability, and Service Delivery teams to continuously improve our Observability capabilities. This role is a technical, hands-on position with a 75% focus on framework design and 25% on implementation and adoption. You will contribute to pipeline design, enabling observability from the first deployment in test environments and More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Just Eat Takeaway.com
customers with hundreds of thousands of restaurant, grocery and convenience partners across the globe. About the role: Just Eat Takeaway is seeking an aspiring Engineer to join the Platform Observability team. The team sits within the Platform & Reliability department, which exists to provide global engineering a magnifying glass into their services while driving commercial availability and optimization. The team is … responsible for looking after a wide range of Observability capabilities that underpin our global platforms. As a Platform Engineer, you will support the implementation and continual evolution of these areas, following guidance from senior engineers within the department. In this role, you will be expected to have a passion for technology and a desire to learn. You will have the More ❯
throughput applications Develop and refine automation solutions using Ansible, Python, and Terraform Troubleshoot hardware, networking, and performance issues in various environments Deploy monitoring and log aggregation tools to improve observability Collaborate with teams to identify bottlenecks and deploy scalable, automated solutions What We're Looking For: 6+ years of Linux system administration and engineering experience in performance-critical environments Proficiency … in Python and bash Scripting, with hands-on Ansible experience Solid networking fundamentals: IP Addressing, VLANs, etc. Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools (Docker/containers, Kubernetes) Desirable: Experience with More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Couchbase
Reliability Engineers are hybrid software and systems engineers. They are the glue holding things together, whether that’s infrastructure/platform, tooling support for our cloud business or managing Observability posture for Couchbase. In this role the candidate we are looking for is for the Observability team which is responsible for maintaining Reliability, Availability and Serviceability for the entire Couchbase … You will have an immediate impact on the day-to-day efficiency of cloud operations and an ongoing impact on growth. Responsibilities Develop/maintain software features in the Observability stack which includes metrics pipeline, alerting, logging and notifications Create/maintain monitoring dashboards which gives insights to our customer cluster health Develop control plane features requiring observability needs High … to identify and solve issues before they affect business productivity Roll up your sleeves to be a full stack engineer as we build end-end software solutions in the Observability domain Requirements 2+ years experience as a software developer Proficiency with programming and scripting languages like Go, Python, Java, or Ruby Strong ability to write code, understands basic DSA concepts More ❯
environment using Microsoft Teams, email and calendar. Bachelor's Degree in a relevant major or equivalent years of experience Any of the following would be a plus: Experience with Observability across multiple domains (APM, Infrastructure, Synthetics, Logs, etc ) within cloud and on-premise environments using Datadog, Azure Monitor and Application Insights. NewRelic and Grafana Experience working in B2B SaaS companies … Experience with cloud containers, specifically Kubernetes Responsibilities & Duties Develop: Architecture, strategy and implementations to enable or enhance the Observability and Reliability of applications and services running on IaaS and PaaS in Microsoft Azure. AWS and GCP are nice to have. Service Level Objectives and indicators focused on improving business workflow performance and availability. Technical and business dashboards, metrics, and actionable … AWS and GCP are nice to have. Training & mentoring for peers and less experienced engineers. Production environments with on-call rotations. Advocacy Train and mentor engineering teams on modern observability practices and techniques. Define and socialize SRE culture, best practices, architectural and security standards. Assess and raise risks across the organization. Partnership with: Internal engineering, architecture and operations teams to More ❯
service mesh solutions across our distributed systems. In this role, you will lead the design and operation of Kong Mesh (based on Kuma) for managing microservices communication, security, and observability at scale. You’ll play a crucial role in defining service-to-service architecture and ensuring platform reliability, scalability, and security. Key Responsibilities: Lead the design and deployment of Kong … Mesh across our environments (on-prem and cloud). Define and enforce best practices for service mesh architecture, traffic routing, zero-trust security, observability, and policy enforcement. Collaborate with infrastructure, security, and development teams to integrate Kong Mesh with CI/CD, monitoring, and logging solutions. Develop custom policies, plugins, and automation scripts to enhance Kong Mesh capabilities. Monitor mesh More ❯
a key member of the Dynatrace sales engine and will be responsible for providing excellent technical support to the sales team. You will be the expert on Dynatrace and observability, with a specialization in Log Management and Analytics. Within this exciting role, you will be responsible for executing great demos which demonstrate the Dynatrace unique approach in solving the customer … be filled at a higher level based on candidate experience. What will help you succeed Preferred Requirements: Experience with query languages such as SQL, SPL, or KQL. Experience with observability and log collectors/pipelines such as FluentBit, OpenTelemetry, Cribl, and Logstash. Experience with web technologies such as HTML, CSS, and JavaScript. Experience with programming/scripting side technologies such … OpenShift, Serverless functions, and CI/CD pipelines. Experience with automation like Ansible, Puppet, Terraform, etc. Why you will love being a Dynatracer Dynatrace is a leader in unified observability and security. We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance. Our employees work with the largest cloud providers, including AWS, Microsoft, and More ❯
Dundee, Scotland, United Kingdom Hybrid / WFH Options
Scopely
with cross-functional teams including game development, QA, operations, and management. Develop automation tools and processes to improve reproducibility and efficiency. Monitor, audit, and report on build systems, incorporating observability and alerts throughout the CICD lifecycle. Participate in code reviews and development processes to enhance engineering effectiveness. What We’re Looking For Extensive experience in build and release roles, with … a solid understanding of CI/CD practices for Unity 3D games. Proficiency in scripting and programming languages such as Python and Bash. Experience with observability tools like ELK, Grafana, Prometheus, or Datadog. Knowledge of version control systems (e.g., Git) and build tools like Jenkins, GitLab, Maven, or Gradle. Experience with Unity integrations, CI systems, and languages like Groovy is More ❯
and refine queue-based processing to support asynchronous workflows and event-driven architecture. Work collaboratively with cross-functional teams, including DevOps, Infrastructure, and Product, to deliver robust systems. Leverage observability tools to monitor, alert, and troubleshoot application and integration health. Stay current on AI-driven software development practices (e.g., GPT-assisted development, Agentic AI workflows) and suggest practical implementations. Participate … Prior experience building middleware for data sync, order processing, and internal APIs in a multi-system e-commerce environment Understanding of architecture patterns: Microservices , SOA , Hexagonal , Modular Monolith Monitoring & Observability: Grafana , Prometheus , CloudWatch , New Relic , Datadog , etc. Solid grasp of AI trends in software development , particularly in using GPT tools and agentic systems Education: Mathematics or Computer Science degree (or More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
TwinStream
services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability, demonstrating performance and capacity improvements and proactively identifying and mitigating reliability risks. Key Responsibilities of the Site Reliability Engineer: Collaborate with Software Engineers to improve reliability and performance in their … subsystems Partner with System Administrators in automating toil and eliminating alerts Evolve observability and monitoring capabilities to identify and solve problems before they impact the business Support development environments to help us achieve our delivery and quality goals Research and evaluate technologies, tools and services to influence buy-vs-build decisions Develop expertise in diverse technical and business domains Expand … in one of our platform languages (Java, Go, Python or similar) Knowledge of cross domain principles & technologies Experience of working in a service management environment Practical applications of using observability patterns in previous systems Creating and monitoring system availability metrics and using those to drive work that reduces downtime There are many great reasons to join our team! Pension Plan More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
TwinStream
services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability, demonstrating performance and capacity improvements and proactively identifying and mitigating reliability risks. Key Responsibilities of the Site Reliability Engineer: Collaborate with Software Engineers to improve reliability and performance in their … subsystems Partner with System Administrators in automating toil and eliminating alerts Evolve observability and monitoring capabilities to identify and solve problems before they impact the business Support development environments to help us achieve our delivery and quality goals Research and evaluate technologies, tools and services to influence buy-vs-build decisions Develop expertise in diverse technical and business domains Expand … in one of our platform languages (Java, Go, Python or similar) Knowledge of cross domain principles & technologies Experience of working in a service management environment Practical applications of using observability patterns in previous systems Creating and monitoring system availability metrics and using those to drive work that reduces downtime There are many great reasons to join our team! Pension Plan More ❯
global transportation agencies. As a senior engineer, you will play a critical role in designing, building, and scaling cloud services that enable remote device management, over-the-air updates, observability, and high-availability operations for our mobile perception platform. We tackle complex challenges related to scalability, performance, and security to enable smarter and safer cities through cutting-edge innovation. As … future of intelligent transportation systems. Responsibilities: Participate in incident prevention, response, and remediation efforts, learning and applying best practices. Design, build, and maintain scalable cloud services that support device observability, OTA updates, and fleet operations. Lead efforts to improve the reliability, security, and performance of multi-region AWS infrastructure using Infrastructure as Code (IaC) tools. Own CI/CD pipelines More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯