Observability Jobs in the UK

501 to 525 of 892 Observability Jobs in the UK

Cloud Architect

Oxford, England, United Kingdom
Experis UK
network segmentation). Lead migration and modernisation (re‐host/re‐platform/re‐factor) for priority applications. Implement IaC at scale (Terraform preferred; standard modules; pipelines). Build observability (logs, metrics, traces, SLOs) and resilience (HA, DR, RTO/RPO). Drive FinOps —cost transparency, budgets, showback/chargeback, right‐sizing. Embed security‐by‐design and compliance (CIS, NIST … warehouses (BigQuery, Synapse, Redshift), ETL/ELT. API strategy (APIM/API Gateway/Apigee), messaging (SQS/SNS/Service Bus/PubSub), event‐driven design. Operations & Reliability Observability stack (CloudWatch/CloudTrail, Azure Monitor/Log Analytics, Cloud Logging/Monitoring; Prometheus/Grafana). DR/BCP architectures (cross‐region, multi‐region, backups, runbooks; tested failover). … KMS. Data/Integration: Event Hubs/Kafka/PubSub, API Gateway/APIM/Apigee, Data Factory/Glue/Cloud Data Fusion, BigQuery/Synapse/Redshift. Observability: Prometheus/Grafana, OpenTelemetry, CloudWatch, Azure Monitor, Cloud Monitoring, ELK/Elastic. Scripting: Python/Bash/PowerShell; strong Git and code review practices. Certifications (Nice to Have) Azure: AZ More ❯
Posted:

Senior Platform Engineer

City Of London, England, United Kingdom
develop
Apply strong networking knowledge to optimise performance, security, and reliability. Ensure compliance with financial services regulations and internal security policies. Contribute to CI/CD pipelines and cloud-native observability solutions. Explore and integrate emerging technologies including AI/LLM-based solutions to enhance automation and operational efficiency. Key Skills & Experience Essential: Strong hands-on experience with AWS (EC2, VPC … or working with AI/LLM solutions Familiarity with Terraform, Ansible, GitLab CI/CD, or similar tools Exposure to financial services or other highly regulated industries Experience with observability stacks (Prometheus, Grafana, ELK, etc. More ❯
Posted:

DevOps Engineer

Nottingham, England, United Kingdom
GTS Group Ltd
based microservices. Troubleshoot production issues, ensuring uptime and documenting processes on the internal wiki. Automate deployments, testing processes, and infrastructure provisioning (Terraform, Ansible, GitHub Actions). Implement monitoring and observability solutions for proactive issue detection. Provide occasional support for internal IT infrastructure (e.g., laptops, printers, office networking). Occasionally maintain and support CMS platforms (Magento, Joomla, WordPress). Experience Required … management) Docker containerization Python scripting for automation Git version control Desirable (Future-Facing Skills): Infrastructure as Code (Terraform, Pulumi, Ansible) Container orchestration (Kubernetes) Go development for microservice utilities Modern observability tools (Prometheus, Grafana, Datadog) CI/CD pipeline management (GitHub Actions, GitLab CI, Jenkins) Firewall-as-a-Service solutions (e.g., Cloudflare) Endpoint/device management (e.g., Intune, NinjaOne) Exposure to More ❯
Posted:

Platform Engineer

City of London, London, United Kingdom
TechChain Talent
optimize cloud-native infrastructure across AWS or GCP. Scale and operate Kubernetes clusters for mission-critical use cases. Implement highly resilient patterns: autoscaling, HPA, service mesh, high availability. Manage observability using Prometheus, Grafana, Loki, and OpenTelemetry. Ensure near-zero downtime via automated monitoring and incident response workflows. Optimize infrastructure for security, performance, and cost efficiency. Platform Engineering/DevOps … trading, and on-chain engineering teams. Ideal Tech Stack Languages Go, Rust, TypeScript, Python Cloud AWS or GCP Infrastructure Kubernetes, Helm, Karpenter, EKS/GKE IaC Terraform, Terragrunt, Crossplane Observability Prometheus, Grafana, Loki, OpenTelemetry Security IAM, Vault, KMS, secrets management Web3/On-chain (Strong Plus) Experience with Solana, Anchor, RPC node architecture, indexers Familiarity with high-frequency, real-time More ❯
Posted:

Platform Engineer

London Area, United Kingdom
TechChain Talent
optimize cloud-native infrastructure across AWS or GCP. Scale and operate Kubernetes clusters for mission-critical use cases. Implement highly resilient patterns: autoscaling, HPA, service mesh, high availability. Manage observability using Prometheus, Grafana, Loki, and OpenTelemetry. Ensure near-zero downtime via automated monitoring and incident response workflows. Optimize infrastructure for security, performance, and cost efficiency. Platform Engineering/DevOps … trading, and on-chain engineering teams. Ideal Tech Stack Languages Go, Rust, TypeScript, Python Cloud AWS or GCP Infrastructure Kubernetes, Helm, Karpenter, EKS/GKE IaC Terraform, Terragrunt, Crossplane Observability Prometheus, Grafana, Loki, OpenTelemetry Security IAM, Vault, KMS, secrets management Web3/On-chain (Strong Plus) Experience with Solana, Anchor, RPC node architecture, indexers Familiarity with high-frequency, real-time More ❯
Posted:

Performance Tester

London Area, United Kingdom
Bestman Solutions
performance, scalability, failover, DR, resilience, alerting, and monitoring. Design and execute load, stress, endurance, and failover tests using industry-standard tools such as JMeter, LoadRunner, or ADS. Set up observability dashboards (Grafana, Splunk, Dynatrace, Kibana, or Datadog) to monitor test execution and system performance. Analyse results to identify performance bottlenecks, system vulnerabilities, and areas for optimisation. Report findings and recommendations … business teams. Experience working in Agile delivery environments with cross-functional teams. Nice to Have Background in financial services or experience supporting legacy-to-modernisation migrations. Understanding of infrastructure observability, cloud platforms, and microservice orchestration. Exposure to automation frameworks and scripting for performance testing. This is a key role within a global transformation programme — offering the chance to shape how More ❯
Posted:

Performance Tester

City of London, London, United Kingdom
Bestman Solutions
performance, scalability, failover, DR, resilience, alerting, and monitoring. Design and execute load, stress, endurance, and failover tests using industry-standard tools such as JMeter, LoadRunner, or ADS. Set up observability dashboards (Grafana, Splunk, Dynatrace, Kibana, or Datadog) to monitor test execution and system performance. Analyse results to identify performance bottlenecks, system vulnerabilities, and areas for optimisation. Report findings and recommendations … business teams. Experience working in Agile delivery environments with cross-functional teams. Nice to Have Background in financial services or experience supporting legacy-to-modernisation migrations. Understanding of infrastructure observability, cloud platforms, and microservice orchestration. Exposure to automation frameworks and scripting for performance testing. This is a key role within a global transformation programme — offering the chance to shape how More ❯
Posted:

Head of Infrastructure

City of London, London, United Kingdom
Hybrid/Remote Options
Harnham
the following: Technical tasks Architecting and scaling cloud infrastructure (GCP preferred) and high-performance computing environments Leading the design and implementation of DevOps platforms, CI/CD pipelines, and observability tools (Terraform, Docker, Kubernetes, Jenkins) Partnering with engineering and R&D to define technical roadmaps for compute and infrastructure products Other key responsibilities Managing and mentoring a team, fostering a … GitHub Actions; Terraform or CloudFormation; Prometheus, Grafana, Datadog, or New Relic; Slurm, Torque, LSF; MPI; Hadoop or Spark;Director of In Experience with high-performance computing, distributed systems, and observability tools Strong communication and executive presence, with the ability to translate complex technical concepts for diverse audiences Familiarity with AI/ML operations is a plus BENEFITS The successful Director More ❯
Posted:

Head of Infrastructure

London Area, United Kingdom
Hybrid/Remote Options
Harnham
the following: Technical tasks Architecting and scaling cloud infrastructure (GCP preferred) and high-performance computing environments Leading the design and implementation of DevOps platforms, CI/CD pipelines, and observability tools (Terraform, Docker, Kubernetes, Jenkins) Partnering with engineering and R&D to define technical roadmaps for compute and infrastructure products Other key responsibilities Managing and mentoring a team, fostering a … GitHub Actions; Terraform or CloudFormation; Prometheus, Grafana, Datadog, or New Relic; Slurm, Torque, LSF; MPI; Hadoop or Spark;Director of In Experience with high-performance computing, distributed systems, and observability tools Strong communication and executive presence, with the ability to translate complex technical concepts for diverse audiences Familiarity with AI/ML operations is a plus BENEFITS The successful Director More ❯
Posted:

Senior Rust Software Engineer

England, United Kingdom
Moody's Investors Service
experience designing and working with relational database schemas Excellent problem solving and communication skills, with a collaborative mindset Proficient in incremental software delivery leveraging agile processes Experience with software observability practices (distributed tracing, OpenTelemetry, etc.) Basic understanding of artificial intelligence concepts, with curiosity and enthusiasm for learning how AI tools can be used to improve processes and drive efficiency. Interest … systems Collaborate with cross functional teams including Product, QA, and DevOps Mentor junior engineers and promote engineering best practices Ensure code quality, security, and performance across all deliverables Champion observability and ensure software is observable, maintainable and resilient About the team Our Corp & Gov Technology team is responsible for delivering innovative software solutions that support Moody's public and private More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

Edinburgh, Midlothian, United Kingdom
Hybrid/Remote Options
Lloyds Bank plc
a key role in ensuring the reliability, scalability, and security of our cloud-native data platforms. This is a hands-on engineering role with a strong focus on automation, observability, incident response, and cross-team collaboration Job Description JOB TITLE: Senior Site Reliability Engineer SALARY: £70,929 - £78,810 LOCATION: Edinburgh or Leeds WORKING PATTERN: Hybrid, 40% (or two days … Cloud Engineering roles. Strong knowledge of Cloud platforms: GCP (preferred), AWS or Azure. Proficiency in Terraform, Docker, Kubernetes, and CI/CD tools (e.g., Jenkins, Harness). Experience with observability tools and distributed tracing. Solid understanding of cloud security principles and vulnerability management. Excellent communication and documentation skills. A collaborative mindset and a bias for action. You'll help shape More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Cloud Infrastructure Engineer

Bristol, Avon, South West, United Kingdom
Hybrid/Remote Options
Hargreaves Lansdown
HL version control set) with quality gates, automated testing, security scanning, and progressive delivery. Introduce and run GitOps for Kubernetes (AKS preferred), patterns and multi-environment promotions. Own platform observability: metrics, logs and traces using Azure Monitor/Log Analytics/Application Insights, plus Datadog/Grafana where appropriate. Embed security by design: Azure Policy, Defender for Cloud, secrets management … cluster operations, node pools, networking (CNI), ingress, secrets, RBAC and workload identity. Experience with GitOps, and container build pipelines (e.g., ACR, OPA policies, image scanning). Working knowledge of observability tooling (Azure Monitor, Log Analytics, Application Insights, Datadog/Grafana) and alerting/response workflows. Understanding of the Microsoft Cloud Adoption Framework, Azure Landing Zones and the Well-Architected Framework. More ❯
Employment Type: Permanent, Part Time, Work From Home
Posted:

Senior Data Engineer

England, United Kingdom
Hybrid/Remote Options
Harvey Nash
looking for an experienced Data Engineer to support on an initial 6 Month Contract engagement. You will own their data platform end to end, from ingestion & modelling to orchestration, observability & governance. You'll be responsible for designing & building robust, reliable pipelines, evolving their lakehouse/warehouse layers & enable fast, trustworthy analytics for multiple teams. Tech you'll be working with More ❯
Posted:

Linux Production Engineer

City of London, London, United Kingdom
Autonomai Recruitment
distributed systems Contribute to ongoing improvements in reliability, latency, and scalability Qualifications: Linux expertise with a solid understanding of networking and containerisation Proficiency in at least Python Experience with observability tooling Proven track record in designing and maintaining highly distributed systems Apply now for a confidential chat More ❯
Posted:

Linux Production Engineer

London Area, United Kingdom
Autonomai Recruitment
distributed systems Contribute to ongoing improvements in reliability, latency, and scalability Qualifications: Linux expertise with a solid understanding of networking and containerisation Proficiency in at least Python Experience with observability tooling Proven track record in designing and maintaining highly distributed systems Apply now for a confidential chat More ❯
Posted:

Cloud Engineer - Azure

City of London, London, United Kingdom
Vallum Associates
Azure Container Instances), and ACA (Azure Container Apps). Create and maintain comprehensive documentation on newly implemented features suitable for an enterprise environment. Design and implement robust monitoring and observability tools to track container performance and health. Automate testing processes by utilizing public cloud elasticity and ephemeral resources, ensuring streamlined operations and reduced manual efforts. Contribute to the software development More ❯
Posted:

Cloud Engineer - Azure

London Area, United Kingdom
Vallum Associates
Azure Container Instances), and ACA (Azure Container Apps). Create and maintain comprehensive documentation on newly implemented features suitable for an enterprise environment. Design and implement robust monitoring and observability tools to track container performance and health. Automate testing processes by utilizing public cloud elasticity and ephemeral resources, ensuring streamlined operations and reduced manual efforts. Contribute to the software development More ❯
Posted:

Cloud Engineer

London Area, United Kingdom
algo1
how to manage workloads at scale. Proficient with Infrastructure as Code tools and practices. Comfortable writing automation, configuration, and tooling to simplify operations and reduce manual effort. Knowledgeable about observability tools & best practices. Ability to collaborate across teams with excellent written and verbal communication skills. Nice to Have Qualifications: Experience with multi-cloud and/or hybrid deployments. Knowledge of More ❯
Posted:

Cloud Engineer

City of London, London, United Kingdom
algo1
how to manage workloads at scale. Proficient with Infrastructure as Code tools and practices. Comfortable writing automation, configuration, and tooling to simplify operations and reduce manual effort. Knowledgeable about observability tools & best practices. Ability to collaborate across teams with excellent written and verbal communication skills. Nice to Have Qualifications: Experience with multi-cloud and/or hybrid deployments. Knowledge of More ❯
Posted:

DevOps Lead

Birmingham, West Midlands, United Kingdom
Hybrid/Remote Options
Robert Walters
to improve performance Develop strategies to improve performance across group technology DevOps Lead: Experience Technical dept across but not limited to: Java, UNIX, Linux, Middleware, Web-Logic, Cloud Platforms Observability tools Designing/Developing/Implementing technology advancements Experience of improving resilience of complex production environments The permanent opportunity for a DevOps Lead will pay a salary range of More ❯
Employment Type: Permanent, Work From Home
Salary: £80,000
Posted:

Analytics Engineer

London, United Kingdom
Tenth Revolution Group
optimise BI dashboards and data products using Tableau, translating business needs into visual insights. Orchestrate and monitor data pipelines, ensuring data quality and timely delivery. Implement data quality checks, observability, and maintain data cataloging and lineage. Drive CI/CD practices using GitHub Actions or similar tools. Collaborate with cross-functional teams to improve platform capabilities and analytics maturity. Requirements More ❯
Employment Type: Permanent
Salary: £70000 - £85000/annum
Posted:

Analytics Engineer

London, South East, England, United Kingdom
Tenth Revolution Group
optimise BI dashboards and data products using Tableau, translating business needs into visual insights. Orchestrate and monitor data pipelines, ensuring data quality and timely delivery. Implement data quality checks, observability, and maintain data cataloging and lineage. Drive CI/CD practices using GitHub Actions or similar tools. Collaborate with cross-functional teams to improve platform capabilities and analytics maturity. Requirements More ❯
Employment Type: Full-Time
Salary: £70,000 - £85,000 per annum
Posted:

Python Developer

Hammersmith, England, United Kingdom
Understanding Recruitment
Design schemas and pipelines across Postgres and MongoDB Run CI and CD, improve build times, handle deployments and rollbacks Collaborate with data and ML to productionise models Instrument for observability and own incidents end to end What you will bring 1+ year engineering with strong Python in production Hands on Elasticsearch experience Solid SQL plus practical MongoDB CI and CD More ❯
Posted:

Lead Data Analyst

Greater London, England, United Kingdom
Harnham
ETL/ELT workflows, and reporting environments Ensure system stability, uptime, and SLA compliance through proactive monitoring Lead incident management, root cause analysis, and production deployments Implement automation and observability to improve performance and reduce manual effort Manage L2 support issues and coordinate fixes with Engineering and DevOps teams Drive improvements in data quality, governance, and workflow efficiency Collaborate with More ❯
Posted:

Senior API Software Engineer

Belfast, Northern Ireland, United Kingdom
Trust In SODA
scalable APIs in C#/.NET + Azure - Shape API standards, gateway strategy, versioning & authentication - Drive event-driven integrations and seamless third-party connectivity - Lead API performance, reliability, and observability improvements - Mentor engineers and influence architecture across multiple teams What you bring: - Deep C#/.NET experience in production systems - Strong REST API design + OpenAPI/Swagger knowledge - SQL More ❯
Posted:
Observability
10th Percentile
£56,718
25th Percentile
£67,500
Median
£80,000
75th Percentile
£105,000
90th Percentile
£139,750