Observability Jobs in London

1 to 25 of 363 Observability Jobs in London

SAP Sovereign Cloud Expert DevOps Engineer

London, United Kingdom
SAP SE
IAM, networking security (NSGs, ASGs), and governance policies to ensure compliance and risk mitigation. Monitoring & Logging : Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation : Strong scripting skills in PowerShell, Bash, and Python , along with automation frameworks like Ansible . Collaboration & Problem-Solving : Ability to work closely with development, security More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Monitoring & Observability Engineer

South East London, London, United Kingdom
COMPUTACENTER (UK) LIMITED
GPS). Our teams operate across the UK, Germany, France, and India, delivering complex, enterprise-grade IT solutions and consultancy across infrastructure, cloud, and modern operations. As a Monitoring & Observability Engineer, you'll work in high-impact delivery teams that support some of the worlds most well-known organisations. Youll play a key role in helping our customers achieve greater … visibility, performance, and reliability across their IT estatescontributing to their operational success through proactive insight and incident prevention. What you'll do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms … with ITSM tools (e.g. ServiceNow) and CI/CD pipelines to enable proactive alerting and resolution workflows Act as a Monitoring & Observability SME within customer delivery teams Support incident response activities and postmortems by identifying patterns, root causes, and optimisation opportunities Work collaboratively with cross-functional teams to define and implement best practices in observability and monitoring Attend customer and More ❯
Employment Type: Permanent
Posted:

Monitoring & Observability Engineer

London, United Kingdom
Computacenter AG & Co. oHG
Select how often (in days) to receive an alert: Monitoring & Observability Engineer Life on the team At Computacenter, you'll be joining a world-class team of over 1,000 skilled professionals within Group Professional Services (GPS). Our teams operate across the UK, Germany, France, and India, delivering complex, enterprise-grade IT solutions and consultancy across infrastructure, cloud, and … modern operations. As a Monitoring & Observability Engineer, you'll work in high-impact delivery teams that support some of the world's most well-known organisations. You'll play a key role in helping our customers achieve greater visibility, performance, and reliability across their IT estates-contributing to their operational success through proactive insight and incident prevention. What you'll … do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms with ITSM tools (e.g. ServiceNow) and CI/CD pipelines to enable proactive alerting and resolution workflows Act as a Monitoring More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Stratospherec Ltd
one or more public cloud providers such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and More ❯
Employment Type: Permanent
Salary: £85000 - £90000/annum Excellent Benefits package
Posted:

Senior Site Reliability Engineer (SRE) / Unix

London, United Kingdom
Morgan Hunt UK Limited
Objective (RPO) of zero . Conduct DR testing (3 scheduled tests per financial year, potentially outside core hours). Maintain CommVault backup administration (Oracle DB, RHEL, MongoDB). Monitoring & Observability Support logging & observability stacks (InfluxDB, Grafana, Prometheus, Nagios). Enhance monitoring via REST APIs, time-series databases, and full-stack tools (TICK, Elasticsearch, OpenSearch). Promote SLO/SLI measurement …/Perl) . Load balancers (HAProxy, Keepalived) . Containers & Orchestration: Docker, Kubernetes, OpenShift . Cloud & IaC: AWS (VPC, EC2, S3, NLB) . Terraform/CDK for automation . Monitoring & Observability: Prometheus, Grafana, InfluxDB, Nagios . Full-stack admin (Elasticsearch, Fluentd, OpenSearch) . Methodologies: Agile (Scrum/Kanban), CI/CD, IaC principles . Risk-aware, customer-focused, proactive problem-solving More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer SRE / Unix

London, South East, England, United Kingdom
Morgan Hunt Recruitment
Objective (RPO) of zero . Conduct DR testing (3 scheduled tests per financial year, potentially outside core hours). Maintain CommVault backup administration (Oracle DB, RHEL, MongoDB). Monitoring & Observability Support logging & observability stacks (InfluxDB, Grafana, Prometheus, Nagios). Enhance monitoring via REST APIs, time-series databases, and full-stack tools (TICK, Elasticsearch, OpenSearch). Promote SLO/SLI measurement …/Perl) . Load balancers (HAProxy, Keepalived) . Containers & Orchestration: Docker, Kubernetes, OpenShift . Cloud & IaC: AWS (VPC, EC2, S3, NLB) . Terraform/CDK for automation . Monitoring & Observability: Prometheus, Grafana, InfluxDB, Nagios . Full-stack admin (Elasticsearch, Fluentd, OpenSearch) . Methodologies: Agile (Scrum/Kanban), CI/CD, IaC principles . Risk-aware, customer-focused, proactive problem-solving More ❯
Employment Type: Contractor
Rate: £550 per day
Posted:

Senior Azure Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Nordcloud group
to L3 networking Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Hosting technologies such as IIS, nginx, Apache, App Service, LightSail Analytical and creative approach to problem More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

SRE Engineer

London, South East, England, United Kingdom
Robert Walters
will involve designing robust software solutions that enhance system performance while ensuring high availability for critical applications. You will work hand-in-hand with product engineering teams to improve observability tools and telemetry systems, driving forward automation initiatives that reduce manual intervention. By participating in incident management processes-facilitating transparent communication with stakeholders and leading blameless post-mortems-you will … a focus on automating these activities wherever possible.* Provide on-call support during production incidents outside standard working hours as required by the business needs.* Contribute to enhancing product observability and telemetry by supporting ongoing modernisation efforts within the infrastructure.* Collaborate closely with engineering teams to brainstorm ideas that simplify infrastructure management and streamline SRE practices. What you bring: * Proficiency More ❯
Employment Type: Contractor
Rate: £400 - £500 per day
Posted:

Solace Messaging Administrator

London, Clerkenwell, United Kingdom
Eligo Recruitment Ltd
our enterprise messaging infrastructure, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, network optimization, and system observability using industry-standard monitoring tools. Required Skills & Qualifications: 3+ years of experience administering enterprise-grade messaging systems. Strong background in production support, preferably in a 24x7 enterprise environment. Experience working More ❯
Employment Type: Permanent
Posted:

Solace Messaging Administrator

London, South East, England, United Kingdom
Eligo Recruitment
our enterprise messaging infrastructure, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, network optimization, and system observability using industry-standard monitoring tools. Required Skills & Qualifications: 3+ years of experience administering enterprise-grade messaging systems. Strong background in production support, preferably in a 24x7 enterprise environment. Experience working More ❯
Employment Type: Full-Time
Salary: Competitive salary
Posted:

Vice President, DevOps Engineer (NE) (London)

Highgate, Greater London, UK
Hybrid / WFH Options
BlackRock, Inc
access to the best tools available. We combine problem-solving skills with software and systems engineering to take a proactive approach in building fault-tolerant and secure systems, improving observability and zealously automating away toil. In this role you will: Use your site reliability expertise to design, operate and support Preqin's infrastructure, middleware and internal services. Improving their performance More ❯
Employment Type: Full-time
Posted:

Principal DevOps Engineer - AWS (London)

London, UK
NTT DATA
to architect secure, performant, and highly available cloud solutions. Proficiency with monitoring and log analytics tools such as AWS CloudWatch, ELK Stack, Prometheus, Datadog, or New Relic, to maintain observability and ensure operational excellence. Demonstrated leadership skills in managing complex, high-pressure situations and guiding teams through incident resolution. Exceptional communication and presentation skills, with proven experience engaging with senior More ❯
Employment Type: Full-time
Posted:

Head of SRE and Production Engineering (London)

London, UK
SS&C Technologies
reliability of cloud and hybrid infrastructure powering some of the most critical client-facing applications in financial services. You will be the strategic and operational leader for platform reliability, observability, incident response, CI/CD modernisation, and developer productivity. You will drive automation, lead with metrics, and build systems and teams that proactively address issues before they impact clients. Key … and services. Implement a comprehensive incident management lifecycle (on-call, escalation, RCA, blameless postmortems). Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automated observability, alerting, and playbooks. CI/CD and Platform Engineering Oversee the development and evolution of CI/CD pipelines for all GIDS products using GitHub Actions, ArgoCD, TeamCity, Octopus Deploy … and GitOps principles. Integrate static and dynamic code analysis, vulnerability scanning, artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause More ❯
Employment Type: Full-time
Posted:

Network Security Engineer - London

London, United Kingdom
Hybrid / WFH Options
Analyticsengineering
IT workflows. Your responsibilities will also include developing CI/CD pipelines tailored for IT infrastructure, enhancing deployment efficiency, and integrating robust network security measures. You will establish comprehensive observability and proactive issue resolution strategies. We are seeking individuals passionate about network automation, security, and scalable IT solutions that enhance both campus and cloud network operations. You should possess extensive More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior AWS Engineer

London
Hybrid / WFH Options
BAE Systems
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Employment Type: Permanent
Posted:

Lead DevOps Engineer

London, United Kingdom
Hippo Digital Limited
ensure code quality and reliability; Experience of work with Docker for containerisation and application packaging; Experience of implementing and managing monitoring solutions, with experience in Prometheus and Grafana for observability and alerting. Experience of implementing and managing robust security practices, including Encryption (TLS) and Secret Management in the Cloud; Experience of leveraging GitLab API for advanced automation, integration, and reporting More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Global IT Quality Engineer Senior Director & CoE Lead

London, United Kingdom
The Boston Consulting Group GmbH
upgrades, ensuring comprehensive testing across third-party and custom-built applications. Establish Advanced Performance Engineering: Establish a robust performance engineering strategy, integrating advanced tools for application performance monitoring (APM), observability, and telemetry. Focus on early identification of performance bottlenecks and quality assurance measures tailored for large-scale enterprise systems, ensuring seamless functionality across platforms. Collaborate Across Cross-Functional Teams/ More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Production Support Engineer

London, United Kingdom
TP ICAP Group
CI/CD pipelines, infrastructure as code (IaC), and automated testing. Experience with industry-standard monitoring tools (ITRS or similar) Proficiency in managing Kubernetes clusters, including deployment, scaling, storage, observability, and lifecycle management Understanding of financial regulations and reporting requirements in Europe such as MiFID II Person Profile The role will suit someone who relishes the prospect of supporting an More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Stott and May
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for … performance and security - Respond to production incidents, perform root cause analysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes - Automate infrastructure tasks with Python, Bash, Go or SQL - Work with Git-based workflows for infrastructure as code - Troubleshoot Kubernetes workloads and containerised services More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

South West London, London, England, United Kingdom
Oscar Technology
experienced Site Reliability Engineer (SRE) to join them on a 6-month contract (outside IR35) You'll be leading efforts acriss AWS and Azure Cloud environments, focusing on automation, observability, infrastructure as code and performance at scale. Stakeholder engagements and strong communication is essential in this role, so if you've been in a start-up/smaller team- this … scripting (Python, Bash, PowerShell), and cloud architecture Comfortable with containerisation and orchestration ( Docker, Kubernetes ) Understanding of networking, DNS, IAM, and load balancing in cloud environments Hands-on experience with observability tooling and production-level troubleshooting If this sounds like you, it's a great opportunity so apply now! Site Reliability Engineer - AWS/Azure | Outside IR35 | £450-500/day More ❯
Employment Type: Contractor
Rate: £450 - £500 per day
Posted:

VP of Platform Engineering (London)

Wandsworth, Greater London, UK
YouLend
infrastructure provisioning and tooling to enhance development efficiency. You will manage Platform Reliability and Infrastructure ensuring a reliable and stable platform. You will oversee YouLend's the Security and Observability frameworks , focusing on platform security, maintaining observability, and providing dashboards for developers to monitor service health. The ideal candidate is someone who has successfully built and scaled platform architectures, led … the ability to work across technical and non-technical teams. Excellent communication skills, with the ability to translate complex technical concepts to business stakeholders. Operational Focus: Expertise in platform observability, monitoring, incident management, and creating highly reliable systems. Experience implementing SLAs, SLOs, and SLIs is a plus. Security & Compliance: In-depth understanding of platform security, data privacy, and regulatory compliance More ❯
Employment Type: Full-time
Posted:

Principal SRE Engineer

London, South East, England, United Kingdom
Robert Walters
incidents using data-driven decision making to minimise downtime and financial impact while leading root cause analysis and conducting blameless post-mortems.* Enhance application health monitoring by implementing robust observability solutions and automating manual processes to improve system resilience.* Drive cost optimisation initiatives and manage capacity resources to ensure efficient and scalable operations across all FX trading platforms.* Collaborate with … Deep technical expertise in Linux/Unix systems administration combined with strong SQL skills and proficiency in scripting languages such as Python or Java.* Demonstrated experience with monitoring and observability tools including Prometheus, Grafana, Splunk, Geneos, OpenTelemetry or Corvil is highly desirable.* Familiarity with cloud platforms as well as containerisation technologies like Kubernetes or Docker alongside CI/CD pipeline More ❯
Employment Type: Full-Time
Salary: £110,000 - £125,000 per annum
Posted:

Senior DevOps Engineer - FinOps - Enabling Services

London, United Kingdom
Hybrid / WFH Options
Lloyds Bank plc
teams to build cost-effective solutions on GCP while maintaining agility and fostering innovation. This position is perfect for engineers who are passionate about optimising cloud usage, enhancing cost observability, and championing a FinOps culture. What you'll do Partner with engineering, finance and product teams to drive cost-efficiency across GCP Design and implement automation to boost cost optimisation … had GCP certifications (e.g. Professional Cloud DevOps Engineer, Professional Cloud Architect) FinOps Foundation certifications (e.g. Practitioner, Engineer) Familiarity with security tools e.g. Hashicorp Vault, Aquasec, Nexus IQ. Knowledge of observability tools e.g. Dynatrace. Experience in cost management tools e.g. Cloudability. About working for us Our focus is to ensure we're inclusive every day, building an organisation that reflects modern More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal DevOps Engineer (London)

London, UK
TP ICAP Group
testers and operations to automate builds, deployment and release of applications running in the cloud and on-premise Provide guidance on industry best practices for software deployment, development, and observability Engineer tooling to implement those practices Assist and architect where appropriate solutions using containerisation and serverless technologies Drive automation for environment management, logging and monitoring Engage with vendors and service … stack CI/CD, GitLab, Jenkins, Sonatype Nexus Knowledge and working experience of containerising application components including writing DockerFiles and deploying to Kubernetes Deep understanding of pipelines as code Observability concepts and tooling; Opensearch, Cribl, Grafana, Prometheus, CloudWatch Experience of working with agile teams Job Band & Level: Manager/7 #Li-Hybrid #LI-MID Not The Perfect Fit? Concerned that More ❯
Employment Type: Full-time
Posted:

Senior Network Security Engineer

London, United Kingdom
CFP Energy (UK) Ltd
e.g., Slackbots and integrations) to streamline IT operations and business processes. Monitoring and Maintenance: Manage and maintain network security systems through system patches and periodic maintenance tasks. Establish comprehensive observability and proactive issue-resolution strategies using tools like SNMP, Syslog, Netflow, Elasticsearch (ELK Stack), and Grafana. Collaboration and Communication: Work with CyberEnergiateams to identify functional needs, develop secure architectures, and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Observability
London
10th Percentile
£65,000
25th Percentile
£73,125
Median
£82,500
75th Percentile
£108,125
90th Percentile
£120,000