676 to 700 of 1,242 Observability Jobs

Site Reliability Engineer

Hiring Organisation: Computappoint
Location: City Of London, England, United Kingdom

where reliability genuinely isn't optional. The role blends application support, platform engineering and SRE practice. It suits someone who leans toward automation and observability over reactive firefighting. Responsibilities: Managing OpenShift and Kubernetes clusters across physical, virtual, and containerised environments Operating observability stacks ( Grafana , Prometheus, Splunk) and driving proactive monitoring … call rotation Key Requirements: Hands-on Kubernetes and/or OpenShift experience in production Scripting skills in Python , Bash, or PowerShell Familiarity with observability tooling and SRE principles SQL and database knowledge (MySQL, Oracle, or similar) Experience supporting .NET, Java, or microservices applications It would be great ...

Site Reliability Engineer

Hiring Organisation: VIQU IT
Location: United Kingdom, Whitechapel, Greater London
Employment Type: Permanent
Salary: £40000 - £50000/annum

Engineer to help improve the reliability, scalability and automation of their AWS estate. This is a hands-on engineering role working across cloud infrastructure, observability, CI/CD and platform tooling, helping development teams deliver faster and more reliably. You’ll be joining a collaborative engineering environment with the opportunity … scalable AWS infrastructure. Develop and manage Infrastructure as Code using AWS CDK. Support CI/CD pipelines and deployment automation. Improve monitoring, logging and observability across distributed systems. Support incident management, root cause analysis and platform reliability improvements. Work closely with engineering and architecture teams to improve operational performance ...

Senior Backend Engineer

Hiring Organisation: SecurityHQ
Location: London, England, United Kingdom

versioned and user-friendly API contracts. Participate in architecture design, code reviews and technical discussions, contributing to overall engineering quality and standards. Quality, Testing & Observability Build and maintain comprehensive test suites including unit, integration, contract and end-to-end testing. Ensure services are fully instrumented with logging, metrics and tracing … support observability in production. Treat testing, monitoring and CI signals as essential components of delivery. Agile Delivery & Continuous Improvement Contribute to agile ceremonies including refinement, estimation and retrospectives. Support continuous improvement across engineering practices, ways of working and use of AI-assisted development tools. Technical Experience & Skills Essential ...

Platform Engineer - Kubernetes / Azure

Hiring Organisation: Keystone Recruitment Partners Ltd
Location: United Kingdom
Employment Type: Permanent
Salary: GBP 450 - 550 Daily

operation of enterprise cloud-native services. The role will focus on Kubernetes-based platforms running on Microsoft Azure, including service mesh, application deployment, observability, security, and production support. Key responsibilities: Build, configure, and support Kubernetes environments on Azure, including AKS. Deploy and manage Spring Boot applications in containerized environments. Work … Ability to work independently and communicate effectively with both technical and non-technical stakeholders. Desirable experience: Terraform, Helm, GitOps, Azure DevOps, or GitHub Actions. Observability tools such as Prometheus, Grafana, OpenTelemetry, or Azure Monitor. Financial services or other regulated enterprise environments. ...

Platform Engineer - Kubernetes / Azure

Hiring Organisation: Keystone Recruitment Partners Ltd
Location: Nationwide, United Kingdom
Employment Type: Permanent, Contract
Salary: £450 - £550/day

Staff Site Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

find and analyze reliability problems across our stack, then design and implement software and systems to create step-function improvements. You will design robust observability solutions, lead incident response, automate operational tasks, and continuously improve our infrastructure's reliability, all while mentoring and educating the broader engineering team to make … reliability a core value at Replit. You Will Architect and Implement Observability: Design, build, and lead the implementation of comprehensive monitoring, logging, and tracing solutions. Create dashboards and metrics that provide real-time visibility into system health and performance, enabling proactive issue detection. Define and Drive Reliability Standards: Work with ...

Integration Developer FTC

Hiring Organisation: itecopeople
Location: London, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £60,000

Build connectors, event-processing services, and data pipelines Design scalable integration patterns, schemas, and event flows Develop CDC pipelines and resilient messaging solutions Improve observability through logging, metrics, and tracing Deploy containerised services using Docker and Kubernetes Contribute to architecture, code reviews, and engineering standards Collaborate with developers, data engineers … design Agile development experience Strong communication and collaboration skills Desirable Skills Go and/or Python CDC pipeline development Azure cloud experience Observability tooling (Prometheus, Grafana, OpenTelemetry) Experience within regulated environments What's on Offer Hybrid working - 2 days per week in London Salary up to £60,900 Generous pension ...

Senior Azure Platform Engineer

Hiring Organisation: Rebel Recruitment
Location: Salford, Greater Manchester, North West, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £90,000

about being part of that journey. You'll be working in an environment where tools like GitHub Copilot, OpenAI models, Claude, Gemini, AI-powered observability platforms, intelligent deployment workflows, and internal AI tooling are actively being explored and introduced to improve how engineering teams work day to day. This … designing and improving Azure infrastructure, evolving Kubernetes platforms within AKS, building reusable Infrastructure-as-Code patterns using Terraform and Crossplane, and helping improve reliability, observability, and security across the wider platform estate. You'll also spend time improving developer tooling and CI/CD processes, helping engineering teams deploy faster ...

Senior DevOps Engineer

Hiring Organisation: Halian Technology Limited
Location: Reading, Berkshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £95,000

reliability, and availability Implement self-service tooling to empower development teams Drive DevOps best practices across the digital product lifecycle Develop and enhance monitoring, observability, and incident response processes Support global engineering teams delivering high-traffic platforms Key Requirements Proven experience supporting digital product delivery in a DevOps or platform … with Infrastructure as Code (Terraform, Ansible, Puppet or similar) Hands-on experience with Kubernetes, Docker, and cloud platforms (AWS preferred) Experience with monitoring/observability tools (Prometheus, Grafana, ELK, APM tools) Solid understanding of system performance, scalability, and resilience Strong collaboration and communication skills within cross-functional product teams Desirable ...

Principal Software Development Engineer

Hiring Organisation: Jobleads-UK
Location: Glasgow, Scotland, United Kingdom

Code, automation frameworks and database‐as‐code practices using tools such as Redgate Flyway. Take ownership of critical customer systems, ensuring operational resilience, observability, performance optimisation and rapid incident response. Collaborate closely with Product, Delivery, Operations and Commercial teams to shape technical solutions, delivery plans and strategic outcomes. Promote secure … Connect or Genesys Cloud. Proven ability to design and deliver secure, scalable and resilient cloud‐native solutions within complex enterprise environments. Strong understanding of observability, operational support, reliability engineering and end‐to‐end ownership practices. Knowledge of regulated financial services environments, including UK GDPR and FCA Consumer Duty requirements. Excellent ...

Senior Software Development Engineer in Test

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

from traditional testing approaches towards a modern, engineering‐led quality strategy: unit testing, contract testing, component testing, integration testing, E2E flows, synthetics, and strong observability across our microservices. We’re looking for a Senior SDET who is hands‐on, highly technical, and passionate about setting teams up for long‐term … teams adopt best practices confidently. Collaborate with Engineering and DevOps to evolve CI/CD pipelines and embed automation earlier in the lifecycle. Improve observability around testing and reliability, integrating logs, traces, metrics, synthetics, and alerts to increase confidence in releases. Promote good testing principles and high‐quality engineering practices ...

Ai Engineer

Hiring Organisation: Morgan McKinley
Location: Yorkshire and Humberside, England, United Kingdom
Employment Type: Full-Time
Salary: Salary negotiable

with Generative and Agentic AI patterns, including LLM integration, RAG architectures, prompt-driven workflows, and AI service orchestration. Integrate AI capabilities with enterprise systems, observability tooling, and security frameworks. Design and maintain CI/CD pipelines within cloud-native engineering environments. Support benchmarking, evaluation, experimentation, and cost optimisation … Skills Experience with Kong API Gateway, Kong Mesh, and Flux CD. RESTful API and microservices development. Terraform and GitOps workflows. Exposure to prompt evaluation, observability, or AI red-teaming tools. SQL and NoSQL database experience. Understanding of vector search technologies and Retrieval-Augmented Generation (RAG) patterns. About You A proactive ...

Lead AI Engineer

Hiring Organisation: Morgan McKinley
Location: Yorkshire and Humberside, England, United Kingdom
Employment Type: Full-Time
Salary: Salary negotiable

Senior Developer

Hiring Organisation: Addition
Location: Watford, Hertfordshire, England, United Kingdom
Employment Type: Full-Time
Salary: £80,000 per annum

Doing: Designing, deploying and managing automation and monitoring platforms that support large-scale applications and services Building and maintaining monitoring, alerting and observability tooling across the platform Creating dashboards that translate complex technical data into meaningful insights for stakeholders Developing automation to integrate new systems using existing frameworks Managing … Docker) Strong Python development skills , including scripting and Lambda functions Experience building and managing CI/CD pipelines , ideally with GitHub Actions Monitoring and observability tooling such as AppDynamics, Grafana, InfluxDB, Graphite, Sensu or similar Experience working with serverless architectures (Lambda, API Gateway, DynamoDB, EventBridge) Solid understanding of Linux/ ...

DevOps Engineer

Hiring Organisation: WTW
Location: Surrey, United Kingdom
Employment Type: Full Time

Managed Identities, Azure networking and Microsoft Entra ID. • Integrate and support security tooling and quality gates, including Mend, Snyk, Invicti, Wiz and GitLeaks. • Improve observability and feedback across build, deployment and environment health using tools such as Datadog, Azure Monitor and Log Analytics. • Help development teams diagnose delivery, deployment … troubleshooting. • Experience embedding security, quality and compliance checks into delivery pipelines, including vulnerability scanning, container scanning, secrets scanning and release evidence. • Good understanding of observability practices, including logs, metrics, dashboards, alerts and environment health checks. • Strong troubleshooting skills across pipelines, deployment automation, Kubernetes workloads, cloud configuration and environment issues. • Ability ...

Principal Machine Learning Engineer

Hiring Organisation: Jobleads-UK
Location: City Of London, England, United Kingdom

evolve the ML Platform, ensuring it supports: Reusable and scalable deployment patterns CI/CD for machine learning Full model lifecycle management Monitoring, observability, and alerting Secure and compliant operation Shape platform standards and interfaces that enable consistent ML delivery across squads and value streams Lead technical spikes and proof … fundamentals (OOP, testing, design patterns). Deep experience building, deploying, and operating production ML systems, including: online and batch model serving, monitoring, alerting, and observability, retraining and lifecycle management. Strong understanding of core data science concepts, sufficient to: review and challenge modelling approaches, ensure models are production‐ready and correctly ...

DevOps Engineer

Hiring Organisation: WTW
Location: Surrey, United Kingdom
Employment Type: Full Time

infrastructure. Automate environment provisioning across development and production. Manage backend state, pipelines, and state-change detection integrations. Platform Engineering & SRE Own and improve reliability, observability, and performance of the platform. Implement SLOs, alerting, dashboards, and auto remediation where possible. Troubleshoot cluster level, networking, and workload deployment issues. Lead root cause … endpoints, Certificate/Secret management etc Strong debugging and operational experience (SRE mindset). Solid experience of DevSecOps architecture, processes & tooling Solid understanding of Observability Process & Tooling Logging, metrics, traces, dashboards Other highly desirable, but not essential skills are: Experience with: GitOps - ArgoCD or GitOps workflows Zero downtime deployments (blue ...

Principal Software Development Engineer

Hiring Organisation: Jobleads-UK
Location: Manchester, England, United Kingdom

/CD pipelines, Infrastructure as Code, automation frameworks, and database-as-code practices using Redgate Flyway.Take ownership of critical customer systems, ensuring operational resilience, observability, performance optimisation, and rapid incident response.Collaborate closely with Product, Delivery, Operations, and Commercial teams to shape technical solutions, delivery plans, and strategic outcomes.Promote secure … Connect or Genesys Cloud.Proven ability to design and deliver secure, scalable, and resilient cloud-native solutions within complex enterprise environments.Strong understanding of observability, operational support, reliability engineering, and end-to-end ownership practices.Knowledge of regulated financial services environments, including UK GDPR and FCA Consumer Duty requirements.Excellent communication and stakeholder management ...

Devops Platform Engineer

Hiring Organisation: hireful
Location: Manchester / Work from home, Greater Manchester, United Kingdom
Employment Type: Permanent
Salary: £75000 - £85000/annum £80,000 - £85,000 + 10% Bonus + Exte

We are recruiting founding Platform Engineers on behalf of a fast-growing enterprise level (global, 500+ staff) software business with a strong engineering culture and a genuine commitment to doing things the right way. They ...

Devops Platform Engineer

Hiring Organisation: hireful
Location: London, United Kingdom
Employment Type: Permanent
Salary: £75000 - £85000/annum £80,000 - £85,000 + 10% Bonus + Exte

Infrastructure Engineer-Devops, SASE

Hiring Organisation: HCLTech
Location: Leeds, England, United Kingdom

HCLTech is a global technology company, spread across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients ...

Infrastructure Engineer-Devops, Palo alto

Hiring Organisation: HCLTech
Location: Manchester Area, United Kingdom

Senior Product Manager, FS Resilience & Market Data

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

ITRS is looking for a Senior Product Manager based in London to lead in delivering critical IT observability solutions. The role involves defining product strategy and engaging with Tier 1 financial institution customers to ensure the roadmap aligns with real needs. You will work on key projects including financial trading ...

Cloud SRE: Resilient, Scalable Infra & Automation

Hiring Organisation: Jobleads-UK
Location: Glasgow, Scotland, United Kingdom

Scotland. This role involves ensuring the reliability, availability, and performance of our healthcare platforms that support millions worldwide. You will work to improve system observability, automate processes, and lead initiatives to enhance platform resilience. The successful candidate will have at least 3 years of related experience, a passion for operational ...

Product Engineer

Hiring Organisation: Radley James
Location: London Area, United Kingdom

integrations end-to-end. The role involves: SuiteQL, SuiteTalk REST/SOAP, SuiteScript OneWorld + multi-entity accounting complexity Integration architecture, sync logic, observability Customer onboarding and troubleshooting Python backend systems and distributed workflows Experience with QBO, Xero, Sage Intacct, Ramp, Bill.com, or similar accounting platforms is highly valuable. Strong ...