501 to 525 of 572 Observability Jobs in England

DevSecOps Capability Manager

Hiring Organisation: WRK DIGITAL LTD
Location: Humber, Devon, UK
Employment Type: Full-time

Direction \n \n Set DevSecOps strategy across pipelines and security automation \n Establish governance for CI/CD, IaC, and cloud delivery \n Define observability standards (SLOs, tracing, dashboards) \n Embed security into pipelines (SAST, SCA, DAST, secrets, IaC scanning) \n Govern \"Golden Path\" templates and adoption \n\n Operational … DevSecOps, and security integration \n Strong cloud, containerisation, and IaC knowledge \n Proven ability to improve DORA and engineering performance metrics \n Experience with observability and monitoring frameworks \n Strong background in security tooling (SAST, SCA, DAST, scanning tools) \n Solid understanding of cloud security, IAM, and zero-trust principles ...

Digital Senior Full Stack Engineer

Hiring Organisation: Leeds Building Society
Location: Leeds, West Yorkshire, Yorkshire, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £75,000

services. You'll lead complex technical delivery, champion modern engineering practices and help shape high-quality solutions through clean architecture, automation, CI/CD, observability and secure-by-default development. Just as importantly, you'll coach and mentor other engineers, raise standards across the squad and define ways of working. … leading code/design reviews; uplifting test automation and quality gates. Ability to influence stakeholders across Product, Architecture, InfoSec, Risk and Operations; governance experience. Observability experience: metrics, logs, traces; operational ownership of services. Experience of supporting UI/UX Design would be beneficial And in return ...

Senior Backend Developer (Python, AI, API) - Remote

Hiring Organisation: Nicoll Curtin Technology
Location: London, United Kingdom
Employment Type: Permanent
Salary: GBP Annual

Senior Backend Engineer - API, Python, Nextjs, Node, AI, Workflow Automation I am seeking a Senior Backend Engineer who thrives on solving complex operational challenges at scale. This isn't a feature factory role, it's ...

Data Platform Engineer

Hiring Organisation: Connells Limited
Location: Milton Keynes, Buckinghamshire, UK
Employment Type: Full-time

Job Description We are seeking an Azure Platform Engineer to join our Group Technology team in Milton Keynes on a 6-month contract basis. You will play a key role in delivering the Connells Group ...

Platform SRE for Observability — Satellite Networks

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Aalyria Technologies in London is seeking a senior SRE to design and build a centralized observability platform for satellite‐level systems. You will shape the strategy, implement best practices, and automate the stack using Terraform and ArgoCD. You will own SLOs/SLIs, contribute to incident response, and partner with ...

Staff Data Engineer – Data Quality & Governance

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

application to adjustments@depop.com. For any other non-disability related questions, please reach out to our Talent Partners.RoleWe’re building a Data Quality, Observability & Governance Team to improve the reliability, trust, and compliance of Depop’s data ecosystem.As a Staff Data Engineer in this team, you’ll lead the design … reduce the mean time to detection and resolution of data incidents, by establishing data contracts between producers and consumers, developing robust data observability systems, and embedding governance and GDPR compliance principles across the data lifecycle.You’ll collaborate with product engineering, data platform, analytics, and legal teams to build confidence ...

Splunk Lead Engineer

Hiring Organisation: VIQU IT
Location: London, Bishopsgate, United Kingdom
Employment Type: Contract
Contract Rate: £550 - £700/day Inside IR35

client a leading finance house are looking for a Lead Splunk Engineer to take the lead in the design and implementation of monitoring and observability patterns and standards within the Observability Team. This role will act as a technical authority, ensuring best practices are followed, automation first approach is taken … mentoring the team to build sustainable capability, advocate monitoring and observability best practice to the wider technology domain. For this opportunity you will have proven skills in: · Attention to detail with the ability to craft concise, informational user documentation · Experience of researching and developing solutions that expand, modernise or improve ...

Splunk Lead Engineer - SC Cleared

Hiring Organisation: VIQU Ltd
Location: London, United Kingdom
Employment Type: Contract
Contract Rate: GBP Daily

client a leading finance house are looking for a Lead Splunk Engineer to take the lead in the design and implementation of monitoring and observability patterns and standards within the Observability Team. This role will act as a technical authority, ensuring best practices are followed, automation first approach is taken … mentoring the team to build sustainable capability, advocate monitoring and observability best practice to the wider technology domain. For this opportunity you will have proven skills in: Attention to detail with the ability to craft concise, informational user documentation Experience of researching and developing solutions that expand, modernise or improve ...

Senior Applied Scientist, Insights, Prime Video

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

will work alongside other scientists and engineering teams to deliver your research into production systems. About the team Our team owns Prime Video observability features for development teams. We consume PBs of data daily which feed into multiple observability features focussed on reducing the customer impact time. Basic Qualifications Experience ...

Front Office Quantitative Developer

Hiring Organisation: Lorien Resourcing Limited
Location: City of London, London, United Kingdom
Employment Type: Contract

across global markets. Key Responsibilities Develop and enhance core electronic trading platform components. Deliver strategic platform improvement and roadmap initiatives. Build real-time monitoring, observability and performance tools. Improve platform scalability, reliability and operational efficiency. Support release management and production environments. Work closely with traders, quants and technology teams … across the wider platform environment, including: CI/CD and build pipelines Release and deployment processes Developer tooling and platform improvements Monitoring, alerting and observability Production support and troubleshooting This is an excellent opportunity for a senior engineer who enjoys combining electronic trading, quantitative development and platform engineering within ...

SRE Managing Consultant

Hiring Organisation: Akkodis
Location: City of London, London, United Kingdom
Employment Type: Permanent
Salary: £90000 - £100000/annum

include: Define and embed SRE engagement models aligned to modern engineering and traditional ITSM/ITIL practices Establish SLIs, SLOs, and Error Budgets Shape observability strategies using metrics, logs, and traces Design incident response models and post-incident learning loops Reduce toil through automation and engineering excellence Deliver SRE capability … Looking For Extensive experience in SRE, cloud operations, or DevOps Proven consulting or advisory background Experience with AWS, Azure, or GCP Strong observability and incident management expertise Ability to obtain UK SC clearance Modis International Ltd acts as an employment agency for permanent recruitment and an employment business ...

Operations Engineer

Hiring Organisation: Ascent Resourcing Limited
Location: Birmingham, West Midlands, England, United Kingdom
Employment Type: Full-Time
Salary: £55,000 - £60,000 per annum

continuity. Key Responsibilities Provide operational support for enterprise platforms, applications, integrations, and associated technologies. Monitor system health, availability, and performance using monitoring, alerting, and observability tools. Analyse, troubleshoot, and resolve incidents affecting services and platforms. Perform root cause analysis and contribute to implementing permanent solutions to prevent recurring issues. Coordinate … within IT operations, support engineering, or service management environments. Experience supporting business-critical production services and operational platforms. Knowledge of monitoring, logging, alerting, and observability practices. Experience working with incident, problem, change, and release management processes. Excellent communication skills with the ability to collaborate effectively across multiple technical and business ...

Senior Site Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

East Coast preferred, for timezone coverage) About the team Cloud Infrastructure owns the platform every Synthesia product runs on — AWS, Kubernetes, MongoDB, Temporal, our observability stack, and the vendor and cost relationships underneath them. We're a small, high-leverage team scaling toward a domain-ownership model: small groups that … automate based on risk and blast radius , not just time saved. A platform domain — over time, deep ownership of a domain such as Temporal, observability, or Kubernetes operations, partnering with the engineers building in it. Vendor & third-party management — own key external relationships and integrations (e.g. LLM API providers, third ...

AI Ops Platform Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

ready, auditable, compliant and scalable across merchant payment use cases. Accountable for the end‐to‐end engineering of GenAI and ML platforms, embedding governance, observability and operational resilience by design, while enabling teams to deploy and run AI solutions with clarity, assurance and accountability at scale. Qualifications and Experience Supporting … platforms with embedded governance approaches, including policy‐as‐code, guardrails, alignment to model risk frameworks, and maintaining lifecycle traceability with audit‐ready evidence. Applying observability and reliability practices to AI platforms, contributing to service level measures and monitoring latency, cost, quality and failure modes, supported by tools such as CloudWatch ...

Senior Full stack Developer - Birmingham - Perm,

Hiring Organisation: INFUSED SOLUTIONS LIMITED
Location: Birmingham, UK
Employment Type: Full-time

root causes of recurring technical problems and implementing long-term solutions.Improving platform reliability, resilience, and overall product quality.Performing application profiling, performance tuning, and optimisation.Enhancing observability, monitoring, alerting, and diagnostic capabilities.Working with engineering teams to improve development practices and technical standards.Reducing technical debt and identifying opportunities for platform improvement.Reviewing existing systems … ownership of technical challenges.Strong communication skills and the ability to collaborate effectively across engineering teams.DesirableExperience working on SaaS platforms or cloud-based applications.Exposure to observability and monitoring tools.Experience with performance profiling and optimisation techniques.Knowledge of scalability, resilience, and reliability engineering principles.Familiarity with CI/CD pipelines and modern software delivery ...

Principal ML Platform Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

GPUs and cloud infrastructure. Develop internal tools and abstractions and agentic systems that reduce operational overhead for researchers and engineers. Drive improvements across observability, automation, reliability, and developer experience. Collaborate closely with researchers and product engineers to understand pain points and turn them into robust platform capabilities. Contribute to technical … model serving systems in production. Supporting research or data‐intensive workloads. Working with GPU‐based systems or other performance‐sensitive infrastructure. Experience with observability and debugging in distributed systems. Familiarity with Terraform, Datadog, GitHub Actions, or similar tools. Bonus points for Experience building agentic or LLM‐powered internal tools. Experience ...

Architect, Staff & Senior Systems Software Engineer London, UK

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

topology. Engineer for Reliability at Scale: Make distributed inference dependable across failure domains (fault handling, graceful degradation, load balancing, and recovery), and define the observability, tracing, and tooling standards that let teams diagnose problems across the runtime, network, and accelerator rather than through logs alone. Drive Bring‐Up: Evaluate system … Nice to have Experience with dataflow or non‐GPU accelerator architectures; pre/post‐silicon bring‐up on custom hardware (ASIC/FPGA); production observability at scale (hardware counters, Prometheus/Grafana‐style export, device and cluster views). Compensation & Equity Competitive Salary: Commensurate with your experience, skills, and location. ...

Architect, Staff & Senior Systems Software Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

topology. Engineer for Reliability at Scale: Make distributed inference dependable across failure domains (fault handling, graceful degradation, load balancing, and recovery), and define the observability, tracing, and tooling standards that let teams diagnose problems across the runtime, network, and accelerator rather than through logs alone. Drive Bring-Up: Evaluate system … Nice to have Experience with dataflow or non-GPU accelerator architectures; pre/post-silicon bring-up on custom hardware (ASIC/FPGA); production observability at scale (hardware counters, Prometheus/Grafana-style export, device and cluster views). Adjacent depth is welcome: HPC cluster design, high-speed networking, distributed ...

Staff Reliability Engineer (Full Stack)

Hiring Organisation: Feeld
Location: Greater London, United Kingdom
Employment Type: Full Time
Salary: 100000 to 130000 GBP Annually

Native). Lead technical problem-solving during incidents: coordinate response, diagnose root causes, communicate status, and drive to resolution. Build and evolve monitoring/observability (dashboards, alerts, tracing, logging) that enables fast detection and diagnosis. Drive post‐incident reviews (blameless) and ensure learnings become durable fixes (tech changes, runbooks, automation … comfort working across services and APIs. Proven incident response leadership: on-call participation, triage, mitigation, and root-cause analysis (RCA) with follow-through. Solid observability skills: practical experience with logging/metrics/tracing and turning signals into actionable alerts and dashboards. Experience collaborating with mobile teams and understanding mobile ...

Lead Software Engineer - Proxy/SSE Network Security

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

resilience outcomes. Drive operational excellence at scale for perimeter, proxy, and SSE services in the US, including incident, change, and problem management rigor, observability and resiliency validation practices, automation to improve repeatability and evidence quality, reduction of client and partner impact, and execution of Technology Lifecycle Management (TLM) and modernization … design, exception frameworks, audit-ready traceability, and measurable risk reduction reporting. Experience with large-scale operations for externally facing or security enforcement services, including observability strategy, resilience testing, incident response alignment, and reduction of repeat incidents and client-impacting events. Experience designing and operating hybrid edge architectures and cloud interconnect ...

Staff SRE: Observability, Automation & Global Reliability

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

London. This role focuses on the reliability, scalability, and performance of Replit's infrastructure serving millions of users worldwide. You will work on designing observability solutions, leading incident response, and automating operational tasks while mentoring other engineers. The ideal candidate has extensive experience in Site Reliability Engineering, strong programming skills ...

GenAI Python Engineer/Hybrid

Hiring Organisation: iBSC
Location: Sheffield, Yorkshire, United Kingdom
Employment Type: Contract
Contract Rate: GBP Annual

against ground truth. Work closely with architects, platform teams, and business stakeholders to deliver scalable and secure solutions. Follow enterprise standards for security, governance, observability, and performance. Required Skills and Experience Strong experience in AI/ML engineering, with hands-on exposure to Generative AI use cases. Experience in building … based AI stack. Experience with high-volume document processing. Familiarity with enterprise architecture, security, and compliance controls. Exposure to monitoring, model evaluation, and AI observability tools. Preferred Profile Able to independently build and deploy GenAI applications from ingestion to retrieval and evaluation. Strong problem-solving skills with a practical implementation ...

Staff Engineer

Hiring Organisation: 17918
Location: London, United Kingdom

engineering standards, best practices and reusable patterns while partnering with Enterprise Architecture and influencing technical direction Drive engineering excellence by improving code quality, testing, observability, reliability and operational practices Support end-to-end delivery by guiding teams through complex technical challenges, improving decision-making, and contributing to planning and risk … data lakes/lakehouse architectures, Iceberg or similar table formats, as well as batch and streaming processing Knowledge of data quality, governance, cataloguing and observability tools (e.g. Datadog), with DBT or AI-assisted engineering practices as a plus Additional Information Your benefits Werea community here that cares as much about ...

Staff Engineer

Hiring Organisation: Stepstone UK
Location: South East London, London, United Kingdom
Employment Type: Permanent

Senior Software Product Strategy & Product Marketing Lead — Data Center AI and Personal AI - Qu[...]

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

center offerings. Define European requirements for AI infrastructure software, including model serving, orchestration, workload management, developer tools, runtime environments, compilers, SDKs, containerization, Kubernetes integration, observability, benchmarking, security, and deployment workflows. Develop European messaging around performance, power efficiency, total cost of ownership, software maturity, deployment flexibility, openness, sovereignty, and integration with … serving, and performance optimization across latency, throughput, and power. Understanding of AI infrastructure software stack, including Linux, containers, Kubernetes, and cloud‐native deployment patterns, observability frameworks (e.g., OpenTelemetry), CNCF ecosystem, and integration with enterprise or hyperscaler data center control planes. Solid knowledge of server and system architecture, including ...