501 to 525 of 561 Observability Jobs

Lead DevOps Engineer (Azure)

Hiring Organisation
Reed Technology
Location
East Anglia, United Kingdom
Employment Type
Permanent
Salary
£75,000
pipeline templates, PR/branch policies, approvals and gated releases * Creating 'golden path' delivery patterns so teams can deploy without bespoke pipelines Operational readiness & observability * Defining monitoring, logging, alerting and dashboards * Improving incident response, runbooks and recovery processes * Shaping DR and operational processes (no on-call at present) Ways …/CD engineering experience * Experience implementing governance, security guardrails and delivery controls * Comfortable operating without an existing DevOps team Desirable * Azure Policy at scale * Observability, SRE or platform engineering practices * Container/AKS experience * Cost governance and showback/chargeback experience Why this role? * Opportunity to own and shape DevOps ...

Cloud Security and Platform Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, England, United Kingdom
mainly focused on AWS, with growing involvement in other cloud and SaaS platforms. You’ll improve existing environments—managing identity and access, governance, security, observability, and lifecycle—by reducing risks, eliminating unsafe configurations, validating ownership, and ensuring the cloud estate is clearly governed and auditable. You will take an active … role in improving RealityMine’s security posture by improving and operating security scanning, improving monitoring and observability, and ensuring risks, vulnerabilities, and end of life components are identified and addressed in a timely and pragmatic way. You will also develop automation used to support security and operational hygiene, reducing manual ...

Forward Deployed Engineer

Hiring Organisation
Novatus Global
Location
City of London, London, United Kingdom
Novatus Global is a Series B scale-up RegTech SaaS provider and boutique advisory firm, helping financial institutions manage their most complex regulatory requirements. We combine deep consulting expertise with cutting-edge SaaS solutions, enabling ...

Cloud Advisory - Agentic Focused Architecture Consultant

Hiring Organisation
Accenture
Location
London Area, United Kingdom
where GenAI and Agentic play a role. Champion system performance, resilience, and efficiency: Proactively identifying and addressing consumption and scalability challenges. Champion full stack observability using modern full stack observability, SRE and AIOps. Manage & Mentor: Lead teams of architects and engineers, providing technical coaching, career counselling, performance management, and coaching ...

Data Platform Solution Architect

Hiring Organisation
MarkJames 🌍
Location
Essex, England, United Kingdom
Define and implement data lakehouse solutions using Apache Iceberg and S3 Lead performance tuning across Snowflake, Airflow, and Iceberg environments Ensure platform reliability, observability, and scalability Drive adoption of cloud-native design patterns and best practices Collaborate with engineering, DevOps, and business stakeholders Requirements Strong experience in Solution Architecture … architectures (Iceberg preferred) Expertise in performance tuning and optimisation Nice to Have CI/CD and DevOps practices Terraform/Infrastructure as Code Monitoring & observability tools (APM) Data governance & catalog tools Cloud security best practices Data modelling and ingestion frameworks ...

AI Product Developer

Hiring Organisation
DCV Technologies
Location
Leeds, West Yorkshire, United Kingdom
Employment Type
Contract
Contract Rate
£400 - £500/day
pipelines, embeddings and vector search Integrate AI solutions with enterprise systems, APIs and cloud platforms (Azure) Implement secure-by-design AI engineering, monitoring and observability Develop reusable components including prompt libraries, agent frameworks and connectors Prototype, test and improve model performance, reliability and scalability Key Skills Strong experience with LLMs …/Azure OpenAI/cloud-native development Experience building production AI systems and API integrations Knowledge of DevOps, CI/CD, monitoring and observability Experience with LangChain, agent frameworks or Copilot Studio is highly desirable Desirable Experience in financial services or regulated environments Understanding of responsible AI, security and governance ...

Lead Site Reliability Engineer

Hiring Organisation
McGregor Boyall
Location
Leeds, West Yorkshire, England, United Kingdom
Employment Type
Full-Time
Salary
£90,000 - £105,000 per annum
they migrate services to the Cloud. Work with Product Owners and Engineering Leads to balance feature delivery with system reliability, performance and health. Use observability tooling, performance metrics and SRE principles to proactively identify issues and reduce operational toil. Implement Incident and problem management practices, ensuring strong root cause analysis … Technical Skills required: Strong cloud engineering background, ideally across Azure and GCP. Experience building or operating large-scale, resilient cloud platforms. Deep understanding of observability tooling (metrics, logs, traces). Hands-on experience with modern SRE practices: SLOs/SLIs Error budgets Automation to reduce toil Production readiness and robust ...

Senior SRE (Java)

Hiring Organisation
Morgan McKinley
Location
City of London, London, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
Software-First Approach to Reliability I am currently partnering with a major FTSE 100 FinTech company that is undergoing a massive modernisation and observability overhaul. They aren't looking for a traditional, infrastructure-heavy SRE; they need a Senior Java Developer who has recently transitioned into the SRE space. … Foundation: 5+ years of Java development experience with a deep understanding of JVM internals. The SRE Pivot: Recent experience in a Site Reliability or Observability role, with hands-on knowledge of OpenTelemetry , Jaeger, or similar tracing tools. The Mindset: A strong philosophy on what makes a "good ...

AI Architect

Hiring Organisation
Stackstudio Digital Ltd
Location
United Kingdom
Employment Type
Permanent
into high value solutions Enforce IAM least privilege with IAM Conditions, organisation policies, and scoped service accounts; integrate BeyondCorp for zero trust access Operationalise observability using Cloud Logging, Cloud Monitoring, Error Reporting, Trace, and Profiler; build model/LLM telemetry dashboards and alerts Identify the right AI/ML frameworks … patterns, vector databases, embeddings, and prompt/guardrail engineering Desirable Skills/Knowledge/Experience Knowledge of MLOps/AgentOps, CI/CD, and observability Strong understanding of regulated financial services environments Proven experience implementing AI risk controls, model governance, and auditability Ensure alignment with FCA, PRA, data privacy, model ...

Gen AI Engineer

Hiring Organisation
Wave Group
Location
England, United Kingdom
applications in production environments Evidence of debugging real issues such as incorrect outputs, latency spikes, retrieval failures or agent misbehaviour Experience with monitoring and observability of LLM systems, for example Langfuse, Prometheus, Grafana, OpenTelemetry or similar Strong understanding of RAG systems, retrieval pipelines and evaluation workflows Experience with agentic frameworks … application and infrastructure layers Multimodal experience across text and image or video is beneficial Tech stack Python, AWS, LangGraph, LangChain, vector databases, evaluation tooling, observability platforms, Docker Why join Small, senior team with high ownership Systems already in production with real customers Bi-weekly shipping cycles with fast feedback loops ...

Senior Software Engineer (DevSecOps)

Hiring Organisation
CBSbutler Holdings Limited trading as CBSbutler
Location
Skipton, North Yorkshire, United Kingdom
Employment Type
Contract
Contract Rate
£550 - £580/day
measurable outcomes. The role You will take ownership of the full delivery lifecycle: from pipeline design and environment architecture through to release-linked observability and incident readiness. Day to day, you can expect to be shipping small, frequent changes using trunk-based development and feature flags, embedding security and quality … DAST, IaC scanning, SBOM, WAF configuration, and pipeline attestations Experience building and managing ephemeral, production-like environments with data-on-demand capability Strong observability skills - tracing, metrics, logs, SLO/error budget management, and deployment annotations Familiarity with DORA metrics and a track record of removing flow constraints at squad ...

Founding Engineer

Hiring Organisation
Omnam Investment Group
Location
London Area, United Kingdom
environments Lead integrations with external systems and support early data onboarding Establish engineering standards, tooling, documentation, and technical processes from the start Set up observability, monitoring, and performance systems Jump in wherever needed, from quick scripts and data cleaning to debugging production issues What You Bring 5+ years of engineering … with backend frameworks (FastAPI, Django, Node.js, Rails, etc.) Strong SQL, data modeling, and database design knowledge Familiarity with IaC, containers, CI/CD, and observability tools Bonus : experience in ETL, or hospitality/proptech/real-estate technology Why Join Us We work together in the heart of London ...

Site Reliability Engineer - Observability

Hiring Organisation
N26 GmbH
Location
Berlin, Germany
Employment Type
Permanent
Salary
EUR Annual
About the opportunity We are seeking a Site Reliability Engineer to join the Observability group inside our Platform Engineering domain. Platform Engineering's goal is to provide easy to use, self-service platforms to enable other segments to easily build, deploy and monitor their business applications. And Observability's role ...

LEAD TECHNICAL ARCHITECT - GLOBAL SAAS/AI PLATFORM

Hiring Organisation
Clarity Resourcing (UK) LLP
Location
United Kingdom
Employment Type
Permanent
Salary
GBP Annual
integrations Real Time systems/telecoms architecture (highly desirable) Salesforce/CRM integration at architectural level AI/ML architecture (integration, pipelines, platform design) Observability, monitoring, resilience engineering Architectural governance and decision frameworks Strong documentation and system design communication EXPERIENCE REQUIRED 8+ years software engineering (Back End/platform focus … Lead architecture reviews and guide engineering decisions Act as escalation point for complex cross-platform challenges Establish architecture governance, documentation, and standards Improve resilience, observability, and operational maturity Lead evolution of AI capabilities across the platform Mentor engineers and elevate technical capability across teams PERSONAL ATTRIBUTES Proactive, ownership-driven mindset ...

Senior Technical Delivery Manager

Hiring Organisation
Stackstudio Digital Ltd
Location
Norwich, Norfolk, East Anglia, United Kingdom
Employment Type
Contract
Contract Rate
From £500 to £550 per day
/Tableau), ML feature pipelines, self-service data products. Oversee architecture conformance, security/compliance (PII/PHI, GDPR), and cost optimization. Ensure observability (logging/metrics), DQ SLAs, lineage, and platform SLOs. Align data initiatives to insurance value streams: Policy Admin Claims Underwriting Pricing/Actuarial Distribution/Broker … engineering (lakehouse, ETL/ELT, orchestration), and BI/analytics platforms. Strong understanding of architecture alignment, data privacy/security (PII/PHI, GDPR), observability, operational SLOs, and cost optimization. Broad insurance domain knowledge across policy administration, claims, underwriting, pricing/actuarial, distribution, fraud, and regulatory reporting, with ability ...

Senior Java Engineer (reliability & observability)

Hiring Organisation
GCS
Location
Northampton, Northamptonshire, United Kingdom
Employment Type
Permanent
Salary
£45000 - £60000/annum
Boot development experience in high-throughput systems Deep understanding of event-driven and messaging architectures (Kafka, JMS, AMQP or similar) Experience engineering reliability and observability at scale (monitoring, tracing, SLIs/SLOs) Desirable Skills: Experience building notification delivery infrastructure (webhooks, push, SMS) Awareness of the payments domain, including processing flows ...

Senior Automation Engineer

Hiring Organisation
Raytheon
Location
Glenrothes, Fife, Scotland, United Kingdom
Employment Type
Contract
commissioning of robotic cells and assembly systems; perform First Article Inspection (FAI) and ensure compliance with safety standards i.e. ISO 9001 or AS9100. Observability & Support: Maintain platform observability and respond to incidents through Root Cause Analysis (RCA) to improve service efficiency. System Integration: Designing and implementing interfaces between MES (e.g. ...

Production Engineer- DevOps skills (Lisbon or Porto)

Hiring Organisation
Lùkla
Location
Lisboa, Portugal
Employment Type
Permanent
Salary
EUR Annual
scalable environments. If you are passionate about automation, cloud, and continuous system improvement, this opportunity is for you. Responsibilities: Ensure the stability, performance, and observability of production systems Implement and manage monitoring and observability solutions (e.g., Dynatrace) Automate operational processes through scripts and playbooks Work with orchestration and scheduling tools … infrastructures Collaborate with cross-functional teams in an agile environment Requirements: Technical skills Experience in DevOps/Production Engineering (minimum 2 years) Knowledge of: Observability (e.g., Dynatrace) Terraform OpenShift/Cloud environments Schedulers (CFT, AutoSys) Automation with: Python (scripting) Ansible ( ability to create playbooks from scratch ) Soft Skills Strong communication ...

SRE Lead (Banking/Financial)

Hiring Organisation
Ascendion
Location
City of London, London, United Kingdom
across production systems. Key Responsibilities: Lead the SRE function across the engineering organisation and drive operational excellence across production systems. Define and implement the observability and monitoring strategy, including dashboards, alerting, SLOs, SLAs, and error budgets. Establish comprehensive monitoring coverage to ensure visibility into system health, infrastructure, and business-critical … engineering teams. Manage incident response processes, including on-call management and post-incident reviews. Collaborate with product and engineering teams to build reliability and observability into new systems. Monitor UI behaviour and end-to-end system performance, not just infrastructure metrics. Essential Skills & Experience: Proven experience as an SRE Lead ...

ML & AI -Engineers/Architect/Lead

Hiring Organisation
KBC Technologies Group
Location
England, United Kingdom
version control, and ensuring production-ready AI systems . You’ll also play a key role in integrating AI/LLM agents with strong observability and rollback mechanisms. Location : Leeds/Manchester Client : IT End Client :Banking domain Work Mode: Hybrid Contract : Inside IR 35 Salary : Market Standards Key Responsibilities … workflows Manage model versioning and release processes Monitor inference cost, latency, and model drift Safely integrate AI/LLM agents into production systems Implement observability, alerting, and rollback mechanisms Experience Levels We’re hiring across multiple seniority levels: Senior Developer: 3–6 years (ML/AI Engineering) Lead Engineer ...

Rust Engineer

Hiring Organisation
Huxley Associates
Location
London, United Kingdom
Employment Type
Permanent
Salary
£150000 - £180000/annum
from systems that actually matter. ETrading, you will build the infrastructure that sits between our traders and the market - execution paths, data pipelines, and observability tooling that power trillions in annual notional volume. When a system performs at 3am under peak load, you will be one of the reasons why. … kernel bypass awareness (DPDK, io_uring) Distributed messaging and event streaming: Kafka, NATS, or equivalent; ordering guarantees, exactly-once semantics, consumer group management Production observability: metrics (Prometheus/OpenTelemetry), distributed tracing, structured logging, and alert design CI/CD pipeline design including benchmarking gates, automated performance regression detection, and reproducible ...

Site Reliability Engineer (SRE)

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Permanent
Salary
£75,000
platform. Key Responsibilities Partner with development teams to define and manage SLOs/SLIs, and use error budgets to guide engineering decisions. Enhance observability ensuring metrics, logs, and tracing are in place to detect and fix issues proactively. Lead cost optimisation initiatives: monitor spend, rightsize workloads, tune autoscaling, and drive … with Kubernetes (on-prem and AWS EKS). Proven track record defining and working with SLOs/SLIs in production environments. Deep understanding of observability (metrics, logging, tracing, telemetry ...

Data Engineer

Hiring Organisation
Tieto
Location
Lisboa, Portugal
Employment Type
Permanent
Salary
EUR Annual
evaluating live data flows, identifying inefficiencies, and improving overall data quality and signal clarity. Working alongside cross-functional teams (data, AI, QA, DevOps, observability), you'll help define which data truly adds value and ensure the platform scales effectively within an Azure and Microsoft Fabric environment. If you enjoy bringing … with ML teams to prepare datasets and support feature development Monitor and analyze production data to improve performance and reduce noise Help define data observability strategies and meaningful metrics Collaborate with multidisciplinary teams across engineering and operations What we're looking for Around 3-5 years of experience in data ...

Head of Software Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
with Product and Design as part of a leadership trio, shaping vision and outcomes. Establish modern engineering standards (cloud‐first, CI/CD, automation, observability, secure SDLC). Drive operational excellence across performance, resilience, and security. Build and scale a multi‐site engineering organisation, embedding a culture of ownership … architectures, and distributed systems. Strong knowledge of Web, Mobile, FE technologies such as JavaScript, React, Kotlin, .Net, Azure. Experience implementing CI/CD pipelines, observability, and secure engineering practices. Track record of scaling teams and delivering in fast‐paced, evolving environments. Experience working in or with startup/scale ...

Senior Network Engineer, Cingularity

Hiring Organisation
IMG
Location
London Area, United Kingdom
specialised DTM (Dynamic Synchronous Transfer Mode) network—while strategically introducing automation to enhance resilience and performance. While you will assist with the development of observability tools, 24/7 monitoring is managed by our Technical Operations Centre (TOC), supported by a joint effort between Systems, Broadcast, and Network Engineering teams. … transmission paths. NetOps & Monitoring Refinement Internal Tooling: Build and refine monitoring techniques where the primary "customers" are our internal TOC and Event Engineering teams. Observability Design: Utilise and assist in the development of modern monitoring and logging systems (e.g., Prometheus, Grafana, ELK/OpenSearch) and the Netbox source of truth. ...