576 to 600 of 1,190 Permanent Observability Jobs

MuleSoft & Salesforce Agentic Engineer

Hiring Organisation
Arbuthnot Latham & Co., Limited
Location
Wolverhampton, West Midlands, United Kingdom
Employment Type
Permanent
Gateway (LLM/MCP/A2A) to integrate agents into existing flows, data models and processes. Agent Control, Monitoring & Governance Implement control, monitoring and observability for Salesforce agents, including usage, decisioning outcomes, errors and exceptions. Ensure agent behaviour aligns with internal policies, regulatory expectations and audit requirements appropriate to asset ...

Quantitative Developer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
keep: research workflows, client-reporting drafts, commentary support, meeting prep. Write production code that other quants want to build on. Own reliability, testing, and observability for what you ship. Mentor teammates on effective AI-augmented engineering practice. Carry out other duties as assigned. What to Expect When You Join ...

Senior Software Engineer I (Android) London, UK

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Data and Product to interpret results, and iterate based on real user behaviour. Quality & Reliability: Maintain high standards for testing, crash‐free sessions and observability, and contribute to incident investigation and prevention. Qualifications Experience: 4+ years of Android engineering experience building and shipping consumer products in Kotlin. Architectural Depth: Comfortable ...

Head of Application Operations

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
driving effective RCAs. Strong Problem Management and RCA facilitation with a track record of implementing preventative actions that reduce operational risk. Proficient with observability and ITSM tooling to enable proactive monitoring, SLO/SLA definition and data‐driven operational dashboards. Strong people leadership with experience organising teams for fast execution ...

Director, Solutions Engineering Splunk UKI

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
within the UKI region . Experience working across multiple customer segments (Enterprise, Public Sector, Service Provider, Commercial). Strong domainexpertisein enterprise software (e.g., Cybersecurity, Observability, Cloud & AI, IT Operations, Application Performance Management, or Big Data). Exceptional communication and articulation skills; ability to translate complex technical ideas into clear business ...

Senior Software Engineer

Hiring Organisation
Jobleads-UK
Location
United Kingdom
INSHUR engineers work by contributing to squad, collective, or discipline-level initiatives, especially those advancing AI-augmented engineering practices across the organisation. Own Observability: You'll ensure systems stay healthy and visible by identifying monitoring gaps and independently managing escalations, building confidence that your area runs smoothly. Collaborate Across Functions ...

Site Reliability Engineer (AWS)

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
place for you. What You'll Do Own reliability – Maintain and improve our AWS infrastructure using Terraform, bringing your expertise and best practices Champion observability – Partner with developers to implement effective monitoring, logging, and tracing strategies Strengthen security – Work closely with the CISO to implement security best practices and ensure … compliance Optimise costs – Monitor cloud spend and implement FinOps best practices Maintain CI/CD pipelines – Implement and maintain reliability and observability aspects of GitHub workflows and deployment pipelines Incident response – Lead incidents, run blameless post-mortems, and drive continuous improvement Enable developers – Mentor teams on SRE and observability practices ...

Monitoring & Observability Engineer

Hiring Organisation
COMPUTACENTER (UK) LIMITED
Location
London, United Kingdom
Employment Type
Permanent
Salary
GBP Annual
Life on the team Location: UK Wide At Computacenter, youll be joining a world-class team of over 1,000 skilled professionals within Group Professional Services (GPS). Our teams operate across the UK, Germany ...

Network Reliability Engineer – Observability & Automation

Hiring Organisation
Jobleads-UK
Location
United Kingdom
Genesys is seeking a Network Engineer for Operations Reliability in the United Kingdom. The role focuses on maintaining the reliability, stability, and performance of enterprise network services, including LAN, WAN, and cloud connectivity. Candidates should ...

Solutions Architect - AI, Observability & Security Presales

Hiring Organisation
Jobleads-UK
Location
United Kingdom
Elasticsearch B.V. is looking for a Solutions Architect to serve as a technical authority and trusted advisor. This role involves understanding customer goals, guiding sales efforts, and building relationships. The ideal candidate should have a ...

Staff Site Reliability Engineer - Cloud

Hiring Organisation
Jobleads-UK
Location
Newcastle upon Tyne, England, United Kingdom
Newcastle: UK - London: UK - Leedstime type: Full timeposted on: Posted Todayjob requisition id: R55272**Elevate Global Operations as our Next Cloud Site Reliability Engineer (Observability Expert)!**Trimble is an industrial technology company transforming the way the world works by delivering solutions that enable our customers to thrive. We create technologies … progress with connected hardware and software solutions.**What Makes This Role Great:**In this role, you will be the primary architect of our Observability Centre of Excellence, directly influencing the reliability and uptime of global platforms that keep world industries moving.**Key Exciting Responsibilities:*** Lead a global "OTel First" strategy ...

AI Native Software Engineer

Hiring Organisation
TekWissen UK
Location
London Area, United Kingdom
invocation, and policy‐based routing Build cloud‐native backend services and APIs to support AI‐driven applications and enterprise integrations Implement evaluation, monitoring, and observability frameworks to ensure accuracy, latency, reliability, and system health across AI agent lifecycles Optimize AI and system performance across cost, scalability, and latency dimensions … Frameworks: LangGraph, AutoGen, CrewAI (or similar) Cloud & DevOps Tooling: Docker, Kubernetes, Terraform, Helm, CI/CD pipelines Enterprise Integration: APIs, enterprise platforms, monitoring and observability tools Why You’ll Love This Role Build real, enterprise‐grade AI systems that move beyond experimentation into production Remain deeply technical ...

Site Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Birmingham, England, United Kingdom
pipelines to facilitate smooth deployments and automate workflows. Collaborate with development teams to establish best practices in system architecture, deployment, and monitoring. Implement observability solutions to gain insights into system performance and user experience. Participate in on-call rotations to respond to system alerts, perform root cause analysis, and implement … code tools (Terraform, Ansible, etc.) for automating deployments. Proficiency in scripting and programming languages such as Python, Go, or Bash. Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack). Excellent problem-solving skills and the ability to work effectively in high-pressure situations. Health Care Plan (Medical, Dental ...

Artificial Intelligence (AI) DevOps

Hiring Organisation
WTW
Location
Greater London, United Kingdom
Employment Type
Full Time
Role The responsibilities will include: Help to design, build, and maintain AI‐augmented DevOps pipelines, integrating LLM‐powered tooling, automated testing, code generation, observability, and environment provisioning. Develop automation for operational workflows (permissions, tagging, remediation tasks, infrastructure housekeeping, monitoring pipelines) Help to build foundational components that allow delivery teams … necessary any and all of the security processes required for operational suitability within WTW for solutions (including SAST and DAST processes) Ensure operational stability, observability, and controlled evolution of AI and agentic systems for the ICT Consultancy business Maintain & support AI tools and AI based systems once deployed and help ...

Principal AI Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
data models, service integrations, and internal tools. Architect systems using modern cloud patterns such as microservices, event‐driven design, and managed services, ensuring reliability, observability, and scalability. Provide architectural leadership across integrations with enterprise systems and third‐party platforms. ML Ops, Reliability & Engineering Best Practices Productionize AI and machine learning … solutions using modern ML Ops and software engineering practices. Establish standards for testing, deployment, observability, drift detection, retraining, and documentation. Drive quality, automation, and performance in systems where accuracy, resilience, and reliability are critical. Leadership, Mentorship & Execution Serve as a hands‐on technical leader and player‐coach, mentoring engineers while ...

Kubernetes Linux AIOps Engineer – Elite Quant Hedge Fund

Hiring Organisation
Winston Fox
Location
City of London, London, United Kingdom
Infrastructure DevOps Engineer/SRE with expertise in Kubernetes, Linux, Observability, IaC and AIOps sought by a market-leading Quantitative Hedge Fund to further aide further business growth. Our client is one of the World's Elite Quant Hedge Fund Managers with large-scale, massively Distributed Systems, and ample opportunity … Terraform, C...) Must be able to write high quality Automation/scripts from scratch. Configuration Management Tools (Ansible/Puppet/Kapitan/Terraform....) Observability: Experience within the modern open-source ecosystem (ELK, OpenTelemetry, LGTM stack, Prometheus, Grafana, Loki...) CI/CD and GitLab/GitOps : working with Development teams. ...

Site Reliability Engineer (Kubernetes / Multi-Cloud) UK Based

Hiring Organisation
Jobleads-UK
Location
Hereford, England, United Kingdom
Cluster Autoscaler, KEDA, Karpenter) Help improve workload reliability and performance Support networking, identity, compute, and storage services Assist with maintaining secure and scalable environments Observability & Monitoring Work with Prometheus, Grafana, OpenTelemetry, Azure Monitor, and CloudWatch Build dashboards, alerts, and logging/tracing pipelines Support monitoring aligned to SLIs/SLOs … networking, and scaling Cloud Experience with Azure and/or AWS Familiarity with networking, IAM, and core services Infrastructure as Code Experience with Terraform Observability Familiarity with monitoring/logging tools (Prometheus, Grafana, loki) Other Technical Skills Helm Charts/Kustomize creation and maintenance Containers (Docker) Exposure to both Azure ...

AI Engineering Enablement Director

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
/ML, software, or platform engineering, with exposure to automated testing and infrastructure‐as‐code or policy‐as‐code.* Working knowledge of AI observability (logs, metrics, traces, behavioural signals) and practical methods to evaluate or improve AI system behaviour.* Familiarity with AI risk and governance frameworks (e.g., NIST … FinOps, such as cost‐aware model selection, unit economics, or prompt‐efficiency practices.* Experience with MLOps or AI delivery tooling, or with AI‐specific observability systems.* Participation in industry communities or standards bodies, with the ability to translate external practice into internal adoption.* Experience facilitating workshops or engineering enablement events. ...

Sr. Distinguished Machine Learning Engineer (Remote-Eligible)

Hiring Organisation
Capital One
Location
Mc Lean, Virginia, United States
Employment Type
Permanent
Salary
USD Annual
Sr. Distinguished Machine Learning Engineer (Remote-Eligible) Overview: At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine ...

Technical Architect

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Significant experience as a technical or solution architect in complex digital or enterprise environments. Strong software engineering foundation, with practical knowledge of modern application architectures (e.g., microservices, APIs, distributed systems). Proven ability to design ...

OpenTelemetry Architect

Hiring Organisation
Ampstek
Location
London Area, United Kingdom
Summary We are seeking an experienced OpenTelemetry Architect to lead the design and implementation of enterprise observability solutions using OpenTelemetry. The ideal candidate will have strong expertise in observability architecture, telemetry pipelines, distributed tracing, and monitoring platform integrations across cloud and hybrid environments. Key Responsibilities Design and implement enterprise-wide … OpenTelemetry architecture and observability frameworks. Define telemetry standards, governance, and best practices for logs, metrics, traces, and events. Architect scalable OpenTelemetry Collector deployments and telemetry pipelines. Lead integration of OpenTelemetry with monitoring and observability platforms such as Dynatrace, Datadog, Grafana, Splunk, and New Relic. Design telemetry routing, enrichment, filtering ...

Azure SRE Engineer

Hiring Organisation
Oscar Associates (UK) Limited
Location
Glasgow, Lanarkshire, United Kingdom
Employment Type
Permanent
Salary
GBP 575 - 625 Daily
Contract We're looking for two experienced Azure Site Reliability Engineers to join a major Financial Services programme focused on platform health, reliability, and observability across a large-scale Azure environment click apply for full job details ...

Infrastructure & Devops Engineer (m/w/d)

Hiring Organisation
iVentureGroup GmbH
Location
Hammerbrook, Hamburg, Germany
Employment Type
Permanent
Salary
EUR Annual
Verantwortung für unseren operativen IT-Betrieb (24/7), während du gleichzeitig moderne Plattform-Initiativen vorantreibst. Ob Kubernetes-Cluster, CI/CD-Pipelines oder Observability - du bist in deinem Element . click apply for full job details ...

GCP SRE for BI Platform — Reliability & Incidents

Hiring Organisation
Jobleads-UK
Location
United Kingdom
experienced Site Reliability Engineer to oversee the health of GCP-hosted APIs and services. This role involves monitoring uptime, leading incident responses, and building observability infrastructures. The ideal candidate has 2+ years in a Site Reliability or DevOps role, practical GCP experience, and a solid grasp of cloud security. Join ...

Lead Platform Engineer – Cloud Native, Kubernetes & Mentorship

Hiring Organisation
Jobleads-UK
Location
United Kingdom
London to manage teams and stakeholders while working with cutting edge technology. This role involves shaping platform strategy, mentoring engineers, and ensuring the observability and reliability of systems. With an annual salary of £80,000 to £100,000, the company promotes professional growth by funding multiple Kubernetes certifications ...