576 to 600 of 1,190 Permanent Observability Jobs

MuleSoft & Salesforce Agentic Engineer

Hiring Organisation: Arbuthnot Latham & Co., Limited
Location: Wolverhampton, West Midlands, United Kingdom
Employment Type: Permanent

Gateway (LLM/MCP/A2A) to integrate agents into existing flows, data models and processes. Agent Control, Monitoring & Governance Implement control, monitoring and observability for Salesforce agents, including usage, decisioning outcomes, errors and exceptions. Ensure agent behaviour aligns with internal policies, regulatory expectations and audit requirements appropriate to asset ...

Quantitative Developer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

keep: research workflows, client-reporting drafts, commentary support, meeting prep. Write production code that other quants want to build on. Own reliability, testing, and observability for what you ship. Mentor teammates on effective AI-augmented engineering practice. Carry out other duties as assigned. What to Expect When You Join ...

Senior Software Engineer I (Android) London, UK

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Data and Product to interpret results, and iterate based on real user behaviour. Quality & Reliability: Maintain high standards for testing, crash‐free sessions and observability, and contribute to incident investigation and prevention. Qualifications Experience: 4+ years of Android engineering experience building and shipping consumer products in Kotlin. Architectural Depth: Comfortable ...

Head of Application Operations

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

driving effective RCAs. Strong Problem Management and RCA facilitation with a track record of implementing preventative actions that reduce operational risk. Proficient with observability and ITSM tooling to enable proactive monitoring, SLO/SLA definition and data‐driven operational dashboards. Strong people leadership with experience organising teams for fast execution ...

Director, Solutions Engineering Splunk UKI

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

within the UKI region . Experience working across multiple customer segments (Enterprise, Public Sector, Service Provider, Commercial). Strong domainexpertisein enterprise software (e.g., Cybersecurity, Observability, Cloud & AI, IT Operations, Application Performance Management, or Big Data). Exceptional communication and articulation skills; ability to translate complex technical ideas into clear business ...

Senior Software Engineer

Hiring Organisation: Jobleads-UK
Location: United Kingdom

INSHUR engineers work by contributing to squad, collective, or discipline-level initiatives, especially those advancing AI-augmented engineering practices across the organisation. Own Observability: You'll ensure systems stay healthy and visible by identifying monitoring gaps and independently managing escalations, building confidence that your area runs smoothly. Collaborate Across Functions ...

Site Reliability Engineer (AWS)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

place for you. What You'll Do Own reliability – Maintain and improve our AWS infrastructure using Terraform, bringing your expertise and best practices Champion observability – Partner with developers to implement effective monitoring, logging, and tracing strategies Strengthen security – Work closely with the CISO to implement security best practices and ensure … compliance Optimise costs – Monitor cloud spend and implement FinOps best practices Maintain CI/CD pipelines – Implement and maintain reliability and observability aspects of GitHub workflows and deployment pipelines Incident response – Lead incidents, run blameless post-mortems, and drive continuous improvement Enable developers – Mentor teams on SRE and observability practices ...

Monitoring & Observability Engineer

Hiring Organisation: COMPUTACENTER (UK) LIMITED
Location: London, United Kingdom
Employment Type: Permanent
Salary: GBP Annual

Life on the team Location: UK Wide At Computacenter, youll be joining a world-class team of over 1,000 skilled professionals within Group Professional Services (GPS). Our teams operate across the UK, Germany ...

Network Reliability Engineer – Observability & Automation

Hiring Organisation: Jobleads-UK
Location: United Kingdom

Genesys is seeking a Network Engineer for Operations Reliability in the United Kingdom. The role focuses on maintaining the reliability, stability, and performance of enterprise network services, including LAN, WAN, and cloud connectivity. Candidates should ...

Solutions Architect - AI, Observability & Security Presales

Hiring Organisation: Jobleads-UK
Location: United Kingdom

Elasticsearch B.V. is looking for a Solutions Architect to serve as a technical authority and trusted advisor. This role involves understanding customer goals, guiding sales efforts, and building relationships. The ideal candidate should have a ...

Staff Site Reliability Engineer - Cloud

Hiring Organisation: Jobleads-UK
Location: Newcastle upon Tyne, England, United Kingdom

Newcastle: UK - London: UK - Leedstime type: Full timeposted on: Posted Todayjob requisition id: R55272**Elevate Global Operations as our Next Cloud Site Reliability Engineer (Observability Expert)!**Trimble is an industrial technology company transforming the way the world works by delivering solutions that enable our customers to thrive. We create technologies … progress with connected hardware and software solutions.**What Makes This Role Great:**In this role, you will be the primary architect of our Observability Centre of Excellence, directly influencing the reliability and uptime of global platforms that keep world industries moving.**Key Exciting Responsibilities:*** Lead a global "OTel First" strategy ...

AI Native Software Engineer

Hiring Organisation: TekWissen UK
Location: London Area, United Kingdom

invocation, and policy‐based routing Build cloud‐native backend services and APIs to support AI‐driven applications and enterprise integrations Implement evaluation, monitoring, and observability frameworks to ensure accuracy, latency, reliability, and system health across AI agent lifecycles Optimize AI and system performance across cost, scalability, and latency dimensions … Frameworks: LangGraph, AutoGen, CrewAI (or similar) Cloud & DevOps Tooling: Docker, Kubernetes, Terraform, Helm, CI/CD pipelines Enterprise Integration: APIs, enterprise platforms, monitoring and observability tools Why You’ll Love This Role Build real, enterprise‐grade AI systems that move beyond experimentation into production Remain deeply technical ...

Site Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: Birmingham, England, United Kingdom

pipelines to facilitate smooth deployments and automate workflows. Collaborate with development teams to establish best practices in system architecture, deployment, and monitoring. Implement observability solutions to gain insights into system performance and user experience. Participate in on-call rotations to respond to system alerts, perform root cause analysis, and implement … code tools (Terraform, Ansible, etc.) for automating deployments. Proficiency in scripting and programming languages such as Python, Go, or Bash. Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack). Excellent problem-solving skills and the ability to work effectively in high-pressure situations. Health Care Plan (Medical, Dental ...

Artificial Intelligence (AI) DevOps

Hiring Organisation: WTW
Location: Greater London, United Kingdom
Employment Type: Full Time

Role The responsibilities will include: Help to design, build, and maintain AI‐augmented DevOps pipelines, integrating LLM‐powered tooling, automated testing, code generation, observability, and environment provisioning. Develop automation for operational workflows (permissions, tagging, remediation tasks, infrastructure housekeeping, monitoring pipelines) Help to build foundational components that allow delivery teams … necessary any and all of the security processes required for operational suitability within WTW for solutions (including SAST and DAST processes) Ensure operational stability, observability, and controlled evolution of AI and agentic systems for the ICT Consultancy business Maintain & support AI tools and AI based systems once deployed and help ...

Principal AI Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

data models, service integrations, and internal tools. Architect systems using modern cloud patterns such as microservices, event‐driven design, and managed services, ensuring reliability, observability, and scalability. Provide architectural leadership across integrations with enterprise systems and third‐party platforms. ML Ops, Reliability & Engineering Best Practices Productionize AI and machine learning … solutions using modern ML Ops and software engineering practices. Establish standards for testing, deployment, observability, drift detection, retraining, and documentation. Drive quality, automation, and performance in systems where accuracy, resilience, and reliability are critical. Leadership, Mentorship & Execution Serve as a hands‐on technical leader and player‐coach, mentoring engineers while ...

Kubernetes Linux AIOps Engineer – Elite Quant Hedge Fund

Hiring Organisation: Winston Fox
Location: City of London, London, United Kingdom

Infrastructure DevOps Engineer/SRE with expertise in Kubernetes, Linux, Observability, IaC and AIOps sought by a market-leading Quantitative Hedge Fund to further aide further business growth. Our client is one of the World's Elite Quant Hedge Fund Managers with large-scale, massively Distributed Systems, and ample opportunity … Terraform, C...) Must be able to write high quality Automation/scripts from scratch. Configuration Management Tools (Ansible/Puppet/Kapitan/Terraform....) Observability: Experience within the modern open-source ecosystem (ELK, OpenTelemetry, LGTM stack, Prometheus, Grafana, Loki...) CI/CD and GitLab/GitOps : working with Development teams. ...

Site Reliability Engineer (Kubernetes / Multi-Cloud) UK Based

Hiring Organisation: Jobleads-UK
Location: Hereford, England, United Kingdom

Cluster Autoscaler, KEDA, Karpenter) Help improve workload reliability and performance Support networking, identity, compute, and storage services Assist with maintaining secure and scalable environments Observability & Monitoring Work with Prometheus, Grafana, OpenTelemetry, Azure Monitor, and CloudWatch Build dashboards, alerts, and logging/tracing pipelines Support monitoring aligned to SLIs/SLOs … networking, and scaling Cloud Experience with Azure and/or AWS Familiarity with networking, IAM, and core services Infrastructure as Code Experience with Terraform Observability Familiarity with monitoring/logging tools (Prometheus, Grafana, loki) Other Technical Skills Helm Charts/Kustomize creation and maintenance Containers (Docker) Exposure to both Azure ...

AI Engineering Enablement Director

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

/ML, software, or platform engineering, with exposure to automated testing and infrastructure‐as‐code or policy‐as‐code.* Working knowledge of AI observability (logs, metrics, traces, behavioural signals) and practical methods to evaluate or improve AI system behaviour.* Familiarity with AI risk and governance frameworks (e.g., NIST … FinOps, such as cost‐aware model selection, unit economics, or prompt‐efficiency practices.* Experience with MLOps or AI delivery tooling, or with AI‐specific observability systems.* Participation in industry communities or standards bodies, with the ability to translate external practice into internal adoption.* Experience facilitating workshops or engineering enablement events. ...

Sr. Distinguished Machine Learning Engineer (Remote-Eligible)

Hiring Organisation: Capital One
Location: Mc Lean, Virginia, United States
Employment Type: Permanent
Salary: USD Annual

Sr. Distinguished Machine Learning Engineer (Remote-Eligible) Overview: At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine ...

Technical Architect

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Significant experience as a technical or solution architect in complex digital or enterprise environments. Strong software engineering foundation, with practical knowledge of modern application architectures (e.g., microservices, APIs, distributed systems). Proven ability to design ...

OpenTelemetry Architect

Hiring Organisation: Ampstek
Location: London Area, United Kingdom

Summary We are seeking an experienced OpenTelemetry Architect to lead the design and implementation of enterprise observability solutions using OpenTelemetry. The ideal candidate will have strong expertise in observability architecture, telemetry pipelines, distributed tracing, and monitoring platform integrations across cloud and hybrid environments. Key Responsibilities Design and implement enterprise-wide … OpenTelemetry architecture and observability frameworks. Define telemetry standards, governance, and best practices for logs, metrics, traces, and events. Architect scalable OpenTelemetry Collector deployments and telemetry pipelines. Lead integration of OpenTelemetry with monitoring and observability platforms such as Dynatrace, Datadog, Grafana, Splunk, and New Relic. Design telemetry routing, enrichment, filtering ...

Azure SRE Engineer

Hiring Organisation: Oscar Associates (UK) Limited
Location: Glasgow, Lanarkshire, United Kingdom
Employment Type: Permanent
Salary: GBP 575 - 625 Daily

Contract We're looking for two experienced Azure Site Reliability Engineers to join a major Financial Services programme focused on platform health, reliability, and observability across a large-scale Azure environment click apply for full job details ...

Infrastructure & Devops Engineer (m/w/d)

Hiring Organisation: iVentureGroup GmbH
Location: Hammerbrook, Hamburg, Germany
Employment Type: Permanent
Salary: EUR Annual

Verantwortung für unseren operativen IT-Betrieb (24/7), während du gleichzeitig moderne Plattform-Initiativen vorantreibst. Ob Kubernetes-Cluster, CI/CD-Pipelines oder Observability - du bist in deinem Element . click apply for full job details ...

GCP SRE for BI Platform — Reliability & Incidents

Hiring Organisation: Jobleads-UK
Location: United Kingdom

experienced Site Reliability Engineer to oversee the health of GCP-hosted APIs and services. This role involves monitoring uptime, leading incident responses, and building observability infrastructures. The ideal candidate has 2+ years in a Site Reliability or DevOps role, practical GCP experience, and a solid grasp of cloud security. Join ...

Lead Platform Engineer – Cloud Native, Kubernetes & Mentorship

Hiring Organisation: Jobleads-UK
Location: United Kingdom

London to manage teams and stakeholders while working with cutting edge technology. This role involves shaping platform strategy, mentoring engineers, and ensuring the observability and reliability of systems. With an annual salary of £80,000 to £100,000, the company promotes professional growth by funding multiple Kubernetes certifications ...