1,001 to 1,025 of 1,201 Permanent Observability Jobs

Staff SRE: Observability, Automation & Global Reliability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
London. This role focuses on the reliability, scalability, and performance of Replit's infrastructure serving millions of users worldwide. You will work on designing observability solutions, leading incident response, and automating operational tasks while mentoring other engineers. The ideal candidate has extensive experience in Site Reliability Engineering, strong programming skills ...

RVP, EMEA Sales - Observability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
just to execute a function, but to help redefine the future of how work gets done. Observe by Snowflake brings AI-native observability to the Snowflake AI Data Cloud, helping engineering and data teams debug, optimize, and understand systems operating at massive scale. Traditional observability tools were not built … strong judgment, and the ability to align people, strategy, and execution across functions. WHAT WE LOOK FOR 10+ years of experience selling cloud, infrastructure, observability, data platforms, or enterprise software. 2+ years of experience managing high-performing enterprise sales teams. Experience selling to senior technical and business stakeholders, including CIOs ...

Software Engineer (Prometheus / Grafana)

Hiring Organisation
SRT Marine Systems PLC
Location
Bristol, United Kingdom
Employment Type
Permanent
Salary
£50000 - £75000/annum
Software Engineer (Prometheus/Grafana) here at SRT, you will be part of a small team tasked with implementing an end-user observability visualisation. Currently, we have observability dashboards in place for our engineers, utilising Prometheus for metrics collection and Grafana for visualisation. This initiative aims to deliver a more … across multiple sites. We are fortunate to have a team of highly experienced engineers, including UX designers, who can provide support and guidance. Ourlead observability engineer will oversee and assist with your work throughout the project in the role of Software Engineer (Prometheus/Grafana). Key Responsibilities - Software Engineer ...

Software Engineer (Prometheus / Grafana)

Hiring Organisation
SRT Marine Systems PLC
Location
Birmingham, West Midlands (County), United Kingdom
Employment Type
Permanent
Salary
£50000 - £75000/annum
Software Engineer (Prometheus/Grafana) here at SRT, you will be part of a small team tasked with implementing an end-user observability visualisation. Currently, we have observability dashboards in place for our engineers, utilising Prometheus for metrics collection and Grafana for visualisation. This initiative aims to deliver a more … across multiple sites. We are fortunate to have a team of highly experienced engineers, including UX designers, who can provide support and guidance.Our lead observability engineer will oversee and assist with your work throughout the project in the role of Software Engineer (Prometheus/Grafana). Key Responsibilities - Software Engineer ...

Principal Engineer - Customer Engagement Platform

Hiring Organisation
Jobleads-UK
Location
Skipton, England, United Kingdom
Apps, Power Automate and the CRM/engagement ecosystem. You define and embed cross‐cutting standards such as API/event contracts, workflow architecture, observability, resilience patterns, and dependency baselines, and drive adoption of the Golden Path: policy‐as‐code CI/CD, progressive delivery, automated rollback/forward … building on Dynamics 365, Power Platform and workflow automation to move with speed *and* confidence. Through Golden Path pipelines, policy‐as‐code, release‐linked observability, on‐demand environments and shift‐left quality, you turn high‐performance delivery into a normal, repeatable capability that compounds over time. This empowers colleagues, reduces ...

Senior Sales

Hiring Organisation
Harrington Starr
Location
London Area, United Kingdom
Senior Enterprise Sales Financial Markets | Observability & Infrastructure Software London/Hybrid £120k–£140k base x2 OTE uncapped This is a high-growth, PE-backed technology platform used by Tier-1 financial institutions to monitor, analyse and optimise mission-critical trading and infrastructure environments. The software sits deep within low-latency … Infrastructure & SRE teams Production Engineering Trading Technology Capacity & Performance Engineering Enterprise Architecture You will be positioning a platform that sits at the heart of observability, operational resilience and infrastructure intelligence across complex financial ecosystems. Commercial Scope Deals are large, strategic and multi-year: £100k+ minimum entry point £500k typical deal ...

Principal Cloud Engineer - Azure - Hybrid - Manchester. Job in Manchester LilyLifestyle Jobs

Hiring Organisation
Jobleads-UK
Location
Manchester, England, United Kingdom
Implement governance, policy, and identity standards (Entra ID) Develop core platform capabilities, including: API Management (APIM) and Web Application Firewall (WAF) Logging, monitoring, and observability Introduce and scale Infrastructure as Code (Terraform) across the environment Contribute to the design and implementation of business continuity and disaster recovery strategies Support … working with: Azure landing zones and governance frameworks Infrastructure as Code (Terraform preferred) Identity and access management (Entra ID/Azure AD) Monitoring and observability tooling Experience working in environments undergoing cloud transformation Ability to operate across engineering and architecture, with a focus on practical implementation Strong communication skills ...

Principal Cloud Engineer - Azure - Hybrid - Manchester

Hiring Organisation
Experis
Location
Manchester, United Kingdom
Employment Type
Permanent
Salary
£78000/annum + Excellent Bens
Implement governance, policy, and identity standards (Entra ID) Develop core platform capabilities, including: API Management (APIM) and Web Application Firewall (WAF) Logging, monitoring, and observability Introduce and scale Infrastructure as Code (Terraform) across the environment Contribute to the design and implementation of business continuity and disaster recovery strategies Support … working with: Azure landing zones and governance frameworks Infrastructure as Code (Terraform preferred) Identity and access management (Entra ID/Azure AD) Monitoring and observability tooling Experience working in environments undergoing cloud transformation Ability to operate across engineering and architecture, with a focus on practical implementation Strong communication skills ...

Senior Backend / Full-Stack Engineer (E5/E6 Level) – AI-Native Startup – Strong Comp + Equity

Hiring Organisation
Mondrian Alpha
Location
London Area, United Kingdom
preferred • Experience with modern frontend frameworks (React/Next.js) is a plus for full-stack candidates • Strong understanding of system design, reliability, scalability, and observability • Experience in startups or fast-paced product environments is highly desirable • AI-native mindset — comfortable leveraging AI tooling and rapid iteration workflows • Strong communication skills … React, Next.js • AI-native workflows and internal LLM tooling • Distributed systems and real-time infrastructure • OpenSearch, SingleStore, Trigger.dev, Axiom • Modern cloud-native infrastructure and observability stack What they offer: • Excellent compensation + meaningful equity • High-ownership environment with direct impact on product and architecture • Small, elite engineering team • Direct collaboration ...

AI Engineering Director

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Build APIs, integrations, MCP Servers, and reusable platform capabilities to connect AI systems with enterprise platforms, tools, and workflows. Establish evaluation, experimentation, regression, and observability frameworks to continuously improve AI system quality, reliability, and agent behavior. Mentor senior engineers and influence engineering direction through code reviews, architecture discussions, technical standards … makers with compelling technical arguments. Preferred Qualifications, Capabilities, and Skills Experience with enterprise-scale AI platform development. Knowledge of industry-standard AI evaluation and observability frameworks. Expertise in cloud-native architectures and container orchestration. Proven track record of cross-functional collaboration and leadership. Familiarity with MCP protocols and enterprise integration ...

Lead Product Engineer

Hiring Organisation
Albert Bow
Location
London Area, United Kingdom
production LLM systems, including evals, retrieval, agent orchestration, prompts and tool use • Setting the standard for engineering quality, testing, code review, deployment and observability • Working directly with customers to understand problems and shape solutions • Mentoring engineers through code reviews, technical discussions and hands-on collaboration • Making pragmatic architectural decisions around … Fluency in Python or a closely related backend language • Experience designing systems with genuine complexity, including queues, async workflows, durable state and production-grade observability • Hands-on experience building and operating LLM-based products in production • Strong experience with evals, agentic systems, RAG, prompt design and tool use • A proven ...

Full Stack Engineer

Hiring Organisation
Techmunity | AI Startup Recruitment
Location
London Area, United Kingdom
pattern (RAG, agents, structured outputs, classification) balancing accuracy, latency, cost and data privacy. Making AI features production-grade. Own the eval harnesses, prompt versioning, observability, cost controls and guardrails that separate a demo from a product. Pull the context features need. Query the data warehouse for what each feature depends … practical LLM toolkit: RAG, tool use and agents, structured outputs, prompt engineering Know how to make AI features production-grade: eval harnesses, prompt versioning, observability, cost and latency controls, guardrails Be comfortable with SQL to pull what you need from the data warehouse Come from a maths, physics, computer science ...

Senior Software Engineer, Banking Connectivity London, UK

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
that high-integrity financial data is correctly distributed across internal systems. Your work will focus on scaling integrations while improving the system’s resilience, observability, and overall structure. You will play a key role in evolving the platform to support new banking partners, products, and regulatory requirements while addressing technical … real‐world banking constraints Collaborate with product, operations, and external partners to unblock integrations and accelerate delivery Improve system quality through pragmatic enhancements in observability, testing, and resilience. This is a high‐impact role. What You'll Bring Experience building and supporting reliable backend systems with external integrations (APIs, webhooks ...

Senior Software Engineer / Reliability Engineering - Real-time Data

Hiring Organisation
Jobleads-UK
Location
City Of London, England, United Kingdom
Build and maintain production-grade software supporting Bloomberg’s global distribution infrastructure Design and implement scalable, fault-tolerant systems with a focus on observability, performance, and automation Analyse system behaviour under real-world and failure scenarios to validate capacity, failover, and recovery meet resilience objectives Identify bottlenecks, scaling limits … Work With Configuration systems serving thousands of servers across the global network Service discovery and clustering systems for distributed infrastructure Monitoring and observability frameworks for large-scale server estates Tooling for diagnosing data quality and distribution issues Ownership of systems may evolve over time as the team focuses on areas ...

Group Head of Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
club platforms, aligned to transformation priorities Set clear architectural direction and embed modern engineering standards (cloud-first, CI/CD, automated testing, observability, secure SDLC) Own end‐to‐end delivery outcomes, ensuring valuable increments are shipped frequently, safely, and predictably Drive operational excellence across reliability, resilience, performance, and security Establish … continuous improvement Experience Senior engineering leader with strong hands‐on technical credentials. Deep experience across cloud-first architectures, distributed systems, CI/CD, observability, and secure SDLC. Experience delivering AI-enabled capabilities into production environments. Proven track record of improving reliability and leading incident response and prevention. Experience scaling engineering ...

Senior Director, Master Data Management

Hiring Organisation
Jobleads-UK
Location
Northampton, England, United Kingdom
manage the MDM product/platform team (product, engineering, data quality, metadata/lineage). Implement DataOps for MDM (CI/CD, automated testing, observability, change control, incident/problem management). Deliver golden record services (match/merge/survivorship, hierarchy management) and reference data services. Define integration architecture … merge/survivorship, hierarchy & reference data management, quality management, metadata & lineage. Hands‐on familiarity with DataOps (CI/CD for data, automated data testing, observability), microservices, and event streaming patterns (e.g., CDC, pub/sub). Experience with enterprise data catalogs, lineage tooling, and at least one MDM platform (commercial ...

Automation Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, Greater Manchester, UK
test automation frameworks (including our AI-assisted tools), scripting (e.g. Python and JavaScript), CI/CD tooling and our internal observability tools to design and execute automated test suites, manage device infrastructure, and provide fast, reliable feedback to product and engineering teams. Our offices are in Trafford Park, Manchester … managing or using a device farm solution (e.g. AWS Device Farm, Firebase Test Lab, BrowserStack, Sauce Labs, or an internal farm). · Familiarity with observability and monitoring for test and device infrastructure (logs, metrics, dashboards, alerts). · Knowledge of mobile platform internals (Android/iOS), SDK integration testing, or backend ...

Automation Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, England, United Kingdom
test automation frameworks (including our AI-assisted tools), scripting (e.g. Python and JavaScript), CI/CD tooling and our internal observability tools to design and execute automated test suites, manage device infrastructure, and provide fast, reliable feedback to product and engineering teams. Our offices are in Trafford Park, Manchester … managing or using a device farm solution (e.g. AWS Device Farm, Firebase Test Lab, BrowserStack, Sauce Labs, or an internal farm). · Familiarity with observability and monitoring for test and device infrastructure (logs, metrics, dashboards, alerts). · Knowledge of mobile platform internals (Android/iOS), SDK integration testing, or backend ...

Senior Software Engineer / Reliability Engineering - Real-time Data

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Build and maintain production-grade software supporting Bloomberg’s global distribution infrastructure Design and implement scalable, fault-tolerant systems with a focus on observability, performance, and automation Analyse system behaviour under real-world and failure scenarios to validate capacity, failover, and recovery meet resilience objectives Identify bottlenecks, scaling limits … Work With Configuration systems serving thousands of servers across the global network Service discovery and clustering systems for distributed infrastructure Monitoring and observability frameworks for large-scale server estates Tooling for diagnosing data quality and distribution issues Ownership of systems may evolve over time as the team focuses on areas ...

Principal Machine Learning Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Anaplan\'s platform and third-party integrations Optimise model inference pipelines for performance, cost, and scalability in production environments Implement monitoring, logging, and observability for GenAI systems to track usage, errors, and model behaviour Collaborate with data scientists to productionise ML models and forecasting algorithms Your Skills Extensive hands … Experience with A/B testing and experimentation frameworks for AI features Contributions to open-source ML projects or research publications Experience with model observability tools (LangSmith, W&B;, MLflow) DEIB Our Commitment to Diversity, Equity, Inclusionand Belonging (DEIB) We believe attracting and retaining the best talent and fostering ...

EMEA VP of AI-Observability Sales

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Snowflake is seeking a Sales Leader for the EMEA region to build and lead a high-performing sales team focused on AI-driven observability solutions. The ideal candidate will have over 10 years of experience in cloud and enterprise software sales, with a track record of managing successful sales teams. … This role offers a unique opportunity to shape the future of data observability in a fast-growing environment. A BA/BS degree is required, alongside strong leadership and coaching skills. #J-18808-Ljbffr ...

Data Engineer

Hiring Organisation
HCLTech
Location
London Area, United Kingdom
HCLTech is a global technology company, home to more than 220,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services ...

Principal AI Engineer - UK (Remote)

Hiring Organisation
NST Recruitment Ltd
Location
United Kingdom
Employment Type
Permanent, Work From Home
Principal AI Engineer Generative AI, LLMs, Python, CI/CD, SaaS/PaaS, Prompt Engineering, Agentic Workflows, Platform Systems, Remote (UK) Up to £200,000 + Equity + Benefits This is a fantastic Principal AI ...

Senior Software Engineer

Hiring Organisation
In Product
Location
City of London, London, United Kingdom
Tech Lead – London, Hybrid (2 days/week) – £100,000-£120,000 plus Benefits – High Growth Startup We’re partnering with a fast-growing healthtech company on a mission to transform primary care. Their platform ...

SRE Lead: Automation, Observability & Reliability

Hiring Organisation
Jobleads-UK
Location
Bromley, England, United Kingdom
Huxley is seeking an experienced SRE Lead to oversee SRE strategy within an investment banking environment. The role focuses on driving automation, improving observability, and enhancing reliability by design. Ideal candidates will possess over 8 years of SRE experience, particularly in resilience engineering, and demonstrable skills in scaling operations. This ...