151 to 175 of 411 Permanent Observability Jobs

Cloud Application Developer

Hiring Organisation
Sanderson Recruitment
Location
Gloucestershire, South West, United Kingdom
Employment Type
Permanent
with web application services. Familiarity with Test-Driven Development and Behaviour-Driven Development Experience in API design, development, and integration. Knowledge of monitoring and observability systems. Understanding of modern digital delivery processes and agile methodologies. Working knowledge of the Atlassian toolset. Experience with JavaScript front-end frameworks and core front ...

LLM Architect

Hiring Organisation
Bright Purple
Location
Edinburgh, City of Edinburgh, United Kingdom
Employment Type
Permanent
with GPU cluster management (CUDA, NCCL, Triton Inference Server) and performance tuning across accelerators. Solid grasp of cloud-native orchestration (Docker, Kubernetes, Helm) and observability tooling (Prometheus, Grafana, Jaeger). Proven ability to translate cutting-edge research into engineered solutions that can scale globally. Why this role stands out Influence ...

Lead Software Engineer

Hiring Organisation
Hyre AI Limited
Location
London, United Kingdom
Employment Type
Permanent
Salary
£85000 - £100000/annum
Microservices and distributed system design Experience with AI/LLM frameworks (e.g. LangChain, OpenAI API, Vertex, LiteLLM) GCP experience and DevOps best practices (monitoring, observability, CI/CD) PostgreSQL and relational data modelling Event-driven systems (Pub/Sub, Kafka) and NoSQL (e.g. MongoDB) Experience building and running production SaaS ...

Embedded Systems Reliability Engineer C

Hiring Organisation
NMS Recruit Ltd
Location
Chester, Cheshire, United Kingdom
Employment Type
Permanent
Salary
£55000 - £60000/annum Hybrid, Life Insurance, Income Prote
Familiarity with MQTT and messaging protocols used in distributed systems Experience with Qt and GUI development for Windows and Linux environments Working knowledge of observability concepts, incident response and long-term reliability strategies Exposure to hardware-in-the-loop (HIL) testing and embedded diagnostics Benefits ...

Embedded Systems Reliability Engineer C++

Hiring Organisation
Russell Taylor Group Ltd
Location
Chester, Cheshire, North West, United Kingdom
Employment Type
Permanent
Salary
£60,000
Familiarity with MQTT and messaging protocols used in distributed systems Experience with Qt and GUI development for Windows and Linux environments Working knowledge of observability concepts, incident response and long-term reliability strategies Exposure to hardware-in-the-loop (HIL) testing and embedded diagnostics Benefits ...

Senior Backend Developer (.NET | AI-First SaaS Platform)

Hiring Organisation
Keepnet
Location
United Kingdom
/CD (Azure), and post-release monitoring. Work on distributed and asynchronous systems using message queues, background workers, and event-driven workflows. Use observability signals (logging, metrics, tracing, tools like Sentry) to proactively detect, diagnose, and prevent production issues. Collaborate closely with frontend, product, and customer-facing teams to deliver ...

AI Solutions Manager

Hiring Organisation
Durlston Partners
Location
Greater London, England, United Kingdom
quantitative models, enterprise data pipelines, and real-time reasoning systems. Evolve team structures and capacity models to optimize delivery. Own safety, security, compliance, and observability of AI systems operating in production investment environments. Maintain deep awareness of emerging AI technologies, agentic patterns, and best practices. How Success Is Measured Strength ...

Lead Software Engineer

Hiring Organisation
We Are Dcoded Limited
Location
Manchester, North West, United Kingdom
Employment Type
Permanent
Salary
£95,000
Contributing to architectural decisions while remaining grounded in delivery Helping your squad plan effectively using a now/next/later mindset Championing reliability, observability, and supportability. You build it, you help run it Improving automation, CI/CD pipelines, and engineering standards incrementally Supporting and mentoring others through pairing ...

Lead Test Analyst

Hiring Organisation
Horwich Farrelly
Location
Salford, Lancashire, England, United Kingdom
Employment Type
Full-Time
Salary
Competitive salary
enable fast, reliable testing at scale. Coordinate cross-team dependencies, test data, and environment scheduling for parallel initiatives Champion shift-left/shift-right, observability and risk-based release decisions. Oversee UAT, regression and sign-off on release quality, with clear quality metrics and executive reporting; confident go/ ...

Senior Software Engineer (SatOS Team)

Hiring Organisation
Spire
Location
Glasgow, Scotland, United Kingdom
Contribute to the continuous improvement of our development processes and tools Perform ground-based testing and in-orbit verification of new software services Implement observability solutions for satellite-side services Work with our customers to translate their requirements into effective software solutions Key Skills: 5+ years experience in professional software ...

Technical Lead

Hiring Organisation
Perch Group
Location
North West, England, United Kingdom
development using Azure Data Factory Exposure to DataBricks, Synapse or Spark Experience working within event-driven architectures Understanding of DevOps, IaC (Terraform/Bicep), Observability The Application Timeline A first stage video call with the internal talent acquisition team (15 minutes) A second stage teams interview with the hiring manager ...

Senior Software Engineer

Hiring Organisation
Perch Group
Location
North West, England, United Kingdom
Factory Familiarity with Databricks , Synapse , or Spark Experience working within event-driven architectures Understanding of DevOps principles , Infrastructure as Code (Terraform/Bicep) , and observability best practices The Application Timeline A first stage phone call with the internal talent acquisition team (15 minutes) A second stage competency Teams call interview ...

Lead AI Engineer (FM Hosting, LLM Inference)

Hiring Organisation
Capital One
Location
New York, United States
Employment Type
Permanent
Salary
USD Annual
develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more. Invent ...

Lead AI Engineer (FM Hosting, LLM Inference)

Hiring Organisation
Capital One
Location
Cambridge, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more. Invent ...

Lead AI Engineer (FM Hosting, LLM Inference)

Hiring Organisation
Capital One
Location
Charlottesville, Virginia, United States
Employment Type
Permanent
Salary
USD Annual
develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more. Invent ...

Lead AI Engineer (FM Hosting, LLM Inference)

Hiring Organisation
Capital One
Location
Baltimore, Maryland, United States
Employment Type
Permanent
Salary
USD Annual
develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more. Invent ...

Lead AI Engineer (FM Hosting, LLM Inference)

Hiring Organisation
Capital One
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more. Invent ...

Lead AI Engineer (FM Hosting, LLM Inference)

Hiring Organisation
Capital One
Location
Dover, Delaware, United States
Employment Type
Permanent
Salary
USD Annual
develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more. Invent ...

AI Engineer

Hiring Organisation
Champion Data
Location
London Area, United Kingdom
strong bias toward practical usability and maintainability. Applied AI Integration - Apply modern AI capabilities pragmatically to real engineering and operational problems, focusing on reliability, observability, and safe deployment rather than novelty alone. Innovation Handover - Work closely with core engineering teams to transition successful prototypes into production-ready solutions, including documentation ...

Lead AI Engineer (FM Hosting, LLM Inference)

Hiring Organisation
Capital One
Location
Mc Lean, Virginia, United States
Employment Type
Permanent
Salary
USD Annual
develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more. Invent ...

Senior Software Engineer (Dublin, Hybrid)

Hiring Organisation
G Treasury SS, LLC
Location
Dublin, Ireland
Employment Type
Permanent
Salary
EUR 125,000 - 150,000 Annual
flags, ensuring seamless integration and deployment Conduct rigorous unit, integration, and non-functional (performance, security) testing to guarantee our software is production-ready Leverage observability tools and logging to troubleshoot and resolve issues across development, test, and production environments Share your enthusiasm for tech trends, explore and learn new technologies ...

Site Reliability Engineer / SRE / Systems Engineer

Hiring Organisation
AWD Online
Location
Manchester, North West, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£70,000
effective incident management across live environments. This Site Reliability Engineer/Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems. APPLY TODAY Ready to make your next career move? Apply … live production issues through to resolution or handover System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues Reliability Improvements: Collaborating with development teams to translate operational insights into long-term ...

Senior Site Reliability Engineer - AI Platform

Hiring Organisation
N26 GmbH
Location
Berlin, Germany
Employment Type
Permanent
Salary
EUR Annual
team's strategy, roadmap, and architecture. Drive incident management and troubleshooting efforts, ensuring a stable and predictable AI development and deployment environment. Improve observability and monitoring, ensuring the AI Platform meets performance and compliance requirements. What you need to be successful Background and skills: Strong hands-on experience in designing … security best practices in cloud environments. Hands-on experience with CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins, or similar). Familiarity with observability tools (DataDog, Prometheus, Grafana, OpenTelemetry). Nice to have: Experience in AI/ML production systems and the unique challenges of scaling AI workloads. Experience ...

Senior Site Reliability Engineer

Hiring Organisation
Alexander Ash Consulting
Location
Scotland, United Kingdom
drive SRE strategy, standards, and maturity across complex platforms. Design, build, and operate resilient, scalable, and secure infrastructure. Lead reliability engineering initiatives, including automation, observability, and incident management. Provide senior technical leadership during major incidents and drive long-term remediation. Use data, metrics, and SLOs to drive continuous improvement ...

Senior ML Infrastructure Engineer

Hiring Organisation
Harnham
Location
England, United Kingdom
large, heterogeneous datasets • Scale public-facing data infrastructure supporting ML research • Optimise distributed AI workloads for latency, throughput, reliability, and GPU utilisation • Build observability tooling for data quality, pipeline health, and experiments • Support GPU infrastructure for large-scale model training • Translate research prototypes into robust, production systems • Scope and supervise ...