301 to 325 of 493 Observability Jobs in England

Ai Engineer

Hiring Organisation
Morgan McKinley
Location
Yorkshire and Humberside, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
with Generative and Agentic AI patterns, including LLM integration, RAG architectures, prompt-driven workflows, and AI service orchestration. Integrate AI capabilities with enterprise systems, observability tooling, and security frameworks. Design and maintain CI/CD pipelines within cloud-native engineering environments. Support benchmarking, evaluation, experimentation, and cost optimisation … Skills Experience with Kong API Gateway, Kong Mesh, and Flux CD. RESTful API and microservices development. Terraform and GitOps workflows. Exposure to prompt evaluation, observability, or AI red-teaming tools. SQL and NoSQL database experience. Understanding of vector search technologies and Retrieval-Augmented Generation (RAG) patterns. About You A proactive ...

Lead AI Engineer

Hiring Organisation
Morgan McKinley
Location
Yorkshire and Humberside, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
with Generative and Agentic AI patterns, including LLM integration, RAG architectures, prompt-driven workflows, and AI service orchestration. Integrate AI capabilities with enterprise systems, observability tooling, and security frameworks. Design and maintain CI/CD pipelines within cloud-native engineering environments. Support benchmarking, evaluation, experimentation, and cost optimisation … Skills Experience with Kong API Gateway, Kong Mesh, and Flux CD. RESTful API and microservices development. Terraform and GitOps workflows. Exposure to prompt evaluation, observability, or AI red-teaming tools. SQL and NoSQL database experience. Understanding of vector search technologies and Retrieval-Augmented Generation (RAG) patterns. About You A proactive ...

Senior Developer

Hiring Organisation
Addition
Location
Watford, Hertfordshire, England, United Kingdom
Employment Type
Full-Time
Salary
£80,000 per annum
Doing: Designing, deploying and managing automation and monitoring platforms that support large-scale applications and services Building and maintaining monitoring, alerting and observability tooling across the platform Creating dashboards that translate complex technical data into meaningful insights for stakeholders Developing automation to integrate new systems using existing frameworks Managing … Docker) Strong Python development skills , including scripting and Lambda functions Experience building and managing CI/CD pipelines , ideally with GitHub Actions Monitoring and observability tooling such as AppDynamics, Grafana, InfluxDB, Graphite, Sensu or similar Experience working with serverless architectures (Lambda, API Gateway, DynamoDB, EventBridge) Solid understanding of Linux/ ...

Site Reliability Engineer Newcastle upon Tyne, England, GB Posted 13 hours ago

Hiring Organisation
Jobleads-UK
Location
Newcastle upon Tyne, England, United Kingdom
service and infrastructure.### ****Key Responsibilities:***** Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments;* Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting;* Perform code deployments and manage CI/CD pipelines using … like;* Prometheus, Grafana, New Relic, DataDog, Splunk, Cloudwatch, Sumologic etc.* Strong understanding of networking and security concepts;### ### ****Additional experience preferred in:***** SRE observability experience with NewRelic or Datadog;* OpenTelemetry;* AIOps/MLOps;* SecOps.****How to Apply:**** Please submit an online application for this position by clicking ...

Principal Software Development Engineer

Hiring Organisation
Jobleads-UK
Location
Manchester, England, United Kingdom
/CD pipelines, Infrastructure as Code, automation frameworks, and database-as-code practices using Redgate Flyway.Take ownership of critical customer systems, ensuring operational resilience, observability, performance optimisation, and rapid incident response.Collaborate closely with Product, Delivery, Operations, and Commercial teams to shape technical solutions, delivery plans, and strategic outcomes.Promote secure … Connect or Genesys Cloud.Proven ability to design and deliver secure, scalable, and resilient cloud-native solutions within complex enterprise environments.Strong understanding of observability, operational support, reliability engineering, and end-to-end ownership practices.Knowledge of regulated financial services environments, including UK GDPR and FCA Consumer Duty requirements.Excellent communication and stakeholder management ...

Senior Product Manager, FS Resilience & Market Data

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
ITRS is looking for a Senior Product Manager based in London to lead in delivering critical IT observability solutions. The role involves defining product strategy and engaging with Tier 1 financial institution customers to ensure the roadmap aligns with real needs. You will work on key projects including financial trading ...

Golang Backend Architect — Distributed Systems Leader

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Golang and experience in cloud architectures, CI/CD practices, and modern database management. You will contribute to enhancing data privacy and implementing observability tools. This position offers a unique opportunity to work at the intersection of technology and finance. #J-18808-Ljbffr ...

Operational Systems Support Consultant (Monitoring & Tooling)

Hiring Organisation
F5 consultants
Location
Bristol, Avon, South West, United Kingdom
Employment Type
Permanent
Salary
£70,000
event management, monitoring optimisation, and tooling transformation (moving away from legacy platforms like TrueSight). Ideal for someone currently working as a: Monitoring Engineer, Observability Engineer, Tooling Consultant, NOC Engineer, or Systems Engineer (with monitoring experience). What You'll Do Improve event management & alerting (reduce noise, increase value) Support ...

Platform Engineer Microsoft Fabric Azure

Hiring Organisation
Client Server
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
integrations Monitor cloud services, troubleshoot incidents and drive platform reliability and performance Develop dashboards and reporting solutions using Power BI and Fabric Improve automation, observability, scalability and operational efficiency across the environment Location: You'll join the team in the London office, with flexibility to work from home once ...

Networking Specialist

Hiring Organisation
Ncounter
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£180,000 - £200,000 per annum
essential, alongside confidence working with modern data centre technologies. Nice to Haves: • Experience with automation using Python, Ansible, or similar tools • Exposure to observability and monitoring platforms • Understanding of network security and secure routing design • Hands-on experience with Arista and or Cisco in production environments • Industry certifications such ...

Head of Data & AI Platforms & Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
OLAs across data platforms and services. Ensure compliance with security, privacy and regulatory requirements across all data platforms. Implement frameworks for data quality, observability, lineage and metadata in partnership with governance teams. Oversee operational monitoring, incident management and continuous optimisation of platform services. Build and lead high performing data engineering ...

Bid Solution Architect - LONDON - PART TIME

Hiring Organisation
Reed
Location
Southwark, London, England, United Kingdom
Employment Type
Temporary
Salary
Salary negotiable
ready. Assure the full scheduling system architecture, focusing on performance and resilience. Validate integration assumptions, API patterns, data flows, and control mechanisms. Ensure system observability, failover, and peak-load behavior are credible and evidenced. Design or validate security controls across application, infrastructure, and operations. Ensure alignment of IAM, encryption, logging ...

Principal Software Architect

Hiring Organisation
Jobleads-UK
Location
Bristol, England, United Kingdom
/software ecosystem. Assess the architectural impact of new technologies. Be aware of the usability, performance, reliability, maintainability, testability, security and observability constraints on the software architecture. Prototyping and validating architectural concepts through proof-of-concept implementations. Contribute to future and/or related product definitions with a forward-looking ...

Engineering Manager - Platform Reliability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Lakebase Platform Reliability team’s footprint spans multiple stacks, systems, and stakeholders. They include AI‐powered tooling and workflows for customer management, real‐time observability during incidents, monitoring and auditing systems that underpin compliance requirements, and customer‐facing operational APIs and maintenance workflows. You’ll contribute to the wider platform ...

Head of Data & AI Platforms & Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
OLAs across data platforms and services.* Ensure compliance with security, privacy and regulatory requirements across all data platforms.* Implement frameworks for data quality, observability, lineage and metadata in partnership with governance teams.* Oversee operational monitoring, incident management and continuous optimisation of platform services.* Build and lead high‐performing data engineering ...

Enterprise Solutions Architect

Hiring Organisation
Jobleads-UK
Location
Oxford, England, United Kingdom
Success Factors Strong integration design capability: Domain Driven Design, event‐based integration, API design principles, resilience patterns, operational considerations (SLAs, observability, incident readiness) Excellent stakeholder management and communication: can influence at exec level, simplify complex trade‐offs, and align diverse teams behind common patterns and outcomes Desirable attributes: TOGAF certification ...

Regional Vice President

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What is The Role: Elastic, the Search AI company, is looking for a high-energy Regional Vice ...

DV Cleared Cloud Engineer - Contract

Hiring Organisation
Experis
Location
South West, United Kingdom
Employment Type
Contract
Contract Rate
£525 - £550/day
support critical, high-availability systems within secure government environments. This is an exciting opportunity to work across cloud and on-prem infrastructure, improving reliability, observability, automation, and delivery pipelines. Key Responsibilities Improve system reliability, performance, and scalability Collaborate with development and support teams Enhance monitoring, observability, and alerting capabilities Automate … relational databases Messaging technologies such as RabbitMQ Desirable Skills Java, Go, or Python development experience Azure experience Service management environment experience Knowledge of observability best practices and availability metrics Experience with secure or cross-domain environments If you receive suspicious outreach claiming to be from us, please contact ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
From £300 to £400 per day
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production environments … production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Permanent
Salary
£75,000
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production environments … production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong ...

AI Platform/ DevOps Engineer

Hiring Organisation
The Portfolio Group
Location
City of London, London, Castle Baynard, United Kingdom
Employment Type
Permanent
Salary
£70000 - £80000/annum + Benefits
Bedrock Knowledge Bases) and embedding pipelines Build and maintain CI/CD pipelines for inference services, retrievers, ingestion workflows, and RAG components Implement observability across AI workloads using CloudWatch, MLflow, and OpenTelemetry - covering latency, throughput, cost, and system health Apply secure-by-design principles including IAM, encryption, network controls … Terraform experience for infrastructure-as-code, provisioning and managing cloud infrastructure at scale Experience operating containerised services, managing CI/CD pipelines, and owning observability and reliability Familiarity with vector databases or search infrastructure (OpenSearch, Algolia) is a strong advantage Python proficiency for scripting, automation, and deploying production services Solid ...

Go Full Stack Developer

Hiring Organisation
itecopeople
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£60,000
event-driven services Contribute to CI/CD pipelines and cloud-native deployments Review code and champion engineering best practices Improve application performance, observability and reliability Collaborate within Agile delivery teams across multiple projects Support technical decision-making and continuous improvement Skills & Experience We are looking for candidates with strong … reviews, testing and engineering governance Experience with any of the following would be highly advantageous: Microsoft Azure Python GitOps tooling (Argo CD/Flux) Observability tooling (Prometheus, Grafana, OpenTelemetry) AI/LLM-enabled applications Event-driven architectures and messaging platforms What's on Offer Opportunity to work on cutting-edge ...

Site Reliability Engineer

Hiring Organisation
Pertemps London
Location
London, United Kingdom
Employment Type
Permanent
Salary
GBP 50,000 Annual
Jenkins, GitLab CI) Develop and maintain Terraform modules for infrastructure-as-code Build automation tools (CLI tools, scripts, GitHub Apps, self-service tooling) Own observability: dashboards, alerts, monitoring, and runbooks Continuously improve platform processes and reduce operational toil What We're Looking For Essential Skills & Experience 2-3 years … GitHub Actions, GitLab CI, Jenkins) Ability to write production-quality code in Python or Bash Solid networking fundamentals (DNS, load balancers, CDNs) Experience with observability tools (NewRelic, Datadog, Prometheus, Grafana) Comfortable participating in on-call rotations Experience using AI tools (e.g. ChatGPT, Copilot, Cursor) to enhance productivity Desirable Go, Ansible ...

AKS DevOps Engineer - Azure Kubernetes

Hiring Organisation
Reed
Location
London Gatwick Airport, Gatwick, West Sussex, England, United Kingdom
Employment Type
Full-Time
Salary
£70,000 per annum, Inc benefits
/CD pipelines using Azure DevOps with YAML. Implement and maintain secure networking patterns and apply cloud security best practices. Create and maintain platform observability using Azure Monitor, Analytics, and Application Insights. Collaborate with engineering teams to ensure service reliability on the platform. Promote best practice in cloud engineering … private endpoints, load balancing, etc. Scripting proficiency in Bash, PowerShell, or Python. Linux operating system knowledge and troubleshooting capability. Experience implementing monitoring, logging, and observability solutions in Azure. Ability to communicate platform issues like risk, platform health, cost etc to non-technical audiences. Desirable Skills: Experience contributing to architecture ...

ML Infrastructure Lead

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
versioning, reproducibility, experimentation, feature management and release management Own and improve the production environment for machine learning systems, ensuring strong standards for availability, performance, observability and resilience Define and implement monitoring across model and platform layers, including system health, data quality, drift, latency, throughput and cost efficiency Build or optimise … pipelines, infrastructure-as-code and workflow orchestration Experience with tools such as Airflow or similar platform and orchestration technologies Good understanding of model observability, data quality, feature pipelines, lineage and reproducibility Experience designing scalable infrastructure for ML workloads, including training, batch inference and real-time serving Strong appreciation of reliability ...