776 to 800 of 1,228 Permanent Observability Jobs

RVP Europe Sales

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

improve quality, efficiency, security, and profitability. Our software combines application intelligence, experience visibility, contextual insights, and real-time control to help customers elevate observability and do more with the networks they already run. As AI reshapes how the world works, connects, and communicates, AppLogic Networks helps ensure modern applications ...

Principal Engineer (Post-Purchase)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Tulip apps), fulfilment, shipping and CS tooling. Scale, reliability and delivery: Lead cross‐team initiatives that increase throughput and reduce cost‐to‐serve. Improve observability and operability across the flow from “buy” to “delivered,” reducing WISMO and manual interventions. Data and tooling coherence: Assist in enabling a 360° order view ...

Senior DevOps Engineer

Hiring Organisation: Morgan McKinley
Location: Oxford, Oxfordshire, England, United Kingdom
Employment Type: Full-Time
Salary: Salary negotiable

Code (IaC): Work closely with the team to design, implement, and maintain scalable cloud architecture using modern IaC frameworks and centralized Git repositories. Observability & SRE Practices: Perform root-cause analysis of production incidents and mature our observability, logging, and metrics-gathering tools to improve system reliability. DevSecOps: Ensure security … infrastructure, applications, and data in a hybrid cloud environment Designing and maintaining robust CI/CD Automation pipelines Implementation of open-source standards for observability (e.g., OpenTelemetry ) Strong troubleshooting, analytical, and system-debugging skills Desired Skills We are also keen to discuss experience in: FinOps practices, including cost control, optimization ...

Enterprise Network Architect

Hiring Organisation: Jobleads-UK
Location: Bournemouth, England, United Kingdom

tools.Deep understanding of security frameworks, firewalls, endpoint protection, and SIEM tools.Strong knowledge of data management platforms, databases, data lakes, Fabric and ETL processes.Experience with observability tools and practices, including monitoring, logging, tracing, and metrics collection using platforms such as ELK stack, Grafana, Solarwinds & Azure Monitor.Ability to design and implement observability ...

DevOps Engineer

Hiring Organisation: Reed
Location: County Durham, England, United Kingdom
Employment Type: Full-Time
Salary: £45,000 - £60,000 per annum, Inc benefits

pipelines using Azure DevOps Supporting monitoring, reliability, and operational readiness Working alongside engineers to embed better DevOps and platform practices Contributing to security, observability, and continuity planning What they’re looking for Proven experience in an Azure-focused DevOps or platform engineering role Hands-on Terraform experience used in live … essential) DevSecOps exposure Cloud cost management/FinOps awareness Understanding of .NET/C# based platforms Scripting with PowerShell, Bash or Python Experience with observability and monitoring tools Interest in using AI tools to improve engineering productivity Working setup & culture Hybrid working with a flexible, trust-based approach Supportive, inclusive ...

SRE DevOps Engineer

Hiring Organisation: WTW
Location: Surrey, United Kingdom
Employment Type: Full Time

product team to develop and support operationally resilient cloud infrastructure. The ideal candidate will have a track record in Microsoft Azure and Observability platforms in complex SaaS environments and have excellent communication skills. You will be joining our growing engineering organization building a wide range of market-leading InsurTech solutions … with focus on high cadence and cost effectiveness Implement infrastructure as code Support the team in infrastructure and networking related issues Maintain and configure observability platforms such as Datadog Proactively monitor production and other environments to ensure stability, availability, security and integrity Participate in incident response, troubleshooting, and root cause ...

Principal Site Reliability Engineer

Hiring Organisation: F5 consultants
Location: Reading, Berkshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £95,000

improve platform reliability across complex Kubernetes and OpenShift environments. You'll work within a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, observability tooling, and automation-first engineering practices. This is a technically hands-on role where you'll take a leading voice in platform stability, mentor others … Kubernetes and OpenShift (non-negotiable) Experience working in complex multi-cloud or hybrid environments Proficiency in service mesh technologies such as Istio Experience with observability stacks including Prometheus, Grafana, Loki, and Tempo Strong Infrastructure as Code experience using Kustomize or Helm, with scripting skills in Bash and/or Python ...

Cloud DevOps Engineer - Derby- £70K

Hiring Organisation: Akkodis
Location: Derbyshire, United Kingdom
Employment Type: Permanent
Salary: £50000 - £70000/annum

where there's genuinely a lot going on, in a good way. They're moving away from legacy infrastructure, modernising their cloud estate, improving observability, and continuing to build out their platform engineering capability. So if you enjoy being part of real change rather than just keeping the lights … collaboration too, you'll be working closely with Dev, QA and Product, helping teams release software reliably while also pushing forward things like monitoring, observability and overall platform resilience. Tech-wise? It's an Azure-first setup, but they're open to people who've worked across ...

AI Technical Lead

Hiring Organisation: 167Solutions
Location: United Kingdom

with responsible AI principles Mentoring AI Engineers, Machine Learning Engineers and Data Scientists Driving engineering best practice including CI/CD, testing, monitoring and observability Technical Experience RequiredArtificial Intelligence & Machine Learning Strong commercial experience designing, training, evaluating and deploying Machine Learning models Experience with Generative AI and Large Language Models … Engineering SQL and NoSQL databases Data Warehousing and Data Lake architectures ETL and Data Pipeline development Git CI/CD Docker Kubernetes Monitoring and Observability platforms Leadership Experience You will ideally have experience: Leading AI, Machine Learning or Software Engineering teams Making architectural and technical decisions within enterprise environments Engaging ...

Senior Software Engineer - Python, TypeScript and AWS

Hiring Organisation: Jobleads-UK
Location: Belfast, Northern Ireland, United Kingdom

DevOps/deployment tools such as Jenkins, Bamboo, Git or similar. Essential skills (must have) AWS TypeScript Python Desired skills (nice to have) Snowflake Observability tools Generative AI What you’ll be doing Contribute to solving complex business problems by delivering high‐quality software that provides outstanding experiences for customers … appetite for rapid prototyping and iteration. Collaborative communicator who can translate between technical and non‐technical stakeholders. Quality‐focused engineer with an interest in observability, reliability and security What’s on offer Health insurance (including access to a digital doctor), life assurance and income protection. Employee discount schemes, annual bonuses ...

Senior AI Engineer - Google AI & Generative Intelligence

Hiring Organisation: Eclaro
Location: Paramus, New Jersey, United States
Employment Type: Permanent
Salary: USD Annual

Actions or GitLab CI. Support on-premise, cloud (GCP/Vertex AI), and hybrid infrastructure deployments including edge devices for local inference. LLM Monitoring & Observability: Monitor LLM performance and usage with LangSmith and Weights & Biases. Track and optimize AI infrastructure costs using OpenMeter and custom dashboards. Set up continuous evaluation … pipelines to ensure ongoing model quality and reliability. Monitor application and model performance end-to-end with LangSmith observability tools. Required Qualifications: 10-15 years of overall software engineering experience. 5 years of hands-on experience in Artificial Generative Intelligence, including LLMs, SLMs, RAG, and multi-agent systems. Deep expertise ...

Principal Cloud Platform Engineer

Hiring Organisation: Jobleads-UK
Location: Cambridge, England, United Kingdom

/CD pipelines and infrastructure to support hosting cloud services at scale, including meeting customer expectations and SLAs. You also understand quality, resiliency, observability and supportability. This role is suited to an experienced Principal Cloud Platform Engineer who can operate autonomously in a complex cloud environment, rapidly understand existing systems … have experience writing and reviewing production‐quality code or automation in languages such as C#, Groovy, PowerShell, or similar. You have extensive experience with observability, monitoring, and alerting, and with using telemetry to drive reliability improvements (for example, Splunk, Grafana). You have experience operating in regulated or compliance‐driven ...

Client Service Delivery

Hiring Organisation: Accenture
Location: Birmingham, England, United Kingdom

Service Delivery Management Own full lifecycle service delivery across infrastructure and cloud environments, ensuring alignment to SLAs, KPIs, scope, and cost. Leverage AIOps and observability tools (e.g. Dynatrace, Datadog, New Relic, Elastic) to proactively monitor service health and performance. Utilise predictive alerting and anomaly detection to prevent incidents and optimise … infrastructure and cloud environments Strong understanding of IT Managed Services frameworks Hands-on experience with AIOps tools such as Dynatrace and ServiceNow Familiarity with observability tools (e.g. Datadog, New Relic, Elastic) Knowledge of event analytics tools such as Splunk IT Service Intelligence and Moogsoft Experience in stakeholder and client management ...

Lead Engineer - Kubernetes & CNI

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

networking, cluster usage, and deployment patterns.Operational Excellence* Support troubleshooting and root-cause analysis across multilayered network and container environments.* Contribute to platform reliability and observability improvements through dashboards, run-books, and automation.* Evaluate new tools, technologies, and approaches and present findings to the wider engineering community.Ways of Working* Operate effectively … networking internals.* Working knowledge of Kubernetes deployment pipelines, Git-based workflows, and automated testing principles.* Strong recent Linux (especially Ubuntu) experience.* Exposure to monitoring, observability, and troubleshooting across containerised workloads.* Familiarity with CI/CD tools (GitLab, Jira, Confluence).* Experience with automation tools such as Terraform, Ansible, or scripting ...

Data Architect / Data Engineers

Hiring Organisation: Vaco LLC
Location: Cincinnati, Ohio, United States
Employment Type: Permanent
Salary: USD 100 Annual

data and analytics assets Data Engineering Design and implement metadata-driven batch ingestion frameworks Develop up to 10 production-grade data pipelines Implement monitoring, observability, and remediation processes Design dimensional data models optimized for analytics and reporting Modern Cloud Data Platforms Architect solutions using Azure data services (Microsoft Fabric, Synapse … Power BI report or dashboard to validate the solution Document reporting and analytics processes DataOps & MLOps Enablement Design CI/CD, testing, and observability frameworks across data pipelines Promote data quality, lineage, and reproducibility through modern DevOps practices Support AI-ready data architectures where applicable Enablement, Collaboration & Leadership Lead technical ...

SRE Managing Consultant - Cloud Operating Model

Hiring Organisation: Jobleads-UK
Location: Manchester, England, United Kingdom

Budgets : Establish service measures and targets (SLIs/SLOs) and introduce Error Budgets to enable data‐driven trade‐offs between reliability and delivery velocity. Observability & Operational Insight: Shape observability approaches (metrics/logs/traces) and operational monitoring models that make reliability risks visible and actionable, improving operational decision‐making. … large‐scale delivery contexts; associate‐level certifications are desirable but not mandatory. Design, establish, and evolve SRE‐led centres of excellence (e.g. Reliability, Observability, or Operational Excellence), setting enterprise‐level standards for SLIs/SLOs, incident management, observability, and continuous improvement across cloud and hybrid platforms. Exposure to modern observability ...

Principal Engineer - Member Experience Platform

Hiring Organisation: Jobleads-UK
Location: Skipton, England, United Kingdom

Quality), and bar‐raising across squads: you shorten lead times, increase deployment frequency, hold change‐failure rate low, and improve MTTR through release‐linked observability - turning fast, safe flow into the default way of working.Operating at platform scale, you define cross‐cutting architecture and delivery standards (API/event contracts … resilience, observability, language/dependency baselines) and drive adoption through the Golden Path: policy‐as‐code CI/CD, progressive delivery (feature flags, canary/blue‐green), automated rollback/forward‐fix, ephemeral, data‐ready environments, and guardrails that make security and compliance by design. You partner with Platform Ownership ...

Database Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: Manchester, England, United Kingdom

Cross-Cloud Portability: Use CNPG and cloud-native patterns to keep our database layer provider-agnostic, enabling seamless deployment across AWS and GCP. Evolve Observability & Monitoring: Build deep, proactive monitoring and alerting for our global database fleet to detect performance regressions and health issues before they impact our customers. Support … orchestration. Distributed Systems Enthusiast: Excited by challenges of multi‐tenant, multi‐region, and multi‐cloud environments while ensuring data integrity and mobility. Security & Observability Mindset: Prioritize security and build deep observability (Prometheus/Grafana/OpenTelemetry/Humio) with automated guardrails. Engineering via Code: Primarily deliver via code; use Java ...

DevOps Engineer

Hiring Organisation: Infinity Quest
Location: Halifax, England, United Kingdom

Actions, Harness, Jenkins). • Networking & Security: Experience with GCP Cloud Armor, GCP Networking, and embedding secure-by-design controls from design to runtime. • Automation & Observability: Implementing actionable observability, performance tuning, and automation to reduce toil. Defining and operating against SLOs/SLIs. • Scripting & Tooling: Scripting in Bash, PowerShell, or Python. … Performance & Reliability: Define, monitor, and operate against service level objectives (SLOs/SLIs), ensuring high availability, performance, and fault tolerance. • Continuous Improvement: Drive automation, observability, and performance tuning to reduce manual effort and improve platform reliability. • Collaboration: Work closely with architecture and feature teams to evolve the cloud roadmap ...

OpenShift Administrator

Hiring Organisation: Coltech
Location: Sheffield, England, United Kingdom

hybrid cloud environments Monitor, troubleshoot, and optimise OpenShift workloads for performance and reliability Manage OpenShift networking, storage, security, patching, and upgrades Implement monitoring and observability solutions using tools such as Prometheus, Grafana, or ELK Stack Analyse system performance and recommend improvements for efficiency and cost optimisation Collaborate with DevOps, development … complex enterprise environments Desirable Skills Experience with hybrid cloud environ Exposure to automation tools such as Ansible or Terraform Knowledge of enterprise monitoring and observability platforms Experience working in large-scale regulated environments ...

Site Reliability Engineer II

Hiring Organisation: Jobleads-UK
Location: United Kingdom

design, build and evolve our infrastructure platform. You will develop Terraform modules, build CI/CD pipelines with GitHub Actions and deliver automation and observability improvements that keep our platform reliable, secure and easy for teams to adopt at scale. Responsibilities: Designing and developing reusable Terraform modules that enable teams … deliver reliable and repeatable infrastructure deployments Diagnosing and resolving complex infrastructure issues by identifying root causes across distributed cloud environments Developing automation and observability tooling to improve how infrastructure is operated at scale Implementing security and governance controls within modules and pipelines so teams inherit secure configurations by default Collaborating ...

Data Platform Engineer

Hiring Organisation: PRISM DIGITAL LIMITED
Location: Milton Keynes, Buckinghamshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £75,000

availability Own incident resolution, root cause analysis, and continuous improvement Collaborate with engineers and third-party providers to mature the platform Contribute to monitoring, observability, and cost optimisation strategies Support projects and business initiatives through robust platform delivery What Theyre Looking For: Microsoft Fabric experience Terraform experience Cloud platform engineering … delivery environments What Youll Work With: Microsoft Fabric Terraform (Infrastructure as Code) Azure cloud technologies SQL Server GitHub/CI/CD tooling Monitoring & observability tools Platform design patterns (scalability, resilience, cost control) Nice to Haves: GitHub Actions/CI/CD pipelines Zero Trust architecture Cloud cost monitoring & reporting ...

Site Reliability Engineer's

Hiring Organisation: F5 consultants
Location: Reading, Berkshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £70,000

support, shared ownership, and continuous improvement. You'll work hands-on in a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, and observability tooling There is genuine investment in your development through training, certifications, and the expertise of those around you. You'll also be part … Ability to work within complex multi-cloud or hybrid environments with a solid foundation in distributed systems Expertise in observability tooling such as Prometheus, Grafana, Loki, and Tempo Proficiency in IaC tools such as Kustomize and Helm, with scripting skills in Bash/Python Experience managing GitOps pipelines using Tekton ...

Infrastructure / DevOps Engineer

Hiring Organisation: rmg digital
Location: England, United Kingdom

Managing and optimising AWS services, including ECS, Lambda, VPC, and Aurora Postgres Building and maintaining CI/CD pipelines using GitHub Actions Implementing monitoring, observability, and alerting using Datadog Supporting development teams with deployment, automation, and operational best practices Improving infrastructure security, scalability, reliability, and cost-efficiency Monitoring system performance … Infrastructure as Code tools such as Terraform and/or CDK Understanding of CI/CD pipelines and GitHub Actions Familiarity with monitoring and observability tooling, such as Datadog Knowledge of containerisation concepts and infrastructure best practices Some experience with TypeScript or JavaScript for scripting and CDK purposes Strong troubleshooting ...

Site Reliability Engineer, K8s

Hiring Organisation: Jobleads-UK
Location: United Kingdom

agreed targets Lead incident response for BI platform services — triage, resolve, and follow up with post-mortems that actually prevent recurrence Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems Review … related field, or equivalent hands‐on experience Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar) Solid grasp of cloud security fundamentals — IAM, network controls, access management Proficiency with Git and version control in a team ...