451 to 475 of 572 Observability Jobs in England

Senior Cloud Engineer

Hiring Organisation: Anson Mccade
Location: Manchester, North West, United Kingdom
Employment Type: Permanent
Salary: £70,000

Senior Platform Engineer Deliver and support cloud platform engineering solutions across client environments. Design services with a reliability mindset, using SLIs, SLOs, and observability practices. Implement and maintain Infrastructure as Code using Terraform across environments. Support incident management, problem management, and continuous improvement of production platforms. Contribute to observability solutions … Infrastructure as Code across non-production and production environments. Understanding of SRE principles including SLIs, SLOs, error budgets, resilience, and reliability. Experience with observability and monitoring tools such as Dynatrace or similar. Experience supporting production platforms including incident and problem management. Exposure to AIOps practices and automation for proactive issue ...

Platform Engineer

Hiring Organisation: TXP
Location: Telford, Shropshire, West Midlands, United Kingdom
Employment Type: Contract
Contract Rate: £500 - £525 per day

using GitLab. * Support the containerisation and deployment of applications. * Work closely with engineering teams delivering Java-based microservices. * Implement and maintain monitoring, logging and observability solutions. * Troubleshoot platform and deployment issues across multiple environments. Essential Skills * Kubernetes * Helm (strong commercial experience required) * AWS * GitLab CI/CD Pipelines * Containerisation (Docker …/Kubernetes) * Microservices architecture and deployment * Observability, monitoring and logging * Experience supporting engineering teams delivering Java applications Desirable Skills * MongoDB * Terraform * Public Sector or Government experience ...

DevOps Engineer

Hiring Organisation: DGH Recruitment Ltd
Location: City of London, London, United Kingdom
Employment Type: Permanent
Salary: £80000 - £100000/annum

platform reliability. Key Responsibilities - Design, deploy, and manage AI platforms and agent infrastructure - Build and maintain CI/CD pipelines and DevOps workflows - Implement observability, monitoring, and logging solutions - Optimise performance, scalability, and cost efficiency - Support AI teams with infrastructure, deployment, and integration - Ensure platform security, compliance, and high availability …/CD, automation, and DevOps best practices - Experience with Kubernetes/containerisation technologies - Strong programming skills (e.g. Python, Go, Node.js) - Experience with observability tools (e.g. OpenTelemetry, Datadog) - Understanding of security, performance optimisation, and scalability Desirable Skills - Experience working on AI/ML platforms or deployments - Exposure to large-scale distributed ...

Azure DevOps Engineer

Hiring Organisation: Langham Recruitment
Location: Birmingham, West Midlands (County), United Kingdom
Employment Type: Contract
Contract Rate: £500 - £550/day Remote, Outside IR35

DevOps. Implement Infrastructure as Code using Terraform and automation tooling. Support the migration and modernisation of existing applications into Azure. Improve monitoring, logging and observability across environments. Collaborate with development teams to streamline deployment processes. Troubleshoot infrastructure, deployment and performance issues. Ensure environments adhere to security, resilience and disaster recovery … Python. Exposure to C#, .NET or ASP.NET environments. Experience migrating applications and services from on-premise environments into Azure. Familiarity with monitoring, logging and observability tools. Strong understanding of cloud security and governance principles. Contract Details Azure DevOps Engineer £500-£550 per day Outside IR35 Initial 3-month contract Remote ...

Principal Cloud Architect

Hiring Organisation: TXP
Location: Southampton, UK
Employment Type: Full-time

product and delivery teams.The successful candidate will provide manager-level technical leadership across DevOps, cloud platforms, Infrastructure as Code, CI/CD, networking, security, observability and reliability engineering. They will help shape enterprise-scale transformation, hybrid cloud strategy and platform services aligned to the Azure Well-Architected Framework, ensuring solutions … compute/storage design. Evaluate platform changes including major provider upgrades (AzureRM/Cloudflare), DR and high availability improvements, cost optimisation strategies, and observability frameworks.Lead technical designs for large-scale refactoring and provider upgrades, environment creation pipelines, secure container registry access, identity integration and Zero Trust patterns, and event-driven ...

Principal Cloud Architect

Hiring Organisation: TXP
Location: Southampton, Hampshire, South East, United Kingdom
Employment Type: Contract
Contract Rate: £550 - £600 per day

delivery teams. The successful candidate will provide manager-level technical leadership across DevOps, cloud platforms, Infrastructure as Code, CI/CD, networking, security, observability and reliability engineering. They will help shape enterprise-scale transformation, hybrid cloud strategy and platform services aligned to the Azure Well-Architected Framework, ensuring solutions … compute/storage design. Evaluate platform changes including major provider upgrades (AzureRM/Cloudflare), DR and high availability improvements, cost optimisation strategies, and observability frameworks. Lead technical designs for large-scale refactoring and provider upgrades, environment creation pipelines, secure container registry access, identity integration and Zero Trust patterns, and event ...

Senior Site Reliability Engineer

Hiring Organisation: VIQU IT
Location: Wavendon, Bedfordshire, United Kingdom
Employment Type: Permanent
Salary: GBP 65,000 - 75,000 Annual

experience with both Azure, and on-premise virtual machines. Experience withInfrastructure as Code/Terraform, Container orchestration (Kubernetes or AKS), and Monitoring and observability tooling (Prometheus, Grafana, Datadog, or Azure Monitor). Ability to implement new processes, and tools, ensuring the wider development and support teams adopts new ways … Engineer Utilise various technologies (Terraform, Kubernetes ect) to manage provision, and configure servers and networks, and automate application lifecycles. Regularly use Datadog and other observability tools for application performance monitoring. Implement new ways of working, helping to shape how the organisation responds and recovers to incidents. Take ownership of incident ...

Senior Site Reliability Engineer

Hiring Organisation: VIQU IT
Location: Milton Keynes, Wavendon, Buckinghamshire, United Kingdom
Employment Type: Permanent
Salary: £65000 - £75000/annum

Senior / Principal DevOps Engineer

Hiring Organisation: Hays Technology
Location: Bury, Greater Manchester, United Kingdom
Employment Type: Contract
Contract Rate: £700 - £800/day £700 - £800 p/d (depending on level)

best practices across engineering teams and onboard products onto shared platforms. Build and maintain secure, scalable, and high-performing cloud infrastructure in AWS. Implement observability, monitoring, and operational insights across multiple environments. Improve deployment processes, reduce friction, and enable self-service capabilities for development teams. Support cloud and infrastructure incident … focus on automation. Experience with containerisation and workload orchestration technologies. Scripting and programming experience with tools such as Python and Bash. Strong understanding of observability, reliability, and operational best practices. Knowledge of information security principles and experience embedding security throughout the software delivery lifecycle. If you're interested in this ...

Principal Platform Engineer

Hiring Organisation: Sanderson Recruitment
Location: City of London, London, United Kingdom
Employment Type: Permanent

persistence platforms Provide technical leadership and architectural guidance across multiple engineering teams Define engineering standards, platform roadmaps and best practices Drive automation, resilience, observability and operational excellence initiatives Support and mentor engineers through code reviews, coaching and technical leadership Collaborate with architects and stakeholders to translate business requirements into technical … automation and DevOps practices Experience mentoring engineers and providing technical leadership Key Technologies AWS Terraform Linux Cassandra Couchbase ScyllaDB Kafka CI/CD Pipelines Observability & Monitoring Platforms Distributed Database Technologies Nice to Have Experience with additional distributed persistence technologies Background in large-scale cloud-native environments Experience defining enterprise platform ...

Principal AI Platform Engineer- Fintech

Hiring Organisation: Client Server
Location: Reigate, Surrey, South East, United Kingdom
Employment Type: Permanent, Work From Home

expert for AI platform architecture and integration challenges. You'll also work closely with Security and Compliance to embed robust guardrails, observability and governance into every layer of the platform. Location/WFH: There's a flexible work from home hybrid model, you'll join colleagues in the Reigate office … prompt engineering and modern GenAI tooling You have a strong knowledge of software engineering fundamentals, including system design, APIs, CI/CD, testing, observability, cloud-native architecture and operational excellence You can design scalable platforms, SDKs and self-service tooling that simplify complexity and accelerate engineering teams You're proficient ...

Senior Data Management Professional - Data Engineer - Commodities Data London, GBR Posted today

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

improve performance, reliability, and maintainability. Design automated pipeline controls for validation, monitoring, schema change, exception handling, and data integrity. Develop workflow orchestration, alerting, observability, and remediation processes. Translate business and client needs into engineering‐ready requirements and scalable technical solutions. Partner with Engineering on platform evolution, architecture, tooling, system design … hands‐on experience with Python or similar programming/scripting languages. Experience with querying structured, semi‐structured, and unstructured datasets. Experience with workflow orchestration, observability, monitoring, alerting, and scalable architecture design. Ability to analyze, refactor, and modernize legacy systems. Strong understanding of data lifecycle management, data integration, data modelling, data ...

Senior Site Reliability Engineer

Hiring Organisation: Experian Ltd
Location: Nottingham, Nottinghamshire, East Midlands, United Kingdom
Employment Type: Permanent, Work From Home

Perform detailed post-incident investigations to identify underlying causes. Document findings and share learnings to prevent recurrence. Implement preventive measures and continuous improvement processes. Observability Champion monitoring, logging, and alerting strategies using tools like Prometheus, Grafana, ELK, and AWS CloudWatch. Build real-time dashboards to visualize system health and reliability … culture of shared responsibility for uptime and performance across engineering teams. Qualifications Deep expertise with various AWS services. Advanced knowledge of monitoring and observability tools. Strong leadership capabilities with a focus on setting clear direction, aligning team efforts with organizational goals, and maintaining high levels of motivation and engagement across ...

Senior AI Engineer| London

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Vector databases & retrieval: Pinecone, Weaviate, Chroma, pgvector, FAISS; embeddings, semantic and hybrid search, reranking MLOps/LLMOps & deployment: Docker, Kubernetes, FastAPI, CI/CD; observability, tracing, evaluation tooling (LangSmith, LangFuse); guardrails, prompt/version management Responsible AI & safety: bias & fairness, hallucination mitigation, evaluation, privacy, security, governance of AI and agentic … optimization. LLMOps, Evaluation & Optimization : Experience operationalizing LLM and agentic applications—building evaluation harnesses and offline/online metrics for quality, groundedness, and safety; implementing observability, tracing, and monitoring; continuously optimizing accuracy, cost, and latency. Familiarity with guardrails, red‐team, and responsible deployment of AI systems in production. Communication Skills : Excellent ...

DevOps Engineer

Hiring Organisation: Opus Recruitment Solutions Ltd
Location: Leeds, West Yorkshire, England, United Kingdom
Employment Type: Contractor
Contract Rate: £400 - £450 per day

InsideIR35 | Hybrid 1 day onsite in Leeds | 6 month Initial contract DevOps Engineer to work on the development of an large scale observability platform, experienced in cloud techologies, infrastracture as code, building and operating distributed systems at scale, proficient in devops engineering, reviewing, writing and testing code, working with source … control mechanisms and and deploying infrastructure. Key experience we are looking for: Previous experience building and supporting large-scale AWS observability and monitoring platforms. Strong Python development background with experience creating automation and engineering tooling. Hands-on Kubernetes (K8s) experience deploying, managing and troubleshooting containerised workloads. Experience using Grafana ...

DevOps Engineer

Hiring Organisation: Oscar Associates (UK) Limited
Location: Manchester, North West, United Kingdom
Employment Type: Permanent
Salary: £70,000

scalable, reliable and cost-efficient as it moves into full production. Working closely with engineering teams, you'll drive automation, improve deployment pipelines, strengthen observability and ensure the platform performs under high-volume, real-time workloads. This is a hands-on position with genuine ownership and plenty of opportunity … enhancing CI/CD pipelines with blue/green deployments and automated rollback Driving platform reliability, resilience and scalability Developing monitoring, alerting and observability across the environment Managing cloud costs and implementing best FinOps practices Participating in a small production on-call rota Technology AWS ECS Fargate Terraform Aurora ...

Global DevOps Lead

Hiring Organisation: Stott & May Professional Search Limited
Location: Oxford, Oxfordshire, UK
Employment Type: Full-time

closely with engineering, cloud, and operations teams to deliver a modern, automated, and scalable platform.You'll drive DevOps strategy across infrastructure, CI/CD, observability, SRE, and cloud optimisation while influencing senior stakeholders across the business.Key Responsibilities- Define and implement a global DevOps operating model, including governance, standards and best … initiatives.- Partner with engineering and cloud teams to establish clear ownership across DevOps and infrastructure.- Lead the implementation and optimisation of enterprise monitoring and observability using Datadog.- Build scalable deployment pipelines that improve release quality and speed.- Establish and monitor DORA metrics, driving improvements in deployment frequency, lead time, change ...

Site Reliability Engineer (SRE) - Cloud & Automation

Hiring Organisation: Spencer Rose Ltd
Location: London, United Kingdom
Employment Type: Permanent
Salary: GBP 60,000 - 70,000 Annual

implementation of SRE practices across the organisation, working closely with infrastructure teams to optimise deployment processes and embed automation and operational excellence. Enhance observability and reliability , defining and implementing SLAs, SLOs and SLIs to improve alerting, monitoring, and capacity planning. Identify and eliminate toil , developing frameworks to analyse recurring issues … beneficial). Experience supporting and building multi-environment, multi-region cloud platforms (AWS or GCP), using IaC and GitOps workflows. Hands-on experience with observability/APM tooling such as Grafana, Datadog or Dynatrace. Background working in regulated financial services or banking environments. Excellent troubleshooting, analytical and communication skills, able ...

Vice President, DevOps Production Services

Hiring Organisation: Jobleads-UK
Location: Manchester, England, United Kingdom

enterprise applications and ensure platform stability, resiliency, and availability. Monitor application health, system performance, batch jobs, interfaces, and alerts using enterprise monitoring and observability tools. Investigate, troubleshoot, and resolve production incidents within defined SLAs. Perform root cause analysis (RCA) for recurring issues and drive permanent fixes. Analyze production logs, identify … Cloud experience preferred. Knowledge of automation/scripting using Python, Shell, or PowerShell. Exposure to DevOps/SRE practices, CI/CD pipelines, and observability tooling. Strong communication skills with the ability to provide concise incident and executive status updates. #J-18808-Ljbffr ...

Site Reliability Engineer

Hiring Organisation: Connells Limited
Location: Milton Keynes, Buckinghamshire, South East, United Kingdom
Employment Type: Permanent, Work From Home

hands-on role in ensuring it is reliable, scalable, and observable. You will help establish and mature SRE practices, focusing on: Monitoring and observability Incident response Post-incident review Reliability testing and capacity planning Toil reduction Enabling development velocity We offer a hybrid working arrangement with one day per week … Build dashboards, alerts, and runbooks to improve visibility Automate repetitive tasks to reduce operational toil Collaborate with cross-functional teams to enhance reliability and observability Support performance testing and capacity planning Proactively identify and prioritise reliability improvements Experience & Skills Required: Hands-on experience with Azure Monitoring (Application Insights, Alerts, Action ...

Principal AI Platform Engineer- Fintech

Hiring Organisation: Client Server
Location: Reigate, Surrey, UK
Employment Type: Full-time

expert for AI platform architecture and integration challenges. You'll also work closely with Security and Compliance to embed robust guardrails, observability and governance into every layer of the platform.Location/WFH:There's a flexible work from home hybrid model, you'll join colleagues in the Reigate office twice … frameworks, RAG, prompt engineering and modern GenAI toolingYou have a strong knowledge of software engineering fundamentals, including system design, APIs, CI/CD, testing, observability, cloud-native architecture and operational excellenceYou can design scalable platforms, SDKs and self-service tooling that simplify complexity and accelerate engineering teamsYou're proficient with ...

Senior Java Engineer - FX eTrading

Hiring Organisation: Pontoon
Location: London, United Kingdom
Employment Type: Contract
Contract Rate: £800 - £900/day

data, order/risk workflows, and real-time streaming capabilities. Optimise Performance: Focus on improving latency, throughput, and reliability across the entire stack. Implement observability practises (metrics, tracing, logging) and conduct performance profiling. Establish Best Practises: Champion engineering excellence through code standards, testing strategies (unit/integration/… including market data, order flows, and execution workflows. Hands-On Skills: Proficiency with CI/CD, containerisation, cloud/on-prem deployments, and observability practises. AI Integration: Comfortable integrating AI coding tools into daily development workflows. Communication Skills: Excellent communication and stakeholder engagement abilities, with a track record of leading ...

Contract - Staff .Net Backend Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

performance. Championing a high‐quality engineering culture – test coverage, peer review, CI/CD discipline via GitHub Actions, Infrastructure as Code, secure coding, observability and performance – aligned to the Reapit Global Technology Strategy, Reapit Connect and agentic tooling. Mentoring and up‐levelling engineers around you through pairing, PR review, architectural …/CD (ideally GitHub Actions), Infrastructure as Code (AWS CDK or Terraform), comprehensive testing (unit, integration and contract), and a genuine commitment to observability, performance and secure‐by‐default coding. Technical leadership without the title – a track record of lifting teams through pairing, mentoring, PR review and example, rather than ...

Junior DevOps Engineer

Hiring Organisation: Experis
Location: City of London, London, United Kingdom
Employment Type: Contract
Contract Rate: £350 - £400/day

Junior DevOps Engineer 3 months initially - extensions London - flexible Inside IR35 - umbrella only The Junior DevOps Engineer will work to the DevOps manager and supports the design, delivery, automation and operation of secure, reliable and ...

Site Reliability Engineer

Hiring Organisation: Connells Limited
Location: Milton Keynes, Buckinghamshire, UK
Employment Type: Full-time

Job Description We are seeking an experienced Site Reliability Engineer (SRE) to join our Group Technology Team in Milton Keynes.ConnellsX is Connells Group Technologys internal developer platform, built on Microsoft Azure. It simplifies cloud hosting ...