our enterprise messaging infrastructure, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, network optimization, and system observability using industry-standard monitoring tools. Required Skills & Qualifications: 3+ years of experience administering enterprise-grade messaging systems. Strong background in production support, preferably in a 24x7 enterprise environment. Experience working More ❯
Wokingham, Berkshire, United Kingdom Hybrid / WFH Options
Nordcloud
to L3 networking Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Must have experience with either Kubernetes or OpenShift Hosting technologies such as IIS, nginx, Apache, App Service, LightSail Analytical and creative approach to problem solving More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Become
and ability to collaborate across multidisciplinary teams Desirable Attributes Exposure to event-driven architectures and messaging systems (e.g., Kafka) Experience with Infrastructure as Code (e.g., Terraform, Ansible) Familiarity with observability tools and performance tuning Ability to mentor junior engineers and contribute to backend design leadership Prior consulting experience or experience in client-facing roles Engagement Model Outside IR35 12-month More ❯
Reading, Berkshire, South East, United Kingdom Hybrid / WFH Options
Halian Technology Limited
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Become
a strong consulting mindset is highly desirable Desirable Attributes Experience with event-driven architectures and messaging systems (e.g., Kafka) Exposure to Infrastructure as Code (e.g., Terraform, Ansible) Familiarity with observability tools and performance tuning Ability to mentor junior engineers and lead backend design initiatives Engagement Model Outside IR35 12-month initial contract with potential for extension or permanent employment Hybrid More ❯
South East London, London, United Kingdom Hybrid / WFH Options
TEN10 SOLUTIONS LIMITED
stakeholder management skills. Nice-to-Have: Hands-on experience with Databricks , Apache Spark , and Azure Deequ . Familiarity with Big Data tools and distributed data processing. Experience with data observability and data quality monitoring. Proficiency with CI/CD tools like Jenkins, Azure DevOps, or GitLab CI. Previous consultancy or client-facing experience. Additional languages like SQL, TypeScript, or Bash More ❯
S3, Aurora, and ElasticCache Proficient with CI/CD tools such as GitLab , GitHub Actions, or CircleCI Strong testing capabilities using JUnit , RestAssured , or similar frameworks Proactive with monitoring, observability, and system health Desirable Skills: Exposure to monitoring platforms like Datadog, Grafana, Prometheus , or PagerDuty Familiarity with Python scripting Experience with Kubernetes and deployment tools such as Helm Why Join More ❯
Milton Keynes, Buckinghamshire, England, United Kingdom
Noir
financial institution with soaring profits - my client is modernising platforms, embracing AI, and driving automation at scale. We're hiring a Lead Site Reliability Engineer (SRE) to drive reliability, observability, and performance across our Azure cloud infrastructure. You'll work in a modern engineering environment where we live by "you build it, you run it", focused on automation, scale, and More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Lorien
communication skills and ability to work independently or lead a small team Nice to Have: Experience with TYK API Gateway Exposure to microservices and event-driven architectures Familiarity with observability tools (e.g., Prometheus, Grafana) Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy. More ❯
Maidenhead, Berkshire, United Kingdom Hybrid / WFH Options
dynaTrace software GmbH
Docker, Kubernetes etc. Ideal candidates will have 2+ years of Dynatrace Technology experience Dynatrace Product Certification. Why you will love being a Dynatracer Dynatrace is a leader in unified observability and security. We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance. Our employees work with the largest cloud providers, including AWS, Microsoft, and More ❯
ElasticCache Familiarity with modern CI/CD platforms – ideally GitLab, but GitHub Actions or CircleCI also welcome Proficiency in testing frameworks like JUnit and RestAssured A passion for monitoring, observability , and maintaining resilient systems Desirable Skills: Experience with monitoring and alerting tools like Datadog, Prometheus, Grafana, or PagerDuty Exposure to Python scripting Familiarity with deployment platforms such as Kubernetes and More ❯
Infrastructure Observability Engineer - Leading Trading Company Location: London, UK Contract Type: Permanent Salary: Competitive + Benefits About Our Client Our client is a well-established trading company with a strong presence in the global commodities market. They are committed to leveraging cutting-edge technology solutions to drive operational excellence and maintain their competitive edge in the fast-paced trading environment. … The Role We are seeking an experienced Infrastructure Observability Engineer to lead the design, implementation, and continuous improvement of our client's enterprise observability platform. This role focuses on delivering comprehensive monitoring, event correlation, and impact analysis, demonstrating AIOps capabilities and tools such as BMC Helix Operations Manager. The ideal candidate will be passionate about improving access to infrastructure performance … automating operational intelligence, and reducing mean time to resolution (MTTR) through intelligent alerting and root cause analysis. Key Responsibilities Own and evolve the enterprise observability strategy across all infrastructure tracks Design, implement, and support event management and impact analysis workflows using platforms such as BMC Helix Operations Manager Integrate and correlate data from multiple sources (e.g., 20+ monitoring systems) into More ❯
years in platform/SRE/DevOps roles * Strong Kubernetes experience (config and deployment) * Deep CI/CD experience - Jenkins, GitLab CI/CD or similar * Skilled with infra observability tooling (Prometheus, Grafana, etc.) * Confident with Git and repo management workflows * Strong automation mindset - reducing manual intervention wherever possible * Cloud experience (AWS, Azure or GCP) * Must be a sole UK More ❯
innovation within the team. Desirable Exposure to AWS AI services (e.g., Lex, Bedrock). Experience with serverless architectures and event-driven design patterns. Familiarity with containerization (Docker, ECS) and observability tooling. Team Fit A proactive mindset with a passion for mentoring and uplifting team performance. Strong communication skills and the ability to work collaboratively across distributed teams. A drive to More ❯
innovation within the team. Desirable: Exposure to AWS AI services (e.g., Lex, Bedrock). Experience with serverless architectures and event-driven design patterns. Familiarity with containerisation (Docker, ECS) and observability tooling. Team Fit: A proactive mindset with a passion for mentoring and uplifting team performance. Strong communication skills and the ability to work collaboratively across distributed teams. A drive to More ❯
architectures across Azure, AWS, and Google Cloud Leading platform engineering squads using DevSecOps, Kubernetes, and automation tooling Enabling edge and private cloud capabilities (e.g., Azure Stack, AWS Outposts) Implementing observability and governance tooling to support modern operations Supporting Agile and product-based delivery using SRE, CI/CD, and Infrastructure as Code Advising clients on architecture optimisation, security, cost control More ❯
automation, scalability, and high reliability. A strong working knowledge of Microsoft Azure is essential. The role involves daily coding, technical leadership across orchestration, CI/CD pipelines, cloud services, observability, and security-working alongside site reliability, onboarding, architecture, and delivery functions. You're expected to scale impact through others by upskilling team members, hiring where needed, and championing platform engineering More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Become
collaboration skills across multidisciplinary teams Desirable Attributes Exposure to microservices architecture and event-driven systems (e.g., Kafka) Experience with design systems and component libraries (e.g., Material, Storybook) Familiarity with observability tools and performance tuning Prior consulting experience or experience in client-facing roles Engagement Model Outside IR35 12-month initial contract with potential for extension or permanent employment Hybrid working More ❯
Caldecotte, Milton Keynes, Buckinghamshire, England, United Kingdom
Connells Group HQ
day-to-day and strategic decision making.You will be a hands-on and customer focused engineering servant-leader. You will be comfortable moving across orchestration, automation, pipelines, cloud services, observability and security domains (even if you are not an expert in them all). A non-negotiable is experience and familiarity with Microsoft Azure.You will play your part in operating More ❯
GPS). Our teams operate across the UK, Germany, France, and India, delivering complex, enterprise-grade IT solutions and consultancy across infrastructure, cloud, and modern operations. As a Monitoring & Observability Engineer, you'll work in high-impact delivery teams that support some of the worlds most well-known organisations. Youll play a key role in helping our customers achieve greater … visibility, performance, and reliability across their IT estatescontributing to their operational success through proactive insight and incident prevention. What you'll do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms … with ITSM tools (e.g. ServiceNow) and CI/CD pipelines to enable proactive alerting and resolution workflows Act as a Monitoring & Observability SME within customer delivery teams Support incident response activities and postmortems by identifying patterns, root causes, and optimisation opportunities Work collaboratively with cross-functional teams to define and implement best practices in observability and monitoring Attend customer and More ❯
secure, scalable cloud data solutions, aligning with business and compliance needs. Key Responsibilities Design, build, and maintain cloud-native data pipelines (Azure/GCP) Implement scalable data management frameworks: observability, validation, lineage Translate business needs into effective technical prototypes and solutions Collaborate with stakeholders, data teams, and service partners Ensure data security, governance, and regulatory compliance Monitor and optimise cloud More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
Client Server Ltd
of IaC principles and tools such as Terraform and Pulumi You have experience of building and improving CI/CD pipelines for product teams You have experience with cloud observability (logging, tracing, metrics, monitoring and alerting) You have experience with Containerisation - Azure Container Apps preferred You have strong scripting skills with PowerShell and/or C# .Net coding You enjoy More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Method Resourcing
teams to operationalize models and ship ML-powered features into production. Continuously assess and iterate on production models, balancing long-term ML strategy with tactical improvements. Champion code quality, observability, and resilience within their ML systems through reviews and hands-on contributions. Help shape their internal ML standards and practices, ensuring they stay ahead of industry advancements. Offer technical mentorship More ❯
in organizations that have adopted agile and product management principles in a globally distributed team set up Led/actively participated, setting architecture direction for products with focus on observability, security, and scalability Expertise in building Cloud Native applications in .NET/Java Tech Stack leveraging microservices based architecture Good understanding of CI/CD principles focused on test automation More ❯