Develop a baseline monitoring and tooling concept for cloud to address the need for compliance infrastructure reporting within agile deliveries as part of our Observability strategy. Develop concepts and tools for chargeback and showback (Financial Instrumentation) in a multicloud context. Implement and mature a cloud forecasting and capacity management solution More β―
CD Pipeline Development: Develop and maintain robust CI/CD pipelines for continuous integration and deployment of ML models and related infrastructure Monitoring and Observability: Build and maintain comprehensive monitoring and alerting systems for our ML infrastructure and models, leveraging tools like DataDog to ensure system health and performance Collaboration More β―
london, south east england, united kingdom Hybrid / WFH Options
Digital Skills ltd
level experience in AWS Networking/TCP/Firewalls/Certs Advanced proficiency with containers and container orchestration tools such as Docker and Kubernetes Observability champion, experience in designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge More β―
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More β―
london, south east england, united kingdom Hybrid / WFH Options
Premier Group
GitLab CI/Jenkins) Automate deployments and monitoring for multiple environments Implement Infrastructure as Code using Terraform Manage containerised environments with Docker & Kubernetes Enhance observability with tools like Prometheus , Grafana , and Datadog Collaborate closely with developers, testers, and platform teams π§° Tech Stack You'll Use: Cloud: AWS (core services: EC2 More β―
enhance internal DevOps culture, tooling, and CI/CD processes. Collaborate cross-functionally to continuously innovate and improve development workflows and system operations. Foster observability and reliability across live systems through best-in-class monitoring and automation. Day to Day: Collaborate with engineers and architects to define and implement cloud More β―
Networks, ExpressRoute, VPNs, NSGs, and Azure Firewall for secure connectivity. Integrate hybrid cloud solutions using Azure Arc and hybrid connectivity strategies. Monitoring & Resilience: Implement observability using Azure Monitor, Log Analytics, App Insights, and Prometheus/Grafana . Design for high availability (HA), disaster recovery (DR), and business continuity (BCP) . More β―
london, south east england, United Kingdom Hybrid / WFH Options
LHH
or CloudFormation. Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. Monitor system performance, availability, and security, implementing observability best practices. Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: Experience deploying and More β―
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
GCP, AWS, or Azure). Proven ability to implement redundancy and disaster recovery scenarios. Track record in scaling high-efficiency production systems. Proficiency with observability tools (e.g., Prometheus, Grafana, Grafana Mimir, OpenTelemetry). Strong written and spoken English (B2 level or higher). Nice to Have: Experience with Argo CD More β―
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown Asset Management Limited
end to end testing tools and practices (e.g. Jest, Cypress, Backstop, Playwright). Experience with CI/CD and Trunk Based Development. Experience with observability tools and practices, including monitoring, logging, and tracing to ensure system reliability and performance. Understanding of Microservices & principles of RESTful API development, including structuring, documenting More β―
Leeds, West Yorkshire, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
Altrincham, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
Bury, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
Leigh, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
Bolton, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
london, south east england, united kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
london (city of london), south east england, united kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
london (west end), south east england, united kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
Ashton-Under-Lyne, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More β―
environments (e.g. Docker), and IaC tools like Terraform and Ansible for infrastructure performance and cost efficiency. β’ Implement best practices in DevOps and DevSecOps, including observability, security, networking, API integration, and disaster recovery. β’ Mentor junior engineers and contribute technical leadership, ideally with experience in broadcast workflows, audio/video streaming, and More β―
Greater Bristol Area, United Kingdom Hybrid / WFH Options
Searchability NS&D
CD pipelines (Jenkins, GitHub Actions, GitLab CI/CD) and automation tools like Terraform and Ansible Programming : Proficiency in Python, Go, or Ruby Monitoring & Observability : Hands-on experience with Prometheus, Grafana, ELK Stack, or similar technologies Core Attributes A passion for solving complex technical challenges in high-availability production environments More β―
looking for experienced DevOps/Platform Engineers to join our vibrant community of Platform Engineering professionals, encompassing knowledge and experience in DevOps, DevSecOps, SRE, Observability, and Internal Developer Platforms/Portals, based at our London offices on a full-time, permanent basis As a member of our Next Gen Engineering More β―
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Modix International
Actions). Strong troubleshooting skills for cloud infrastructure and application performance. Knowledge of cloud security, compliance , and identity management . Experience with monitoring and observability tools (New Relic, Splunk). A continuous improvement mindset and a desire to optimize systems for security, performance, and cost. AWS Certifications (e.g., AWS Certified More β―
secure or regulated environments (e.g. Defence, Government, Critical National Infrastructure). Desirable: Familiarity with cloud platforms such as AWS, Azure, or OpenStack. Experience with observability tooling (e.g. Prometheus, Grafana, ELK stack). Exposure to infrastructure security principles and compliance frameworks. Whatβs in It for You: Salary from Β£80,000+ More β―