and modern deployment practices Familiarity with infrastructure-as-code tools such as Terraform Strong understanding of security best practices in application and infrastructure design Exposure to observability tools (e.g. Prometheus, Grafana, structured logging) Confident debugging and resolving issues in complex distributed systems Product-oriented mindset with a collaborative approach to improving developer experience Bonus: experience with Kafka, gRPC, or contributing More ❯
London, England, United Kingdom Hybrid / WFH Options
Betsson Group
RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open source More ❯
london, south east england, united kingdom Hybrid / WFH Options
Betsson Group
RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open source More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Betsson Group
RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open source More ❯
standards Skills and Qualifications REQUIRED/NON-NEGOTIABLE: Full AWS stack (including Lambda, SQS, SNS) IAM management for pipelines and users Blue/green deployment experience Terraform or CloudFormation Prometheus CloudWatch NICE TO HAVE: SonarQube Wiz FinOps tagging experience (Apptio) Kafka Kong EE Dynatrace Micro-UI patterns (JWT tokenisation and passthrough More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
Fruition Group
Working with developers and SREs to solve complex problems What we're looking for: Strong experience with AWS (EC2, ECS, Lambda, RDS etc.) Good knowledge of observability tools (Grafana, Prometheus, OpenTelemetry, Datadog, or similar) Background in software engineering (JavaScript/TypeScript & Node.js, although any language is fine) Experience with Infrastructure as Code (Terraform, CloudFormation, or similar) CI/CD pipelines More ❯
Leeds, West Yorkshire, United Kingdom Hybrid / WFH Options
Corecom Consulting
influencing and negotiating in a highly regulated environment . Desirable skills include: RESTful design and API Gateway tools (Apigee or equivalent) Kubernetes/OpenShift Monitoring tools such as Grafana, Prometheus, Dynatrace CI/CD with GitLab or Artifactory Security practices (OWASP, JWT, certificates, encryption) Experience within IT support environments Benefits 25 days holiday + Bank Holidays , with the option to More ❯
nodes and IPoE 13. Proven ability to work independently & collaboratively in a fast-paced technical environment. 14. Demonstratable knowledge of the telecommunications industry & technologies. 15. Experience of working with Prometheus and Grafana More ❯
Newcastle Upon Tyne, Tyne and Wear, North East, United Kingdom
Randstad Digital
Excellent communication skills for engaging business stakeholders, end-users, and technologists. ITIL certification (or equivalent ITIL framework experience). Technical expertise in: Databases & design: SQL Server Monitoring tools: Grafana, Prometheus, Victoria Metrics Scheduling tools: Control-M Operating systems: Windows, Linux Containerisation & cloud: Kubernetes, Azure Collaboration tools: JIRA, Git, Bitbucket This is a fantastic opportunity to work on impactful projects with More ❯
Excellent communication skills for engaging business stakeholders, end-users, and technologists. ITIL certification (or equivalent ITIL framework experience). Technical expertise in: Databases & design: SQL Server Monitoring tools: Grafana, Prometheus, Victoria Metrics Scheduling tools: Control-M Operating systems: Windows, Linux Containerisation & cloud: Kubernetes, Azure Collaboration tools: JIRA, Git, Bitbucket This is a fantastic opportunity to work on impactful projects with More ❯
and predictive analytics. Understanding of AI frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn) and their application in network automation and monitoring. Experience with telemetry and observability frameworks (e.g., Prometheus, Grafana) for real-time network monitoring and troubleshooting. Experience : Minimum of 7 years' of experience in network engineering, operations, and support. Proven ability to work hands-on and take strong More ❯
Demonstrate expert knowledge and application of the OWASP Top 10 security risks; proactively identify, remediate, and educate the team on security vulnerabilities. Architect logging, monitoring, and tracing strategies (OpenTelemetry, Prometheus, Application Insights), and drive adoption of best practices for platform reliability. Architect and optimise CI/CD pipelines (Azure DevOps, GitHub Actions), automate quality gates, enable blue/green deployments More ❯
with Python or Ansible is considered advantageous.* Comprehensive understanding of virtualisation platforms and container orchestration tools enables you to propose scalable solutions confidently.* Familiarity with monitoring stacks such as Prometheus or Grafana allows you to provide valuable insights into system performance for clients.* Exceptional interpersonal skills empower you to build rapport with stakeholders at all levels while communicating complex ideas More ❯
initiatives.* Expertise in virtualisation platforms as well as container orchestration technologies-including associated tooling-to optimise resource utilisation across diverse workloads.* Hands-on knowledge of monitoring stacks such as Prometheus or Grafana along with log management solutions like ELK/EFK or their equivalents.* Proven ability to diagnose intricate technical problems using structured troubleshooting methodologies that minimise disruption to business More ❯
London, Bloomsbury, United Kingdom Hybrid / WFH Options
IntaPeople
CI, Jenkins, GitHub Actions, or AWS CodePipeline Support and train technical staff in upskilling necessary for ongoing operations Monitor and ensure system reliability, availability, and performance using tools likeCloudWatch, Prometheus, Icinga2, Grafana, and Datadog Automate deployment, scaling, and management of containerized applications using Docker and Kubernetes Desirable skills Travis CI Monitoring – Grafana, Icinga Prometheus Rabbit MQ/AMQP Working knowledge More ❯
Bloomsbury, Shropshire, United Kingdom Hybrid / WFH Options
IntaPeople
CI, Jenkins, GitHub Actions, or AWS CodePipeline Support and train technical staff in upskilling necessary for ongoing operations Monitor and ensure system reliability, availability, and performance using tools likeCloudWatch, Prometheus, Icinga2, Grafana, and Datadog Automate deployment, scaling, and management of containerized applications using Docker and Kubernetes Desirable skills Travis CI Monitoring Grafana, Icinga Prometheus Rabbit MQ/AMQP Working knowledge More ❯
Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability and rapid incident response. Security and IAM: Implement security best practices, managing Identity and Access Management (IAM) policies … with container orchestration technologies, particularly Kubernetes. Familiarity with version control systems (e.g., Git) and CI/CD pipelines for efficient code deployment. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to ensure system observability. Strong experience with SQL databases and AWS DynamoDB, focusing on performance tuning and optimization. Proven ability to design and manage RESTful APIs, ensuring More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Understanding Recruitment
/infrastructure engineering role Strong scripting skills in Python , Bash , or Ruby Familiarity with configuration management tools (Ansible, Puppet, or Chef) Interest or exposure to observability tools like Datadog , Prometheus , or Grafana A passion for learning and improving in high-performance environments This is a rare chance to learn from elite engineers and contribute directly to a platform supporting global More ❯
/infrastructure engineering role Strong scripting skills in Python , Bash , or Ruby Familiarity with configuration management tools (Ansible, Puppet, or Chef) Interest or exposure to observability tools like Datadog , Prometheus , or Grafana A passion for learning and improving in high-performance environments This is a rare chance to learn from elite engineers and contribute directly to a platform supporting global More ❯
london, south east england, united kingdom Hybrid / WFH Options
Understanding Recruitment
/infrastructure engineering role Strong scripting skills in Python , Bash , or Ruby Familiarity with configuration management tools (Ansible, Puppet, or Chef) Interest or exposure to observability tools like Datadog , Prometheus , or Grafana A passion for learning and improving in high-performance environments This is a rare chance to learn from elite engineers and contribute directly to a platform supporting global More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Understanding Recruitment
/infrastructure engineering role Strong scripting skills in Python , Bash , or Ruby Familiarity with configuration management tools (Ansible, Puppet, or Chef) Interest or exposure to observability tools like Datadog , Prometheus , or Grafana A passion for learning and improving in high-performance environments This is a rare chance to learn from elite engineers and contribute directly to a platform supporting global More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Understanding Recruitment
/infrastructure engineering role Strong scripting skills in Python , Bash , or Ruby Familiarity with configuration management tools (Ansible, Puppet, or Chef) Interest or exposure to observability tools like Datadog , Prometheus , or Grafana A passion for learning and improving in high-performance environments This is a rare chance to learn from elite engineers and contribute directly to a platform supporting global More ❯
experience building and deploying services with Java and Spring Boot. Comfort working in a cloud-native environment - Kubernetes (EKS), containers, scaling etc. An interest in observability, using tools like Prometheus and Grafana to keep services healthy and understand usage patterns. Familiarity with AWS services and how to integrate them into modern applications. A keen focus on quality and security, baking More ❯