Hampshire, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
along with the Onyx portfolio management team, to deliver industry-leading DevOps and Infrastructure products that provide Infrastructure-as-code abstractions and operating principles, leading cloud computing capability, automation, observability, operability, and developer experience. You will drive the product roadmap, guide product development initiatives, and ensure the successful launch and adoption of DevOps and Infrastructure products. Together, you will facilitate … the following characteristics, it would be a plus: Strong understanding of modern infrastructure and site reliability engineering practice, including Infrastructure-as-code tools (e.g. Terraform, Ansible ) and metrics and observability tools (e.g. Prometheus, Grafana ). Strong understanding of modern DevOps practice, including DevOps stacks (e.g. Jenkins, GitLab, CircleCI ). Cloud experience (e.g. AWS, Google Cloud, Azure, Kubernetes). Familiar with More ❯
London, England, United Kingdom Hybrid / WFH Options
Canada Life
infrastructure to the cloud and understanding the challenges involved Familiarity with cloud security best practices, identity and access management (IAM), and encryption techniques Microsoft Azure certifications are a plus Observability Designing, implementing and day-to-day use of logging and monitoring tools to capture data for alerting and issue identification and resolution using DataDog, App Insights or similar tools. Designing … applications and infrastructure for observability, security, and reliability. Networking & Security Monitor and enhance network performance, ensuring high levels of security and scalability across all cloud environments. Enforce security best practices in AKS, including network policies, RBAC (Role-Based Access Control), and integration with Azure Active Directory Core Services Software development experience, ideally in .NET stack. SQL skills to manage and More ❯
London, England, United Kingdom Hybrid / WFH Options
Redefined Ltd
infrastructure to the cloud and understanding the challenges involved Familiarity with cloud security best practices, identity and access management (IAM), and encryption techniques Microsoft Azure certifications are a plus Observability Designing, implementing and day-to-day use of logging and monitoring tools to capture data for alerting and issue identification and resolution using DataDog, App Insights or similar tools. Designing … applications and infrastructure for observability, security, and reliability. Networking & Security Monitor and enhance network performance, ensuring high levels of security and scalability across all cloud environments. Enforce security best practices in AKS, including network policies, RBAC (Role-Based Access Control), and integration with Azure Active Directory Core Services Azure core services such as Azure Storage, including Blob, Azure VMs, Azure More ❯
emergency events outside of your local time-zone. Here's what you need: Technical Expertise In-depth understanding of the Linux operating environment: kernel tuning, network stack tuning, system observability & instrumentation, and security & access management. Solid understanding of layer 2-7 networking fundamentals and the relationship between servers & services, and the transit of their packets through network hardware. In-depth … experience engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes. Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus. Experience with observability platforms: InfluxDB, Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix. Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase. Ability to write code in Go, Python, Bash, or Perl for automation. Work Experience 6-8 years of More ❯
emergency events outside of your local time-zone. Here's What You Need Technical Expertise In-depth understanding of the Linux operating environment: kernel tuning, network stack tuning, system observability & instrumentation, and security & access management. Solid understanding of layer 2-7 networking fundamentals and the relationship between servers & services, and the transit of their packets through network hardware. In-depth … experience engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes. Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus Experiencing with observability platforms: InfluxDB, Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase Ability to write code in Go, Python, Bash, or Perl for automation. Work Experience 6-8 years of More ❯
tools, including experience with some of the following tools: GitLab CI, GitHub Actions, Concourse CI, Jenkins X, TeamCity, Artifactory, etc.; Infrastructure provisioning (at least one of Terraform, Ansible, CloudFormation); Observability and Application monitoring (ELK stack, TICK stack, Grafana, Prometheus, New Relic, Datadog, etc.); Networking concepts - Bastion hosts, Reverse Proxies, Load Balancing, TLS, etc. Key Soft Skills required: Naturally resilient, tenacious More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Arm Limited
infrastructure "Nice To Have" Skills and Experience: Experience in a GitOps solution such as ArgoCD, Flux or Fleet Implementation of the Security Development Lifecycle (SDL) in infrastructure Monitoring and observability using Prometheus and Grafana, ELK stack or equivalent Use of Kubernetes management systems such as Rancher Familiarity with open source project development cycles and contribution processes, particularly around CI/ More ❯
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Amber Labs
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Amber Labs
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
ARM, or Pulumi. Experience in building secure applications and infrastructure. Strong communication skills, with the ability to convey and understand complex technical concepts clearly and concisely. SRE skills including observability and telemetry monitoring. Familiarity with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerization using Docker, Kubernetes, OpenShift, and Helm. Programming skills in languages such as Python More ❯
and firewalls. Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster More ❯
London, England, United Kingdom Hybrid / WFH Options
BlackRock, Inc
access to the best tools available. We combine problem-solving skills with software and systems engineering to take a proactive approach in building fault-tolerant and secure systems, improving observability and zealously automating away toil. In this role you will: Use your site reliability expertise to design, operate and support Preqin's infrastructure, middleware and internal services. Improving their performance More ❯
and high availability CI/CD Pipeline Development: Develop and maintain robust CI/CD pipelines for continuous integration and deployment of ML models and related infrastructure Monitoring and Observability: Build and maintain comprehensive monitoring and alerting systems for our ML infrastructure and models, leveraging tools like DataDog to ensure system health and performance Collaboration and Mentorship: Collaborate effectively with More ❯
etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Monitoring utilising products such as: Prometheus, Grafana, ELK, filebeat etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Edge technologies e.g. NGINX, HAProxy etc. Excellent knowledge of YAML or similar languages The following More ❯
Manual Tester (DV Security Clearance) Position Description CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named one of the 'World's Best Employers' by Forbes magazine. We offer a competitive salary, excellent More ❯
Consultant IRC250319 Job: IRC250319 Location: United Kingdom - London Designation: Senior Consultant Experience: 5-10 years Function: Engineering Skills: Cloud(Azure/AWS/GCP), Containers, DevOps Practices, Grafana, Kubernetes, Observability stack, SRE Management, Terraform Work Model: Hybrid We are seeking an experienced Platform Engineering leader with a hands-on engineering background, who can articulate the business benefits that Observability and … on the responsibility of handling client engagements from both technical and business perspectives. Requirements: We are ideally looking for someone with a strong background and experience in the following: Observability and SRE Practices: In-depth understanding of observability and Site Reliability Engineering practices. Familiarity with tools in the LGTM stack (Loki, Grafana, Tempo, Mimir) or equivalent observability platforms. Containerisation: Strong More ❯
and firewalls. • Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster … performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability Seniority level Seniority level Not Applicable Employment type Employment type Full-time Job function Job function Information Technology Industries Computer and Network Security Referrals increase your chances of interviewing at More ❯
London, England, United Kingdom Hybrid / WFH Options
Mistral AI
computing and highly available distributed systems • Exposure to site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...) • Experience working against reliability KPIs (observability, alerting, SLAs) • Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes...), monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...), infrastructure-as-code tools More ❯
emergency events outside of your local time-zone. Here's what you need: Technical Expertise In-depth understanding of the Linux operating environment: kernel tuning, network stack tuning, system observability & instrumentation, and security & access management. Solid understanding of layer 2-7 networking fundamentals and the relationship between servers & services, and the transit of their packets through network hardware. In-depth … experience engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes. Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus Experience with observability platforms: InfluxDB, Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase Ability to write code in Go, Python, Bash, or Perl for automation. Work Experience 5-7+ years More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven technical and some leader/mentoring experience Cloud- expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI More ❯
Bradford, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Proven technical and some leader/mentoring experience Cloud-native expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI/CD, Terraform More ❯