Newcastle upon Tyne, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate More ❯
Role: DevOps Engineer Location: Slough-Berkshire, UK Salary: £38000 - £40000 IT Global Consulting Limited is seeking an experienced and highly motivated DevOps Engineer responsible for development, maintenance, and ongoing support of the platform. The role involves acting as the internal More ❯
IT Global Consulting Limited is seeking an experienced and highly motivated DevOps Engineer responsible for platform development, maintenance, and support, acting as the internal focal point for an enterprise platform. The ideal candidate should be self-motivated, ambitious, and eager More ❯
Site Reliability Engineer (Prometheus and Grafana) (15797) London, England About the Role Join a global team of engineers, operators, and Agile practitioners responsible for building and operating a world-class Data Loss Prevention (DLP) infrastructure. This role is within the Cybersecurity organization, focusing on enhancing observability and telemetry across the DLP stack to support a cloud-first strategy while maintaining … an exciting opportunity for engineers with strong SRE and monitoring experience, and also a great entry point for professionals looking to transition into cybersecurity. Key Responsibilities Design and maintain Prometheus metrics collection and PromQL queries Build, review, and optimize Grafana and Splunk dashboards using observability best practices (e.g., Four Golden Signals, RED methodology) Refine alerting rules across tools like PagerDuty … Prometheus, and Splunk to eliminate noise and identify gaps Work closely with engineering squads to implement and maintain SLO/SLIs and error budgets Operate Prometheus in agent mode and troubleshoot issues Use telemetry data to generate actionable insights for the DLP teams Drive continuous improvement of monitoring and observability systems Participate in a 24/7 on-call support More ❯
have: Experience with other database platforms ClickHouse, FoundationDb, MSSQL, MongoDb, Redis, Neo4j. Experience with configuration management tools i.e. Chef, Terraform, Ansible. Experience with an observability & monitoring stack such as Prometheus exporters, LogStash, Elasticsearch, Prometheus, Thanos, Grafana, and AlertManager. Experience with CI/CD pipelines. Experience working with various cloud providers (AWS and GCP). Experience with virtualization technologies such as More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … What We’re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … What We’re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … continuity What Were Looking For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … What We’re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
high system availability, enabling rapid delivery through CI/CD, and supporting development teams with robust infrastructure and tooling. A key part of the role includes proactive monitoring using Prometheus, Grafana, and Splunk, as well as participating in on-call rotations to respond to live incidents. Collaboration across engineering, security, and product teams is essential to build scalable and resilient … cause analysis and preventive measures. 3. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. 4. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. 5. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. 6. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. 7. … Engineer level 2. Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements 3. Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL 4. Proficient in one or more languages of Python, Go, Bash, SQL 5. Familiar with GitHub/GitOps/container orchestration/Kubernetes More ❯
Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and … dstat for monitoring storage and system performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability. More ❯
workloads Implement and maintain DevOps tooling (Terraform, Ansible, GitLab CI/CD, Jenkins) Lead PoCs for new storage technologies and present results to technical leadership Support observability via Grafana, Prometheus, Splunk , and related platforms Contribute to containerization efforts with Docker and Kubernetes (preferred) What We’re Looking For: 8+ years of experience in storage systems administration and infrastructure/platform … Linux performance tuning , particularly in HPC or ML/AI contexts Programming/scripting experience in Python , Golang , or similar languages Familiarity with modern observability and monitoring tools (Grafana, Prometheus, Splunk) Experience supporting AI/ML modelling environments is highly desirable Knowledge of container and orchestration technologies (Docker, Kubernetes) is a plus Proactive, collaborative, and passionate about building world-class More ❯
workloads Implement and maintain DevOps tooling (Terraform, Ansible, GitLab CI/CD, Jenkins) Lead PoCs for new storage technologies and present results to technical leadership Support observability via Grafana, Prometheus, Splunk , and related platforms Contribute to containerization efforts with Docker and Kubernetes (preferred) What We’re Looking For: 8+ years of experience in storage systems administration and infrastructure/platform … Linux performance tuning , particularly in HPC or ML/AI contexts Programming/scripting experience in Python , Golang , or similar languages Familiarity with modern observability and monitoring tools (Grafana, Prometheus, Splunk) Experience supporting AI/ML modelling environments is highly desirable Knowledge of container and orchestration technologies (Docker, Kubernetes) is a plus Proactive, collaborative, and passionate about building world-class More ❯
knowledge of Terraform , Ansible or similar tools Scripting: Skilled in Bash, PowerShell, or equivalent CI/CD & GitOps: Experience with Azure DevOps , GitHub Actions , ArgoCD , Flux Monitoring: Exposure to Prometheus , Grafana , and alerting systems Development Ecosystem: Familiarity with .NET, Node.js, React, and Nginx a bonus Certifications like CKA or Terraform Associate are welcomed but not essential. Key Responsibilities: Lead the … tools into the platform. Support scalable, resilient cloud environments with modern DevOps practices. Promote GitOps deployment strategies and mentor peers in DevOps best practice. Enhance observability using tools like Prometheus and Grafana. This role is ideal for someone looking to take the next step in a DevOps career while working with a modern tech stack in a supportive, growth-focused More ❯
Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and … dstat for monitoring storage and system performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability Seniority level Seniority level Not Applicable Employment type Employment type Full-time Job function Job function Information Technology Industries Computer and Network Security Referrals More ❯
networking and security principles. Responsibilities Manage and scale Kubernetes clusters for infrastructure. Develop infrastructure automation using Python and tools like Ansible or Terraform. Monitor infrastructure health and performance using Prometheus, Grafana, and logs. Maintain and integrate identity provider system with OAuth/OIDC/LDAP Deploy and maintain Ceph clusters for distributed, fault-tolerant storage. Monitor and manage user identities … Experience General Programming Debugging Skill (irrespective of Programming Languages). 3+ years in DevOps, SRE, or system infrastructure roles. Strong experience with Kubernetes and container orchestration. Familiarity with: Grafana, Prometheus, ArgoCD Knowledge of infrastructure-as-code tools: Terraform, Ansible, or Helm. Knowledge of virtualisation and containerisation: clustering, storage configuration, and virtualization management. Understanding of networking fundamentals: firewalls, VPNs, NAT, VLANs More ❯
Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and … dstat for monitoring storage and system performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability. We back our colleagues and their loved ones with benefits and programs that support their holistic well-being. That means we prioritize their physical More ❯
management of cloud network infrastructure. SRE and Monitoring Tools: Familiarity with SRE tools and principles, such as setting up monitoring, logging, and alerting for network components using tools like Prometheus, Grafana, AWS CloudWatch, or Azure Monitor. Security Best Practices: Strong understanding of cloud-native security, including encryption, identity and access management (IAM), network security groups, and compliance with security standards … in software-defined networking (SDN). Experience implementing Site Reliability Engineering (SRE) practices, including the use of monitoring, logging, and alerting tools to ensure network reliability and uptime (e.g., Prometheus, Grafana, AWS CloudWatch). Proven experience in connecting on-premises networks to cloud environments via technologies like Direct Connect, ExpressRoute, or VPNs for hybrid cloud solutions. Relevant certifications such as More ❯
engineering experience in performance-critical environments Proficiency in Python and bash Scripting, with hands-on Ansible experience Solid networking fundamentals: IP Addressing, VLANs, etc. Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools More ❯
engineering experience in performance-critical environments Proficiency in Python and bash Scripting, with hands-on Ansible experience Solid networking fundamentals: IP Addressing, VLANs, etc. Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools More ❯
engineering experience in performance-critical environments Proficiency in Python and bash Scripting, with hands-on Ansible experience Solid networking fundamentals: IP Addressing, VLANs, etc. Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools More ❯
Birmingham, England, United Kingdom Hybrid / WFH Options
Digital Gurus
agile delivery Tech Stack: AWS (EC2, ECS, RDS, S3, IAM, CloudWatch, VPC, etc.) Terraform, Vagrant for infrastructure provisioning Jenkins, Git, Jira, Confluence, ServiceNow Linux (Amazon Linux 2023), Docker Monitoring: Prometheus, Grafana This is an exciting opportunity to be part of a fast-moving team delivering critical infrastructure support to a high-profile programme. You’ll work with both cloud and More ❯
WAF, ALB, ELB, Network ACLs, Security Groups, KMS, S3, and other relevant services. Experience with code and security analysis tools like Blackduck, Checkmarx, SonarQube. Application and infrastructure monitoring using Prometheus and Grafana. Log management using ELK stack, Docker, Kubernetes, and Rancher. Ability to work with Subject Matter Experts to ensure the service meets user needs. #J-18808-Ljbffr More ❯
Edinburgh, Scotland, United Kingdom Hybrid / WFH Options
JR United Kingdom
or CloudFormation. Hands-on experience with CI/CD pipelines and automation tooling. Background in containerisation and orchestration – e.g., Docker, Kubernetes. Familiarity with monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, CloudWatch). Proven ability to troubleshoot and resolve complex infrastructure issues. Experience working in cross-functional engineering teams, ideally in a DevOps or SRE capacity. Strong scripting skills (e.g. More ❯
Cambridge, Landbeach, Cambridgeshire, United Kingdom
Polytec Personnel Ltd
Puppet to implement infrastructure as Code * Experience of using static code analysis tools, such as BlackDuck * Able to use and manage other monitoring tools, such as Nagios, SolarWinds, Grafana, Prometheus etc. * Experience of resolving complex issues using your debugging skills * Strong communication skills, including the ability to explain technical concepts to non-technical colleagues * Able to listen and take advice More ❯