London, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
service level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration … such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through More ❯
including vulnerability management and compliance. Collaborate with development and operations teams to improve system performance and scalability. Maintain and improve logging, monitoring, and alerting systems using tools like Prometheus, Grafana, ELK Stack, or Datadog Support and optimize infrastructure for both Linux and Windows-based environments. Participate in incident management, problem resolution, and root cause analysis. Ensure documentation of infrastructure, processes More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
service level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration … such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
service level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration … such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
service level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration … such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
service level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration … such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through More ❯
London, England, United Kingdom Hybrid / WFH Options
Tes
environment. Security Best Practices: Strong understanding of security frameworks and compliance standards for cloud infrastructure and DevOps processes. Monitoring & Observability: Understanding of monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK) to ensure system performance and issue tracking. Skills CI/CD Tools: Hands-on experience with Jenkins, GitLab CI/CD, Travis CI, or similar tools for building CI More ❯
Grays, England, United Kingdom Hybrid / WFH Options
TES
environment. Security Best Practices: Strong understanding of security frameworks and compliance standards for cloud infrastructure and DevOps processes. Monitoring & Observability: Understanding of monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK) to ensure system performance and issue tracking. Skills CI/CD Tools: Hands-on experience with Jenkins, GitLab CI/CD, Travis CI, or similar tools for building CI More ❯
automating data processing tasks. Experience with CI/CD tools (GitHub Actions, Jenkins, AWS CodePipeline), and integrating data-centric workflows. Familiarity with monitoring and logging tools (e.g., Prometheus, Loki, Grafana) in application and data-intensive environments. Proficiency in Configuration Management tools (Chef, Puppet, Ansible) and data orchestration tools (e.g., Airflow, Prefect). Strong background in containerization using Docker and orchestration More ❯
London, England, United Kingdom Hybrid / WFH Options
ZigZag Global
scripting and automation using languages like PowerShell, Bash, or Python. Hands-on experience with CI/CD tools like Azure DevOps, GitHub Actions or GitLab CI. Practical experience with Grafana, Prometheus and/0r other monitoring tools. Solid understanding of networking, security, and compliance principles. Excellent problem-solving and troubleshooting skills. Strong communication and collaboration skills, with the ability to More ❯
Use Terraform, AWS CDK, or CloudFormation to automate cloud resource provisioning, enabling consistent and repeatable infrastructure deployments. Monitoring & Observability: Implement monitoring, logging, and alerting solutions using tools like Prometheus, Grafana, Loki, Datadog, or CloudWatch to ensure system health and performance. Security & Compliance: Implement security best practices for cloud infrastructure, including IAM policies, security groups, and VPC configurations, to ensure compliance … Go. Experience with CI/CD tools such as Jenkins, GitLab CI, or AWS CodePipeline for automated deployment and testing. Familiarity with monitoring and logging tools such as Prometheus, Grafana, Loki, or Datadog. Strong understanding of cloud security best practices and IAM management. Excellent problem-solving and troubleshooting skills with the ability to resolve complex infrastructure and application issues. Strong More ❯
Actions) Work with cloud platforms such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements Approximately 18 months of experience More ❯
Actions) Work with cloud platforms such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements Approximately 18 months of experience More ❯
Actions) Work with cloud platforms such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements Approximately 18 months of experience More ❯
london (city of london), south east england, united kingdom
Sparta Global
Actions) Work with cloud platforms such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements Approximately 18 months of experience More ❯
Actions) Work with cloud platforms such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements Approximately 18 months of experience More ❯
Actions) Work with cloud platforms such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements Approximately 18 months of experience More ❯
Experience in migrating monolithic applications into microservices architectures. In-depth Linux/Unix experience, emphasizing system performance tuning and automation. Familiarity with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Loki, OTel, ELK stack) to ensure system reliability and performance. Experience in developing and working with backend applications technologies (e.g. Express, Django). Benefits we offer: 23 days’ holiday + More ❯
London, England, United Kingdom Hybrid / WFH Options
Quaisr Limited
or HashiCorp Nomad. Excellent problem-solving, communication, and collaboration skills. Nice to have: Experience managing distributed systems, microservices, and event-driven architectures. Knowledge of observability tools such as Prometheus, Grafana, ELK Stack, or Datadog. Experience with security best practices, monitoring, and incident response. Familiarity with DevSecOps and compliance frameworks (ISO 27001, SOC 2, GDPR). Exposure to big data processing More ❯
or HashiCorp Nomad. Excellent problem-solving, communication, and collaboration skills. Nice to have: Experience managing distributed systems, microservices, and event-driven architectures. Knowledge of observability tools such as Prometheus, Grafana, ELK Stack, or Datadog. Experience with security best practices, monitoring, and incident response. Familiarity with DevSecOps and compliance frameworks (ISO 27001, SOC 2, GDPR). Exposure to big data processing More ❯
London, England, United Kingdom Hybrid / WFH Options
Global Screening Services
Take strategic direction and own end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems Strong experience with Python and More ❯
practices, RBAC, IAM, networking security (NSGs, ASGs), and governance policies to ensure compliance and risk mitigation. Monitoring & Logging: Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation: Strong scripting skills in PowerShell, Bash, and Python, along with automation frameworks like Ansible. Collaboration & Problem-Solving: Ability to work closely with development More ❯
London, England, United Kingdom Hybrid / WFH Options
ZigZag Global
scripting and automation using languages like PowerShell, Bash, or Python Hands-on experience with CI/CD tools like Azure DevOps, GitHub Actions or GitLab CI Practical experience with Grafana, Prometheus and/0r other monitoring tools Solid understanding of networking, security, and compliance principles Excellent problem-solving and troubleshooting skills Strong communication and collaboration skills, with the ability to More ❯
AWS, Azure, GCP). Implement and manage containerization and orchestration platforms (Docker, Kubernetes, ECS, etc.). Monitor system performance, availability, and security using monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack, Splunk, Datadog). Troubleshoot and resolve infrastructure and application issues in development, test, and production environments. Collaborate with development teams to ensure smooth code deployments and environment consistency. … experience with cloud platforms (AWS, Azure, or GCP). Knowledge of containerization and orchestration tools (Docker, Kubernetes, Helm). Familiarity with monitoring/logging tools like ELK Stack, Prometheus, Grafana, or Splunk. Experience with Infrastructure as Code (Terraform, Ansible, or similar). Strong understanding of networking, firewalls, DNS, and load balancing. Preferred Qualifications: Certifications such as AWS Certified DevOps Engineer More ❯
AWS, Azure, GCP). Implement and manage containerization and orchestration platforms (Docker, Kubernetes, ECS, etc.). Monitor system performance, availability, and security using monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack, Splunk, Datadog). Troubleshoot and resolve infrastructure and application issues in development, test, and production environments. Collaborate with development teams to ensure smooth code deployments and environment consistency. … experience with cloud platforms (AWS, Azure, or GCP). Knowledge of containerization and orchestration tools (Docker, Kubernetes, Helm). Familiarity with monitoring/logging tools like ELK Stack, Prometheus, Grafana, or Splunk. Experience with Infrastructure as Code (Terraform, Ansible, or similar). Strong understanding of networking, firewalls, DNS, and load balancing. Preferred Qualifications: Certifications such as AWS Certified DevOps Engineer More ❯