Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
Required Skills and Experience: Proven experience in cloud infrastructure automation. Deep knowledge of cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes, Rancher), automation tools (Terraform, Ansible), and monitoring solutions (Prometheus, Grafana). Strong scripting and programming skills in Bash, Python, and Go. Experience in DevOps, SRE, or Platform Engineering roles with a focus on hybrid infrastructure. Familiarity with Agile methodologies More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
processing and ETL tools like Apache Kafka, Spark, or Hadoop. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure. Understanding of ML algorithms, their development and implementation Confidence developing end-to-end solutions Experience with infrastructure as code e.g. Terraform, Ansible If you More ❯
CI, or similar. Manage cloud infrastructure (OCI, AWS, Azure, or GCP) using Infrastructure as Code tools like Terraform or Serverless Functions. Monitor system health and performance using tools like Prometheus, Grafana, or Datadog or NewRelic. Collaborate closely with development teams to automate builds, performance tests, and deployments. Ensure system security, compliance, and best practices are followed in deployment pipelines. Ensure More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
teams to: Modernise our infrastructure by leading the migration from Docker Swarm to Kubernetes Design and operate CI/CD pipelines using CloudBees and GitLab Build out observability with Prometheus, Grafana, OpenTelemetry, and Dynatrace Automate cloud deployments (AWS-first) using Terraform and platform tooling Improve security posture across IAM, secrets, and networking Help the team ship faster and safer by More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. Strong scripting and automation skills, with More ❯
setting up and managing monitoring, metrics, and alerting systems Experience operating production-grade services at scale Great to have: Experience with tools such as: Terraform, SaltStack, MongoDB, Elasticsearch, Kafka, Prometheus, Grafana or HashiCorp Vault Experience with securing applications, services, and data, including authentication, authorization, TLS, and encryption Exposure to Kubernetes (administering, deploying, or developing apps on K8s clusters) Understanding of More ❯
in their Hertfordshire office. In this role, you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/Grafana or equivalent), and building dashboards that clearly communicate metrics to business stakeholders. You'll drive system automation and integration, crafting scripts and workflows-primarily in Python-to onboard More ❯
Watford, Hertfordshire, South East, United Kingdom
La Fosse
in their Hertfordshire office. In the role you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/Grafana or equivalent), and building dashboards that clearly communicate metrics to business stakeholders You'll drive system automation and integration, crafting scripts and work flows-primarily in Python—to More ❯
and service flow mappings aligned with application performance needs. Develop and optimize Dynatrace Query Language (DQL) queries for actionable insights. Support observability design and migration from tools such as Prometheus, Grafana, and AWS CloudWatch to Dynatrace. Advise on RBAC models, data access strategies , and security best practices for multi-team environments. Design monitoring strategies for Kubernetes workloads in hybrid cloud … Ability to design noise-reducing alert strategies using Dynatrace AI (Davis). Strong communication skills to engage stakeholders and understand business and technical requirements. Nice to Have Exposure to Prometheus, Grafana, AWS CloudWatch monitoring tools. Familiarity with Terraform, GitLab , or other Infrastructure-as-Code frameworks. Background in SRE or platform engineering . Integration experience with collaboration tools such as Slack More ❯
alerts, and service flow mappings aligned to engineering needs. Help teams craft complex DQL queries to extract meaningful insights from telemetry data. Support observability design and migration efforts from Prometheus and CloudWatch to Dynatrace. Advise on RBAC models and data access strategies based on team structure and security requirements. Assist in monitoring strategy for Kubernetes-based workloads, especially in hybrid … alert detection and alert correlation using AI Davis to create alerting configurations that reduce noise and are high precision alerts. Dynatrace Observability Monitoring AWS Additional Skills & Qualifications Exposure to Prometheus, AWS CloudWatch monitoring tools. Experience with Terraform, GitLab, or similar DevOps tooling. Background in Site Reliability Engineering (SRE) or platform engineering is a plus. Integration experience with tools like Slack More ❯
Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Infrastructure Specialist Posting Date: 17 Jul 2025 Function: Cyber Security Unit: Business Location: Ipswich (4405), Ipswich, United Kingdom Salary: Competitive + 5k More ❯