Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
Required Skills and Experience: Proven experience in cloud infrastructure automation. Deep knowledge of cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes, Rancher), automation tools (Terraform, Ansible), and monitoring solutions (Prometheus, Grafana). Strong scripting and programming skills in Bash, Python, and Go. Experience in DevOps, SRE, or Platform Engineering roles with a focus on hybrid infrastructure. Familiarity with Agile methodologies More ❯
CI, or similar. Manage cloud infrastructure (OCI, AWS, Azure, or GCP) using Infrastructure as Code tools like Terraform or Serverless Functions. Monitor system health and performance using tools like Prometheus, Grafana, or Datadog or NewRelic. Collaborate closely with development teams to automate builds, performance tests, and deployments. Ensure system security, compliance, and best practices are followed in deployment pipelines. Ensure More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. Strong scripting and automation skills, with More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
or other cloud providers (e.g. GCP, Azure). Strong understanding of key security technologies and protocols such as TLS, OAuth and SPIFFE. Observability, alerting, metrics collection and visualisation (e.g. Prometheus, Grafana, Elasticsearch, Dynatrace). "Nice To Have" Skills and Experience: We would be even more impressed if you are passionate about the following: Cluster processes performance tuning and optimisation. Migrating More ❯
setting up and managing monitoring, metrics, and alerting systems Experience operating production-grade services at scale Great to have: Experience with tools such as: Terraform, SaltStack, MongoDB, Elasticsearch, Kafka, Prometheus, Grafana or HashiCorp Vault Experience with securing applications, services, and data, including authentication, authorization, TLS, and encryption Exposure to Kubernetes (administering, deploying, or developing apps on K8s clusters) Understanding of More ❯
setting up and managing monitoring, metrics, and alerting systems Experience operating production-grade services at scale Great to have: Experience with tools such as: Terraform, SaltStack, MongoDB, Elasticsearch, Kafka, Prometheus, Grafana or HashiCorp Vault Experience with securing applications, services, and data, including authentication, authorization, TLS, and encryption Exposure to Kubernetes (administering, deploying, or developing apps on K8s clusters) Understanding of More ❯
in their Hertfordshire office. In this role, you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/Grafana or equivalent), and building dashboards that clearly communicate metrics to business stakeholders. You'll drive system automation and integration, crafting scripts and workflows-primarily in Python-to onboard More ❯
Watford, Hertfordshire, South East, United Kingdom
La Fosse
in their Hertfordshire office. In the role you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/Grafana or equivalent), and building dashboards that clearly communicate metrics to business stakeholders You'll drive system automation and integration, crafting scripts and work flows-primarily in Python—to More ❯