at least one programming language that compiles to machine code such as Rust, C++, or Go. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty. Expert knowledge of deployment technologies such as Pulumi or Terraform. Expert knowledge of Kubernetes. Responsibilities: Improving our observability by adding/adjusting metrics. More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
at least one programming language that compiles to machine code such as Rust, C++, or Go. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty. Expert knowledge of deployment technologies such as Pulumi or Terraform. Expert knowledge of Kubernetes. Responsibilities: Improving our observability by adding/adjusting metrics. More ❯
CI/CD tooling, IaC, and cloud-native technologies. Advanced scripting (Bash, Python) and automation experience. Skilled in monitoring and observability tools (e.g., Prometheus, Grafana, ELK). Strong problem-solving, communication, and leadership skills. Familiarity and Experience of CI/CD Tools: Jenkins, GitLab CI Infrastructure as Code: Terraform, Ansible More ❯
such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements More ❯
such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements More ❯
such as AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements More ❯
understanding of CI/CD pipelines and tools (e.g., Github CI, GitLab CI, CircleCI, Jenkins). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Strong scripting skills (e.g., Bash, Python) for automation tasks. Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills. More ❯
understanding of CI/CD pipelines and tools (e.g., Github CI, GitLab CI, CircleCI, Jenkins). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Strong scripting skills (e.g., Bash, Python) for automation tasks. Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills. More ❯
AWS Lambda, Google Cloud Functions, Azure Functions) Containerisation technologies (e.g. Docker, Kubernetes, OpenShift) Tools for logging, monitoring, alerting and observability (e.g. ELK, Splunk, Prometheus, Grafana) Working knowledge of operating systems including CLI experience, deploying and configurating application or web servers We are currently operating a discretionary hybrid working model which More ❯
end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems More ❯
NSGs, ASGs), and governance policies to ensure compliance and risk mitigation. Monitoring & Logging : Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation : Strong scripting skills in PowerShell, Bash, and Python , along with automation frameworks like Ansible . Collaboration & Problem-Solving More ❯
Tracking - e.g. JIRA, Confluence Monitoring, Logging, and Performance Tuning - Skills in monitoring systems' performance and logs to ensure uptime and identify performance bottlenecks - e.g. Grafana, Datadog Networking Concepts - Knowledge in TCP/IP, DNS, VPN, load balancing, and firewalls Security Best Practices - Implementing security in DevOps (e.g., IAM policies, network More ❯
Implement comprehensive monitoring, logging, and alerting systems to proactively identify and address performance issues, errors, and security threats. Use tools like Azure Monitor, Prometheus, Grafana, or similar to collect and analyse metrics, logs, and traces. Configure alerts and notifications to ensure timely responses to critical events. Security & Compliance: Implement security More ❯
or other build tools; Ansible or other IT Automation/software provisioning tools; JIRA, Confluence; * Experience in monitoring/reporting tools such as Splunk, Grafana/Prometheus etc * Experience in Agile practices * Working knowledge of environment monitoring tools such as GCO, NewRelic, Prometheus, Grafana. * Collaboration Skills: Proactive can-do attitude More ❯
using Infrastructure as Code for configuration management and code implementation - Terraform etc. Experience setting up and using monitoring and alerting tools such as Dynatrace, Grafana, Cloudwatch etc. Experience using Configuration management tools like Puppet, Ansible, Packer, Chef. Experience with various testing tooling - Selenium, Cucumber etc Experience in scripting - bash/ More ❯
/CD pipelines using the likes of GitLab, Jenkins, CircleCI, CodeBuild etc. Familiarity with scripting (Bash or Python). Monitoring and alerting tools - Prometheus, Grafana or Splunk, ELK. We're looking for someone who wants to progress their career into the DevOps arena. Submit your CV now to be considered. More ❯