at least one programming language that compiles to machine code such as Rust, C++, or Go. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty. Expert knowledge of deployment technologies such as Pulumi or Terraform. Expert knowledge of Kubernetes. Responsibilities: Improving our observability by adding/adjusting metrics. More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
at least one programming language that compiles to machine code such as Rust, C++, or Go. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty. Expert knowledge of deployment technologies such as Pulumi or Terraform. Expert knowledge of Kubernetes. Responsibilities: Improving our observability by adding/adjusting metrics. More ❯
or PowerShell for automation. Understanding of AWS networking concepts, including VPCs, subnets, and security groups. Experience with monitoring and logging solutions such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch. Familiarity with Zero Trust security models and best practices for securing cloud workloads. Ability to troubleshoot complex infrastructure issues and More ❯
london, south east england, United Kingdom Hybrid / WFH Options
LHH
or PowerShell for automation. Understanding of AWS networking concepts, including VPCs, subnets, and security groups. Experience with monitoring and logging solutions such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch. Familiarity with Zero Trust security models and best practices for securing cloud workloads. Ability to troubleshoot complex infrastructure issues and More ❯
end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems More ❯
NSGs, ASGs), and governance policies to ensure compliance and risk mitigation. Monitoring & Logging : Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation : Strong scripting skills in PowerShell, Bash, and Python , along with automation frameworks like Ansible . Collaboration & Problem-Solving More ❯
Tracking - e.g. JIRA, Confluence Monitoring, Logging, and Performance Tuning - Skills in monitoring systems' performance and logs to ensure uptime and identify performance bottlenecks - e.g. Grafana, Datadog Networking Concepts - Knowledge in TCP/IP, DNS, VPN, load balancing, and firewalls Security Best Practices - Implementing security in DevOps (e.g., IAM policies, network More ❯
Greater Bristol Area, United Kingdom Hybrid / WFH Options
Searchability NS&D
CI/CD) and automation tools like Terraform and Ansible Programming : Proficiency in Python, Go, or Ruby Monitoring & Observability : Hands-on experience with Prometheus, Grafana, ELK Stack, or similar technologies Core Attributes A passion for solving complex technical challenges in high-availability production environments Strong communication and collaboration skills, able More ❯
and Orchestration: Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes) Monitoring and Logging: Experience with monitoring and logging tools like DataDog, Prometheus, or Grafana Data Engineering Skills: Knowledge of event streaming platforms (e.g., Apache Kafka) and SQL database management Strong Communication and Collaboration: Excellent communication skills and the ability More ❯
languages like Python, Bash, or Shell. Experience with Docker and Kubernetes for container orchestration and management. Familiarity with monitoring tools like CloudWatch, Prometheus, or Grafana, and best practices in security for cloud environments. Strong communication skills and experience working with development, operations, and security teams. Preferred Qualifications AWS Certified DevOps More ❯
with TDD, BDD, and automated testing frameworks (PyTest, Selenium). Familiarity with security best practices in software development. Knowledge of observability tools like Prometheus, Grafana, and ELK stack. More ❯
container orchestration tools such as Docker and Kubernetes Observability champion, experience in designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge of SecDevOps security best practices and experience implementing security controls in a cloud environment including SIEM More ❯
containerization for applications and their subsequent orchestration within Kubernetes environments. Experience working on at least one monitoring/observability stack (Datadog, ELK, Splunk, Loki, Grafana). Strong knowledge of Unix or Linux Strong communication skills to collaborate with various stakeholders Able to work independently in a fast-paced environment Detail More ❯
london, south east england, united kingdom Hybrid / WFH Options
Digital Skills ltd
container orchestration tools such as Docker and Kubernetes Observability champion, experience in designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge of SecDevOps security best practices and experience implementing security controls in a cloud environment including SIEM More ❯
management tools (e.g., Ansible, Puppet, Chef), containerization and orchestration (e.g., Docker, Kubernetes) Experience with Cloud services (e.g., AWS, Azure, GCP), Monitoring tools (e.g., Prometheus, Grafana). Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) About the Team J.P. Morgan is a global leader in financial services, providing strategic advice and More ❯
using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful More ❯
understanding of modern infrastructure and site reliability engineering practice, including Infrastructure-as-code tools (e.g. Terraform, Ansible ) and metrics and observability tools (e.g. Prometheus, Grafana ). Strong understanding of modern DevOps practice, including DevOps stacks (e.g. Jenkins, GitLab, CircleCI ). Cloud experience (e.g. AWS, Google Cloud, Azure, Kubernetes). Familiar More ❯
Experience with DevOps principles and concepts such as Infrastructure as Code (Ansible/Saltstack), CI/CD (Gitlab, Jenkins, Git), monitoring and visualization (Prometheus, Grafana) Experience with big data technologies such as NoSQL/RDBMS, Redis, ElasticSearch, Kafka Experience with containers and container management (Docker, Kubernetes) Experience analysing and building More ❯
any cloud infrastructure principles (like AWS, GCP or OCI), understanding basic principles of secure software delivery is a plus Familiar with Observability tools like Grafana or Prometheus, understanding the importance of giving the correct visibility to our platforms and environments We highly value ownership and initiative with capabilities to drive More ❯
as code (IaC) tools such as Terraform, Ansible, or Chef for automation and configuration management. Strong understanding of monitoring and observability tools like Prometheus, Grafana, Azure App Insights for proactive system monitoring and troubleshooting. Knowledge of networking, security principles, and best practices in a cloud environment. Demonstrated experience of CI More ❯
data throughput in Java and C++. We use Airflow for workflow management, Kafka for data pipelines, Bitbucket for source control, Jenkins for continuous integration, Grafana + Prometheus for metrics collection, ELK for log shipping and monitoring, Docker and Kubernetes for containerisation, OpenStack for our private cloud, Ansible and Terraform for More ❯
agile and fast-paced development environments. Excellent analytical, problem-solving, and communication skills. Desirable: Experience with container orchestration and monitoring tools (e.g., Kubernetes, Prometheus, Grafana). Exposure to cloud platforms (AWS, Azure, GCP) in a QA/testing capacity. Knowledge of static code analysis tools and vulnerability scanners (e.g., SonarQube More ❯
by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS, RDS More ❯