that address complex business requirements and drive decision-making. Your skills and experience Proficiency with AWS Tools: Demonstrable experience using AWS Glue, AWS Lambda, Amazon Kinesis, Amazon EMR , Amazon Athena, Amazon DynamoDB, AmazonCloudwatch, Amazon SNS and AWS Step Functions. Programming Skills: Strong More ❯
San Diego, California, United States Hybrid / WFH Options
SAIC
Hat Certified OpenShift Administrator, Azure Administrator, Red Hat OpenShift (EX280), or Certified Kubernetes Administrator (CKA). Familiarity with observability tools like Prometheus, Zabbix, Grafana, AmazonCloudWatch, or Azure Monitor. Target salary range: $200,001 - $240,000. The estimate displayed represents the typical salary range for this position based More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
NICE
or CircleCI. Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture. Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Cloudwatch). Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems. Experience of Incident management and blameless postmortems that includes More ❯
north yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Queen Square Recruitment
of DevSecOps best practices and compliance standards (e.g., ISO 27001, GDPR, NIST). Proficiency in monitoring tools and centralized logging (ELK, Prometheus, Grafana, AWS CloudWatch). Strong scripting skills (e.g., Python, Bash) for automation and tool integration. Demonstrated ability to lead DevOps teams and define scalable best practices. Eligibility More ❯
secure systems using OpenID Connect (OIDC) solutions like Keycloak. + Cloud Management: Leverage extensive AWS knowledge, including EC2, RDS, S3, EKS, Route 53, CloudFormation, CloudWatch, Lambda, and more. + Containerization and Orchestration: Deploy and manage applications with Kubernetes and Docker to achieve seamless scalability and reliability. + CI/ More ❯
the-clock production support with a focus on automation, scalability, security, and application resiliency; troubleshoot performance bottlenecks using monitoring tools such as AppDynamics and CloudWatch, while maintaining comprehensive documentation to streamline cloud operations and drive efficiency; lead and support Major Incident Management (MIM) processes, ensuring swift resolution of application More ❯
bristol, south west england, United Kingdom Hybrid / WFH Options
Sanderson
and compliance across our infrastructure. About you: Extensive hands-on experience in an Enterprise environment supporting a wide range of AWS services (VPC, EC2, CloudWatch, EKS , RDS, S3, EBS) with a strong focus on security and scalability. Proven ability to support on-premises Compute Platforms, including VMware, Enterprise Storage More ❯
Reston, Virginia, United States Hybrid / WFH Options
CGI
deployment of tools in AWS. Strong experience in AWS EC2, ESC, S3, Cloudformation Solid understanding of Cloud Security Tools like IAM, Key Management Services, Cloudwatch, RDS, DDoS etc Worked on NOSQL/DynamoDB, Oracle, RDS Experience with Python - good to have. Experiences in full life cycle application/system More ❯
Infrastructure as Code for configuration management and code implementation - Terraform etc. Experience setting up and using monitoring and alerting tools such as Dynatrace, Grafana, Cloudwatch etc. Experience using Configuration management tools like Puppet, Ansible, Packer, Chef. Experience with various testing tooling - Selenium, Cucumber etc Experience in scripting - bash/ More ❯
edinburgh, central scotland, United Kingdom Hybrid / WFH Options
Provn
/CD pipelines and automation tooling. Background in containerisation and orchestration – e.g., Docker, Kubernetes. Familiarity with monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, CloudWatch). Proven ability to troubleshoot and resolve complex infrastructure issues. Experience working in cross-functional engineering teams, ideally in a DevOps or SRE capacity. More ❯
scale, distributed environment - this could be a great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks, and reducing operational noise More ❯
architects, developers, and security teams to align delivery with business and compliance objectives Implement security-first DevOps using tools like Terraform, Kubernetes, Jenkins, and CloudWatch Lead threat detection, logging, and incident response strategies across environments Define DevSecOps practices and mentor junior engineers in modern cloud security and automation Experience More ❯
of tools such as Docker, Kubernetes, and CI/CD pipelines (Jenkins, Git, Helm, Terraform) Ability to work with monitoring/logging tools (e.g. CloudWatch, Prometheus, Grafana) Previous experience supporting production environments or investigating application-level issues Comfortable in Agile environments using Jira, Confluence, and similar tools Strong communication More ❯
london, south east england, United Kingdom Hybrid / WFH Options
psd group
of tools such as Docker, Kubernetes, and CI/CD pipelines (Jenkins, Git, Helm, Terraform) Ability to work with monitoring/logging tools (e.g. CloudWatch, Prometheus, Grafana) Previous experience supporting production environments or investigating application-level issues Comfortable in Agile environments using Jira, Confluence, and similar tools Strong communication More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
Fruition Group
product delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/ More ❯
City Of London, England, United Kingdom Hybrid / WFH Options
Fruition Group
product delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/ More ❯
london (city of london), south east england, United Kingdom Hybrid / WFH Options
Fruition Group
product delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/ More ❯
transformations Hands-on expertise with Kubernetes (EKS preferred) in production Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Proficiency in Infrastructure as Code using Terraform and knowledge of GitOps workflows Strong background in observability: metrics, visualization, logging, tracing Understanding of automation More ❯
by customer data insights. Key Requirements: Technical Expertise: Hands-on experience with Cloud DevOps, specifically AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. More ❯
incident resolution and engage with various teams (including 3rd parties) for support escalation. You are a good fit if: You have previously worked with Amazon AWS cloud administration, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security … clustering. You are comfortable with various logging, monitoring and alerting platforms and have expertise in the usage (and, desirably, the deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You More ❯