that address complex business requirements and drive decision-making. Your skills and experience Proficiency with AWS Tools: Demonstrable experience using AWS Glue, AWS Lambda, Amazon Kinesis, Amazon EMR , Amazon Athena, Amazon DynamoDB, AmazonCloudwatch, Amazon SNS and AWS Step Functions. Programming Skills: Strong More ❯
RDS management, backup strategies, failover configurations, and performance tuning Optimize caching using Elasticache to boost system performance 📈 Monitoring, Observability & Reliability Build observability frameworks using CloudWatch, Grafana, InfluxDB, and Elasticsearch Implement proactive alerting and escalation processes to maintain uptime and responsiveness Develop logging strategies that offer deep system insights 🤝 Cross More ❯
Greater London, England, United Kingdom Hybrid / WFH Options
Star Recruits (UK)
RDS management, backup strategies, failover configurations, and performance tuning Optimize caching using Elasticache to boost system performance 📈 Monitoring, Observability & Reliability Build observability frameworks using CloudWatch, Grafana, InfluxDB, and Elasticsearch Implement proactive alerting and escalation processes to maintain uptime and responsiveness Develop logging strategies that offer deep system insights 🤝 Cross More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Star Recruits (UK)
RDS management, backup strategies, failover configurations, and performance tuning Optimize caching using Elasticache to boost system performance 📈 Monitoring, Observability & Reliability Build observability frameworks using CloudWatch, Grafana, InfluxDB, and Elasticsearch Implement proactive alerting and escalation processes to maintain uptime and responsiveness Develop logging strategies that offer deep system insights 🤝 Cross More ❯
Infrastructure as Code for configuration management and code implementation - Terraform etc. Experience setting up and using monitoring and alerting tools such as Dynatrace, Grafana, Cloudwatch etc. Experience using Configuration management tools like Puppet, Ansible, Packer, Chef. Experience with various testing tooling - Selenium, Cucumber etc Experience in scripting - bash/ More ❯
west london, south east england, United Kingdom Hybrid / WFH Options
Harrington Starr
scale, distributed environment - this could be a great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks, and reducing operational noise More ❯
south west london, south east england, United Kingdom Hybrid / WFH Options
Harrington Starr
scale, distributed environment - this could be a great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks, and reducing operational noise More ❯
City of London, Greater London, UK Hybrid / WFH Options
Fruition Group
product delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/ More ❯
City Of London, England, United Kingdom Hybrid / WFH Options
Fruition Group
product delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/ More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Fruition Group
product delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/ More ❯
london (city of london), south east england, United Kingdom Hybrid / WFH Options
Fruition Group
product delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/ More ❯
transformations Hands-on expertise with Kubernetes (EKS preferred) in production Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Proficiency in Infrastructure as Code using Terraform and knowledge of GitOps workflows Strong background in observability: metrics, visualization, logging, tracing Understanding of automation More ❯
by customer data insights. Key Requirements: Technical Expertise: Hands-on experience with Cloud DevOps, specifically AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. More ❯
by customer data insights. Key Requirements: Technical Expertise: Hands-on experience with Cloud DevOps, specifically AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. More ❯
london, south east england, United Kingdom Hybrid / WFH Options
ITR Partners
by customer data insights. Key Requirements: Technical Expertise: Hands-on experience with Cloud DevOps, specifically AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. More ❯
incident resolution and engage with various teams (including 3rd parties) for support escalation. You are a good fit if: You have previously worked with Amazon AWS cloud administration, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security … clustering. You are comfortable with various logging, monitoring and alerting platforms and have expertise in the usage (and, desirably, the deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You More ❯
delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems Strong experience … with Python and/or GoLang Java (SpringBoot and Micrometer) useful Demonstrable experience working with AWS services like SQS, EKS, RDS, VPC, EC2, Cloudwatch (X-Ray, Metrics and Logs), Lambda Solid knowledge of Linux systems and bash scripting Strong knowledge of networking and common protocols (TCP, DNS, TLS, HTTP More ❯
event-driven architectures. Ability to design and implement highly available, fault-tolerant systems using AWS services. Expertise in monitoring and troubleshooting AWS environments using CloudWatch, X-Ray, and other AWS tools. More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Searchability NS&D
CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV More ❯
Python, PHP), version control (Git), and database tools (MySQL/MariaDB). Experience with AWS Cloud technologies, including EC2, S3, Lambda, and monitoring tools (CloudWatch, New Relic, Zabbix), as well as infrastructure provisioning (Terraform, CloudFormation) and containerisation. Knowledge of authentication, security, networking concepts, and scaling applications within AWS is More ❯
london, south east england, united kingdom Hybrid / WFH Options
Xcede
CloudFormation Maintain infrastructure definitions under source control for consistency and repeatability Monitoring & Incident Response Establish observability standards using tools such as Azure Monitor and CloudWatch Act as a point of escalation for critical incidents, ensuring rapid diagnosis and resolution Cross-Team Collaboration Partner with engineers, testers, and security leads More ❯