London, England, United Kingdom Hybrid / WFH Options
BlackRock, Inc
self-service automation. Work on incident resolution and engage with various teams (including 3rd parties) for support escalation. You are a good fit if: You have previously worked with Amazon AWS cloud administration, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups. You possess expertise in … including knowledge of HA/clustering. You are comfortable with various logging, monitoring and alerting platforms and have expertise in the usage (and, desirably, the deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows More ❯
London, England, United Kingdom Hybrid / WFH Options
Nivoda
Architecture : Develop and deploy serverless applications using AWS Lambda and related services to enable cost-efficient and highly responsive systems. Monitoring & Incident Response : Set up proactive monitoring using AWS CloudWatch, Prometheus, or Grafana. Troubleshoot and resolve infrastructure or application issues promptly to ensure high availability. Security & Compliance : Enforce AWS security best practices, including IAM policies, VPC configurations, and security … experience with GitOps tools like ArgoCD and application packaging with Helm . Strong scripting abilities (e.g., Python , Bash ) to automate workflows. Familiarity with monitoring and logging tools (e.g., AWS CloudWatch, ELK stack, Prometheus). Solid understanding of IAM , networking (VPCs, subnets, routing), and security best practices. Preferred : Experience with serverless architectures on AWS (e.g., AWS Lambda, API Gateway, DynamoDB More ❯
London, England, United Kingdom Hybrid / WFH Options
Global Screening Services
and own end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems Strong experience with Python and/or … GoLang Java (SpringBoot and Micrometer) useful Demonstrable experience working with AWS services like SQS, EKS, RDS, VPC, EC2, Cloudwatch (X-Ray, Metrics and Logs), Lambda Solid knowledge of Linux systems and bash scripting Strong knowledge of networking and common protocols (TCP, DNS, TLS, HTTP) Experience with DevOps principles and tooling such as Infrastructure as Code (Terraform) and CI/ More ❯
and maintaining tools that support data science and MLOps/LLMOps workflows. Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild … Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture. Collaborate with architects and stakeholders …/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies Experience with cloud services, especially Amazon Web Services (AWS) - SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS. Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation. Experience implementing and scaling MLOps workflows More ❯
using GitLab to ensure continuous integration, delivery, and deployment of applications. Collaborate with the development team to optimise pipeline efficiency and ensure code quality. Implement monitoring solutions using AWS CloudWatch, Prometheus, Grafana, or similar tools to ensure visibility into application performance, health, and security. Troubleshoot production issues and provide resolution. Ensure the security of cloud infrastructure by implementing best … or PowerShell. Experience automating infrastructure tasks using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation. Monitoring & Logging Tools Experience with monitoring and logging tools such as AWS CloudWatch, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana). Benefits Join a rapidly expanding start-up where personal growth is a part of our DNA. Benefit from a flexible work More ❯
or CloudFormation to automate cloud resource provisioning, enabling consistent and repeatable infrastructure deployments. Monitoring & Observability: Implement monitoring, logging, and alerting solutions using tools like Prometheus, Grafana, Loki, Datadog, or CloudWatch to ensure system health and performance. Security & Compliance: Implement security best practices for cloud infrastructure, including IAM policies, security groups, and VPC configurations, to ensure compliance and data protection. More ❯
services : CloudTrail Secrets Manager WAF X-Ray AWS services widely used in production: Business Applications : SES Cloud Financial Management : Cost Explorer Compute : ECS (Fargate), Lambda Containers : ECR Management & Governance : Cloudwatch, Systems Manager (Parameter Store), Trusted Advisor Networking & Content Delivery : API Gateway, Cloudfront, ELB, Route53, VPC Security, Identity, & Compliance : Certificate Manager, Cognito, IAM Storage : S3 AWS services used in production … on a product-specific basis : Analytics : Kinesis, Opensearch Application Integration : Amazon MQ (RabbitMQ), SNS, SQS Compute : EC2 (legacy, moving towards ECS), EBS Database : DocumentDB, Elasticache (Redis), RDS Front-end Web & Mobile : Amplify Migration & Transfer : AWS Transfer (SFTP), DMS You can find further details about the role, including key responsibilities and accountabilities, alongside the organisational structure and person specification in More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS, RDS, EC2, Lambda, Cloudwatch, etc.). More ❯
London, England, United Kingdom Hybrid / WFH Options
Hopecompass
must. Proficiency in scripting languages such as Python, Bash, or Shell. Experience with containerization and orchestration tools like Docker and Kubernetes. Experience with monitoring and logging tools such as CloudWatch, Prometheus, or Grafana. Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills. #J-18808-Ljbffr More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
Hedge End, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
London, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
Birmingham, England, United Kingdom Hybrid / WFH Options
Digital Gurus
and scripting (Shell, Python, PowerShell) Deploy and manage containerised services using Docker Drive best practices in DevOps, security, and agile delivery Tech Stack: AWS (EC2, ECS, RDS, S3, IAM, CloudWatch, VPC, etc.) Terraform, Vagrant for infrastructure provisioning Jenkins, Git, Jira, Confluence, ServiceNow Linux (Amazon Linux 2023), Docker Monitoring: Prometheus, Grafana This is an exciting opportunity to be part More ❯
allows for constructive criticism. Essential Skills and Experience: 5+ years of experience with a broad range of AWS technologies (e.g., EC2, RDS, ELB, EBS, EFS, S3, VPC, Glacier, IAM, CloudWatch, KMS) to develop and maintain an AWS-based cloud solution, with an emphasis on best practice cloud security. Expertise in provisioning infrastructure using Terraform and VMs with tools such … configuration management tools. Strong scripting skills (e.g., Shell, Python, PowerShell, Perl, JAVA) and automation skills. Thorough knowledge of Jenkins and pipeline using Groovy script. Experience with Docker containers and Amazon Linux 2023 AMI. Experience with system monitoring tools (e.g., Grafana, Alert Manager, Prometheus, and Node exporter). Ability to analyse and resolve complex infrastructure resource and application deployment issues. More ❯
Cardiff, Wales, United Kingdom Hybrid / WFH Options
ZipRecruiter
scalable Data Science, MLOps, and LLMOps workflows across the organisation. Drive strategy and execution for deploying, serving, and monitoring large models (LLMs) in real-time and batch environments using Amazon SageMaker, Bedrock, and related services. Guide the use of Infrastructure-as-Code (IaC) practices with AWS CDK and CloudFormation to provision and manage secure and maintainable cloud environments. Design … pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions for More ❯
London, England, United Kingdom Hybrid / WFH Options
VE3
security of our systems hosted on AWS while streamlining DevOps practices across our teams. Requirements Key Responsibilities Design, implement, and manage AWS cloud infrastructure (EC2, VPC, IAM, RDS, S3, CloudWatch, etc.). Maintain and optimize CI/CD pipelines using tools like GitHub Actions, Jenkins, or AWS Code Pipeline. Perform system administration tasks for Linux/Unix-based environments. … Strong understanding of networking (DNS, TCP/IP, VPN, firewalls). Knowledge of containerization technologies (Docker, ECS, EKS, or Kubernetes). Experience with monitoring/logging tools such as CloudWatch, ELK Stack, Prometheus/Grafana. Excellent problem-solving skills and the ability to work independently. Preferred Qualifications AWS Certified SysOps Administrator/DevOps Engineer – Professional. Experience with hybrid cloud … and security of our systems hosted on AWS while streamlining DevOps practices across our teams. Key Responsibilities • Design, implement, and manage AWS cloud infrastructure (EC2, VPC, IAM, RDS, S3, CloudWatch, etc.). • Maintain and optimize CI/CD pipelines using tools like GitHub Actions, Jenkins, or AWS Code Pipeline. • Perform system administration tasks for Linux/Unix-based environments. More ❯
City of London, England, United Kingdom Hybrid / WFH Options
VE3
security of our systems hosted on AWS while streamlining DevOps practices across our teams. Requirements Key Responsibilities Design, implement, and manage AWS cloud infrastructure (EC2, VPC, IAM, RDS, S3, CloudWatch, etc.). Maintain and optimize CI/CD pipelines using tools like GitHub Actions, Jenkins, or AWS Code Pipeline. Perform system administration tasks for Linux/Unix-based environments. … Strong understanding of networking (DNS, TCP/IP, VPN, firewalls). Knowledge of containerization technologies (Docker, ECS, EKS, or Kubernetes). Experience with monitoring/logging tools such as CloudWatch, ELK Stack, Prometheus/Grafana. Excellent problem-solving skills and the ability to work independently. Preferred Qualifications AWS Certified SysOps Administrator/DevOps Engineer – Professional. Experience with hybrid cloud More ❯
tools like Drone Automate infrastructure provisioning using Terraform or Infrastructure-as-Code tools Build and maintain monitoring and alerting systems using Prometheus, Grafana, or AWS native monitoring tools like CloudWatch Collaborate with development and DevOps teams to design MSK and Kubernetes-based solutions Troubleshoot complex issues related to Kafka and container orchestration. Document infrastructure setups, architectures, and operational procedures … Competencies Strong experience with AWS MSK for Apache Kafka and understanding of Kafka internals (brokers, topics, partitions, producers/consumers) Proficiency with AWS services, like S3, IAM, VPC and CloudWatch Hands-on experience with Kubernetes in production environments Strong knowledge of Docker and managing containerized applications Proficiency in configuring and managing Kubernetes clusters, including monitoring, scaling, and automated deployments … tools like Terraform Knowledge of container build and deployment automation using CI/CD pipelines Experience in observability tools for both MSK and Kubernetes, including Prometheus, Grafana, and AWS CloudWatch for metrics and logs Deep understanding of Kafka and Kubernetes security practices, including network policies and IAM roles Experience with Vault Strong analytical and troubleshooting skills Ability to work More ❯
AWS Networking, VPC, DynamoDB Databases: DB2, PostgreSQL, MariaDB, Oracle, Operating Systems/Platforms: Linux, Mainframe (legacy computing platform), Containerization/Orchestration: Containers (Kubernetes) Monitoring/Alerting: Alerting and Monitoring (CloudWatch, Prometheus, etc.) Qualifications You'll need to have obtained a degree in Computer Science, Engineering, Mathematics, or a related STEM discipline or demonstrate equivalent technical experience or interest. Benefits More ❯
Delivering Large-scale, Long-term IT Projects for the Public Sector. Key Skills & Experience: Good experience with AWS technologies (e.g., EC2, RDS, ELB, EBS, EFS, S3, VPC, Glacier, IAM, CloudWatch, KMS) to develop and maintain an AWS-based cloud solution, with an emphasis on best practice cloud security. Provisioning infrastructure using Terraform and VMs with tools such as Vagrant. … configuration management tools. Strong scripting skills (e.g., Shell, Python, PowerShell, Perl, JAVA) and automation skills. Thorough knowledge of Jenkins and pipeline using Groovy script. Experience with Docker containers and Amazon Linux 2023 AMI. Experience with system monitoring tools (e.g., Grafana, Alert Manager, Prometheus, and Node exporter). Experience with Git, Jira, Confluence, and ServiceNow for incident and change management. More ❯
Delivering Large-scale, Long-term IT Projects for the Public Sector. Key skills & experience: Good experience with AWS technologies (e.g., EC2, RDS, ELB, EBS, EFS, S3, VPC, Glacier, IAM, CloudWatch, KMS) to develop and maintain an AWS-based cloud solution, with an emphasis on best practice cloud security. Provisioning infrastructure using Terraform and VMs with tools such as Vagrant. … configuration management tools. Strong scripting skills (e.g., Shell, Python, PowerShell, Perl, JAVA) and automation skills. Thorough knowledge of Jenkins and pipeline using Groovy script. Experience with Docker containers and Amazon Linux 2023 AMI. Experience with system monitoring tools (e.g. Grafana, Alert Manager, Prometheus, Node exporter ). Experience with Git, Jira, Confluence, and ServiceNow for incident and change management. Desired More ❯
Birmingham, West Midlands (County), United Kingdom
Syntax Consultancy Ltd
Delivering Large-scale, Long-term IT Projects for the Public Sector. Key skills & experience: Good experience with AWS technologies (e.g., EC2, RDS, ELB, EBS, EFS, S3, VPC, Glacier, IAM, CloudWatch, KMS) to develop and maintain an AWS-based cloud solution, with an emphasis on best practice cloud security. Provisioning infrastructure using Terraform and VMs with tools such as Vagrant. … configuration management tools. Strong scripting skills (e.g., Shell, Python, PowerShell, Perl, JAVA) and automation skills. Thorough knowledge of Jenkins and pipeline using Groovy script. Experience with Docker containers and Amazon Linux 2023 AMI. Experience with system monitoring tools (e.g. Grafana, Alert Manager, Prometheus, Node exporter ). Experience with Git, Jira, Confluence, and ServiceNow for incident and change management. Desired More ❯