London, England, United Kingdom Hybrid / WFH Options
BlackRock, Inc
self-service automation. Work on incident resolution and engage with various teams (including 3rd parties) for support escalation. You are a good fit if: You have previously worked with Amazon AWS cloud administration, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups. You possess expertise in … including knowledge of HA/clustering. You are comfortable with various logging, monitoring and alerting platforms and have expertise in the usage (and, desirably, the deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows More ❯
London, England, United Kingdom Hybrid / WFH Options
Nivoda
Architecture : Develop and deploy serverless applications using AWS Lambda and related services to enable cost-efficient and highly responsive systems. Monitoring & Incident Response : Set up proactive monitoring using AWS CloudWatch, Prometheus, or Grafana. Troubleshoot and resolve infrastructure or application issues promptly to ensure high availability. Security & Compliance : Enforce AWS security best practices, including IAM policies, VPC configurations, and security … experience with GitOps tools like ArgoCD and application packaging with Helm . Strong scripting abilities (e.g., Python , Bash ) to automate workflows. Familiarity with monitoring and logging tools (e.g., AWS CloudWatch, ELK stack, Prometheus). Solid understanding of IAM , networking (VPCs, subnets, routing), and security best practices. Preferred : Experience with serverless architectures on AWS (e.g., AWS Lambda, API Gateway, DynamoDB More ❯
London, England, United Kingdom Hybrid / WFH Options
Global Screening Services
and own end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems Strong experience with Python and/or … GoLang Java (SpringBoot and Micrometer) useful Demonstrable experience working with AWS services like SQS, EKS, RDS, VPC, EC2, Cloudwatch (X-Ray, Metrics and Logs), Lambda Solid knowledge of Linux systems and bash scripting Strong knowledge of networking and common protocols (TCP, DNS, TLS, HTTP) Experience with DevOps principles and tooling such as Infrastructure as Code (Terraform) and CI/ More ❯
and maintaining tools that support data science and MLOps/LLMOps workflows. Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild … Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture. Collaborate with architects and stakeholders …/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies Experience with cloud services, especially Amazon Web Services (AWS) - SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS. Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation. Experience implementing and scaling MLOps workflows More ❯
London, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
and maintaining tools that support data science and MLOps/LLMOps workflows. Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild … Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture. Collaborate with architects and stakeholders …/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies Experience with cloud services, especially Amazon Web Services (AWS) – SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS. Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation. Experience implementing and scaling MLOps workflows More ❯
Cardiff, Wales, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
and maintaining tools that support data science and MLOps/LLMOps workflows. Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild … Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture. Collaborate with architects and stakeholders …/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies Experience with cloud services, especially Amazon Web Services (AWS) – SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS. Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation. Experience implementing and scaling MLOps workflows More ❯
Glasgow, Scotland, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
and maintaining tools that support data science and MLOps/LLMOps workflows. Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild … Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture. Collaborate with architects and stakeholders …/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies Experience with cloud services, especially Amazon Web Services (AWS) – SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS. Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation. Experience implementing and scaling MLOps workflows More ❯
Salford, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
and maintaining tools that support data science and MLOps/LLMOps workflows. Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild … Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture. Collaborate with architects and stakeholders …/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies Experience with cloud services, especially Amazon Web Services (AWS) – SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS. Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation. Experience implementing and scaling MLOps workflows More ❯
Newcastle upon Tyne, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
and maintaining tools that support data science and MLOps/LLMOps workflows. Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild … Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests. Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture. Collaborate with architects and stakeholders …/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies Experience with cloud services, especially Amazon Web Services (AWS) – SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS. Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation. Experience implementing and scaling MLOps workflows More ❯
using GitLab to ensure continuous integration, delivery, and deployment of applications. Collaborate with the development team to optimise pipeline efficiency and ensure code quality. Implement monitoring solutions using AWS CloudWatch, Prometheus, Grafana, or similar tools to ensure visibility into application performance, health, and security. Troubleshoot production issues and provide resolution. Ensure the security of cloud infrastructure by implementing best … or PowerShell. Experience automating infrastructure tasks using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation. Monitoring & Logging Tools Experience with monitoring and logging tools such as AWS CloudWatch, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana). Benefits Join a rapidly expanding start-up where personal growth is a part of our DNA. Benefit from a flexible work More ❯
infrastructure in AWS using IaC (Terraform preferred) Develop and maintain CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, or similar) Implement monitoring, logging, and alerting using tools like CloudWatch, Prometheus, Grafana, or ELK Manage containerised services using Docker and Kubernetes (EKS) Collaborate with developers, security, and platform teams to improve deployment efficiency and reliability Ensure cloud architecture aligns More ❯
or CloudFormation to automate cloud resource provisioning, enabling consistent and repeatable infrastructure deployments. Monitoring & Observability: Implement monitoring, logging, and alerting solutions using tools like Prometheus, Grafana, Loki, Datadog, or CloudWatch to ensure system health and performance. Security & Compliance: Implement security best practices for cloud infrastructure, including IAM policies, security groups, and VPC configurations, to ensure compliance and data protection. More ❯
services : CloudTrail Secrets Manager WAF X-Ray AWS services widely used in production: Business Applications : SES Cloud Financial Management : Cost Explorer Compute : ECS (Fargate), Lambda Containers : ECR Management & Governance : Cloudwatch, Systems Manager (Parameter Store), Trusted Advisor Networking & Content Delivery : API Gateway, Cloudfront, ELB, Route53, VPC Security, Identity, & Compliance : Certificate Manager, Cognito, IAM Storage : S3 AWS services used in production … on a product-specific basis : Analytics : Kinesis, Opensearch Application Integration : Amazon MQ (RabbitMQ), SNS, SQS Compute : EC2 (legacy, moving towards ECS), EBS Database : DocumentDB, Elasticache (Redis), RDS Front-end Web & Mobile : Amplify Migration & Transfer : AWS Transfer (SFTP), DMS You can find further details about the role, including key responsibilities and accountabilities, alongside the organisational structure and person specification in More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS, RDS, EC2, Lambda, Cloudwatch, etc.). More ❯
London, England, United Kingdom Hybrid / WFH Options
Hopecompass
must. Proficiency in scripting languages such as Python, Bash, or Shell. Experience with containerization and orchestration tools like Docker and Kubernetes. Experience with monitoring and logging tools such as CloudWatch, Prometheus, or Grafana. Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills. #J-18808-Ljbffr More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
Hedge End, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
London, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during More ❯
Birmingham, England, United Kingdom Hybrid / WFH Options
Digital Gurus
and scripting (Shell, Python, PowerShell) Deploy and manage containerised services using Docker Drive best practices in DevOps, security, and agile delivery Tech Stack: AWS (EC2, ECS, RDS, S3, IAM, CloudWatch, VPC, etc.) Terraform, Vagrant for infrastructure provisioning Jenkins, Git, Jira, Confluence, ServiceNow Linux (Amazon Linux 2023), Docker Monitoring: Prometheus, Grafana This is an exciting opportunity to be part More ❯
allows for constructive criticism. Essential Skills and Experience: 5+ years of experience with a broad range of AWS technologies (e.g., EC2, RDS, ELB, EBS, EFS, S3, VPC, Glacier, IAM, CloudWatch, KMS) to develop and maintain an AWS-based cloud solution, with an emphasis on best practice cloud security. Expertise in provisioning infrastructure using Terraform and VMs with tools such … configuration management tools. Strong scripting skills (e.g., Shell, Python, PowerShell, Perl, JAVA) and automation skills. Thorough knowledge of Jenkins and pipeline using Groovy script. Experience with Docker containers and Amazon Linux 2023 AMI. Experience with system monitoring tools (e.g., Grafana, Alert Manager, Prometheus, and Node exporter). Ability to analyse and resolve complex infrastructure resource and application deployment issues. More ❯
Cardiff, Wales, United Kingdom Hybrid / WFH Options
ZipRecruiter
scalable Data Science, MLOps, and LLMOps workflows across the organisation. Drive strategy and execution for deploying, serving, and monitoring large models (LLMs) in real-time and batch environments using Amazon SageMaker, Bedrock, and related services. Guide the use of Infrastructure-as-Code (IaC) practices with AWS CDK and CloudFormation to provision and manage secure and maintainable cloud environments. Design … pipelines using GitHub Actions, AWS CodePipeline, Jenkins, and other tools, with an emphasis on reliability, reusability, and performance. Contribute to the design and integration of monitoring and observability solutions (CloudWatch, Prometheus, Grafana) to ensure infrastructure and model health. Champion software engineering excellence through Test-Driven Development (TDD), rigorous test automation, and continuous quality assurance practices. Support architectural decisions for More ❯