patterns (blue/green, rolling, canary) and implement automated rollback and recovery. Drive continuous improvement in the infrastructure space, through initiatives to remove single points of failure and improve autoscaling, high availability and managed service adoption across the platform. Collaborate with SRE, Security and Engineering teams to enhance observability, monitoring and alerting through tools like Prometheus, Grafana and CloudWatch. Partner More ❯
Bristol, Avon, South West, United Kingdom Hybrid/Remote Options
Hargreaves Lansdown
improve developer experience (DX), and reduce lead time for changes through automation and platform enhancements. Implement cost visibility and optimisation (FinOps) across the platform: tagging, budgets/alerts, rightsizing, autoscaling and usage reporting. Maintain platform documentation, runbooks and service catalog entries; contribute to onboarding guides and demo sessions for consumers of the platform. Participate in an on-call rota for More ❯
Employment Type: Permanent, Part Time, Work From Home
london, south east england, united kingdom Hybrid/Remote Options
Sprout
Deploying all changes, including complex machine learning models, reliably to customers within 15 minutes Building a real-time, configuration-driven platform that seamlessly adapts to diverse customer needs Ensuring autoscaling and cost-efficient model serving in production, with robust support for ML monitoring and experimentation Centralised reporting/metrics for both the business and our customers Powering user experience of More ❯
for data processing, analytics, and workload management A good understanding of AWS, Snowflake, Databricks, and Azure architectures and features, including but not limited to: Compute services: EC2, Lambda, AutoScaling, VPC Storage and container services: ECS, S3, DynamoDB, RDS Management & governance tools: KMS, IAM, CloudFormation, CloudWatch, CloudTrail, Unity Catalog Analytics services: Glue, Athena, Crawlers, Lake Formation, Redshift, Databricks More ❯
our software is deployed using Docker containers using ECS; much of it using infrastructure-as-code tools like Terraform and Pulumi. Our data is stored in a serverless/autoscaling MySQL database. Newer work is often written in TypeScript, using NestJS and SvelteKit. We'll use a variety of testing tools in both languages: end-to-end, integration and unit More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Eligo Recruitment
Lead CI/CD pipeline automation , security scanning, and compliance integration (ISO 27001) Configure network architecture, manage VPNs (Tailscale), and support cloud security policies Implement monitoring, alerting, and auto-scaling strategies to maintain 99%+ uptime Mentor engineering teams on cloud best practices and develop reusable deployment tooling What You’ll Bring Strong experience with GCP , Terraform , and More ❯
Belfast, County Antrim, Northern Ireland, United Kingdom
Reed
will blend deep troubleshooting skills with solid project delivery and a security-by-design approach. Day-to-day of the role: Design, deploy, and operate AWS EC2 workloads, AutoScaling Groups, ALB/NLB, and manage secure VPC architectures. Implement and manage IAM roles, policies, and encryption using AWS KMS. Configure and manage Microsoft Intune/Entra ID … Science/IT or equivalent experience. 5+ years in enterprise IT with L3 support, networking, and systems administration responsibilities. Proven expertise across AWS (EC2, VPC, IAM, Load Balancing, AutoScaling), Windows Server/AD, Intune/Entra ID, and enterprise networking/security. Hands-on experience with VoIP deployments and QoS, and with SharePoint/OneDrive administration. Strong More ❯