ECS, and Nomad. Implement infrastructure as code (IaC) using Terraform for provisioning and managing cloud infrastructure. Define and enforce best practices for container security, networking, and observability. Implement auto-scaling solutions using Karpenter, Cluster Autoscaler, or custom scaling policies. Troubleshoot and resolve complex issues related to container orchestration and workload performance. Collaborate with developers, DevOps, and SRE More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
and operations teams, applying DevOps methodologies to streamline processes and enhance system reliability. Performance Optimization : Expertise in tuning cloud applications for cost efficiency, scalability, and high availability , leveraging Azure Autoscaling, Load Balancers, andTraffic Manager . At least 5 years of hands-on experience in Azure Hyperscale/DevOps. Over 10 years of experience in DevOps roles, driving automation, scalability, and More ❯
City, Cardiff, United Kingdom Hybrid / WFH Options
SRT Marine Systems PLC
stack (database, rabbit, services hosted in docker containers). We simply need help provisioning multiple horizontal instances of this single VM instance, like you would achieve with elastic auto-scaling middleware like Kubernetes (but we will not be using Kubernetes). The idea is to be able to flexibly configure IaC and deploy a fixed number of instances More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
SRT Marine Systems PLC
stack (database, rabbit, services hosted in docker containers). We simply need help provisioning multiple horizontal instances of this single VM instance, like you would achieve with elastic auto-scaling middleware like Kubernetes (but we will not be using Kubernetes). The idea is to be able to flexibly configure IaC and deploy a fixed number of instances More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
using Logstash, ElasticSearch, and Kibana. Perform initial analysis on incidents and escalate issues to the appropriate support level when necessary. Manage Application ID and provide cloud elasticity by auto-scaling resources based on business requirements. Ensure Disaster Recovery (DR) and manual redundancy failover capabilities. Provide regular service management reports to keep clients informed. Follow ITIL processes and Integrated More ❯
IAM, etc.). 6. Performance Optimisation & Scaling Analyse system bottlenecks and recommend performance tuning strategies. Support database optimisations, caching mechanisms, and load balancing strategies. Assist in designing auto-scaling solutions to handle peak loads efficiently. 7. DevOps & CI/CD Implementation Support the implementation of CI/CD pipelines for automated testing and deployment. Provide recommendations on More ❯
tools such as GIT. Strong organizational skills and proficiency in English communication; additional languages are a benefit. Experience designing and managing applications supporting high availability, zero downtime upgrades, auto-scaling, distributed architecture, metrics-driven decisions, serverless operations, and containerization. Datalex's purpose is to transform airline retail. We are a market leader in airline retail technology, delivering digital More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
scanning tools (Trivy, tfsec) integrated into pipelines A proactive approach to problem-solving, documentation, and coaching Additional bonus skills include experience with Azure governance tools, advanced Datadog capabilities, Kubernetes autoscaling solutions, GitOps workflows, automated cost dashboards, compliance frameworks, and internal platform development. What You Can Expect: Competitive salary: £70,000 - £80,000 depending on experience 25 days holiday plus bank More ❯
performance backend services powering Roku mobile apps. Build and maintain robust APIs and microservices integrating seamlessly with mobile clients and cloud infrastructure. Develop and optimize a highly efficient auto-scaling platform utilizing Docker containers and Kubernetes orchestration. Ensure reliability and uptime of backend systems through comprehensive monitoring, testing, and automation. Troubleshoot and resolve complex technical challenges within production More ❯
modular design, automation (e.g., GitHub Actions/Terraform Cloud), and CI/CD best practices. - Proven capability in architecting and deploying Google Kubernetes Engine (GKE) clusters, including service mesh, autoscaling, workload identity, and observability. - Solid understanding of GCP security, identity federation (SAML/OIDC), RBAC, Zero Trust networking, and VPC Service Controls. - Experience leading cloud migration projects, including DNS, traffic More ❯
Strong proficiency in Kubernetes, microservices architecture, Helm, GitLab CI/CD, and ArgoCD, Prometheus, Grafana. Programming experience in at least one language; Golang or Python preferred Deep understanding of autoscaling, version upgrades, and cloud service optimization Bonus if you're familiar with technologies like Kafka, Elasticsearch, PostgreSQL, ScyllaDB, Databricks, Dagster, Sentry, Kong Employee Benefits Our competitive benefits packages are designed More ❯
more. Mentor engineers across the team, fostering a strong engineering culture of ownership, curiosity, and excellence. Drive modernization efforts-introducing patterns like GitOps, Policy-as-Code (Kyverno), Cilium networking, autoscaling, and better resource efficiency. Collaborate deeply with SRE, Platform, and Application teams to align infrastructure capabilities with real-world product demands. Champion best practices in CI/CD, reliability, container More ❯
/CD processes and tools. Have previously worked with MLOps tools like MLFlow and Airflow, or on common problems such as model and API monitoring, data drift and validation, autoscaling, access permissions Have previously worked with monitoring tools such as New Relic or Grafana Understand the use of feature stores and related data technologies for operational machine learning products Are More ❯
wagmi, and Viem libraries for modern wallet connection flows. Experience deploying web applications on AWS using services like Elastic Beanstalk, EC2, and S3, with understanding of environment config, auto-scaling, and CloudWatch logging. Proficient in setting up CI/CD pipelines using GitHub Actions (or similar tools), including artifact packaging and automated deployment. Working knowledge of CloudFront, CDN More ❯
Agile working practices, Jira and Confluence Contribute to cost visibility by using cost and usage data to enable self-service reporting Drive sustainable usage practices (e.g. terminating idle resources, autoscaling) Translate technical decisions into cost impacts, surfacing trade-offs to customers Work collaboratively across teams to embed cost-awareness into design, development, deployment, and monitoring practices More ❯
and platform efficiency : Collaborate with FinOps and platform teams to implement node pool optimization strategies (e.g., N2D/E2 migration), improve workload placement, and reduce cloud spend through smarter autoscaling and resource tuning Oversee the evolution of our products on multi-tenant hosting platform across GCP and Azure Shape architectural decisions related to Kubernetes, Service meshes, API, etc. Drive innovation More ❯
similar technologies. Designed and implemented distributed, event-driven systems using Kafka Streams, AWS Kinesis, or similar. Optimize for low-latency and high-throughput processing (1M+ TPS) microservices. Implemented auto-scaling , blue-green deployments , and canary releases andBuild and maintain SLAs, SLOs, and SLIs for critical services. Strong practical knowledge and experience developing robust caching solutions, utilizing technologies such More ❯
Central London, London, United Kingdom Hybrid / WFH Options
Eligo Recruitment Limited
Lead CI/CD pipeline automation , security scanning, and compliance integration (ISO 27001) Configure network architecture, manage VPNs (Tailscale), and support cloud security policies Implement monitoring, alerting, and auto-scaling strategies to maintain 99%+ uptime Mentor engineering teams on cloud best practices and develop reusable deployment tooling What Youll Bring Strong experience with GCP , Terraform , and Infrastructure More ❯