Senior DevOps Engineer

Senior DevOps Engineer – AI & Cloud Infrastructure

Type: Permanent / Full-Time (Employment or Contract considered)

Location: Remote or Hybrid

Time Zones: UK, Europe, North America–friendly

The Opportunity

We’re working with a high-growth tech-start up company building a next-generation AI cloud platform, focused on fast, reliable inference for large language models and other compute-intensive workloads.

The platform combines modern cloud infrastructure, Kubernetes, GPU clusters, and developer-first tooling to support mission-critical AI systems operating across multiple regions.

They’re now looking for a Senior DevOps Engineer to take ownership of the infrastructure backbone — someone who enjoys operating complex systems at scale and working closely with infrastructure, ML, and product engineering teams.

What You’ll Be Doing AI Cloud Infrastructure

Design, build, and operate highly available, secure infrastructure supporting AI inference, fine-tuning, and data processing workloads
Manage multi-region Kubernetes clusters, including GPU-heavy environments
Implement autoscaling strategies across heterogeneous compute fleets

Infrastructure as Code & Automation

Own and evolve infrastructure-as-code using tools such as Terraform, Helm, and similar
Automate provisioning of compute, networking, and storage
Build tooling to spin environments up and down for experiments, benchmarks, and customer deployments

CI/CD & Release Engineering

Design and maintain CI/CD pipelines across backend, infrastructure, and ML components
Implement safe deployment strategies (e.g. blue/green, canary releases)
Partner with engineers to improve build speed, test reliability, and deployment confidence

Observability, Reliability & SRE

Build and operate observability stacks (metrics, logging, tracing)
Define and monitor SLOs / SLAs for latency, availability, and reliability
Create runbooks, playbooks, and incident response processes for production systems

Security & Best Practices

Implement best practices around secrets management, access control, and network security
Support secure, multi-tenant environments for enterprise customers
Help foster a culture of operational excellence, ownership, and reliability

What They’re Looking For Essential

4–8+ years’ experience in DevOps, SRE, Platform, or Infrastructure Engineering
Strong experience running production systems on major cloud platforms (AWS, GCP, or Azure)
Deep hands-on experience with Kubernetes in production
Strong Infrastructure-as-Code skills (Terraform or equivalent)
Proficiency in at least one scripting or programming language (e.g. Python, Go, Bash)
Solid understanding of networking, security fundamentals, and distributed systems
Proven experience building reliable, observable, automated systems

Nice to Have

Experience supporting GPU-based workloads or ML infrastructure
Exposure to AI / ML platforms, inference systems, or data pipelines
Familiarity with modern CI/CD tooling and GitOps approaches
Experience with observability tooling (metrics, logs, tracing)
Background in cloud platforms, AI infrastructure, or high-scale SaaS environments

Why Join

Work on core infrastructure powering cutting-edge AI systems
High impact and ownership over architecture and tooling decisions
Collaboration with senior engineers across infrastructure, ML, and product
Competitive compensation, equity, and long-term growth potential
Flexible remote / hybrid working

Apply Now

Senior DevOps Engineer

Job Details