Principle Site Reliability Engineer

Fimador are looking for a Principal DevOps / SRE Engineer to help shape, scale, and operate a high-availability AWS and Linux platform. This is a senior IC role where you’ll combine deep technical expertise with technical leadership, mentoring, and a strong focus on operational excellence.

You’ll work closely with architecture, engineering, and platform teams to design reliable infrastructure, strengthen CI/CD delivery, improve observability, and drive automation across the software delivery lifecycle. You’ll also play a key role in raising engineering standards, influencing best practice, and helping teams build and operate services that are secure, scalable, and production-ready.

What you’ll be doing

Lead the technical direction for cloud infrastructure, SRE practices, automation, and platform reliability as a senior individual contributor.
Design, build, and maintain scalable AWS and Linux environments with a focus on availability, resilience, security, and performance.
Develop and improve Infrastructure as Code using Terraform, creating reusable, modular, and policy-compliant infrastructure patterns.
Build, optimise, and maintain CI/CD pipelines in Jenkins, enabling faster, safer, and more consistent software releases.
Champion SRE principles including observability, incident response, reliability engineering, service ownership, error budgets, automation, and continuous improvement.
Drive operational excellence by reducing toil, improving deployment reliability, strengthening monitoring, and helping teams improve MTTR and change success rates.
Partner with architecture and engineering teams to influence platform design, tooling decisions, and delivery practices.
Mentor engineers, provide technical guidance, and raise the overall capability of the team without moving away from hands-on delivery.

Your experience will ideally be in:

High-quality Terraform code that supports repeatable, consistent, multi-environment provisioning.
Building and enhancing Jenkins pipelines that include automated testing, security checks, environment promotion, and reliable release controls.
AWS cloud infrastructure and Linux-based production environments.
Site Reliability Engineering, DevOps, platform engineering, or cloud infrastructure roles.
Terraform and Infrastructure as Code best practices.
Jenkins-based CI/CD pipeline design, build, and optimisation.
Observability, monitoring, alerting, incident management, and production operations.
Kubernetes, Helm, networking, ingress, RBAC, secrets management, and scalable deployment patterns.
Automation using Bash, Python, Java, or similar languages.
Mentoring engineers and influencing technical direction in an IC leadership capacity.

This is a hybrid opportunity with 2 days on site per week.

Apply Now

Principle Site Reliability Engineer

Job Details