Principle Site Reliability Engineer
Fimador are looking for a Principal DevOps / SRE Engineer to help shape, scale, and operate a high-availability AWS and Linux platform. This is a senior IC role where you’ll combine deep technical expertise with technical leadership, mentoring, and a strong focus on operational excellence.
You’ll work closely with architecture, engineering, and platform teams to design reliable infrastructure, strengthen CI/CD delivery, improve observability, and drive automation across the software delivery lifecycle. You’ll also play a key role in raising engineering standards, influencing best practice, and helping teams build and operate services that are secure, scalable, and production-ready.
What you’ll be doing
- Lead the technical direction for cloud infrastructure, SRE practices, automation, and platform reliability as a senior individual contributor.
- Design, build, and maintain scalable AWS and Linux environments with a focus on availability, resilience, security, and performance.
- Develop and improve Infrastructure as Code using Terraform, creating reusable, modular, and policy-compliant infrastructure patterns.
- Build, optimise, and maintain CI/CD pipelines in Jenkins, enabling faster, safer, and more consistent software releases.
- Champion SRE principles including observability, incident response, reliability engineering, service ownership, error budgets, automation, and continuous improvement.
- Drive operational excellence by reducing toil, improving deployment reliability, strengthening monitoring, and helping teams improve MTTR and change success rates.
- Partner with architecture and engineering teams to influence platform design, tooling decisions, and delivery practices.
- Mentor engineers, provide technical guidance, and raise the overall capability of the team without moving away from hands-on delivery.
Your experience will ideally be in:
- High-quality Terraform code that supports repeatable, consistent, multi-environment provisioning.
- Building and enhancing Jenkins pipelines that include automated testing, security checks, environment promotion, and reliable release controls.
- AWS cloud infrastructure and Linux-based production environments.
- Site Reliability Engineering, DevOps, platform engineering, or cloud infrastructure roles.
- Terraform and Infrastructure as Code best practices.
- Jenkins-based CI/CD pipeline design, build, and optimisation.
- Observability, monitoring, alerting, incident management, and production operations.
- Kubernetes, Helm, networking, ingress, RBAC, secrets management, and scalable deployment patterns.
- Automation using Bash, Python, Java, or similar languages.
- Mentoring engineers and influencing technical direction in an IC leadership capacity.
This is a hybrid opportunity with 2 days on site per week.