Site Reliability Engineer

Job Title: Site Reliability Engineer

Job Description

This Site Reliability Engineer role focuses on designing, building and maintaining cloud-based, high-volume, high-speed systems that provide critical data services to the insurance industry. You will work primarily in AWS, using Linux, containers and modern automation and CI/CD tooling to improve reliability, performance and security. The position combines hands-on engineering, incident response and continuous improvement of the platform and its supporting infrastructure.

Responsibilities

Design, implement and support scalable, resilient cloud-based solutions in AWS for high-volume, high-speed data systems.
Apply structured problem-solving skills to investigate and resolve technical issues across production and non-production environments.
Own and deliver regular maintenance activities such as system patching, upgrades and general platform housekeeping.
Diagnose and address system performance issues, identifying bottlenecks and implementing improvements.
Develop and maintain automation using scripting languages such as Python and tools like Ansible and Terraform to manage infrastructure and deployments.
Build, support and test infrastructure components as part of a collaborative engineering team.
Contribute to the design and implementation of observability and resilience practices to improve system reliability.
Participate in incident response, troubleshooting and root cause analysis to enhance system stability and prevent recurrence.
Work with CI/CD pipelines (e.g. GitLab CI or GitHub CI) to streamline build, test and deployment processes.
Use containerisation technologies, particularly Docker, to package and run applications consistently across environments.
Follow agile working methodologies, taking ownership of user stories and driving them through to completion.
Continuously identify opportunities for system improvement, automation and simplification, and implement agreed changes.
Collaborate closely with developers and other engineers to ensure infrastructure and applications work seamlessly together.

Essential Skills

Proven experience in a Site Reliability Engineer (SRE) role, working on production systems.
Previous industry experience working in a team that supports, builds and tests infrastructure.
Background in software development, with the ability to understand and work with application code and tooling.
Strong hands-on experience with AWS or other major cloud platforms in a production environment.
Solid knowledge of Linux systems, including deploying, maintaining and upgrading Linux-based servers, and working confidently in terminal-based environments.
experience with Linux distributions such as Red Hat Enterprise Linux or CentOS (or similar).
Proficiency in scripting or development using languages such as Python.
Practical experience with infrastructure automation technologies such as Ansible and Terraform.
Hands-on experience with CI/CD pipelines using tools such as GitLab CI or GitHub CI.
Strong experience with Docker and container-based workflows.
Familiarity with agile methodologies and working practices.
Understanding of observability and resilience concepts within an SRE context.

Additional Skills & Qualifications

experience contributing to the design and implementation of new technologies and platform solutions.
Exposure to resilience engineering practices, including designing for fault tolerance and graceful degradation.
experience implementing or working with observability tooling and practices (e.g. logging, metrics, tracing).
Ability to work closely with development teams to align infrastructure and application delivery.
Strong sense of ownership, with the determination to see tasks and user stories through to completion.
Clear, concise communication skills for collaborating within cross-functional teams and documenting solutions.

Why Work Here?

You will join a technology-focused environment where reliability, automation and modern engineering practices are at the core of how systems are built and run. The organisation offers the opportunity to work on large-scale, cloud-native platforms with contemporary tooling, giving you scope to deepen your SRE expertise and broaden your cloud and automation skills. You can expect a contract of at least one year with strong potential for extension, providing stability while you contribute to meaningful, high-impact projects. The culture encourages continuous improvement, knowledge sharing and collaborative problem-solving, supporting your professional growth in a modern engineering setting.

Work Environment

You will work in a cloud-centric environment built primarily on AWS, supporting high-volume, high-speed data systems. The technology stack includes Linux (such as Red Hat Enterprise Linux or CentOS), Python, Ansible, Terraform, Docker and CI/CD pipelines using GitLab CI or GitHub CI. The team follows agile methodologies, working in iterative cycles with user stories, regular ceremonies and close collaboration between developers and reliability engineers. Day-to-day work is hands-on and terminal-focused within Linux environments, with a strong emphasis on automation, observability, security and resilience. The setting is professional and technology-driven, with modern tooling and processes that support efficient remote collaboration and focused engineering work.

Location

Nottingham, UK

Rate/Salary

400.00 - 450.00 GBP Daily

Trading as TEKsystems. Allegis Group Limited, Maxis 2, Western Road, Bracknell, RG12 1RT, United Kingdom. No. (phone number removed). Allegis Group Limited operates as an Employment Business and Employment Agency as set out in the Conduct of Employment Agencies and Employment Businesses Regulations 2003. TEKsystems is a company within the Allegis Group network of companies (collectively referred to as "Allegis Group"). Aerotek, Aston Carter, EASi, Talentis Solutions, TEKsystems, Stamford Consultants and The Stamford Group are Allegis Group brands. If you apply, your personal data will be processed as described in the Allegis Group Online Privacy Notice available at (url removed)>

To access our Online Privacy Notice, which explains what information we may collect, use, share, and store about you, and describes your rights and choices about this, please go to (url removed)>

We are part of a global network of companies and as a result, the personal data you provide will be shared within Allegis Group and transferred and processed outside the UK, Switzerland and European Economic Area subject to the protections described in the Allegis Group Online Privacy Notice. We store personal data in the UK, EEA, Switzerland and the USA. If you would like to exercise your privacy rights, please visit the "Contacting Us" section of our Online Privacy Notice at (url removed)/en-gb/privacy-notices for details on how to contact us. To protect your privacy and security, we may take steps to verify your identity, such as a password and user ID if there is an account associated with your request, or identifying information such as your address or date of birth, before proceeding with your request. If you are resident in the UK, EEA or Switzerland, we will process any access request you make in accordance with our commitments under the UK Data Protection Act, EU-U.S. Privacy Shield or the Swiss-U.S. Privacy Shield.

Apply Now

Site Reliability Engineer

Job Details