Site Reliability Engineer
Day Rate: £500 - £600 Location: Hybrid - 3-4 days on site per week Herefordshire (occasional travel to other UK sites) Contract Position - Hybrid 3-4 days on-site - 4 days is preferable - 3-month rolling contract Availability: On-call rota (24/7 when required) Security Clearance: Security Clearance (SC) required - DV MOD Preferred - Must be eligible for DV Clearance Start Date : ASAP 3 month rolling contract Overview We're looking for a Site Reliability Engineer (SRE) to join our client's growing cross-domain services team, supporting critical systems used by major UK government organisations. As part of this dynamic environment, you'll play a key role in ensuring our platforms remain highly available, performant, and cost-efficient . You'll collaborate closely with software development, support, and operations teams to improve cloud and on-prem infrastructure, optimise CI/CD pipelines, enhance system observability, and proactively manage reliability risks across complex environments. Key Responsibilities
- Partner with Software Engineers to enhance system reliability, scalability, and performance.
- Collaborate with System Administrators to automate repetitive tasks and streamline alerts.
- Advance monitoring and observability practices to identify and resolve issues before they affect users.
- Support development and testing environments to help meet delivery and quality objectives.
- Research, evaluate, and recommend tools and technologies to improve operational efficiency.
- Develop a deep understanding of the technical ecosystem, contributing to both cloud and on-prem solutions.
- Strong background with configuration management tools (e.g. Ansible, Chef, Puppet).
- Hands-on experience with Terraform for infrastructure as code.
- Expertise with containerisation and orchestration (Docker, Kubernetes, OpenShift, or Swarm).
- Skilled in CI/CD pipeline tools (e.g. Jenkins, GitLab CI).
- Proficient with monitoring and observability tools (Grafana, Prometheus, InfluxDB).
- Experience integrating event-driven systems using MQ solutions (RabbitMQ or similar).
- Strong knowledge of SQL and relational databases.
- Advanced Linux administration and shell scripting skills.
- Familiarity with network security protocols.
- Experience deploying and maintaining systems on AWS (EC2, RDS, S3, Lambda).
- Programming experience in Java, Go, or Python.
- Understanding of cross-domain technologies and security models.
- Background in service management environments and ITIL practices.
- Proven application of observability patterns and system health metrics.
- Experience with Microsoft Azure cloud services.
- Company
- itecopeople
- Location
- Herefordshire, UK
Hybrid/Remote Options - Posted
- Company
- itecopeople
- Location
- Herefordshire, UK
Hybrid/Remote Options - Posted