SRE - Site Reliability Engineer
Senior Site Reliability Engineer (Observability)
Location: London/UK (Remote)
Contract: 12 Months Initial
Day rate : £55 Per Hour - £62 Per Hour Inside IR35
Job Overview
We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services.
Responsibilities
- Design, deploy and scale observability platforms
- Manage and scale Prometheus monitoring systems
- Deploy and maintain large Elasticsearch clusters
- Build and maintain data pipelines using Kafka
- Develop alerting and monitoring frameworks
- Automate infrastructure using Terraform and Ansible
- Develop tools and scripts using Python, Go, Ruby or Bash
- Work with Linux systems (Debian/Ubuntu)
- Participate in on-call rotation
- Improve system reliability, performance and scalability
Required Skills
- 5+ years experience in Site Reliability Engineering / DevOps
- Strong Linux systems experience
- Observability and Monitoring tools experience
- Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
- Kafka
- Terraform / Infrastructure as Code
- Ansible / Configuration Management
- Programming experience (Python, Go, Ruby or Bash)
- Distributed systems and cloud infrastructure experience
This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. pandey @ randstad. Co. uk
Randstad Technologies is acting as an Employment Business in relation to this vacancy.