SRE - Site Reliability Engineer
Job Description
Considering making an application for this job Check all the details in this job description, and then click on Apply.
Senior Site Reliability Engineer (Observability)
Location: London/UK (Remote)
Contract: 12 Months Initial
Day rate : £55 Per Hour - £62 Per Hour Inside IR35
Job Overview
We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services.
Responsibilities
- Design, deploy and scale observability platforms
- Manage and scale Prometheus monitoring systems
- Deploy and maintain large Elasticsearch clusters
- Build and maintain data pipelines using Kafka
- Develop alerting and monitoring frameworks
- Automate infrastructure using Terraform and Ansible
- Develop tools and scripts using Python, Go, Ruby or Bash
- Work with Linux systems (Debian/Ubuntu)
- Participate in on-call rotation
- Improve system reliability, performance and scalability
Required Skills
- 5+ years experience in Site Reliability Engineering / DevOps
- Strong Linux systems experience
- Observability and Monitoring tools experience
- Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
- Kafka
- Terraform / Infrastructure as Code
- Ansible / Configuration Management
- Programming experience (Python, Go, Ruby or Bash)
- Distributed systems and cloud infrastructure experience
This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. pandey @ randstad. xehkeey Co. uk
Randstad Technologies is acting as an Employment Business in relation to this vacancy.