SRE - Site Reliability Engineer

Senior Site Reliability Engineer (Observability)

Location: London/UK (Remote)

Contract: 12 Months Initial

Day rate: £55 Per Hour - £62 Per Hour Inside IR35

Job Overview

We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services.

Responsibilities

Design, deploy and scale observability platforms
Manage and scale Prometheus monitoring systems
Deploy and maintain large Elasticsearch clusters
Build and maintain data pipelines using Kafka
Develop alerting and monitoring frameworks
Automate infrastructure using Terraform and Ansible
Develop tools and scripts using Python, Go, Ruby or Bash
Work with Linux systems (Debian/Ubuntu)
Participate in on-call rotation
Improve system reliability, performance and scalability

Required Skills

5+ years experience in Site Reliability Engineering/DevOps
Strong Linux systems experience
Observability and Monitoring tools experience
Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
Kafka
Terraform/Infrastructure as Code
Ansible/Configuration Management
Programming experience (Python, Go, Ruby or Bash)
Distributed systems and cloud infrastructure experience

This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV

Randstad Technologies is acting as an Employment Business in relation to this vacancy.

Apply Now

SRE - Site Reliability Engineer

Job Details