Site Reliability Engineer

We are looking for a Software Engineer with strong SRE / Platform Engineering experience to enhance system reliability, security, and performance .

This role focuses on observability, API security, infrastructure hardening, and incident management in a modern cloud-based environment.

Key Responsibilities

  • Improve platform reliability, availability, and performance
  • Identify and fix security vulnerabilities
  • Strengthen API security (auth, rate limiting, access control)
  • Implement monitoring, logging, and observability solutions
  • Automate incident detection and response
  • Perform root cause analysis (RCA) and prevent future issues
  • Support high availability and disaster recovery
  • Collaborate with engineering + security teams

Must-Have Skills

  • Strong experience in SRE / DevOps / Platform Engineering
  • Hands-on with Observability tools (Splunk preferred)
  • Knowledge of API security & vulnerability management
  • Experience with Java / Spring Boot / Vert.x
  • Experience with Docker / Kubernetes
  • Scripting: Python / Bash

Good to Have

  • Threat & vulnerability management
  • Infrastructure hardening
  • Cloud platforms (AWS / Azure)
  • Incident management & RCA

Job Details

Company
Thrive IT Systems
Location
Burgess Hill, UK
Posted