Site Reliability Engineer
We are looking for a Software Engineer with strong SRE / Platform Engineering experience to enhance system reliability, security, and performance.
This role focuses on observability, API security, infrastructure hardening, and incident management in a modern cloud-based environment.
Key Responsibilities
- Improve platform reliability, availability, and performance
- Identify and fix security vulnerabilities
- Strengthen API security (auth, rate limiting, access control)
- Implement monitoring, logging, and observability solutions
- Automate incident detection and response
- Perform root cause analysis (RCA) and prevent future issues
- Support high availability and disaster recovery
- Collaborate with engineering + security teams
Must-Have Skills
- Strong experience in SRE / DevOps / Platform Engineering
- Hands-on with Observability tools (Splunk preferred)
- Knowledge of API security & vulnerability management
- Experience with Java / Spring Boot / Vert.x
- Experience with Docker / Kubernetes
- Scripting: Python / Bash
Good to Have
- Threat & vulnerability management
- Infrastructure hardening
- Cloud platforms (AWS / Azure)
- Incident management & RCA