Site Reliability Engineer (Remote)

Key Responsibilities:

  • Design, implement, and maintain scalable, highly available infrastructure and services.
  • Develop automation scripts and tools to improve system reliability and operational efficiency.
  • Monitor and troubleshoot system performance, identifying and resolving issues to minimise downtime.
  • Implement and maintain CI/CD pipelines to support efficient software delivery.
  • Develop and enforce best practices for security, monitoring, and incident management.
  • Collaborate with development teams to enhance application performance and stability.
  • Create detailed documentation and conduct post-incident reviews to identify root causes and implement long-term solutions.

Essential Skills and Experience:

  • Proven experience in Site Reliability Engineering, DevOps, or similar roles.
  • Strong understanding of cloud platforms (AWS, Azure, or GCP) and containerisation technologies (Kubernetes, Docker).
  • Proficiency in scripting languages such as Python, Bash, or Go.
  • Hands-on experience with monitoring and observability tools like Prometheus, Grafana, and the ELK stack.
  • Familiarity with infrastructure-as-code tools like Terraform or Ansible.
  • Solid understanding of networking concepts and system security best practices.
  • Excellent problem-solving skills and a passion for automation and continuous improvement.

Desirable:

  • Certifications in cloud platforms or DevOps tools.
  • Experience with large-scale distributed systems.

This role offers the opportunity to work on mission-critical projects in a fast-paced and collaborative environment, driving innovation and reliability in our technology ecosystem.

Rullion celebrates and supports diversity and is committed to ensuring equal opportunities for both employees and applicants.

Job Details

Company
Rullion Ltd
Location
Nationwide, United Kingdom
Hybrid / Remote Options
Employment Type
Contract
Posted