Site Reliability Engineer
Cambridge, Cambridgeshire, United Kingdom
Hybrid / WFH Options
Hybrid / WFH Options
AI Tech Suite
initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation, monitoring, and incident response Software Engineer, Site Reliability Engineer. Fireworks AI offers a fast and efficient platform for building and deploying generative AI applications … and managing infrastructure at scale, particularly on the edge. Proficiency in Python, Docker, Linux systems, and scripting (Bash, Python). Strong expertise with infrastructure automation tools (Terraform, Ansible). Experience managing observability and monitoring systems, particularly Prometheus. Deep understanding of networking concepts and protocols. Responsibilities: Design, build, and maintain … scalable and resilient infrastructure on the edge. Develop automation and infrastructure-as-code solutions using Terraform, Ansible, and scripting languages (Python, Bash). Deploy and manage containerized applications using Docker and related technologies. Ensure system observability by building and optimizing monitoring systems, particularly using Prometheus. Troubleshoot and optimize Linux More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted: