Production Engineer
- Hiring Organisation
- Jobleads-UK
- Location
- Greater London, England, United Kingdom
communication efforts during incidents, updating stakeholders and keeping clear records of incident activities. Operational Support & Reliability: Monitor system performance and health using tools like Prometheus and Grafana, identifying any performance issues or potential incidents. Help implement automation and process improvements to enhance efficiency and reduce manual intervention in incident detection … knowledge of cloud infrastructure. Familiarity with incident management practices and frameworks (e.g., ITIL, SRE best practices). Experience with monitoring and alerting tools (e.g., Prometheus, Grafana) or willingness to learn. Basic experience with scripting or automation tools (e.g., Python, Bash, Terraform, Ansible). Strong communication skills, with the ability ...