RDS Good understanding of monitoring and logging solutions, e.g. Prometheus, AWS Cloudwatch, Grafana, OpenTelemetry, Honeycomb, ELK etc. Basic SRE knowledge, and experience in alerting and incident management platforms (eg. Opsgenie, Pagerduty) Proven ability to provide and support strong and scalable CI/CD pipelines Linux, Git, Docker and good scripting skills in e.g. Python, bash, Go. You should be More ❯
and written communication skills and are willing to present and defend your ideas to technical and non-technical audiences. Additional Desired Skills Experience with incident management platforms like PagerDuty, OpsGenie, or similar tools Understanding of SLO/SLA management and implementations Knowledge of industry standard incident management frameworks and best practices Familiarity with automated remediation and runbook automation Experience More ❯
and written communication skills and are willing to present and defend your ideas to technical and non-technical audiences. Additional Desired Skills Experience with incident management platforms like PagerDuty, OpsGenie, or similar tools Understanding of SLO/SLA management and implementations Knowledge of industry standard incident management frameworks and best practices Familiarity with automated remediation and runbook automation Experience More ❯
protocols, encoding/transcoding workflows. Demonstrated ability to lead technical recovery during high-pressure incidents Familiarity with observability tools (e.g., Grafana, Prometheus, Datadog) and incident management platforms (e.g., PagerDuty, Opsgenie). Excellent communication and stakeholder management skills. Strong analytical and problem-solving abilities. What's in it For You? Hybrid Work Model: We've adopted a flexible hybrid working More ❯
recurring issues that require development effort and the escalation and prioritisation of those items Monitor hardware, applications and environmental conditions of our Order Management systems using tools such as OpsGenie & CheckMK (Nagios) Manage production releases of our Order Management systems Participate in Disaster Recovery planning, updating run books and DR tests Ensure that all procedures and policies are up More ❯