and written communication skills and are willing to present and defend your ideas to technical and non-technical audiences. Additional Desired Skills Experience with incident management platforms like PagerDuty, OpsGenie, or similar tools Understanding of SLO/SLA management and implementations Knowledge of industry standard incident management frameworks and best practices Familiarity with automated remediation and runbook automation Experience More ❯
protocols, encoding/transcoding workflows. Demonstrated ability to lead technical recovery during high-pressure incidents Familiarity with observability tools (e.g., Grafana, Prometheus, Datadog) and incident management platforms (e.g., PagerDuty, Opsgenie). Excellent communication and stakeholder management skills. Strong analytical and problem-solving abilities. What's in it For You? Hybrid Work Model: We've adopted a flexible hybrid working More ❯
dashboards and reports Maintain existing alarms and create new ones to monitor application health, mainly on AWS Integrate with third-party systems to improve monitoring and reporting Manage the OPSgenie rotation schedule(s) and participate in rotations for mission-critical systems Collaborate with IT and security network specialists for cohesive monitoring Work with development engineers to implement application service More ❯