systems, monitoring tools (e.g., Prometheus, Grafana), and caching (e.g., Redis). Strong communication and cross-functional collaboration skills. Desirable Skills Familiarity with SRE principles and incident management tools (e.g., PagerDuty). Understanding of DevOps security practices and FinOps. Experience with MACH architecture, CDN optimization, and service mesh/API gateways. If you're interested please get in touch ASAP More ❯
or CircleCI Strong testing capabilities using JUnit , RestAssured , or similar frameworks Proactive with monitoring, observability, and system health Desirable Skills: Exposure to monitoring platforms like Datadog, Grafana, Prometheus , or PagerDuty Familiarity with Python scripting Experience with Kubernetes and deployment tools such as Helm Why Join H&B Tech? Help define the future of digital health & wellness in a purpose-led More ❯
in testing frameworks like JUnit and RestAssured A passion for monitoring, observability , and maintaining resilient systems Desirable Skills: Experience with monitoring and alerting tools like Datadog, Prometheus, Grafana, or PagerDuty Exposure to Python scripting Familiarity with deployment platforms such as Kubernetes and tools like Helm Why Join H&B Tech? Be part of a fast-moving, forward-thinking team at More ❯
similar GitHub Actions, CircleCI) Understands the importance of monitoring and proactive in resolving critical issues. Fluent in testing frameworks Junit , RestAssured Desirable: Exposure with monitoring and alerting platforms. Datadog , PagerDuty, Graphana, Prometheus Exposure in Python Scripting Exposure in deployment platforms like Kubernetes and tools like Helm Excited to build meaningful software that helps people live healthier lives? We'd love More ❯
successful in this role, you should have: Experience in architecture and engineering of Event Intelligence Solutions/AIOps platforms. Experience engineering monitoring platforms such as IBM Netcool, Moogsoft, BigPanda, PagerDuty, ServiceNow AIOps. Proficiency in Python, and hands-on knowledge of Ansible Automation Platform. Other highly valued skills include: Knowledge of Observability Platforms: Prometheus, Grafana, ELK, Splunk. Experience with integration into More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Salt Search
coverage planning to ensure 24/7 coverage. Act as a back-up during unforeseen coverage gaps Process Oversight: Maintain and improve escalation workflows, including use of tools like PagerDuty and Salesforce. Ensure compliance with SLAs and contractual obligations Client Assurance: Engage in senior-level conversations with clients to reassure them during service disruptions, incidents and bugs. Incident Management: Coordinate … communication skills, with the ability to manage high-pressure situations and make decisions. Strong client relations skills, can-do attitude Familiarity with support platforms (e.g., Salesforce Case Management, ServiceNow,PagerDuty). Experience with business continuity practices and procedures. Ability to analyse data and trends to inform decision-making. Comfortable working out-of-hours and managing support coverage across geographies. Self More ❯