Good understanding of monitoring and logging solutions, e.g. Prometheus, AWS Cloudwatch, Grafana, OpenTelemetry, Honeycomb, ELK etc. Basic SRE knowledge, and experience in alerting and incident management platforms (eg. Opsgenie, Pagerduty) Proven ability to provide and support strong and scalable CI/CD pipelines Linux, Git, Docker and good scripting skills in e.g. Python, bash, Go. You should be proficient in More ❯
or CircleCI Strong testing capabilities using JUnit , RestAssured , or similar frameworks Proactive with monitoring, observability, and system health Desirable Skills: Exposure to monitoring platforms like Datadog, Grafana, Prometheus , or PagerDuty Familiarity with Python scripting Experience with Kubernetes and deployment tools such as Helm Why Join H&B Tech? Help define the future of digital health & wellness in a purpose-led More ❯
in testing frameworks like JUnit and RestAssured A passion for monitoring, observability , and maintaining resilient systems Desirable Skills: Experience with monitoring and alerting tools like Datadog, Prometheus, Grafana, or PagerDuty Exposure to Python scripting Familiarity with deployment platforms such as Kubernetes and tools like Helm Why Join H&B Tech? Be part of a fast-moving, forward-thinking team at More ❯
least the following requirements: Expert knowledge of Kubernetes, Expert knowledge of continuous deployment systems such as Buildkite and ArgoCD, Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty, Expert knowledge of infrastructure as code technologies such as Pulumi or Terraform. Location We hire engineers in London and in Palo Alto. We usually work from the office 5 days More ❯
similar GitHub Actions, CircleCI) Understands the importance of monitoring and proactive in resolving critical issues. Fluent in testing frameworks Junit , RestAssured Desirable: Exposure with monitoring and alerting platforms. Datadog , PagerDuty, Graphana, Prometheus Exposure in Python Scripting Exposure in deployment platforms like Kubernetes and tools like Helm. Ready to shape the future of health and wellness through tech? Apply now and More ❯
expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. Tech Stack Kubernetes Buildkite/ArgoCD Prometheus/Grafana/PagerDuty Pulumi/Terraform SGLang: This team is leading the development of one of the most popular open-source inference engines, SGLang ( ). You have the opportunity to work on open More ❯
verbal and written communication skills and are willing to present and defend your ideas to technical and non-technical audiences. Additional Desired Skills Experience with incident management platforms like PagerDuty, OpsGenie, or similar tools Understanding of SLO/SLA management and implementations Knowledge of industry standard incident management frameworks and best practices Familiarity with automated remediation and runbook automation Experience More ❯
verbal and written communication skills and are willing to present and defend your ideas to technical and non-technical audiences. Additional Desired Skills Experience with incident management platforms like PagerDuty, OpsGenie, or similar tools Understanding of SLO/SLA management and implementations Knowledge of industry standard incident management frameworks and best practices Familiarity with automated remediation and runbook automation Experience More ❯
of Site Reliability Engineering (SRE) principles, including incident management, monitoring, alerting, and performance tuning. Strong knowledge of Software Development Lifecycle (SDLC) processes. Familiarity with incident management platforms like ServiceNow, PagerDuty, or similar tools. Excellent analytical and problem-solving abilities with a focus on delivering timely and effective solutions. Outstanding communication skills, capable of collaborating across diverse, cross-functional teams. Experience More ❯
successful in this role, you should have: Experience in architecture and engineering of Event Intelligence Solutions/AIOps platforms. Experience engineering monitoring platforms such as IBM Netcool, Moogsoft, BigPanda, PagerDuty, ServiceNow AIOps. Proficiency in Python, and hands-on knowledge of Ansible Automation Platform. Other highly valued skills include: Knowledge of Observability Platforms: Prometheus, Grafana, ELK, Splunk. Experience with integration into More ❯
and CS workflows, and create better pipeline visibility. We've raised $6M and are growing fast, with a strong and committed customer base that includes Zscaler, 1Password, Ramp, Postman, Pagerduty and dozens more. Job Description We are seeking a driven and innovative Growth Engineer to join our team in Rosario, Argentina. In this role, you will be responsible for developing More ❯
resolve complex technical issues using tools like Datadog, Bugsnag, and JIRA, and interpret log data to identify root causes and trends. Familiarity with incident management workflows using tools like PagerDuty and Bugsnag, with the ability to prioritise, document, and escalate issues appropriately based on severity and impact. Knowledge of APIs, SSO, and web technologies to support platform configuration and client More ❯