SaaS Monitoring Engineer

We are a Global Recruitment specialist that provides support to the clients across EMEA, APAC, US and Canada. We have an excellent job opportunity for you.

Role Title: SaaS Monitoring Engineer

Location: Canada Square, London; Hybrid 60% office / 40% home

Duration: 18/12/2026

Role Description:

Role Overview:

We are seeking a highly motivated and detail-oriented SaaS Monitoring Engineer to join our growing cloud operations team. In this role, you will be responsible for designing, implementing, and maintaining monitoring solutions that ensure the health, performance, and reliability of our Software-as-a-Service (SaaS) platforms. You will play a critical part in proactively identifying issues, minimizing downtime, and enabling data-driven decision-making through real-time observability.

A key responsibility of this position includes building and maintaining a centralized console dashboard that provides a comprehensive, real-time view of SaaS service health, system performance, and key operational metrics.

Key Responsibilities:

Monitoring & Observability:

Design, implement, and manage robust monitoring frameworks for SaaS applications and infrastructure.

Track system health, availability, latency, error rates, and resource utilization across distributed systems.

Continuously improve observability through logs, metrics, and traces using modern monitoring tools such as Datadog, Prometheus, Grafana, Azure Monitor, or similar platforms.

Console Dashboard Development:

Design and create a centralized console dashboard that provides a real-time overview of the health of all SaaS services.

Ensure the dashboard displays actionable insights, including service uptime, API performance, incident alerts, and dependency status.

Optimize dashboard usability by tailoring views for different stakeholders (engineering, operations, leadership).

Integrate data from multiple sources into a unified visualization platform for seamless monitoring.

Incident Management & Troubleshooting:

Set up intelligent alerting mechanisms to detect anomalies and performance degradation.

Investigate and troubleshoot incidents quickly to identify root causes and implement permanent fixes.

Collaborate with DevOps and engineering teams during incident response and postmortem reviews.

Automation & Optimization:

Automate monitoring processes, alert escalation, and response workflows.

Continuously refine alert thresholds to reduce noise and improve signal accuracy.

Implement predictive monitoring techniques to anticipate potential outages.

Collaboration & Communication:

Work closely with software engineers, DevOps, and product teams to ensure monitoring requirements are embedded early in the development lifecycle.

Provide insights and reporting on SaaS performance trends and operational risks.

Document monitoring strategies, configurations, and best practices.

Required Skills & Qualifications:

Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).

Proven experience in monitoring cloud-based or SaaS environments.

Strong understanding of distributed systems, microservices architecture, and cloud platforms (AWS, Azure, or GCP).

Hands-on experience with monitoring and visualization tools (e.g., Grafana, Prometheus, ELK stack, Splunk, Datadog).

Experience building interactive dashboards and console-based monitoring systems.

Proficiency in scripting or programming languages such as Python, Bash, or Go.

Familiarity with containerization and orchestration tools (Docker, Kubernetes).

Strong analytical and problem-solving skills with attention to detail.

Preferred Qualifications:

Experience with Site Reliability Engineering (SRE) principles and practices.

Knowledge of CI/CD pipelines and DevOps methodologies.

Exposure to AIOps or machine learning-based monitoring tools

Certification in cloud platforms or monitoring technologies.

If you are interested in this position and would like to learn more, please send through your CV and we will get in touch with you as soon as possible. Please note, candidates are often Shortlisted within 48 hours.

Job Details

Company
eTeam
Location
London Area, United Kingdom
Hybrid / Remote Options
Posted