Cloud Engineer - AWS Observability (Telford)

We are looking for a technically proficient Observability Subject Matter Expert (SME) to architect, implement, and manage observability frameworks across a complex hybrid-cloud environment. This role will focus on AWS-native services (Connect, Data, Integration), enterprise platforms (Pega, Contact Center), and the underlying infrastructure, ensuring end-to-end visibility, performance optimization, and proactive incident response.

Key Responsibilities:

· Observability Architecture & Strategy:

· Design and implement observability pipelines using AWS-native and third-party tools.

· Define telemetry standards (metrics, logs, traces) across microservices, APIs, and data pipelines.

· Establish SLIs/SLOs and integrate them into service health dashboards.

· AWS Workload Monitoring:

· Implement observability for AWS Connect (contact flows, agent metrics, call quality).

· Monitor AWS Data Services (Glue, Redshift, Athena, S3, Lake Formation) for performance, throughput, and data lineage.

· Integrate AWS Integration Services (API Gateway, EventBridge, Step Functions, Lambda) with distributed tracing and structured logging.

· Tooling & Automation:

· Deploy and manage observability tools: CloudWatch, X-Ray, OpenTelemetry, Prometheus, Grafana, Datadog, Splunk, ELK.

· Automate alerting, anomaly detection, and incident correlation using AI/ML-based tools.

· Integrate observability into CI/CD pipelines and Infrastructure-as-Code (IaC) workflows.

· Incident Management & RCA:

· Lead real-time diagnostics during major incidents using telemetry data.

· Conduct post-incident reviews with detailed root cause analysis and observability insights.

· Collaboration & Governance:

· Work closely with DevOps, Security, and Application teams to enforce observability standards.

· Ensure compliance with data governance, retention, and security policies for telemetry data.

Required Skills & Experience:

· 7+ years in observability engineering.

· Deep expertise in AWS services, especially AWS Connect, Glue, Lambda, API Gateway, S3, Infrastructure and Network

· Strong hands-on experience with observability stacks such as : Dynatrace OpenTelemetry, Prometheus, Grafana, Datadog, Splunk, ELK, CloudWatch/X-Ray.

· Proficient in scripting (Python, Bash) and IaC (Terraform, CloudFormation).

· Experience with monitoring enterprise platforms like Pega and Contact Center systems.

· Solid understanding of distributed systems, networking, and application performance tuning.

Company
Infoplus Technologies UK Limited
Location
Telford, Shropshire, UK
Posted
Company
Infoplus Technologies UK Limited
Location
Telford, Shropshire, UK
Posted