Observability Architect
We are looking for an experienced Enterprise Tooling & Observability Lead to drive the strategy, design, implementation, and modernization of enterprise monitoring, logging, APM, and operational tooling during and after large-scale on-prem to AWS cloud migrations.
The ideal candidate brings deep expertise across observability platforms, infrastructure/application monitoring, cloud-native operations, and integration of enterprise tools into cloud architectures. This role ensures a seamless migration of tooling capabilities, enhanced visibility, and improved reliability in the AWS operating model.
Role Type: Contract Inside IR35
Duration: 4-6 weeks
Location: London, UK (Hybrid)
Technical Expertise
- 14+ years of experience in enterprise monitoring, logging, APM, and observability tooling.
- Strong understanding of AWS architecture, cloud-native monitoring tools, and hybrid observability.
- Experience with:
- APM platforms: Dynatrace, AppDynamics, Datadog
- Logging platforms: Splunk, ELK/Opensearch, CloudWatch Logs
- Metrics & telemetry: Prometheus, Grafana, OpenTelemetry
- Event management: ServiceNow, PagerDuty, Moogsoft, BigPanda
- Strong knowledge of instrumentation for distributed systems, microservices, containers (EKS, ECS), serverless workloads, and legacy systems.
Migration & Architecture Skills
- Proven experience supporting large-scale on-prem to AWS migrations.
- Deep understanding of migration patterns and observability dependencies.
- Hands-on experience designing observability for multi-account AWS landing zones and multi-region architectures.
Soft Skills & Leadership
- Excellent communication, architectural documentation, and executive presentation skills.
- Ability to influence stakeholders across engineering, cloud, SRE, operations, and leadership.
- Experience leading cross-functional teams and managing vendor/tooling relationships.
Preferred Qualifications
- AWS Certified Solutions Architect / Cloud Practitioner / DevOps Engineer
- Certifications in observability platforms (Datadog, Dynatrace, Splunk, etc.)
- Knowledge of ITIL, SRE principles, and enterprise operational frameworks
- Experience with automation using Python, Terraform, CloudFormation (nice-to-have)
Success Indicators
- Smooth transition of observability and tooling through all migration waves.
- Enhanced end-to-end visibility across applications, networks, and infrastructure post-migration.
- Reduction in incidents, MTTR, and monitoring gaps after migration to AWS.