Principal Engineer
Principal Software Engineer
Location: Cambridge
Our client is scaling a large, distributed cloud platform and is looking for a Principal Engineer to act as the Subject Matter Expert (SME) across observability and cloud infrastructure.
You’ll be working at serious scale managing thousands of Kubernetes nodes, handling tens of terabytes of logs daily, and supporting millions of real-time metrics across a highly distributed environment.
The Role
This is a senior, hands-on role where you will own the technical direction and standards of the observability ecosystem.
As the SME, you’ll define best practice, guide architectural decisions, and act as the go-to expert across engineering teams, ensuring scalable, cost-efficient, and high-performance systems.
Key Responsibilities
- Act as the SME for observability and cloud infrastructure across the organisation
- Lead architecture across metrics, logs, and tracing systems
- Design and optimise high-throughput data pipelines and storage layers
- Implement strategies such as sampling, aggregation, and down-sampling
- Extend and enhance open-source observability tools at scale
- Partner with engineering teams to standardise tooling and improve adoption
- Drive reliability, scalability, and cost optimisation across the platform
- Define and promote best practices aligned with OpenTelemetry and modern observability standards
- Mentor engineers and elevate engineering quality across teams
Tech Environment
- Kubernetes at scale (thousands of nodes)
- High-volume telemetry (hundreds of thousands of events per second)
- Observability stack: Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse
- Multi-cloud (AWS, GCP)
- Infrastructure as code (Terraform), CI/CD pipelines
What We’re Looking For
- 15+ years building and scaling distributed systems
- Strong hands-on experience with Golang (plus Python or Shell)
- Deep expertise in observability at scale
- Strong Kubernetes and cloud infrastructure experience
- Proven ability to design systems for performance, scale, and cost efficiency
- Experience with service mesh technologies (e.g. Istio/Envoy)
- Ability to operate as a technical authority and trusted advisor across teams
Nice to Have
- Open-source or CNCF contributions
- Experience using AI tools to improve engineering efficiency
Why Join
- Be the go-to expert shaping a large-scale observability platform
- Work on complex, high-impact infrastructure challenges
- Strong ownership and influence at Principal level