know as soon as possible. What you'll be doing We are looking for a Site Reliability Engineer to join the team and play a key role in scaling OpenTelemetry, driving service health, deep observability, and high availability across our entire technology infrastructure. You will have strong software engineering skills (ideally in TypeScript and Rust) and a deep understanding of … working across infrastructure and application layers, and you will lead by example in everything from SLOs and SLIs to post-incident reviews. What You Will Be Doing: Observability and OpenTelemetry: Own and evolve our observability strategy across services. Lead how we collect, process, sample, and surface trace and metrics data using OpenTelemetry. Focus on high-signal telemetry that enables fast … test/deploy automation Proven ability to lead incident response and post-incident review processes Strong problem-solving mindset and attention to detail Desirable skills Knowledge and understanding of OpenTelemetry tools, specification, APIs etc. Some experience in Rust or similar compiled language e.g. Go Experience instrumenting and running OpenTelemetry in production at scale. Knowledge of distributed tracing and trace sampling More ❯
with tools like Evidently AI or Weights & Biases to detect data drift and model degradation, triggering alerts as needed. Logging and Tracing : Set up logging systems with ELK Stack, OpenTelemetry , or LangSmith to capture AI events, errors, and traces for debugging and auditing. Security Implementation : Apply secure-by-design principles to protect models and data from vulnerabilities (e.g., adversarial attacks … or Go is a plus for UI or system-level integration. Experience with containerization (Docker, Kubernetes) and API development (REST, GraphQL ). Expertise in logging frameworks (e.g., ELK Stack, OpenTelemetry ) and visualization tools (e.g., Plotly , Chart.js). AI-Specific Skills : Understanding of AI model metrics (e.g., F1 score, latency) and drift detection techniques (e.g., PSI, KS test). Knowledge of More ❯
in software delivery, CI/CD, observability, and infrastructure-as-code. Drive improvements in telemetry and observability , helping us move from log-centric metrics to first-class telemetry using OpenTelemetry and modern observability stacks. Optimise for performance , helping the platform scale for low-latency, high-throughput demands in real-time sports data delivery. Mentor and guide engineers , promoting a strong … e.g., RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have: Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open More ❯
rebuilding nearly every component of our observability platform, from data collection to real-time analytics. You will drive core initiatives that move Twilio from fragmented tooling to a unified, OpenTelemetry-first observability stack built for scale. You'll lead technically and strategically-designing platform components, influencing architectural decisions, mentoring engineers, and engaging with teams across Platform Engineering and R&D. … workflows. Design and build developer-friendly tooling and APIs to support incident response, performance analysis, and platform debugging at scale. Leverage (and optionally contribute to) open-source standards like OpenTelemetry to ensure interoperability and extensibility. Champion a pragmatic approach to observability-balancing performance, cost, and user value across diverse engineering teams. Qualifications Twilio values diverse experiences from all kinds of … logging platforms, metrics pipelines, tracing infrastructure, or profiling tools). Lead technical execution for major components of Twilio's observability overhaul, including shift to centralized S3-based data lakes, OpenTelemetry instrumentation, and ClickHouse-backed query engines. Deep proficiency in at least one modern programming language (e.g., Go, Python, Java). Familiarity with high-cardinality data challenges and telemetry correlation techniques. More ❯
Overview This software platform is on a mission to make data less of a headache and more of a superpower. By leaving behind outdated, costly methods, it delivers fast monitoring of logs, metrics, traces, and security events while saving customers More ❯
working with cutting-edge technology. You will design, implement, and optimise IT Operations solutions across observability, AIOps, and ITSM platforms, help clients adopt best practices in Event Management and OpenTelemetry, and act as a trusted technical advisor bridging technology and business strategy. You’ll also help develop frameworks, accelerators, and methodologies that define how the company delivers its services. Skills … related technical roles Hands-on experience with observability platforms : Dynatrace, AppDynamics, Datadog Experience with AIOps/ITSM tools : BigPanda, Splunk ITSM, ServiceNow, or equivalent Expertise in Event Management and OpenTelemetry Strong knowledge of ITSM/ITIL frameworks and Enterprise Architecture principles Proven experience delivering solutions to large enterprise clients Ability to bridge technical delivery with business strategy , advising senior stakeholders More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Morela
working with cutting-edge technology. You will design, implement, and optimise IT Operations solutions across observability, AIOps, and ITSM platforms, help clients adopt best practices in Event Management and OpenTelemetry, and act as a trusted technical advisor bridging technology and business strategy. You’ll also help develop frameworks, accelerators, and methodologies that define how the company delivers its services. Skills … related technical roles Hands-on experience with observability platforms : Dynatrace, AppDynamics, Datadog Experience with AIOps/ITSM tools : BigPanda, Splunk ITSM, ServiceNow, or equivalent Expertise in Event Management and OpenTelemetry Strong knowledge of ITSM/ITIL frameworks and Enterprise Architecture principles Proven experience delivering solutions to large enterprise clients Ability to bridge technical delivery with business strategy , advising senior stakeholders More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Get Staffed Online Recruitment Limited
Senior C# .NET Developer Mayfair, London - Hybrid (1 - 2 days per week in the office) £68,000 - £72,000 per annum The Role Our client is hiring a backend C# .NET Developer to join their engineering team in a hybrid More ❯
London, St James's, United Kingdom Hybrid / WFH Options
Stock in the Channel
Senior C# .NET Developer Mayfair, London - Hybrid (1 - 2 days per week in the office) £68,000 - £72,000 per annum The Role We’re hiring a backend C# .NET Developer to join our engineering team in a hybrid role More ❯