Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
programming language (Python, GoLang, C++, or Java). Solid experience with Terraform for IaC. Hands-on skills with observability tools (Prometheus, Grafana, ELK stack, OpenTelemetry) and logging pipelines (Kibana, Elasticsearch). Expertise in Docker and container orchestration using Kubernetes (preferably on GCP) and Helm. Familiarity with CI/CD systems More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
to implement redundancy and disaster recovery scenarios. Track record in scaling high-efficiency production systems. Proficiency with observability tools (e.g., Prometheus, Grafana, Grafana Mimir, OpenTelemetry). Strong written and spoken English (B2 level or higher). Nice to Have: Experience with Argo CD and Argo Rollouts. Familiarity with technologies such More ❯
related field. Proven experience as a Site Reliability Engineer or similar role. Proficient in Java, Spring Boot, distributed systems, and modern observability practices (e.g., OpenTelemetry, Prometheus), with strong cross-functional collaboration and knowledge-sharing skills. In-depth knowledge of system architecture, distributed systems, and networking. Experience with cloud platforms (e.g. More ❯
infrastructure level Experience with monitoring and logging tools like DataDog or Grafana's observability stack (Prometheus, Tempo, Loki, Grafana) Familiarity with the open standard OpenTelemetry Excellent written and verbal communication skills, we're a collaborative team! PLEASE NOTE: Our engineering teams work fully remotely across Europe but we are focusing More ❯
Indianapolis, Indiana, United States Hybrid / WFH Options
Eli Lilly and Company
prioritize the needs and experiences of end-users, ensuring that applications are reliable, efficient, and user-friendly. What You Should Bring: Extensive observability background, OpenTelemetry, AWS knowledge, Kubernetes, DevOps practices, monitoring tools (Splunk, AppDynamics, Datadog), scripting (Python), L1 & L2 support, incident management, ITIL, and documentation skills. Experience with disaster recovery More ❯
on a hybrid basis as their offices in London. Essential Skills Python, Pytest Common python libraries such as Pandas/Numpy/Jupyter notebooks OpenTelemetry Git/Github Github actions Docker Microservices and/or lambdas Postgresql Streamlit Desirable skills include: The Fast API ecosystem (Pydantic, SQLAlchemy, Alembic) AWS – including More ❯
london, south east england, united kingdom Hybrid / WFH Options
Lorien
on a hybrid basis as their offices in London. Essential Skills Python, Pytest Common python libraries such as Pandas/Numpy/Jupyter notebooks OpenTelemetry Git/Github Github actions Docker Microservices and/or lambdas Postgresql Streamlit Desirable skills include: The Fast API ecosystem (Pydantic, SQLAlchemy, Alembic) AWS – including More ❯
and the technologies that revolve around them Demonstrate significant experience with implementing code with Go, C++ or Python Develop projects with technologies such as opentelemetry, CSI, CNI, CI/CD tooling, Load Balancing, Service Mesh frameworks Work in a way that works for you FlexBase, Akamai's Global Flexible Working More ❯
tolerant systems with strong recovery mechanisms and failover strategies to maintain service continuity. Implement comprehensive logging and tracing using tools such as zap, klog, OpenTelemetry, and Jaeger to enhance monitoring and troubleshooting. Apply Test-Driven Development (TDD) and engage in Pair Programming to ensure high code quality and promote team More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Tbwa Chiat/Day Inc
or DBaaS environment. Strong understanding of cloud infrastructure components (e.g., compute, storage, networking) and their cost drivers. Experience with observability tools (e.g., Prometheus, Grafana, OpenTelemetry) and a deep understanding of monitoring and alerting best practices. Exceptional communication skills, capable of articulating complex technical concepts to diverse audiences. Demonstrated ability to More ❯
you'll contribute to influence and shape both the strategy and implementation of our evolving observability capabilities across the Birdie system; you'll leverage OpenTelemetry and SRE practices to support squads in proactively identifying issues before they impact customers; You'll play a vital role in building and maintaining our More ❯
Arlington, Virginia, United States Hybrid / WFH Options
Two Six Technologies
driven architecture and streaming data Experience identifying and remediating performance bottlenecks in a large microservice software architecture Have worked with telemetry solutions such as OpenTelemetry Two Six Technologies is committed to providing competitive and comprehensive compensation packages that reflect the value we place on our employees and their contributions. We More ❯
approaches for improving system reliability, auditing, and financial reconciliation accuracy. Open Standards: Support our commitment to observability and open standards. Contribute to initiatives around OpenTelemetry, OpenAPI, and other tools that improve transparency and traceability across services. About you At least 5 years of professional experience in software development, with a More ❯
Dundee, Angus, United Kingdom Hybrid / WFH Options
Ivanti
Tools: Help deploy and manage observability platforms such as Azure Application Insights (AppInsights), New Relic, Prometheus, and Grafana. Support Distributed Tracing & Telemetry: Work with OpenTelemetry to collect and export telemetry data for better system insights and debugging. Optimize Logging & Metrics Collection: Assist in implementing structured logging and improving system performance … experience in observability, monitoring, or DevOps-related roles. Basic experience with monitoring tools such as Azure AppInsights, New Relic, Prometheus, and Grafana. Understanding of OpenTelemetry, New Relic, AppInsights APM for telemetry data collection. Familiarity with AWS and Azure cloud environments. Exposure to Kubernetes and container monitoring. Basic scripting knowledge (Python More ❯
Observability Engineer - Grafana IR35 Status: Inside IR35 Rate: £700/day Contract Length: Initial 6 months Office Location: Central London Hybrid Model: 3 days per week in office, 2 days remote About the Role: As an experienced Observability Engineer, you More ❯
london, south east england, united kingdom Hybrid / WFH Options
Computappoint
Observability Engineer - Grafana IR35 Status: Inside IR35 Rate: £700/day Contract Length: Initial 6 months Office Location: Central London Hybrid Model: 3 days per week in office, 2 days remote About the Role: As an experienced Observability Engineer, you More ❯
and operational automation. AI & Observability Product Development Work with engineering teams to develop an AI-driven observability and automation platform, leveraging: Telemetry ingestion (Kafka, OpenTelemetry, Fluentd). Streaming analytics (Flink, Spark, CEP engines). AI-driven anomaly detection & automation (AutoGPT, LangChain, MLflow, TensorFlow). Define technical requirements and architecture priorities … and executing product roadmaps, from idea to launch and scale. Hands-on experience with telemetry data (logs, metrics, traces) and IT infrastructure monitoring (e.g., OpenTelemetry, Prometheus, ELK, Splunk, ITRS Geneos, Datadog, Dynatrace, etc.). Knowledge of AI/ML frameworks (TensorFlow, PyTorch, MLflow) and automation tools (Terraform, Ansible, ServiceNow ITSM More ❯
and operational automation. AI & Observability Product Development Work with engineering teams to develop an AI-driven observability and automation platform, leveraging: Telemetry ingestion (Kafka, OpenTelemetry, Fluentd). Streaming analytics (Flink, Spark, CEP engines). AI-driven anomaly detection & automation (AutoGPT, LangChain, MLflow, TensorFlow). Define technical requirements and architecture priorities … and executing product roadmaps, from idea to launch and scale. Hands-on experience with telemetry data (logs, metrics, traces) and IT infrastructure monitoring (e.g., OpenTelemetry, Prometheus, ELK, Splunk, ITRS Geneos, Datadog, Dynatrace, etc.). Knowledge of AI/ML frameworks (TensorFlow, PyTorch, MLflow) and automation tools (Terraform, Ansible, ServiceNow ITSM More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A Site Reliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A Site Reliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability More ❯
happens, and how we grow. What we are looking for : Our ideal candidate would be an experienced technical communicator who has substantial experience with OpenTelemetry (collector, SDKs, OTLP, and/or semantic conventions), who loves OSS and wants to participate in OSS communities. We're looking for someone who is … excited about teaching and engaging with the world in person and wants to make OpenTelemetry better through technical & communication contributions. If this sounds like you and you stay on top of what's happening in the observability world, we should talk. We strongly encourage under-represented groups to apply. As … how to build with OpenTelemetry. You will participate in events, author content, write code, appear in videos, and many other programs. Previous engagement with OpenTelemetry is an absolute must; and a big plus for us is if you already participate meaningfully in any other open source communities such as Kubernetes More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A Site Reliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
Role: Site Reliability Engineer Location: London (Hybrid) Salary: £80,000 - £105,000 As our Site Reliability Engineer, you'll work closely with our feature team and other colleagues to meet defined service level objectives and continually improve systems and environments. More ❯