related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such as GitHub More ❯
Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/fluentd/fluentbit/filebeat/logstash) Hands-on experience with complex troubleshooting of Kubernetes and Docker container Good knowledge of Regex, Lucene, PromQL More ❯
Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/fluentd/fluentbit/filebeat/logstash) Hands-on experience with complex troubleshooting of Kubernetes and Docker container Good knowledge of Regex, Lucene, PromQL More ❯
Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/fluentd/fluentbit/filebeat/logstash) Hands-on experience with complex troubleshooting of Kubernetes and Docker container Good knowledge of RegEx, Lucene, PromQL More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
programming language (Python, GoLang, C++, or Java). Solid experience with Terraform for IaC. Hands-on skills with observability tools (Prometheus, Grafana, ELK stack, OpenTelemetry) and logging pipelines (Kibana, Elasticsearch). Expertise in Docker and container orchestration using Kubernetes (preferably on GCP) and Helm. Familiarity with CI/CD systems More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
to implement redundancy and disaster recovery scenarios. Track record in scaling high-efficiency production systems. Proficiency with observability tools (e.g., Prometheus, Grafana, Grafana Mimir, OpenTelemetry). Strong written and spoken English (B2 level or higher). Nice to Have: Experience with Argo CD and Argo Rollouts. Familiarity with technologies such More ❯
Code (IaC) : Proficiency with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation. Distributed Tracing : Experience with distributed tracing tools like Jaeger or OpenTelemetry for debugging microservices. Security : Strong knowledge of securing microservices, Kubernetes clusters, and cloud-based applications. Additional Information We believe that coming together as a community More ❯
and performance. Experience in implementing observability, instrumenting applications to provide insights into system performance. Hands-on experience with tools such as Dynatrace, Prometheus and OpenTelemetry for monitoring, tracing, and real-time alerting is highly sought after. An understanding of microservices and container orchestration with the ability to optimise containerised applications More ❯
SNS, SQS, EventBridge). Knowledge of GraphQL, WebSockets, or real-time data streaming. Exposure to DevOps and observability practices (e.g., Prometheus, Datadog, AWS CloudWatch, OpenTelemetry). Prior experience in leading distributed engineering teams. More ❯
SNS, SQS, EventBridge). Knowledge of GraphQL, WebSockets, or real-time data streaming. Exposure to DevOps and observability practices (e.g., Prometheus, Datadog, AWS CloudWatch, OpenTelemetry). Prior experience in leading distributed engineering teams. More ❯
discovery/registry frameworks. In-depth knowledge of CI/CD pipelines, automated testing, distributed tracing, and observability tools (e.g., Prometheus, Grafana, Jaeger/OpenTelemetry). Proven skills in event-driven architectures, messaging systems (e.g., RabbitMQ, Kafka), and data modeling across diverse database types. Previous experience with healthcare software, EMR More ❯
best practices Experience implementing and managing logging solutions (such as ELK stack) Proficiency with monitoring platforms (such as Prometheus) Familiarity with tracing technologies (including OpenTelemetry or Jaeger) Background in performance optimization and resource allocation Industry certifications (cloud platforms preferred) Knowledge of Agile development practices Capability to diagnose and address critical More ❯
best practices Experience implementing and managing logging solutions (such as ELK stack) Proficiency with monitoring platforms (such as Prometheus) Familiarity with tracing technologies (including OpenTelemetry or Jaeger) Background in performance optimization and resource allocation Industry certifications (cloud platforms preferred) Knowledge of Agile development practices Capability to diagnose and address critical More ❯
and Kubernetes. Manage CI/CD pipelines using GitHub Actions and ensure smooth delivery to production. Own monitoring, alerting, and observability, using tools like OpenTelemetry and Dynatrace. Security & Compliance: Ensure systems are compliant with PCI DSS, PSD2, and SCA. Champion secure coding practices and data protection across services. Collaboration & Mentoring More ❯
infrastructure level Experience with monitoring and logging tools like DataDog or Grafana's observability stack (Prometheus, Tempo, Loki, Grafana) Familiarity with the open standard OpenTelemetry Excellent written and verbal communication skills, we're a collaborative team! PLEASE NOTE: Our engineering teams work fully remotely across Europe but we are focusing More ❯
on experience with containerization (Docker, Kubernetes). Strong security mindset with experience in compliance frameworks (SOC, PCI, GDPR). Familiarity with monitoring tools like OpenTelemetry, Instana, or LogicMonitor. Scripting experience (Ruby, Python, Bash) for automation and infrastructure management. More ❯
React, GoLang); Proficient in (Azure) cloud platforms and tooling ( , Terraform/OpenTofu, ArgoCD, GitLab); Experienced in using and extending observability tooling like Datadog, Grafana, OpenTelemetry and system/application performance monitoring; Ability to debug, optimize code, and automate routine operational tasks; Deep understanding in infrastructure and software development security best More ❯
Indianapolis, Indiana, United States Hybrid / WFH Options
Eli Lilly and Company
prioritize the needs and experiences of end-users, ensuring that applications are reliable, efficient, and user-friendly. What You Should Bring: Extensive observability background, OpenTelemetry, AWS knowledge, Kubernetes, DevOps practices, monitoring tools (Splunk, AppDynamics, Datadog), scripting (Python), L1 & L2 support, incident management, ITIL, and documentation skills. Experience with disaster recovery More ❯
databases (ideally Postgres, MongoDB). Experience of event streaming (Apache Kafka) would also be beneficial. Familiarity with observability platforms such as Grafana, Zabbix, Prometheus, OpenTelemetry/SigNoz. Experience of mobile telecoms principles and platforms would be advantageous but is not mandatory (such as EPC, DIAMETER/SS7 signalling, GTP and More ❯
skills and experiences are highly desirable: Experience with event-driven architecture and design patterns Knowledge of the Kubernetes ecosystem, specifically AWS EKS Proficiency with OpenTelemetry for observability Previous experience mentoring and guiding junior team members The Walt Disney Company is an Equal Opportunity Employer. We strive to be a diverse More ❯
SNS, SQS, EventBridge). Knowledge of GraphQL, WebSockets, or real-time data streaming. Exposure to DevOps and observability practices (e.g., Prometheus, Datadog, AWS CloudWatch, OpenTelemetry). Prior experience in leading distributed engineering teams. Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to More ❯
transform the TechOps team Participate in the operational management of OpenShift Work with technologies such as Ansible, PowerShell, C#, SQL Server, Elastic Grafana, Prometheus, OpenTelemetry, Bare-metal builds, Hyper-V automation What we are looking for: Experience in TechOps, especially with Infrastructure as Code Familiarity with development technologies like C# More ❯
Job ID: 42014 Location: Birmingham : 1 Trinity Park : Bi Position Category: Information Technology Position Type: Employee Regular LRQA is a global assurance provider with operations in over 100+ countries and a mission to delight our customers. We have a diverse More ❯