related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such as GitHub More ❯
Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/fluentd/fluentbit/filebeat/logstash) Hands-on experience with complex troubleshooting of Kubernetes and Docker container Good knowledge of Regex, Lucene, PromQL More ❯
Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/fluentd/fluentbit/filebeat/logstash) Hands-on experience with complex troubleshooting of Kubernetes and Docker container Good knowledge of Regex, Lucene, PromQL More ❯
Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/fluentd/fluentbit/filebeat/logstash) Hands-on experience with complex troubleshooting of Kubernetes and Docker container Good knowledge of RegEx, Lucene, PromQL More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
programming language (Python, GoLang, C++, or Java). Solid experience with Terraform for IaC. Hands-on skills with observability tools (Prometheus, Grafana, ELK stack, OpenTelemetry) and logging pipelines (Kibana, Elasticsearch). Expertise in Docker and container orchestration using Kubernetes (preferably on GCP) and Helm. Familiarity with CI/CD systems More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
to implement redundancy and disaster recovery scenarios. Track record in scaling high-efficiency production systems. Proficiency with observability tools (e.g., Prometheus, Grafana, Grafana Mimir, OpenTelemetry). Strong written and spoken English (B2 level or higher). Nice to Have: Experience with Argo CD and Argo Rollouts. Familiarity with technologies such More ❯
in cloud-native environments at scale. Exposure to high-load, high-performance systems and large-scale microservices architectures. Experience with observability and monitoring frameworks (OpenTelemetry, Grafana, Prometheus). Knowledge of Graph Databases and AI integration in platform operations is a plus. Experience mentoring junior engineers and leading cross-functional initiatives. More ❯
Code (IaC) : Proficiency with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation. Distributed Tracing : Experience with distributed tracing tools like Jaeger or OpenTelemetry for debugging microservices. Security : Strong knowledge of securing microservices, Kubernetes clusters, and cloud-based applications. Additional Information We believe that coming together as a community More ❯
SNS, SQS, EventBridge). Knowledge of GraphQL, WebSockets, or real-time data streaming. Exposure to DevOps and observability practices (e.g., Prometheus, Datadog, AWS CloudWatch, OpenTelemetry). Prior experience in leading distributed engineering teams. More ❯
SNS, SQS, EventBridge). Knowledge of GraphQL, WebSockets, or real-time data streaming. Exposure to DevOps and observability practices (e.g., Prometheus, Datadog, AWS CloudWatch, OpenTelemetry). Prior experience in leading distributed engineering teams. More ❯
best practices Experience implementing and managing logging solutions (such as ELK stack) Proficiency with monitoring platforms (such as Prometheus) Familiarity with tracing technologies (including OpenTelemetry or Jaeger) Background in performance optimization and resource allocation Industry certifications (cloud platforms preferred) Knowledge of Agile development practices Capability to diagnose and address critical More ❯
best practices Experience implementing and managing logging solutions (such as ELK stack) Proficiency with monitoring platforms (such as Prometheus) Familiarity with tracing technologies (including OpenTelemetry or Jaeger) Background in performance optimization and resource allocation Industry certifications (cloud platforms preferred) Knowledge of Agile development practices Capability to diagnose and address critical More ❯
and Kubernetes. Manage CI/CD pipelines using GitHub Actions and ensure smooth delivery to production. Own monitoring, alerting, and observability, using tools like OpenTelemetry and Dynatrace. Security & Compliance: Ensure systems are compliant with PCI DSS, PSD2, and SCA. Champion secure coding practices and data protection across services. Collaboration & Mentoring More ❯
on experience with containerization (Docker, Kubernetes). Strong security mindset with experience in compliance frameworks (SOC, PCI, GDPR). Familiarity with monitoring tools like OpenTelemetry, Instana, or LogicMonitor. Scripting experience (Ruby, Python, Bash) for automation and infrastructure management. More ❯
databases (ideally Postgres, MongoDB). Experience of event streaming (Apache Kafka) would also be beneficial. Familiarity with observability platforms such as Grafana, Zabbix, Prometheus, OpenTelemetry/SigNoz. Experience of mobile telecoms principles and platforms would be advantageous but is not mandatory (such as EPC, DIAMETER/SS7 signalling, GTP and More ❯
skills and experiences are highly desirable: Experience with event-driven architecture and design patterns Knowledge of the Kubernetes ecosystem, specifically AWS EKS Proficiency with OpenTelemetry for observability Previous experience mentoring and guiding junior team members The Walt Disney Company is an Equal Opportunity Employer. We strive to be a diverse More ❯
SNS, SQS, EventBridge). Knowledge of GraphQL, WebSockets, or real-time data streaming. Exposure to DevOps and observability practices (e.g., Prometheus, Datadog, AWS CloudWatch, OpenTelemetry). Prior experience in leading distributed engineering teams. Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to More ❯
Job ID: 42014 Location: Birmingham : 1 Trinity Park : Bi Position Category: Information Technology Position Type: Employee Regular LRQA is a global assurance provider with operations in over 100+ countries and a mission to delight our customers. We have a diverse More ❯
owning the delivery of significant functionality, ideally having worked with peers of different levels to complete projects collaboratively. Our technology stack: Python (including FastAPI, OpenTelemetry, procrastinate, SQLAlchemy, Uvicorn), Postgres, MySQL, Liquibase, Retool, Docker, AWS Who you are: Seven or more years professional experience in software engineering Proven experience leading the More ❯
roles (e.g. Solutions Architect, Sales Engineering, Pre-Sales). Background in enterprise SaaS, especially in infrastructure monitoring, analytics, or APM. Hands-on expertise with OpenTelemetry, Kubernetes, and modern cloud-native observability stacks. Familiarity with streaming data and real-time metric processing. Experience working in Agile environments and across the full More ❯
roles (e.g. Solutions Architect, Sales Engineering, Pre-Sales). Background in enterprise SaaS, especially in infrastructure monitoring, analytics, or APM. Hands-on expertise with OpenTelemetry, Kubernetes, and modern cloud-native observability stacks. Familiarity with streaming data and real-time metric processing. Experience working in Agile environments and across the full More ❯
to drive our alerting, or coordinating across multiple teams to manage the response to an incident. Our technology stack: AWS (including ECS and RDS), OpenTelemetry, NewRelic, Python, Postgres, Liquibase, Angular, Docker Who you are: Four or more years professional experience in a customer-facing technical support or engineering role Excellent More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Tbwa Chiat/Day Inc
or DBaaS environment. Strong understanding of cloud infrastructure components (e.g., compute, storage, networking) and their cost drivers. Experience with observability tools (e.g., Prometheus, Grafana, OpenTelemetry) and a deep understanding of monitoring and alerting best practices. Exceptional communication skills, capable of articulating complex technical concepts to diverse audiences. Demonstrated ability to More ❯
you'll contribute to influence and shape both the strategy and implementation of our evolving observability capabilities across the Birdie system; you'll leverage OpenTelemetry and SRE practices to support squads in proactively identifying issues before they impact customers; You'll play a vital role in building and maintaining our More ❯