position. Hands-on experience with incident response, including designing and improving incident management processes. Expertise in Observability practices, including metrics, logs, traces, and understanding of distributed tracing tools (e.g., OpenTelemetry). Strong problem-solving skills with a focus on building resilient, fault-tolerant systems. Excellent communication skills and a collaborative mindset. Have to have SEC+ or higher certification or ability More ❯
Hawthorne, California, United States Hybrid / WFH Options
GCR Professional Services
in-the-loop (HIL) testing environments. Improve monitoring, logging, and debugging capabilities for embedded applications. Manage containerization and virtualization of embedded development environments using tools like Kubernetes, Grafana and OpenTelemetry Research and implement best practices for security, performance, and scalability. Automate software releases and version control strategies for embedded firmware. Skills and/or Experience Needed: MS or BS in More ❯
Bash, etc.) Strong problem-solving and analytical abilities. Excellent communication and teamwork skills. Eagerness to learn and adapt in a fast-paced trading environment. Desirable Experience with metrics & monitoring, OpenTelemetry, Splunk, Prometheus, Grafana, etc. Experience and knowledge of working with distributed systems Experience with Kubernetes Knowledge of networking (HTTP/TCP/UDP/IP). Experience in Financial markets. More ❯
Sheffield, South Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
VANLOQ LIMITED
Skills: Proven experience in Python development & FastAPI Strong knowledge of PostgreSQL database administration Excellent problem-solving, debugging, and analytical skills Nice to Have: Exposure to observability tools ( Prometheus, Grafana, OpenTelemetry ) Experience with enterprise tools (Control M, True Sight, Guardium, Tenable Nessus, Delinea) Understanding of security and software development in highly regulated environments End-to-end experience with CI/CD More ❯
/sub messaging frameworks (ex. ActiveMQ, ZeroMQ) Familiarity with integrating software applications as a suite of independent, small and modular services (microservices, OSGi) Experience with system monitoring frameworks (Prometheus, OTEL, InfluxDB/Telegraf, etc) 3 years minimum experience with IP network protocols and development of distributed or networked applications 3rd party and subcontract staffing agencies are not eligible for partnership More ❯
Python) or JMeter, with data parameterization and correlation. Manage distributed load generation (containers, cloud workers) to simulate millions of concurrent users. Integrate performance metrics from CloudWatch, Prometheus, Grafana, and OpenTelemetry to analyze system bottlenecks. Develop SLA/SLO dashboards and integrate performance gates into CI/CD pipelines. Collaborate with DevOps and developers to tune JVM, Akka, thread pools, GC More ❯
The world can't wait. You Have: 7+ years of experience measuring service SLIs using custom metrics, logs, and t race s and tools such as Prometheus, Grafana, or OpenTelemetry 7+ years of experience developing Infrastructure as Code ( IaC ) in Terraform 7+ years of experience scripting or coding in Python, Go, or Bash 7+ years of experience designing SLIs, SLOs More ❯
of CI/CD pipelines using GitLab and ArgoCD. Design and operate containerised workloads with EKS, Fargate, and Kubernetes. Manage Kubernetes deployments using Helm charts. Implement observability solutions using OpenTelemetry (OTel), Grafana, and Splunk. Optimise infrastructure with Karpenter for autoscaling and cost efficiency. Ensure robust AWS networking (VPC, Transit Gateway, PrivateLink, Route 53) and enforce security best practices. Drive incident … response, monitoring, and performance tuning. Key Technologies: AWS (EKS, Fargate, EC2, S3), Terraform, CloudFormation, GitLab, ArgoCD, Docker, Kubernetes, Helm, Cassandra, OTel, Grafana, Splunk, Karpenter, Python, Bash. Desirable: Experience with Google Cloud Platform (GCP), Apigee Hybrid, and hybrid/multi-cloud environments. Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy. More ❯
experience in technical integrations and POCs Comfortable coding in any high-level programming language (Java, Go, Python) Strong hands-on knowledge of Kubernetes, AWS, Azure, GCP, Docker, Prometheus, and OpenTelemetry Industry knowledge and opinions on Monitoring, Observability, Log Management, SIEM Engineering/DevOps Background - advantage Experience in Technical Sales of Log Analytics/Monitoring/APM/SIEM - advantage Cultural More ❯
capability. Preferred Education, Experience, & Skills Splunk Certified Cloud Architect, Splunk Certified Admin, or Splunk Certified Power User. Cloud certifications (AWS/Azure/GCP). Experience with observability frameworks (OpenTelemetry), metrics pipelines, and metric-to-log correlation. Prior experience operating at enterprise scale (multi TB ingestion/day, global deployments). Proficiency with Terraform, Ansible, or similar IaC tools and More ❯
of ITSM/incident management processes and tools (Halo ITSM, ServiceNow, Jira Service Management) Cloud experience ( AWS, Azure, GCP ) and deploying observability tools in cloud-native environments Understanding of OpenTelemetry and modern observability standards Strong problem-solving skills and ability to work in a fast-paced start-up or consulting environment Why Join: Work with our exclusive client , a high More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Morela
of ITSM/incident management processes and tools (Halo ITSM, ServiceNow, Jira Service Management) Cloud experience ( AWS, Azure, GCP ) and deploying observability tools in cloud-native environments Understanding of OpenTelemetry and modern observability standards Strong problem-solving skills and ability to work in a fast-paced start-up or consulting environment Why Join: Work with our exclusive client , a high More ❯
Preferred Qualifications: OpenShift certifications (e.g., Red Hat Certified Specialist in OpenShift Administration). Experience with multi-cluster and hybrid cloud OpenShift deployments. Familiarity with monitoring and logging tools (e.g., oTel, Grafana, Splunk stack). Knowledge of OpenShift Operators and Helm charts. Experience with large-scale migration projects. About WIPRO: Wipro is an exciting organization to work for. We ranked as More ❯
handsworth, yorkshire and the humber, united kingdom
Wipro
Preferred Qualifications: OpenShift certifications (e.g., Red Hat Certified Specialist in OpenShift Administration). Experience with multi-cluster and hybrid cloud OpenShift deployments. Familiarity with monitoring and logging tools (e.g., oTel, Grafana, Splunk stack). Knowledge of OpenShift Operators and Helm charts. Experience with large-scale migration projects. About WIPRO: Wipro is an exciting organization to work for. We ranked as More ❯
LangSmith. Experience in prompt engineering and generative AI integration. Skills & Tools Programming: Java, Python, Spring Boot, REST APIs Cloud: AWS (Glue, Kinesis, EMR, Route 53), containerization, serverless architecture Monitoring: OpenTelemetry, Dynatrace, LoadRunner, Splunk Collaboration: Jira, Confluence Testing: Unit, integration, functional, performance Agile: SAFe and Agile methodologies Qualifications Education: Bachelor's Level Degree (Required) The future is what you make it More ❯
serverless architectures. Deep understanding of CI/CD (GitHub Actions, Jenkins, or AWS CodePipeline). Proven ability to secure and scale production systems. Monitoring and observability tools (CloudWatch, Grafana, OpenTelemetry). Familiar with data exchange formats (JSON, YAML, Parquet) and API design. Leadership & Delivery 4-8 years in software development and/or DevOps, including 2+ in a management or More ❯
technical experience in Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics, logs, traces and APM. Leadership & Global Operations Proven success leading multi-regional or global technical teams with direct management of managers. Demonstrated More ❯
architectures . Deep understanding of CI/CD (GitHub Actions, Jenkins, or AWS CodePipeline). Proven ability to secure and scale production systems. Monitoring and observability tools (CloudWatch, Grafana, OpenTelemetry). Familiar with data exchange formats (JSON, YAML, Parquet) and API design. Leadership & Delivery 48 years in software development and/or DevOps , including 2+ in a management or team More ❯
Go-based, making it the most effective language for this role. Experience with, or strong interest in, observability tools (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, Clickhouse) and standards (OpenTelemetry, OpenTracing, OpenMetrics). Deep understanding of distributed systems and data models Hands-on experience with Kubernetes, and cloud platforms (AWS, GCP, Azure). Benefits Roku is committed to offering a More ❯
that's building something exceptional. Tech Snapshot (don't worry if you don't know it all): Kotlin, TypeScript, Terraform, Azure/AWS/GCP, Temporal, Postgres, graph databases, OpenTelemetry, Grafana, containerised dev environments, CI/CD pipelines. Perks & Culture ?? Competitive salary + EMI share options ?? Breakfast and dinner on tap, plus snacks that raise the bar ?? Regular socials + More ❯
release validation, and production monitoring Strong communication skills; can adapt output to technical and non-technical audiences Bonus Points: Background in QA, test automation, or release engineering Experience with OpenTelemetry, distributed tracing, or event-driven logs Experience in continuous delivery environments with real-time observability needs Prior involvement in incident reviews or quality postmortems Relevant certifications (e.g., Data Analytics, SQL More ❯