/Accounts - AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background More ❯
london, south east england, united kingdom Hybrid / WFH Options
Deutsche Bank
and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience is a plus More ❯
artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause identification using distributed tracing, advanced log analysis, and anomaly detection. Security More ❯
london, south east england, united kingdom Hybrid / WFH Options
Deutsche Bank
and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience is a plus More ❯
and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience is a plus More ❯
artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause identification using distributed tracing, advanced log analysis, and anomaly detection. Security More ❯
and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience is a plus More ❯
and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience is a plus More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
Modernise our infrastructure by leading the migration from Docker Swarm to Kubernetes Design and operate CI/CD pipelines using CloudBees and GitLab Build out observability with Prometheus, Grafana, OpenTelemetry, and Dynatrace Automate cloud deployments (AWS-first) using Terraform and platform tooling Improve security posture across IAM, secrets, and networking Help the team ship faster and safer by mentoring on … distributed systems at scale in production. Cloud AWS (primary), Kubernetes (future), Docker (current), Terraform. Excellent debugging skills across network, systems, and data stack. Observability tooling, e.g. custom metrics pipelines, OpenTelemetry tracing, or integrations across telemetry stacks. Security engineering and practical understanding of IAM hardening, zero-trust network principles, and secrets management in data-heavy systems. Passion for building reliable, secure More ❯
experience. What will help you succeed Preferred Requirements: Experience with query languages such as SQL, SPL, or KQL. Experience with observability and log collectors/pipelines such as FluentBit, OpenTelemetry, Cribl, and Logstash. Experience with web technologies such as HTML, CSS, and JavaScript. Experience with programming/scripting side technologies such as Java, .NET, PHP, Go, Node.js and database. Advanced More ❯
platform, writing new monitoring queries to drive our alerting, or coordinating across multiple teams to manage the response to an incident. Our technology stack: AWS (including ECS and RDS), OpenTelemetry, NewRelic, Python, Postgres, Liquibase, Angular, Docker Who you are: Four or more years professional experience in a customer-facing technical support or engineering role Excellent verbal and written communication skills More ❯
At Anaplan, we are a team of innovators who are focused on optimizing business decision-making through our leading scenario planning and analysis platform so our customers can outpace their competition and the market. What unites Anaplanners across teams and More ❯
At Anaplan, we are a team of innovators who are focused on optimizing business decision-making through our leading scenario planning and analysis platform so our customers can outpace their competition and the market. What unites Anaplanners across teams and More ❯
between Google's Load Balancer and the HTTP server in our main Elixir application causing HTTP 5XX responses to be returned to our customers. - Debugging an issue in our OpenTelemetry pipelines causing us to silently drop spans. - An enthusiasm for both software development and systems engineering. - A high bar for code and configuration quality and readability. - A good understanding of … to managing our Kubernetes configuration, using ArgoCD and Helm. - We manage a high-availability metrics collection system using Grafana, Thanos & Prometheus. We're in the process of transitioning to OpenTelemetry and Honeycomb for our application telemetry (traces and metrics). - We manage a data pipeline using Pub/Sub, Airbyte, and dbt. Our Current Focus We're currently driving a … how we think about and monitor reliability across the engineering organisation, with a focus on early detection of customer-impacting issues. We're extending and standardising our use of OpenTelemetry, and introducing Honeycomb as the single place for engineers to understand how our applications are operating in production. This project involves both technical work, on the application libraries and infrastructure More ❯
Proven experience in building and scaling observability platforms in a cloud-native environment. Observability Expertise: Deep understanding of observability pillars (metrics, logs, traces) and related tools (e.g., Prometheus, Grafana, OpenTelemetry, Jaeger, Kibana Elastic Stack). AI/ML Proficiency: Hands-on experience integrating ML/AI models into observability systems to drive advanced insights, anomaly detection, and predictive analysis. Distributed More ❯
Back-end Engineer (Go) Application Deadline: 5 September 2025 Department: Technology Employment Type: Full Time Location: Belfast Reporting To: Noel Description Imagine catching criminals before they strike-that's exactly what Napier's AI-powered platform does! By analysing transactions More ❯
lifecycle tools, model monitoring, and versioning Exposure to tools like KServe, Ray Serve, Triton, or vLLM is a big plus Bonus Points Experience with observability frameworks like Prometheus or OpenTelemetry Knowledge of ML libraries: TensorFlow, PyTorch, HuggingFace Exposure to Azure or GCP Passion for financial services Qualifications Degree in Computer Science, Engineering, Data Science, or similar What We Offer A More ❯
implement security controls at the infrastructure level Experience with monitoring and logging tools like DataDog or Grafana's observability stack (Prometheus, Tempo, Loki, Grafana) Familiarity with the open standard OpenTelemetry Excellent written and verbal communication skills, we're a collaborative team! PLEASE NOTE: Our engineering teams work fully remotely across Europe but we are focusing our hiring strategy on these More ❯
systems administration combined with strong SQL skills and proficiency in scripting languages such as Python or Java.* Demonstrated experience with monitoring and observability tools including Prometheus, Grafana, Splunk, Geneos, OpenTelemetry or Corvil is highly desirable.* Familiarity with cloud platforms as well as containerisation technologies like Kubernetes or Docker alongside CI/CD pipeline management is important for this role.* Comprehensive More ❯
the backend APIs that power it. Your work will bring our entire platform to life, from schema management and composition checks to advanced analytics and distributed tracing powered by OpenTelemetry and ClickHouse. We are looking for a hands-on technical leader who can seamlessly integrate these distinct technology stacks to solve complex challenges for our enterprise customers, ensuring our platform … and maintaining the backend APIs and services that power the Studio, primarily using Go and TypeScript/Node.js . Owning and evolving the user experience for our observability stack (OpenTelemetry, ClickHouse) and features like Role-Based Access Control (RBAC) to ensure deep, actionable insights into our users' systems. Collaborating on our command-line tool (wgc) and platform SDKs to create More ❯
patterns, and packaging. Familiarity with building performant and reliable Python systems, including low-level C/C++ extensions (e.g., using pybind11, Cython) and instrumentation for production telemetry (e.g., Prometheus, OpenTelemetry). A proactive ownership mindset and the ability to navigate ambiguity. Excellent collaboration and communication skills for working effectively with teams and stakeholders. Ideally Professional experience GPGPU programming (e.g., CUDA More ❯
the evolution of its query compiler, plugin system, and overall performance, ensuring it scales to meet the demands of the largest enterprises while integrating seamlessly with our observability stack (OpenTelemetry, ClickHouse) and the rest of the Cosmo platform. TEAM INTEGRATION You align with the CTO. You collaborate closely with the entire engineering team, product managers, and directly with customers. A … the router's Go-based plugin system, enabling deep, performant customization for enterprise users. Ensuring the router integrates seamlessly with our observability stack, exporting rich metrics and traces via OpenTelemetry to platforms like ClickHouse. Embedding security best practices directly into the router, implementing features like JWT authentication and ensuring it meets enterprise and SOC 2 compliance standards. Mentoring other engineers More ❯
experience, some of which should have focus on Observability. Excellent knowledge and hands-on experience with monitoring, logging, and tracing tools such as Prometheus, VictoriaMetrics, Grafana, Datadog, New Relic, OpenTelemetry, ELK Stack, or similar. Experience with high volume data storage (Structured and unstructured). A strong technical background, with current capabilities and willingness to get hands on when needed. Excellent More ❯