to translate complexity into clarity Experience with Terraform, Helm, or GitOps tooling Familiarity with front-end technologies such as React and TypeScript Exposure to GraphQL, observability stacks (e.g., Prometheus, OpenTelemetry), or large-scale data platforms Prior work in regulated industries (BFSI, telecom, public sector) To succeed in this role, you'll bring more than just technical knowledge. You'll demonstrate More ❯
lifecycle tools, model monitoring, and versioning Exposure to tools like KServe, Ray Serve, Triton, or vLLM is a big plus Bonus Points Experience with observability frameworks like Prometheus or OpenTelemetry Knowledge of ML libraries: TensorFlow, PyTorch, HuggingFace Exposure to Azure or GCP Passion for financial services Qualifications Degree in Computer Science, Engineering, Data Science, or similar What We Offer A More ❯
and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience is a plus More ❯
and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience is a plus More ❯
infrastructure, CI/CD pipelines, and cloud networking Demonstrated ability to manage application observability, triage and monitoring Technical Competencies (Desirable): Familiarity with Auth0, AWS Cognito, Helm, Prometheus/Grafana, OpenTelemetry or Honeycomb Experience with CI/CD pipelines for containerised and serverless environments Knowledge of additional cloud platforms such as GCP or Azure Benefits Market-leading salary More ❯
/Accounts - AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background More ❯
experience, some of which should have focus on Observability. Excellent knowledge and hands-on experience with monitoring, logging, and tracing tools such as Prometheus, VictoriaMetrics, Grafana, Datadog, New Relic, OpenTelemetry, ELK Stack, or similar. Experience with high volume data storage (Structured and unstructured). A strong technical background, with current capabilities and willingness to get hands on when needed. Excellent More ❯
/Accounts - AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background More ❯
systems by design Nice-to-Haves: Exposure to regulated environments (e.g., BFSI, healthcare, public sector) Experience with performance, security, or chaos testing Familiarity with observability tooling (e.g., Prometheus, Grafana, OpenTelemetry) Knowledge of contract testing, mocking, or service virtualization Mindset & Cultural Fit A builder's mindset, focused on enabling early, frequent, and safe delivery through automated confidence A belief that quality More ❯
debugging issues in client Windows environments Refactoring components to improve system quality - performance optimizations and network improvements Helping us build up our tools for observability and distributed tracing (using OpenTelemetry and Grafana) Keeping the Mimica platform up-to-date with the latest framework developments and devising innovative solutions in the Intelligent Automation space Documenting procedures and guides to facilitate knowledge More ❯
multi-tenant PostgreSQL, sharded MySQL). Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs. Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts. Comfort with Kubernetes , IaC and cloud-native patterns; can debug from network to application layer. Start-up bias for action: you prioritise high-leverage fixes, ship More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
Modernise our infrastructure by leading the migration from Docker Swarm to Kubernetes Design and operate CI/CD pipelines using CloudBees and GitLab Build out observability with Prometheus, Grafana, OpenTelemetry, and Dynatrace Automate cloud deployments (AWS-first) using Terraform and platform tooling Improve security posture across IAM, secrets, and networking Help the team ship faster and safer by mentoring on … distributed systems at scale in production. Cloud AWS (primary), Kubernetes (future), Docker (current), Terraform. Excellent debugging skills across network, systems, and data stack. Observability tooling, e.g. custom metrics pipelines, OpenTelemetry tracing, or integrations across telemetry stacks. Security engineering and practical understanding of IAM hardening, zero-trust network principles, and secrets management in data-heavy systems. Passion for building reliable, secure More ❯
in software delivery, CI/CD, observability, and infrastructure-as-code. Drive improvements in telemetry and observability , helping us move from log-centric metrics to first-class telemetry using OpenTelemetry and modern observability stacks. Optimise for performance , helping the platform scale for low-latency, high-throughput demands in real-time sports data delivery. Mentor and guide engineers , promoting a strong … e.g., RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have: Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open More ❯
in software delivery, CI/CD, observability, and infrastructure-as-code. Drive improvements in telemetry and observability , helping us move from log-centric metrics to first-class telemetry using OpenTelemetry and modern observability stacks. Optimise for performance , helping the platform scale for low-latency, high-throughput demands in real-time sports data delivery. Mentor and guide engineers , promoting a strong … e.g., RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have: Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open More ❯
engineering in Go. You will not only architect our internal systems for scale but also build and operate key product infrastructure, including our customer-facing telemetry pipeline (built on OpenTelemetry and ClickHouse) and the AI pipeline that empowers our products. We are looking for a hands-on technical leader, driven by the challenge of solving ambiguous, 'eBay-scale' problems-whether … on, but is not limited to: Architecting, building, and operating the core cloud-native infrastructure for WunderGraph Cosmo, primarily using Go and Kubernetes. Owning and evolving our observability stack (OpenTelemetry, Prometheus, ClickHouse) and the infrastructure supporting our AI-driven features to ensure deep, actionable insights into our systems. Building and optimizing CI/CD pipelines to improve build times, automate … system architecture, distributed systems, and the challenges of running high-performance API gateways. Familiarity with GraphQL Federation is a significant plus. Experience building or managing modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, ClickHouse). A self-starter attitude and a leader's mindset: you are comfortable with ambiguity, can identify and solve ill-defined problems, and don't need hand More ❯
About Birdie Birdie is the leading home healthcare technology platform that aims to radically transform the lives of older adults. Its all-in-one solution supports around 4.8 million (and growing) care visits every month, equipping care providers with the More ❯
AI workloads, multi-language environments, and cloud infrastructure that's designed to be straightforward to set up and maintain. We build with technologies developers actually want to work with: OpenTelemetry for standardized instrumentation SQL for intuitive querying (no proprietary query language to learn) Rust, Python, and TypeScript for performance and productivity Postgres, DataFusion, and object storage for scalable backends Unlike … the standard for AI application development. We're signatories of the open source pledge and build on open standards because we believe in interoperability, not lock-in. Use our OpenTelemetry-based SDK with any compatible backend - we're confident you'll choose us on merit. We're backed by Sequoia Capital and run a fully remote team across multiple time More ❯
Guildford, Surrey, United Kingdom Hybrid / WFH Options
Electronic Arts
Locations : Guildford, Surrey, United Kingdom Role ID 209795 Worker Type Regular Employee Studio/Department Other Work Model Hybrid Description & Requirements Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part More ❯
the evolution of its query compiler, plugin system, and overall performance, ensuring it scales to meet the demands of the largest enterprises while integrating seamlessly with our observability stack (OpenTelemetry, ClickHouse) and the rest of the Cosmo platform. TEAM INTEGRATION You align with the CTO. You collaborate closely with the entire engineering team, product managers, and directly with customers. A … the router's Go-based plugin system, enabling deep, performant customization for enterprise users. Ensuring the router integrates seamlessly with our observability stack, exporting rich metrics and traces via OpenTelemetry to platforms like ClickHouse. Embedding security best practices directly into the router, implementing features like JWT authentication and ensuring it meets enterprise and SOC 2 compliance standards. Mentoring other engineers More ❯
Analytics and conversion optimization Growth hacking and experimentation frameworks AI automation tools for marketing workflows (content generation, personalization, analytics) Nice to haves: Experience in the observability/monitoring space (OpenTelemetry, APM tools) Background working with AI/ML products or communities Hands-on experience with LLMs and AI agents for marketing automation Track record of building developer communities Experience with … AI workloads, multi-language environments, and cloud infrastructure that's designed to be straightforward to set up and maintain. We build with technologies developers actually want to work with: OpenTelemetry for standardized instrumentation SQL for intuitive querying (no proprietary query language to learn) Rust, Python, and TypeScript for performance and productivity Postgres, DataFusion, and object storage for scalable backends Unlike … the standard for AI application development. We're signatories of the open source pledge and build on open standards because we believe in interoperability, not lock-in. Use our OpenTelemetry-based SDK with any compatible backend - we're confident you'll choose us on merit. We're backed by Sequoia Capital and run a fully remote team across multiple time More ❯
the backend APIs that power it. Your work will bring our entire platform to life, from schema management and composition checks to advanced analytics and distributed tracing powered by OpenTelemetry and ClickHouse. We are looking for a hands-on technical leader who can seamlessly integrate these distinct technology stacks to solve complex challenges for our enterprise customers, ensuring our platform … and maintaining the backend APIs and services that power the Studio, primarily using Go and TypeScript/Node.js . Owning and evolving the user experience for our observability stack (OpenTelemetry, ClickHouse) and features like Role-Based Access Control (RBAC) to ensure deep, actionable insights into our users' systems. Collaborating on our command-line tool (wgc) and platform SDKs to create More ❯