in: Languages: Java 17+ (Java 21 preferred) Frameworks: Micronaut (preferred), Spring Boot Testing: JUnit, Mockito Build Tools: Gradle Data & Messaging: Kafka, MongoDB APIs: GraphQL Federation, REST Infrastructure & Observability: Terraform, OpenTelemetry, Dynatrace Soft Skills & Leadership Exceptional communication skills - able to distill and present engineering decisions to executives and business teams. Experienced in managing relationships with third-party vendors and platform providers. More ❯
platform, writing new monitoring queries to drive our alerting, or coordinating across multiple teams to manage the response to an incident. Our technology stack: AWS (including ECS and RDS), OpenTelemetry, NewRelic, Python, Postgres, Liquibase, Angular, Docker Who you are: Four or more years professional experience in a customer-facing technical support or engineering role Excellent verbal and written communication skills More ❯
multi-tenant PostgreSQL, sharded MySQL). Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs. Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts. Comfort with Kubernetes , IaC and cloud-native patterns; can debug from network to application layer. Start-up bias for action: you prioritise high-leverage fixes, ship More ❯
you will work on the best-in-class open-banking decision making platform, and learn how a operate with low-latency, at scale. Our technology stack: Python (including FastAPI, OpenTelemetry, procrastinate, SQLAlchemy, Uvicorn), Postgres, MySQL, Liquibase, Retool, Docker, AWS Who you are: Three or more years professional experience in software engineering Proficiency in writing well-structured Python code with type More ❯
in distributed, real-time systems Experience with containerisation and orchestration technologies, such as Kubernetes, in production environments Familiarity with observability tooling and practices, such as Victoria Metrics, Prometheus, Grafana, OpenTelemetry and SLOs Well-developed debugging skills with the ability to navigate unfamiliar systems, identify root causes and deliver effective solutions under time pressure Proven track record of contributing to fault More ❯
Implement secure architecture and platform hardening aligned with defence-grade standards, supporting identity, access control, encryption, and system resilience. Monitoring & Continuous Improvement Setup and maintain monitoring solutions (e.g. ELK, OpenTelemetry, Prometheus), troubleshoot performance, and deliver root cause analysis and remediation. What We're Looking For DV Clearance : Active Developed Vetting clearance is essential . Systems Engineering Experience : 2nd/3rd … Implement secure architecture and platform hardening aligned with defence-grade standards, supporting identity, access control, encryption, and system resilience. Monitoring & Continuous Improvement Setup and maintain monitoring solutions (e.g. ELK, OpenTelemetry, Prometheus), troubleshoot performance, and deliver root cause analysis and remediation. What We're Looking For DV Clearance : Active Developed Vetting clearance is essential . Systems Engineering Experience : 2nd/3rd More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
Modernise our infrastructure by leading the migration from Docker Swarm to Kubernetes Design and operate CI/CD pipelines using CloudBees and GitLab Build out observability with Prometheus, Grafana, OpenTelemetry, and Dynatrace Automate cloud deployments (AWS-first) using Terraform and platform tooling Improve security posture across IAM, secrets, and networking Help the team ship faster and safer by mentoring on … distributed systems at scale in production. Cloud AWS (primary), Kubernetes (future), Docker (current), Terraform. Excellent debugging skills across network, systems, and data stack. Observability tooling, e.g. custom metrics pipelines, OpenTelemetry tracing, or integrations across telemetry stacks. Security engineering and practical understanding of IAM hardening, zero-trust network principles, and secrets management in data-heavy systems. Passion for building reliable, secure More ❯
between Google's Load Balancer and the HTTP server in our main Elixir application causing HTTP 5XX responses to be returned to our customers. - Debugging an issue in our OpenTelemetry pipelines causing us to silently drop spans. - An enthusiasm for both software development and systems engineering. - A high bar for code and configuration quality and readability. - A good understanding of … to managing our Kubernetes configuration, using ArgoCD and Helm. - We manage a high-availability metrics collection system using Grafana, Thanos & Prometheus. We're in the process of transitioning to OpenTelemetry and Honeycomb for our application telemetry (traces and metrics). - We manage a data pipeline using Pub/Sub, Airbyte, and dbt. Our Current Focus We're currently driving a … how we think about and monitor reliability across the engineering organisation, with a focus on early detection of customer-impacting issues. We're extending and standardising our use of OpenTelemetry, and introducing Honeycomb as the single place for engineers to understand how our applications are operating in production. This project involves both technical work, on the application libraries and infrastructure More ❯
in software delivery, CI/CD, observability, and infrastructure-as-code. Drive improvements in telemetry and observability , helping us move from log-centric metrics to first-class telemetry using OpenTelemetry and modern observability stacks. Optimise for performance , helping the platform scale for low-latency, high-throughput demands in real-time sports data delivery. Mentor and guide engineers , promoting a strong … e.g., RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have: Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open More ❯
in software delivery, CI/CD, observability, and infrastructure-as-code. Drive improvements in telemetry and observability , helping us move from log-centric metrics to first-class telemetry using OpenTelemetry and modern observability stacks. Optimise for performance , helping the platform scale for low-latency, high-throughput demands in real-time sports data delivery. Mentor and guide engineers , promoting a strong … e.g., RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have: Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open More ❯
engineering in Go. You will not only architect our internal systems for scale but also build and operate key product infrastructure, including our customer-facing telemetry pipeline (built on OpenTelemetry and ClickHouse) and the AI pipeline that empowers our products. We are looking for a hands-on technical leader, driven by the challenge of solving ambiguous, 'eBay-scale' problems-whether … on, but is not limited to: Architecting, building, and operating the core cloud-native infrastructure for WunderGraph Cosmo, primarily using Go and Kubernetes. Owning and evolving our observability stack (OpenTelemetry, Prometheus, ClickHouse) and the infrastructure supporting our AI-driven features to ensure deep, actionable insights into our systems. Building and optimizing CI/CD pipelines to improve build times, automate … system architecture, distributed systems, and the challenges of running high-performance API gateways. Familiarity with GraphQL Federation is a significant plus. Experience building or managing modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, ClickHouse). A self-starter attitude and a leader's mindset: you are comfortable with ambiguity, can identify and solve ill-defined problems, and don't need hand More ❯
Back-end Engineer (Go) Application Deadline: 5 September 2025 Department: Technology Employment Type: Full Time Location: Belfast Reporting To: Noel Description Imagine catching criminals before they strike-that's exactly what Napier's AI-powered platform does! By analysing transactions More ❯
About Birdie Birdie is the leading home healthcare technology platform that aims to radically transform the lives of older adults. Its all-in-one solution supports around 4.8 million (and growing) care visits every month, equipping care providers with the More ❯
At Anaplan, we are a team of innovators who are focused on optimizing business decision-making through our leading scenario planning and analysis platform so our customers can outpace their competition and the market. What unites Anaplanners across teams and More ❯
At Anaplan, we are a team of innovators who are focused on optimizing business decision-making through our leading scenario planning and analysis platform so our customers can outpace their competition and the market. What unites Anaplanners across teams and More ❯
AI workloads, multi-language environments, and cloud infrastructure that's designed to be straightforward to set up and maintain. We build with technologies developers actually want to work with: OpenTelemetry for standardized instrumentation SQL for intuitive querying (no proprietary query language to learn) Rust, Python, and TypeScript for performance and productivity Postgres, DataFusion, and object storage for scalable backends Unlike … the standard for AI application development. We're signatories of the open source pledge and build on open standards because we believe in interoperability, not lock-in. Use our OpenTelemetry-based SDK with any compatible backend - we're confident you'll choose us on merit. We're backed by Sequoia Capital and run a fully remote team across multiple time More ❯
Guildford, Surrey, United Kingdom Hybrid / WFH Options
Electronic Arts
Locations : Guildford, Surrey, United Kingdom Role ID 209795 Worker Type Regular Employee Studio/Department Other Work Model Hybrid Description & Requirements Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part More ❯
the evolution of its query compiler, plugin system, and overall performance, ensuring it scales to meet the demands of the largest enterprises while integrating seamlessly with our observability stack (OpenTelemetry, ClickHouse) and the rest of the Cosmo platform. TEAM INTEGRATION You align with the CTO. You collaborate closely with the entire engineering team, product managers, and directly with customers. A … the router's Go-based plugin system, enabling deep, performant customization for enterprise users. Ensuring the router integrates seamlessly with our observability stack, exporting rich metrics and traces via OpenTelemetry to platforms like ClickHouse. Embedding security best practices directly into the router, implementing features like JWT authentication and ensuring it meets enterprise and SOC 2 compliance standards. Mentoring other engineers More ❯
Analytics and conversion optimization Growth hacking and experimentation frameworks AI automation tools for marketing workflows (content generation, personalization, analytics) Nice to haves: Experience in the observability/monitoring space (OpenTelemetry, APM tools) Background working with AI/ML products or communities Hands-on experience with LLMs and AI agents for marketing automation Track record of building developer communities Experience with … AI workloads, multi-language environments, and cloud infrastructure that's designed to be straightforward to set up and maintain. We build with technologies developers actually want to work with: OpenTelemetry for standardized instrumentation SQL for intuitive querying (no proprietary query language to learn) Rust, Python, and TypeScript for performance and productivity Postgres, DataFusion, and object storage for scalable backends Unlike … the standard for AI application development. We're signatories of the open source pledge and build on open standards because we believe in interoperability, not lock-in. Use our OpenTelemetry-based SDK with any compatible backend - we're confident you'll choose us on merit. We're backed by Sequoia Capital and run a fully remote team across multiple time More ❯
the backend APIs that power it. Your work will bring our entire platform to life, from schema management and composition checks to advanced analytics and distributed tracing powered by OpenTelemetry and ClickHouse. We are looking for a hands-on technical leader who can seamlessly integrate these distinct technology stacks to solve complex challenges for our enterprise customers, ensuring our platform … and maintaining the backend APIs and services that power the Studio, primarily using Go and TypeScript/Node.js . Owning and evolving the user experience for our observability stack (OpenTelemetry, ClickHouse) and features like Role-Based Access Control (RBAC) to ensure deep, actionable insights into our users' systems. Collaborating on our command-line tool (wgc) and platform SDKs to create More ❯