We're looking for a Cloud Platform Engineer to join our infrastructure team. In this role, you'll take ownership of critical data and compute systems that power high-performance analytics at scale. You'll work across ClickHouse, Kafka, OpenSearch, and Kubernetes environments — ensuring everything runs smoothly, securely, and efficiently. If you enjoy solving complex technical challenges and optimizing … large-scale distributedsystems, this is the perfect opportunity for you. About Intapp: Intapp, based in Silicon Valley, is a leader in Vertical AI SaaS company, collaborating with over 2,550 professional and financial services firms globally. With 20+ years of industry expertise, Intapp's "Intelligence Applied" approach is transforming how businesses operate and leverages a strategic partner … You Will Do: Manage, monitor, and optimize ClickHouse clusters in production environments — including schema design, query tuning, replication setup, and capacity planning. Operate and maintain Kafka, OpenSearch, and other distributedsystems, ensuring high performance, scalability, and reliability. Deploy, configure, and manage containerized applications and stateful workloads on Kubernetes, following best practices for security and resource efficiency. Implement and More ❯
to help design and evolve internal compute frameworks that underpin their trading and research platforms. This is a high-impact role, working at the intersection of software engineering and distributedsystems, with the opportunity to build scalable tools and frameworks used across the business. What youll do: Design, develop, and maintain performant, reliable frameworks and services in Python … Build productivity tools and platforms that streamline workflows across investment and technology teams Contribute to code reviews and best practices, improving overall development quality Work across distributedsystems, containers, and automation pipelines to deliver scalable solutions What were looking for: 6+ years professional software development experience Strong proficiency in high-performance Python (deep ecosystem knowledge and best practices … Experience with at least one JVM language (Java, Kotlin, Scala) is an advantage Background in distributedsystems and large-scale compute frameworks Familiarity with Docker, Kubernetes, Linux environments, and CI/CD automation This role offers the chance to work on complex technical challenges at scale, with direct impact on high-performance computing platforms. If youre a skilled More ❯
to help design and evolve internal compute frameworks that underpin their trading and research platforms. This is a high-impact role, working at the intersection of software engineering and distributedsystems, with the opportunity to build scalable tools and frameworks used across the business. What you ll do: Design, develop, and maintain performant, reliable frameworks and services in … Python Build productivity tools and platforms that streamline workflows across investment and technology teams Contribute to code reviews and best practices, improving overall development quality Work across distributedsystems, containers, and automation pipelines to deliver scalable solutions What we re looking for: 6+ years professional software development experience Strong proficiency in high-performance Python (deep ecosystem knowledge and … best practices) Experience with at least one JVM language (Java, Kotlin, Scala) is an advantage Background in distributedsystems and large-scale compute frameworks Familiarity with Docker, Kubernetes, Linux environments, and CI/CD automation This role offers the chance to work on complex technical challenges at scale, with direct impact on high-performance computing platforms. If you More ❯
to help design and evolve internal compute frameworks that underpin their trading and research platforms. This is a high-impact role, working at the intersection of software engineering and distributedsystems, with the opportunity to build scalable tools and frameworks used across the business. What you’ll do: Design, develop, and maintain performant, reliable frameworks and services in … Python Build productivity tools and platforms that streamline workflows across investment and technology teams Contribute to code reviews and best practices, improving overall development quality Work across distributedsystems, containers, and automation pipelines to deliver scalable solutions What we’re looking for: 6+ years’ professional software development experience Strong proficiency in high-performance Python (deep ecosystem knowledge and … best practices) Experience with at least one JVM language (Java, Kotlin, Scala) is an advantage Background in distributedsystems and large-scale compute frameworks Familiarity with Docker, Kubernetes, Linux environments, and CI/CD automation This role offers the chance to work on complex technical challenges at scale, with direct impact on high-performance computing platforms. If you More ❯
to help design and evolve internal compute frameworks that underpin their trading and research platforms. This is a high-impact role, working at the intersection of software engineering and distributedsystems, with the opportunity to build scalable tools and frameworks used across the business. What you’ll do: Design, develop, and maintain performant, reliable frameworks and services in … Python Build productivity tools and platforms that streamline workflows across investment and technology teams Contribute to code reviews and best practices, improving overall development quality Work across distributedsystems, containers, and automation pipelines to deliver scalable solutions What we’re looking for: 6+ years’ professional software development experience Strong proficiency in high-performance Python (deep ecosystem knowledge and … best practices) Experience with at least one JVM language (Java, Kotlin, Scala) is an advantage Background in distributedsystems and large-scale compute frameworks Familiarity with Docker, Kubernetes, Linux environments, and CI/CD automation This role offers the chance to work on complex technical challenges at scale, with direct impact on high-performance computing platforms. If you More ❯
Employment Type: Permanent
Salary: £170000 - £200000/annum plus Bonus & Package
Guildford, Surrey, England, United Kingdom Hybrid/Remote Options
Jonothan Bosworth
Senior Python/C++ DistributedSystems Engineer Location: Hybrid Salary: £60,000 – £70,000 Type: Permanent About the Role: Our employer-partner is looking for a Senior Python/C++ Engineer with strong experience in distributedsystems, VoIP, audio/video processing, and cloud-native architectures. This role suits someone who enjoys building performance-critical tools … working across the full stack, and contributing to scalable real-time communication products. Key Responsibilities: Develop backend services and distributed components using Python and C++ . Work with media processing technologies (FFMPEG, audio mixing, streaming pipelines). Engineer VoIP, DECT, and SIP-based communication software, including real-time call handling. Build and maintain REST APIs (Flask, FastAPI, Django) and … control-plane systems. Desirable: FFMPEG, PyQt, NumPy, SQLAlchemy experience. Understanding of secure communications (SSL/TLS, JWT). Passion for scalable, highly available architecture (real-time or mission-critical systems). Why Apply? Join a highly technical engineering group solving complex real-time communication challenges. Opportunity to work on both cloud and on-premise distributed systems. A role More ❯
united kingdom, united kingdom Hybrid/Remote Options
Jobgether
In this role, you will contribute to the design, development, and maintenance of high-scale messaging infrastructure that supports billions of communications monthly. You will work on mission-critical systems responsible for message delivery, routing, and reliability, collaborating closely with cross-functional teams. This position offers the opportunity to tackle complex distributedsystems, network, and performance challenges … ensure safe, efficient, and high-quality message delivery. Lead architectural and design discussions, contributing to long-term technical strategy. Write clean, testable, and maintainable code following best practices for distributed and network-intensive systems. Participate in code reviews, mentor fellow engineers, and foster a culture of learning and operational excellence. Requirements 7+ years of professional backend engineering experience working … on highly available, scalable systems. Strong understanding of distributedsystems, networked services, and high-throughput data flows. Proficiency in object-oriented programming languages such as Java, Kotlin, or similar; experience with MySQL, Kafka, HBase, and Kubernetes is a plus. Experience solving complex reliability, performance, and throughput challenges. Excellent collaboration and communication skills across engineering, product, and operational teams. More ❯
Role: DistributedSystems Software Engineer - Up to £190k + Bonus Salary: Up to £190k + Bonus Location: London (Hybrid) Skills: Language agnostic, just need to be a keen technologist (Ideally experienced in Rust, Python or C++) This firm is an elite company with high tech standards who have previously set tech world records. They are made up of … to the limits. They’ll find the best team to suit your skillset/interests but you could be working on: • Designing and developing scalable, tested and production grade distributedsystems • R&D work for functional programming; either pre-existing languages (such as Rust and Erlang), or purpose-built languages similar to OCaml • Building out Machine Learning Infrastructure More ❯
St. Albans, Hertfordshire, England, United Kingdom
Method Resourcing
business for nearly 30-years, and after a very successful period, they have acquired several competitors. After a period of consolidation, they are now looking to unify all their systems into a single source of truth. They are also looking to scale massively over the next 5-years as they enter new markets, and as a result, are looking … several million events per day, before scaling up to 100+ millions events per day. You'll work at the top of the engineering track, designing and delivering high-performance, distributedsystems while guiding others through implementation and problem-solving. This is a deep technical role, ideal for someone who thrives on code, architecture, and tangible impact. What you … ll do Design and build scalable, distributedsystems that support critical environments. Lead technical decision-making and resolve engineering challenges across domains. Own the delivery of complex features, ensuring performance, resilience, and maintainability. Collaborate with Engineering Leads, Architects, and Product to translate roadmap goals into reality. Contribute to a 5-year architectural refresh, evolving systems to event More ❯
Senior Software Developer | Surrey | Hybrid | £65,000 - £75,000 Are you an experienced systems-level engineer craving impact We’re seeking a Senior Software Developer versed in Rust, or an equivalent systems language, who thrives in high-availability, mission-critical environments. Our client is a fast-growing technology provider delivering next-generation communications solutions to a global customer … is a chance to join a forward-thinking engineering team where you’ll make a real impact. You’ll play a key role in architecting, building, and optimising telecommunications systems in Rust , contributing to secure, high-performance, and scalable solutions used worldwide. You’ll collaborate closely across DevOps, API (Java), front-end, and database teams, and be empowered to … architectural refinements, and shape technology direction. What We're Looking For Three key areas of experience, strong candidates may excel in any one or two : Rust programming or equivalent systems-level expertise Proven experience in Rust development is ideal, but strong developers in C, C++, or similar systems languages are very welcome. Telecommunications or comparable high-availability background More ❯
Mercor is hiring AI Agent Infrastructure Engineers on behalf of a leading AI Lab developing scalable systems to power the next generation of intelligent, autonomous agents. This is a unique opportunity to work with world-class AI researchers and engineers, building the infrastructure that enables advanced reasoning, multi-agent coordination, and real-world deployment of AI systems. Responsibilities Design … build, and optimize infrastructure for training, deploying, and scaling AI agents across distributed systems. Develop robust backend services, APIs, and orchestration frameworks that support multi-agent workflows and high-performance compute environments. Collaborate closely with research and product teams to integrate model-serving pipelines, memory systems, and reasoning components. Implement monitoring, observability, and failover mechanisms to ensure high … identifying bottlenecks and improving efficiency across data, compute, and model layers. Participate in synchronous collaboration sessions (4-hour windows, 2–3 times per week) to review architecture decisions, troubleshoot distributedsystems, and iterate on design improvements. Requirements Strong background in Computer Science, Software Engineering, or Systems Design, with focus on large-scale distributed infrastructure. Experience with More ❯
software that empowers businesses to reach new heights. The Opportunity As a DevOps Engineer at Bright, you'll be enabling our engineering teams to build, deploy, and operate production systems at scale. You'll support the infrastructure and deployment pipelines for our API platform and product development teams, ensuring they can ship fast, safely, and reliably to tens of … of customers across the UK and Ireland. Working across our Azure-based Kubernetes infrastructure, you'll build and maintain the CI/CD pipelines, infrastructure as code, and observability systems that enable small, autonomous squads to deploy with confidence. You'll be instrumental in creating the foundations that allow our teams to move at startup pace while maintaining production … scanning, and quality gates Create deployment strategies (blue/green, canary, rolling updates) Support teams with deployment tooling and best practices Observability & Reliability Implement comprehensive monitoring, logging, and alerting systems Build dashboards and metrics for system health and performance Design and implement incident response procedures Conduct post-mortems and drive continuous improvement Optimize system performance and resource utilization Security More ❯
backend services and microservices using Java. Collaborate broadly: Work with cross-functional teams to deliver scalable telecom solutions. Optimise performance: Integrate databases, APIs, and ensure efficiency across systems. Harden systems: Implement redundancy, security, and performance tuning for telecom operations. Problem-solve: Troubleshoot challenges in distributedsystems and live client environments. Contribute to Agile: Participate in code reviews … ensure best practices across the lifecycle. What we’re looking for 4+ years’ hands-on backend Java development experience. Strong expertise in RESTful APIs and microservice architectures. Background in distributedsystems, OOP, and networking fundamentals. Cloud environment experience (AWS, GCP, etc.), plus Docker/Kubernetes and CI/CD. Proven track record optimising apps for performance, memory, and … scalability. SQL/NoSQL database experience, including deployment and integration. Knowledge of messaging systems (Kafka, RabbitMQ, Pub/Sub). Excellent communication and analytical skills. Nice to have: Telecom-specific protocols (SMPP, SIP), OSS/BSS integrations, or network APIs. Event-driven systems, CQRS, or high-redundancy architectures. Security scanning, testing, Git, and Agile/Scrum experience. Interest More ❯
Job summary A Lead Systems Engineer is typically the technical lead for multidisciplinary teams delivering and operating multiple components for a system. We work on national, highly available distributedsystems being built and run by in house teams. The systems can differ in size, scale and purpose, but an example system would: Have a round-the … which builds and operates a set of products including the Personal Demographics Service (PDS) and GP Registration Main duties of the job Developing, building and operating national, highly available distributedsystems being built and run within NHS England. Operating within and contributing to the NHS England engineering principles. Have technical ownership across the system space, including application, environments … networks, pipelines and operational tools. Engaged in peer-to-peer collaboration to solve engineering problems and drive-up organisation engineering standards. This is a significant part of the Lead Systems Engineer role, in the order of 25% of time. Responsible for Engineering maturity within the team. Coaching and mentoring colleagues to develop the team. Candidates will need to demonstrate More ❯
St. Albans, Hertfordshire, England, United Kingdom
Method Resourcing
Senior Software Engineer (C#, Azure, Event-Driven Systems)£75,000-£8,000 + Bonus + Shares | St Albans (Hybrid) A long-established, high-growth technology business is beginning a major architectural rebuild following several acquisitions. They are unifying multiple systems into a single source of truth and preparing to scale their platform significantly over the next five years. … Senior Software Engineer to play a key role in designing, building, and delivering the next-generation event-driven platform. You'll help break down a large monolith into a distributed, event-driven system processing several million events per day, scaling further as new markets open. This role is hands-on: building services, solving technical challenges, and contributing to the … engineering standards that will define the new platform. Key Responsibilities * Build scalable, resilient, event-driven services using C# and Azure.* Work on distributedsystems that support high-throughput, high-availability environments.* Deliver features end-to-end with a focus on performance, reliability, and maintainability.* Collaborate with Engineering Leads, Architects, and Product to refine designs and deliver roadmap goals. More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Utility Warehouse Limited
impact will include: Improving resilience, scalability and system reliability Raising engineering standards across observability, SLAs and deployment quality Contributing to key launches (including partner rollout and rebrand work) Supporting distributed system improvements and database migration projects Core Responsibilities Work primarily in Go, GraphQL, Docker & Kubernetes Break down complex work and deliver with minimal oversight Maintain strong engineering standards across … customer acquisition platforms. You’ll work primarily with Go, GraphQL, Docker and Kubernetes. You’ll own deployments end-to-end within our team’s Kubernetes namespace and contribute to systems where resilience, reliability, observability and uptime really matter. You’ll thrive here if you enjoy autonomy, solving distributedsystems problems, and mentoring others as a player-coach. … essential. Required Skills and Experience To be successful in this role, you’ll need: Strong production experience with Go (non-negotiable) Fluency with GraphQL, Docker and Kubernetes Experience with distributedsystems , concurrency and event-driven architectures Good understanding of resilience, observability, uptime, SLAs and progressive degradation Ability to deliver end-to-end: design build deploy support Experience with More ❯
role will be hands-on as well as leading the team to shape the future backend systems. As a Staff Software Engineer, you'll lead backend architecture, scaling secure distributedsystems for a growing customer base. Set engineering standards, mentor engineers, and collaborate across teams to deliver scalable features. Tackle challenges in performance, fault tolerance, and data-heavy … workloads while influencing product strategy.Requirements: Strong foundations in algorithms, data structures, and distributedsystems Experience building and operating large-scale backend systems Expertise in system and API design, scalability, and performance tuning Proficiency in a modern backend language (Java preferred) Knowledge of cloud-native architectures, containers, and CI/CD Proven leadership in technical strategy and mentoring More ❯
role will be hands-on as well as leading the team to shape the future backend systems. As a Staff Software Engineer, you'll lead backend architecture, scaling secure distributedsystems for a growing customer base. Set engineering standards, mentor engineers, and collaborate across teams to deliver scalable features. Tackle challenges in performance, fault tolerance, and data-heavy … workloads while influencing product strategy. Requirements: Strong foundations in algorithms, data structures, and distributedsystems Experience building and operating large-scale backend systems Expertise in system and API design, scalability, and performance tuning Proficiency in a modern backend language (Java preferred) Knowledge of cloud-native architectures, containers, and CI/CD Proven leadership in technical strategy and More ❯
london, south east england, united kingdom Hybrid/Remote Options
eBay
build communities to create economic opportunity for all. About the role Every day, millions of people sell and ship on eBay. Our Shipping team builds the products and backend systems that make that possible — helping sellers meet the fast-changing expectations of global buyers. We move quickly, stay curious, and ship continuously. What you'll do Design, build and … operate large-scale distributedsystems and APIs that power eBay's global shipping experiences. Take full ownership of your code — from design and deployment to production support, monitoring and on-call. Architect secure, maintainable, and scalable solutions, contributing to documentation and continuous delivery. Lead by example through mentoring, pair programming, reviewing code, and driving engineering best practices. Collaborate … with Product and Engineering partners to define requirements, manage dependencies, and deliver end-to-end solutions. Continuously improve systems, processes, and tooling — from testing and observability to introducing new technologies and AI-driven enhancements. What you bring Strong experience with Java , Spring Boot, and API development in enterprise-scale environments. Hands-on knowledge of SQL and NoSQL databases, asynchronous More ❯
About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected … strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. About the team In this role you will work on xAI's production systems that power and the API. The actual team matching will happen at the offer stage. About the role As an ideal candidate you have a good understanding of how … a compiled language such as C++, Rust, or Go is highly beneficial. Expert knowledge of either Rust or C++, Experience in designing, implementing, and maintaining reliable and horizontally scalable distributedsystems, Knowledge of service observability and reliability best practices, Experience in operating commonly used databases such as PostgreSQL, Clickhouse, and MongoDB. Additionally, any of the below points will More ❯
Wigan, Lancashire, England, United Kingdom Hybrid/Remote Options
Searchability
and platform teams THE SITE RELIABILITY ENGINEER ROLE: As a Site Reliability Engineer, you'll ensure the reliability, performance and scalability of critical digital platforms. You'll monitor production systems, refine SLAs/SLOs and error budgets, design scalable solutions, improve architecture through telemetry insights, and build dashboards that provide clear visibility of system health. You'll also contribute … and knowledge of container orchestration (Kubernetes) and Infrastructure as Code (Terraform) * Experience with monitoring and observability tools such as Grafana, Prometheus or OpenTelemetry * Strong understanding of networking fundamentals and distributedsystems* Ability to collaborate effectively with engineering, operations and product teams TO BE CONSIDERED: Please either apply through this advert or email me directly via .For further information … your application to our client in conjunction with this vacancy only. KEY SKILLS SRE, Site Reliability Engineer, AWS, Kubernetes, Terraform, Observability, Performance, SLAs/SLOs, Monitoring, Automation, GO, .NET, DistributedSystems, Cloud-Native Engineering More ❯
customers, and stakeholders. What you will do: Define and drive the product roadmap for ClickHouse Cloud, with a focus on cloud-native database capabilities such as multi-region replication, distributed query execution, fault tolerance, storage tiering, and schema evolution. Collaborate with engineering to prioritize and deliver improvements to the core database engine, storage layer, and query optimizer, ensuring the … effectively managing expectations and keeping stakeholders informed. About You: Minimum 8+ years of product management experience with a strong background in cloud-based SaaS products. Deep familiarity with database systems, distributedsystems, or real-time analytics platforms is strongly preferred. Hands-on experience with SQL and strong understanding of query execution, indexing strategies, partitioning, and performance tuning. … or organization. If you have any questions or comments about compensation as a candidate, please get in touch with us at Perks Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries. Healthcare - Employer contributions towards your healthcare. Equity in the company - Every new team member who joins our company receives More ❯
research leader, where you ll architect and optimise the platforms that deliver large-scale language models to production. You ll be working on some of the hardest challenges in distributed AI systems: building ultra-reliable, ultra-scalable environments for inference and deployment. What you ll be doing Designing cloud-native architectures to run large language models on serverless … frameworks (e.g. Kubernetes, Knative, or custom-built FaaS). Developing approaches to minimise cold-start latency through advanced container snapshotting, weight pre-loading, and graph partitioning . Building distributed inference pipelines with tensor parallelism, model sharding, and efficient memory scheduling to serve LLMs at scale. Experimenting with quantisation, pruning, and KV-cache management to squeeze maximum throughput from GPU … accelerator clusters. Working closely with applied researchers to turn state-of-the-art methods into robust, production-grade systems. What you ll bring Deep understanding of large-scale ML systems engineering , with direct experience in deploying or optimising LLMs. Hands-on expertise in C Rust/Go for systems programming, plus Python for model integration. Strong knowledge of More ❯
research leader, where you’ll architect and optimise the platforms that deliver large-scale language models to production. You’ll be working on some of the hardest challenges in distributed AI systems: building ultra-reliable, ultra-scalable environments for inference and deployment. What you’ll be doing Designing cloud-native architectures to run large language models on serverless … frameworks (e.g. Kubernetes, Knative, or custom-built FaaS). Developing approaches to minimise cold-start latency through advanced container snapshotting, weight pre-loading, and graph partitioning . Building distributed inference pipelines with tensor parallelism, model sharding, and efficient memory scheduling to serve LLMs at scale. Experimenting with quantisation, pruning, and KV-cache management to squeeze maximum throughput from GPU … accelerator clusters. Working closely with applied researchers to turn state-of-the-art methods into robust, production-grade systems. What you’ll bring Deep understanding of large-scale ML systems engineering , with direct experience in deploying or optimising LLMs. Hands-on expertise in C Rust/Go for systems programming, plus Python for model integration. Strong knowledge of More ❯