ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You'll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient … and observable. Key Responsibilities Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL Support day-to-day operations in data centre/large-scale infrastructure environments (5,000+ hosts) Contribute to system reliability, scalability and performance improvements across the … platform Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems Collaborate with internal teams to improve observability, monitoring and alerting across services Identify and implement operational improvements to existing monitoring, logging and incident response processes Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring More ❯
ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You'll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient … and observable. Key Responsibilities Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL Support day-to-day operations in data centre/large-scale infrastructure environments (5,000+ hosts) Contribute to system reliability, scalability and performance improvements across the … platform Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems Collaborate with internal teams to improve observability, monitoring and alerting across services Identify and implement operational improvements to existing monitoring, logging and incident response processes Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring More ❯
ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You'll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient … and observable. Key Responsibilities Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL Support day-to-day operations in data centre/large-scale infrastructure environments (5,000+ hosts) Contribute to system reliability, scalability and performance improvements across the … platform Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems Collaborate with internal teams to improve observability, monitoring and alerting across services Identify and implement operational improvements to existing monitoring, logging and incident response processes Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring More ❯
Bolton, Greater Manchester, UK Hybrid/Remote Options
TechNET IT Recruitment Ltd
ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You'll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient … and observable. Key Responsibilities Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL Support day-to-day operations in data centre/large-scale infrastructure environments (5,000+ hosts) Contribute to system reliability, scalability and performance improvements across the … platform Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems Collaborate with internal teams to improve observability, monitoring and alerting across services Identify and implement operational improvements to existing monitoring, logging and incident response processes Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring More ❯
ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You'll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient … and observable. Key Responsibilities Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL Support day-to-day operations in data centre/large-scale infrastructure environments (5,000+ hosts) Contribute to system reliability, scalability and performance improvements across the … platform Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems Collaborate with internal teams to improve observability, monitoring and alerting across services Identify and implement operational improvements to existing monitoring, logging and incident response processes Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring More ❯
ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You'll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient … and observable. Key Responsibilities Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL Support day-to-day operations in data centre/large-scale infrastructure environments (5,000+ hosts) Contribute to system reliability, scalability and performance improvements across the … platform Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems Collaborate with internal teams to improve observability, monitoring and alerting across services Identify and implement operational improvements to existing monitoring, logging and incident response processes Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring More ❯
Stoke-on-Trent, Staffordshire, UK Hybrid/Remote Options
TechNET IT Recruitment Ltd
ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You'll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient … and observable. Key Responsibilities Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL Support day-to-day operations in data centre/large-scale infrastructure environments (5,000+ hosts) Contribute to system reliability, scalability and performance improvements across the … platform Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems Collaborate with internal teams to improve observability, monitoring and alerting across services Identify and implement operational improvements to existing monitoring, logging and incident response processes Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring More ❯
Birmingham, Leeds, Liverpool, London (Canary Wharf), United Kingdom Hybrid/Remote Options
UK Health Security Agency
Continuous Delivery (CI/CD); ensuring our services run reliably, are scalable, and perform optimally Monitor and manage these aspects while taking responsibility for multiple cloud infrastructure services Observing systems will be key to prioritising the operational service improvements and performance improvements to meet/exceed SLOs (Service Level Objectives) The role will be responsible to the Principal Specialist … manual intervention and improve efficiency. Write code that is maintainable, clear, and concise. Optimise system performance using strong problem-solving skills to identify bottlenecks with an engineering mindset. Ensure systems can handle current and future workloads through automation and capacity planning. Continuously improve services through observability, and identify ways to improve observability practices. Follow SRE principles. Guide and educate … HPC & SRE engineering team. As an SRE, you will play a critical role in ensuring the stability, scalability, and performance of our services. You will combine software engineering and systems engineering to build, improve and run reliable, scalable production systems. Key Responsibilities Service Reliability & Performance Ensure services are stable, scalable, and performant through engineering best practices and system design. More ❯
research leader, where you ll architect and optimise the platforms that deliver large-scale language models to production. You ll be working on some of the hardest challenges in distributed AI systems: building ultra-reliable, ultra-scalable environments for inference and deployment. What you ll be doing Designing cloud-native architectures to run large language models on serverless … frameworks (e.g. Kubernetes, Knative, or custom-built FaaS). Developing approaches to minimise cold-start latency through advanced container snapshotting, weight pre-loading, and graph partitioning . Building distributed inference pipelines with tensor parallelism, model sharding, and efficient memory scheduling to serve LLMs at scale. Experimenting with quantisation, pruning, and KV-cache management to squeeze maximum throughput from GPU … accelerator clusters. Working closely with applied researchers to turn state-of-the-art methods into robust, production-grade systems. What you ll bring Deep understanding of large-scale ML systems engineering , with direct experience in deploying or optimising LLMs. Hands-on expertise in C Rust/Go for systems programming, plus Python for model integration. Strong knowledge of More ❯
research leader, where you’ll architect and optimise the platforms that deliver large-scale language models to production. You’ll be working on some of the hardest challenges in distributed AI systems: building ultra-reliable, ultra-scalable environments for inference and deployment. What you’ll be doing Designing cloud-native architectures to run large language models on serverless … frameworks (e.g. Kubernetes, Knative, or custom-built FaaS). Developing approaches to minimise cold-start latency through advanced container snapshotting, weight pre-loading, and graph partitioning . Building distributed inference pipelines with tensor parallelism, model sharding, and efficient memory scheduling to serve LLMs at scale. Experimenting with quantisation, pruning, and KV-cache management to squeeze maximum throughput from GPU … accelerator clusters. Working closely with applied researchers to turn state-of-the-art methods into robust, production-grade systems. What you’ll bring Deep understanding of large-scale ML systems engineering , with direct experience in deploying or optimising LLMs. Hands-on expertise in C Rust/Go for systems programming, plus Python for model integration. Strong knowledge of More ❯
of these to the team. Required skills and experience: A solid track record of object-oriented programming principles and a good understanding of design patterns. Experience working on complex systems, debugging distributedsystems and dealing with large or complex volumes. You will understand and use Version Control System best practises. We use Git, but equivalents are acceptable. More ❯
of these to the team. Required skills and experience: A solid track record of object-oriented programming principles and a good understanding of design patterns. Experience working on complex systems, debugging distributedsystems and dealing with large or complex volumes. You will understand and use Version Control System best practises. We use Git, but equivalents are acceptable. More ❯
of these to the team. Required skills and experience: A solid track record of object-oriented programming principles and a good understanding of design patterns. Experience working on complex systems, debugging distributedsystems and dealing with large or complex volumes. You will understand and use Version Control System best practises. We use Git, but equivalents are acceptable. More ❯
of these to the team. Required skills and experience: A solid track record of object-oriented programming principles and a good understanding of design patterns. Experience working on complex systems, debugging distributedsystems and dealing with large or complex volumes. You will understand and use Version Control System best practises. We use Git, but equivalents are acceptable. More ❯
Liverpool, Merseyside, North West, United Kingdom Hybrid/Remote Options
Acorn Insurance
a hybrid working basis Salary: £65,000 - £75,000 depending on experience We're looking for a passionate Senior Backend Developer who thrives on solving complex problems and building systems that scale. You'll be at the forefront of creating next-generation backend services that genuinely make a difference for our customers and business. In this role, you'll … that values openness and collaboration. Bonus: Clean Architecture Understanding Clean Architecture and Mediator Pattern is a huge plus! If you have it, you'll help us build maintainable, testable systems that follow industry best practices. Our Tech Stack: Backend: C#, .NET 8.0, ASP.NET Core Database: SQL Server, Entity Framework Infrastructure: Docker, Azure Tools: Unit and Integration Testing, Git, Agile … building robust, scalable systems. Understanding Clean Architecture and Mediator Pattern is desirable but not mandatory. Bonus Points For: Understanding Clean Architecture, Mediator Pattern Solid grasp of asynchronous communication in distributedsystems Experience with high-throughput, data-intensive systems Contributed towards systems utilising Event-Driven Architecture Why Join Us? Modern tech stack and a strong culture of More ❯
Edinburgh, Midlothian, Scotland, United Kingdom Hybrid/Remote Options
Cathcart Technology
Software Engineering Manager required to lead a team of Java engineers in Edinburgh, shaping the development of next-generation, large-scale systems that solve complex technical challenges in a collaborative environment. The Opportunity This is a chance to lead a team of talented engineers building sophisticated, large-scale backend systems that deliver real-time, mission-critical functionality to … challenges in a regulated, high-availability environment, driving innovation while ensuring reliability, quality, and customer trust. You'll lead an agile team who will be working on Java-based distributedsystems and cloud platforms, whilst guiding delivery across the full software development lifecycle. You'll empower your team to take ownership, innovate, and deliver high-impact features with … of software innovation. They have built a deeply technical, forward-thinking engineering culture where collaboration, experimentation, and operational excellence are core capabilities. Teams work with large-scale datasets, complex systems, and distributed architectures to deliver scalable, resilient, and high-quality software, supported by cutting-edge infrastructure and cloud technologies. Why this role? ** Lead a talented team of engineers More ❯
engineering alongside driving best practice and projects. What You'll Do Working primarily in Java (Spring Boot, Hibernate, etc.) Drive design and architecture decisions for scalable, secure, and performant systems Collaborate cross-functionally with Product, Design, and DevOps to deliver high-impact features Conduct code reviews, establish coding standards, and promote engineering best practices Continuously improve team workflows and … development processes What They're Looking For Good expertise in Java and JVM-based systems (3+ years coding experience preferred) Solid background in system design, distributedsystems, and cloud architectures (AWS/Azure/GCP) Passion for clean code, testing, and performance optimization Excellent communication, leadership, and decision-making skills A mindset for innovation, problem-solving, and More ❯
and middleware platforms, and they're looking for a TypeScript Engineer to help shape the services and tooling behind this transformation. The team's mission is to replace legacy systems with clean, event-driven services and provide consistent, reliable data to applications across the business - from customer-facing products to operational systems. The Role You'll be joining a … cloud-hosted workloads. You'll work closely with engineers, analysts, and product specialists to design scalable services, contribute to architectural improvements, and help evolve the company's approach to distributed data and developer experience. Day-to-day responsibilities include: Designing and implementing backend services in TypeScript using Node.js (REST, GraphQL, workers, event-driven consumers). Building and integrating services … and implementing REST or GraphQL APIs, background workers, and microservices. Ability to work with SQL or NoSQL databases and understand how to model and query data effectively. Familiarity with distributedsystems concepts such as event sourcing, resilience patterns, and asynchronous communication. Experience with containerised development (Docker) and deploying cloud-native applications (Azure, AWS, or GCP). Strong understanding More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Harnham - Data & Analytics Recruitment
Up to £160,000 + Bonus + Benefits We're supporting a major financial services organisation hiring an eFX Software Engineer to build and optimise ultra-low-latency trading systems used across global FX markets. If you're a high-performance Java engineer who loves solving complex technical challenges at scale, this is a standout opportunity. What you'll … work on: Engineering sub-40μs latency eFX systems with high throughput and fault tolerance Designing real-time pricing, risk and trading components Evolving low-latency Java patterns (lock-free, low-GC, CPU/cache optimisation) Working closely with quants, traders, architects and senior engineers Influencing technical strategy across distributed, performance-critical systems What we're looking for … low-latency Java within trading, eFX or electronic markets Deep understanding of networking (TCP/UDP/FIX), Linux tuning, performance profiling Background in designing high-performance architectures and distributedsystems Strong grasp of FX pricing, risk, and trading workflows Solid engineering fundamentals: testing, CI/CD, API design, automation Why join? Work on mission-critical systemsMore ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid/Remote Options
Syntax Consultancy Limited
large-scale IT modernisation programmes for Government clients. Key experience + tasks will include: Java Full Stack Developer with experience developing large-scale micro-services, messaging, web services + distributed systems. In-depth Java 8/11 , Spring Boot, Spring Framework and Micro-services development experience. JavaScript, NodeJS, ReactJS preferred and REST API, PostgreSQL/NoSQL databases, JSON and … Serverless. DevOps and CI/CD technologies , including hands-on experience of Jenkins, Docker , Ansible, Git. Kubernetes . Experience in large scale integration projects involving messaging, web services and distributedsystems, ELK stack, OpenStack platform Agile development methods (Scrum, BDD, TDD, Kanban). Advantageous skills: Docker, JavaScript: ReactJS and NodeJS preferred, AWS API Gateway and Serverless technologies, Message More ❯
to lead the technical design and implementation of our most critical data infrastructure and products. In this senior-level individual contributor role, you'll be responsible for designing scalable systems, setting data architecture standards, and solving complex technical challenges that power analytics, data science, and business function use cases across the company. You will work closely with engineers, product … work within a dynamic company, then we'd love to hear from you. THE GAME PLAN Everyone on our team has a part to play Architect High-Impact Data Systems Design and implement scalable, maintainable, and secure batch & streaming data pipelines and architectures that support enterprise-wide data needs Define standards for data modeling, data product design, and pipeline … end data solutions Drive Engineering Best Practices Establish and enforce engineering best practices around code quality, testing, documentation, and deployment Contribute to the evolution of the data platform, ensuring systems are modular, interoperable, and resilient Lead technical design and code reviews, mentoring peers and raising the bar for engineering excellence Lead Strategic Initiatives Partner with data platform teams, analytics More ❯
to lead the technical design and implementation of our most critical data infrastructure and products. In this senior-level individual contributor role, you'll be responsible for designing scalable systems, setting data architecture standards, and solving complex technical challenges that power analytics, data science, and business function use cases across the company. You will work closely with engineers, product … work within a dynamic company, then we'd love to hear from you. THE GAME PLAN Everyone on our team has a part to play Architect High-Impact Data Systems Design and implement scalable, maintainable, and secure batch & streaming data pipelines and architectures that support enterprise-wide data needs Define standards for data modeling, data product design, and pipeline … end data solutions Drive Engineering Best Practices Establish and enforce engineering best practices around code quality, testing, documentation, and deployment Contribute to the evolution of the data platform, ensuring systems are modular, interoperable, and resilient Lead technical design and code reviews, mentoring peers and raising the bar for engineering excellence Lead Strategic Initiatives Partner with data platform teams, analytics More ❯
to lead the technical design and implementation of our most critical data infrastructure and products. In this senior-level individual contributor role, you'll be responsible for designing scalable systems, setting data architecture standards, and solving complex technical challenges that power analytics, data science, and business function use cases across the company. You will work closely with engineers, product … work within a dynamic company, then we'd love to hear from you. THE GAME PLAN Everyone on our team has a part to play Architect High-Impact Data Systems Design and implement scalable, maintainable, and secure batch & streaming data pipelines and architectures that support enterprise-wide data needs Define standards for data modeling, data product design, and pipeline … end data solutions Drive Engineering Best Practices Establish and enforce engineering best practices around code quality, testing, documentation, and deployment Contribute to the evolution of the data platform, ensuring systems are modular, interoperable, and resilient Lead technical design and code reviews, mentoring peers and raising the bar for engineering excellence Lead Strategic Initiatives Partner with data platform teams, analytics More ❯
to lead the technical design and implementation of our most critical data infrastructure and products. In this senior-level individual contributor role, you'll be responsible for designing scalable systems, setting data architecture standards, and solving complex technical challenges that power analytics, data science, and business function use cases across the company. You will work closely with engineers, product … work within a dynamic company, then we'd love to hear from you. THE GAME PLAN Everyone on our team has a part to play Architect High-Impact Data Systems Design and implement scalable, maintainable, and secure batch & streaming data pipelines and architectures that support enterprise-wide data needs Define standards for data modeling, data product design, and pipeline … end data solutions Drive Engineering Best Practices Establish and enforce engineering best practices around code quality, testing, documentation, and deployment Contribute to the evolution of the data platform, ensuring systems are modular, interoperable, and resilient Lead technical design and code reviews, mentoring peers and raising the bar for engineering excellence Lead Strategic Initiatives Partner with data platform teams, analytics More ❯
Senior .NET Developer - Manchester (Hybrid) Join high-performing development teams of 250+ building real-time, large-scale systems used by millions of users across the UK, North America and South Africa. This is a chance to work on business-critical software that directly drives user engagement and revenue. Hybrid role based in Manchester city centre (2 days per week … made at team level, you will have a say in the tech stack and development approach, with a strong emphasis on team collaboration. This is a stimulating environment where systems must operate in real-time, requiring robust event-driven architectures, streaming data pipelines, and reactive programming. You'll tackle complex scalability challenges across distributedsystems, ensuring speed … and reliability under heavy user loads. Security and compliance are central to the platform, so you'll be involved in building secure systems with strong authentication, encryption, and adherence to regulatory standards. What You'll Get to Work With: Modern microservices architecture powering high-volume systems Containerisation using Docker and Kubernetes for scalable deployments Cloud-native platforms and More ❯