Systems : Design scalable backend systems using Python, event-driven services, and service patterns Own Infrastructure as Code : Lead the design and maintenance of our Terraform-managed AWS infrastructure DevOps & Observability : Set up and manage CI/CD pipelines, logging, tracing, and system health checks Security & Compliance : Build with SOC2 and GDPR in mind-IAM, encryption, audit logs, zero trust Performance More ❯
manage changes, and perform problem analysis to maintain system uptime and reliability. Collaborate with internal teams and customers to troubleshoot and resolve infrastructure and application issues. Operate and enhance observability tooling, including Prometheus, Grafana, and Splunk, with a strong focus on PromQL. Participate in anon-call rotation to support critical production systems. Improve and maintain CI/CD pipelines and More ❯
etc.) Comfort with basic computer administration including software installation, system configuration, and networking. Comfort with git and automated build pipelines (Jenkins, GitLab CI/CD, etc.) Preferred Passion for observability (Elastic, APM, Grafana, etc.) Experience integrating software with a Large Language Model (LLM) Experience with retrieval-augmented generation (RAG) Production-grade software development experience with Python Service containerization and deployment More ❯
Better Placed Ltd - A Sunday Times Top 10 Employer!
and scaling production-grade AI-powered products. Strong collaborator, with a track record of working closely with AI research, product, and infrastructure teams. Bonus Points: Exposure to MLOps, AI observability, or LLM deployment at scale. Experience with data engineering for large-scale pipelines. Prior background in enterprise SaaS or developer tools. Why Join: AI-native mission: Shape the next generation More ❯
City of London, Greater London, UK Hybrid / WFH Options
Better Placed Ltd - A Sunday Times Top 10 Employer!
and scaling production-grade AI-powered products. Strong collaborator, with a track record of working closely with AI research, product, and infrastructure teams. Bonus Points: Exposure to MLOps, AI observability, or LLM deployment at scale. Experience with data engineering for large-scale pipelines. Prior background in enterprise SaaS or developer tools. Why Join: AI-native mission: Shape the next generation More ❯
container based runtime environments Proficient in modelling relational data Have worked with NoSQL data in the past Proficient in fine tuning databases for high performance A good understanding of observability tooling, covering metrics, traces and logging Ability to mentor others, regardless of experience level Proficient in utilising cloud infrastructure provided by AWS/GCP/Azure Worked with a variety More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and attitude on automating common repetitive tasks A suitable sense of ownership and responsibility in driving tasks to timely full completion "Nice To Have" Skills and Experience: AIOps and Observability Meaningful experience in a distributed team Working in a sophisticated, multi-geography, engineering services environment! Providing technical support and mentoring to othe Accommodations at Arm At Arm, we want to More ❯
cutting-edge trading tools. From initial design to seamless deployment, you'll drive key infrastructure decisions to ensure optimal scalability and performance. Your expertise will shape comprehensive monitoring and observability solutions, guaranteeing a resilient Web3 experience for our users. If you're driven to build the backbone of the future of trading, we want to hear from you More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Trireme
cutting-edge trading tools. From initial design to seamless deployment, you'll drive key infrastructure decisions to ensure optimal scalability and performance. Your expertise will shape comprehensive monitoring and observability solutions, guaranteeing a resilient Web3 experience for our users. If you're driven to build the backbone of the future of trading, we want to hear from you More ❯
platform services that arecrucial for accelerating the productivity of all engineeringteams. As an Engineering Manager, you will lead and expand twocore components: the Feature Flags Service and the EngineeringInternal Observability Platform (E360). You'll ensure that thesesystems serve as the foundation for seamless and efficientengineering workflows, allowing us to deliver top-tier AI-drivenproducts. What You'll Do at More ❯
high-quality code, primarily in TypeScript (Node.js), and selecting appropriate languages (like Python, Go, Rust, etc.) for specific tasks, ranging from scripting data workflows to compiling performant services. Fostering Observability & Dependability: Integrating monitoring tools for metrics, tracing, and logging, with the objective of maintaining exceptional system uptime (over 99.9%). Ensuring Security & Adherence: Implementing secure coding standards and data protection More ❯
high-quality code, primarily in TypeScript (Node.js), and selecting appropriate languages (like Python, Go, Rust, etc.) for specific tasks, ranging from scripting data workflows to compiling performant services. Fostering Observability & Dependability: Integrating monitoring tools for metrics, tracing, and logging, with the objective of maintaining exceptional system uptime (over 99.9%). Ensuring Security & Adherence: Implementing secure coding standards and data protection More ❯
roles. Experience with technical troubleshooting and scripting languages such as Python, Go, or Bash. Experience with Kubernetes security, including workload isolation, RBAC, and network policies, containerisation, orchestration, and Kubernetes observability tools (e.g., Falco, Prometheus, Grafana). Experience with infrastructure-as-code and configuration management tools (e.g., Terraform, Helm, ArgoCD). United Kingdom Security Vetting Developed Vetting (DV) clearance. Preferred qualifications More ❯
full lifecycle Develop APIs and tools that expose predictors as scalable services to other teams Contribute to software engineering best practices across the ML stack (testing, CI/CD, observability) Partner with platform engineers and product stakeholders to ensure technical alignment and delivery What we're looking for 6+ years of experience in software engineering with strong focus on machine More ❯
software development and architecture. Experience influencing technical decisions across the different stakeholder levels of the business including non-technical audiences. Ability to foster a culture around data-driven reliability, observability, monitoring, and automation. Due to the global nature of the team, a degree of flexible working will be required to accommodate different time zones. We are an equal opportunities employer. More ❯
automating our physical server inventory using Infrastructure as Code (IaC). You will work across all layers of infrastructure, including: Networking & Exchange Connectivity Linux Systems & Kubernetes Administration Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction More ❯
services/message buses and other architectural elements Deploy these applications using features such as containers to cloud leveraging CI/CD to support this process backed with good observability when running these in production Ensure quality through the creation of documentation and use of unit/integration/contract testing with a consideration of security/performance requirements We More ❯
jobs, or streaming) so they integrate well into the core product user journeys. Model monitoring & maintenance: implement monitoring for model performance (accuracy, drift, latency). Set up alerts and observability tools to track data/model health in production. Automate retraining workflows based on triggers (e.g., data drift, performance drop). Role Summary: End-to-End ML workflow automation: data More ❯
Experience using managed languages such as Python, Go, C#, Java, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience designing tooling to simplify the operational management of SaaS/PaaS systems. Familiarity with building flexible and More ❯
and CI/CD workflows (GitLab CI). Write clean, production-grade code in Python (Scala is a bonus). Build infrastructure using Terraform, AWS CloudFormation, or SAM. Drive observability across the platform using Datadog or CloudWatch. Actively mentor Data Engineers and Associates, and lead technical discussions and design sessions. Key requirements: Must-Have: Strong experience with AWS services: Glue More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
RVU Co UK
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
DevOps: you build it, you run it. Tech Stack M&S uses a variety of technologies including; Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis Everyone’s Welcome We are ambitious about the future of retail. We’re disrupting, innovating and leading the industry into a more conscientious, inspiring digital More ❯
DevOps: you build it, you run it. Tech Stack M&S uses a variety of technologies including; Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis Everyone’s Welcome We are ambitious about the future of retail. We’re disrupting, innovating and leading the industry into a more conscientious, inspiring digital More ❯
Experience using managed languages such as Python, Go, C#, Java, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience designing tooling to simplify the operational management of SaaS/PaaS systems. Familiarity with building flexible and More ❯