london (city of london), south east england, united kingdom Hybrid / WFH Options
Understanding Recruitment
/infrastructure engineering role Strong scripting skills in Python , Bash , or Ruby Familiarity with configuration management tools (Ansible, Puppet, or Chef) Interest or exposure to observability tools like Datadog , Prometheus , or Grafana A passion for learning and improving in high-performance environments This is a rare chance to learn from elite engineers and contribute directly to a platform supporting global More ❯
AWS, GCP, or Azure). Familiarity with auth, billing, or subscription systems . Background in 3D graphics, creative tooling, or ML pipelines . Knowledge of observability tools like Grafana, Prometheus, or OpenTelemetry. This is a rare opportunity to join an early-stage team backed by leading deep-tech investors, building the foundation of a platform that fuses AI, creativity, and More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Paradigm Talent
AWS, GCP, or Azure). Familiarity with auth, billing, or subscription systems . Background in 3D graphics, creative tooling, or ML pipelines . Knowledge of observability tools like Grafana, Prometheus, or OpenTelemetry. This is a rare opportunity to join an early-stage team backed by leading deep-tech investors, building the foundation of a platform that fuses AI, creativity, and More ❯
london, south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
AWS, GCP, or Azure). Familiarity with auth, billing, or subscription systems . Background in 3D graphics, creative tooling, or ML pipelines . Knowledge of observability tools like Grafana, Prometheus, or OpenTelemetry. This is a rare opportunity to join an early-stage team backed by leading deep-tech investors, building the foundation of a platform that fuses AI, creativity, and More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
AWS, GCP, or Azure). Familiarity with auth, billing, or subscription systems . Background in 3D graphics, creative tooling, or ML pipelines . Knowledge of observability tools like Grafana, Prometheus, or OpenTelemetry. This is a rare opportunity to join an early-stage team backed by leading deep-tech investors, building the foundation of a platform that fuses AI, creativity, and More ❯
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root … cause analysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and … background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root … cause analysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and … background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root … cause analysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and … background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
london (city of london), south east england, united kingdom
BGC Group
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root … cause analysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and … background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
and platform engineering. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, CloudWatch, Lambda) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible Monitoring & Observability: Grafana, Prometheus CI/CD: GitHub Actions Automation & Scripting: Python, Bash, Go or Java What We’re Looking For Proven experience running AWS cloud infrastructure in a production or regulated (financial) environment. … Hands-on experience managing Kubernetes clusters (preferably EKS). Strong understanding of Infrastructure as Code using Terraform. Familiarity with monitoring and observability stacks such as Prometheus and Grafana. Experience building and maintaining CI/CD pipelines (GitHub Actions or similar). Strong scripting or automation skills using Python, Bash, Go or Java . A collaborative mindset — comfortable working alongside developers More ❯
Build and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. Collaborate closely with Salesforce … and Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ML platforms, real-time data More ❯
Build and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. Collaborate closely with Salesforce … and Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ML platforms, real-time data More ❯
/EKS knowledge to help the team overcome technical barriers. What They’re Looking For - 5–10 years’ hands-on Kubernetes (EKS on AWS) experience. - Strong skills with Terraform, Prometheus, and scaling infra. - Collaborative and adaptable in a fast-paced environment where priorities shift quickly. - Ability to solve technical challenges and mentor others through example. If you're interested and More ❯
/EKS knowledge to help the team overcome technical barriers. What They’re Looking For - 5–10 years’ hands-on Kubernetes (EKS on AWS) experience. - Strong skills with Terraform, Prometheus, and scaling infra. - Collaborative and adaptable in a fast-paced environment where priorities shift quickly. - Ability to solve technical challenges and mentor others through example. If you're interested and More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Venn Group
of technologies including RHEL, CentOS, Ubuntu, VMware, and F5 load balancers Manage web services, LAMP stack applications, Samba servers, and authentication proxies Utilise tools such as Ansible, Katello, Nagios, Prometheus, and Grafana for configuration and monitoring Automate routine tasks using scripts and infrastructure-as-code practices Maintain clear and up-to-date technical documentation Support knowledge sharing and training for More ❯
City of London, London, United Kingdom Hybrid / WFH Options
M-XR
models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track record More ❯
models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track record More ❯
london, south east england, united kingdom Hybrid / WFH Options
M-XR
models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track record More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
M-XR
models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track record More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Ncounter
EVPN, VLAN/VxLAN, MLAG, STP. Hands-on with Arista/Cisco; strong troubleshooting tools (Wireshark, netcat, etc.). Familiar with network security, automation (Python, Ansible), and observability stacks (Prometheus, Grafana). Excellent communicator with experience delivering in high-stakes, collaborative settings. STEM degree and CCNP/CCIE preferred. Why Join? Join a trusted global institution where networking is core More ❯
them with Kubernetes . Ensure high availability and scalability of AI services through robust orchestration strategies. Monitoring & Reliability Set up monitoring and alerting systems using Cloud Monitoring , Cloud Logging , Prometheus Troubleshoot infrastructure issues and ensure minimal downtime for critical AI services. Required Skills Strong hands-on experience with GCP services : Compute Engine, Kubernetes, Cloud Storage, BigQuery, Cloud Run. Proficient in …/CD tools : Google Cloud Build, Jenkins, GitHub Actions. Proven experience with Terraform and other IaC tools. Experience in multi-cloud environments . Familiarity with monitoring tools such as Prometheus Exposure to AI/ML infrastructure and data workflows in financial services. People Source Consulting Ltd is acting as an Employment Agency in relation to this vacancy. People Source specialise More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Alexander Ash Consulting
with research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ … experience with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯
with research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ … experience with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯
london, south east england, united kingdom Hybrid / WFH Options
Alexander Ash Consulting
with research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ … experience with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Alexander Ash Consulting
with research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ … experience with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯