availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root cause analysis … and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and future demand. … production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix systems and More ❯
availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root cause analysis … and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and future demand. … production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix systems and More ❯
london (city of london), south east england, united kingdom
BGC Group
availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root cause analysis … and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and future demand. … production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix systems and More ❯
perfect environment for you. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, Lambda, CloudWatch) Containerisation & Orchestration: Docker, Kubernetes (EKS) Infrastructure as Code: Terraform Configuration Management: Ansible Monitoring & Observability: Prometheus, Grafana, ELK Stack CI/CD: GitHub Actions Scripting & Automation: Python, Bash, or Go What Youll Be Doing Designing and maintaining reliable, scalable, and secure infrastructure for production systems. Automating operational … Were Looking For Strong experience running cloud infrastructure (AWS preferred) in production. Proven background in Kubernetes operations (EKS, Helm, or similar). Solid knowledge of monitoring, alerting, and logging (Grafana, Prometheus, ELK). Hands-on experience with Terraform and CI/CD tooling. Strong scripting or development background (Python, Go, or similar). Excellent troubleshooting skills and a proactive, problem More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Ncounter
VLAN/VxLAN, MLAG, STP. Hands-on with Arista/Cisco; strong troubleshooting tools (Wireshark, netcat, etc.). Familiar with network security, automation (Python, Ansible), and observability stacks (Prometheus, Grafana). Excellent communicator with experience delivering in high-stakes, collaborative settings. STEM degree and CCNP/CCIE preferred. Why Join? Join a trusted global institution where networking is core to More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Venn Group
including RHEL, CentOS, Ubuntu, VMware, and F5 load balancers Manage web services, LAMP stack applications, Samba servers, and authentication proxies Utilise tools such as Ansible, Katello, Nagios, Prometheus, and Grafana for configuration and monitoring Automate routine tasks using scripts and infrastructure-as-code practices Maintain clear and up-to-date technical documentation Support knowledge sharing and training for first- and More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Eligo Recruitment
ll Bring Strong experience with GCP , Terraform , and Infrastructure-as-Code Deep knowledge of cloud networking, security automation, and compliance standards Proficiency in CI/CD pipelines , monitoring tools (Grafana, Datadog), and scripting A collaborative mindset with excellent communication and mentoring skills Why Join? Shape a next-gen AI infrastructure with autonomy and purpose Hybrid working with regular meetups in More ❯
data models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track More ❯
City of London, London, United Kingdom Hybrid / WFH Options
M-XR
data models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track More ❯
london, south east england, united kingdom Hybrid / WFH Options
M-XR
data models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
M-XR
data models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track More ❯
and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. Collaborate closely with Salesforce DevOps … Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ML platforms, real-time data pipelines More ❯
and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. Collaborate closely with Salesforce DevOps … Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ML platforms, real-time data pipelines More ❯
mandatory Strong understanding of monitoring, observability, and telemetry (metrics, logs, traces) Ability to translate technical concepts into actionable business requirements Hands-on experience with tools such as Datadog, BigPanda, Grafana would be desirable Excellent stakeholder management skills - including product and engineering teams. If you think this role is a good fit - apply now! Hays Specialist Recruitment Limited acts as an More ❯
london (city of london), south east england, united kingdom
Hays
mandatory Strong understanding of monitoring, observability, and telemetry (metrics, logs, traces) Ability to translate technical concepts into actionable business requirements Hands-on experience with tools such as Datadog, BigPanda, Grafana would be desirable Excellent stakeholder management skills - including product and engineering teams. If you think this role is a good fit - apply now! Hays Specialist Recruitment Limited acts as an More ❯
mandatory Strong understanding of monitoring, observability, and telemetry (metrics, logs, traces) Ability to translate technical concepts into actionable business requirements Hands-on experience with tools such as Datadog, BigPanda, Grafana would be desirable Excellent stakeholder management skills - including product and engineering teams. If you think this role is a good fit - apply now! Hays Specialist Recruitment Limited acts as an More ❯
and stakeholder management skills Familiarity with Agile or Kanban methodologies Experience using project management tools such as Jira or Confluence Comfort working with data visualisation tools such as Tableau, Grafana, or Power BI Ability to thrive in an environment that moves quickly and values execution Nice to have: Background in cybersecurity, data analytics, or enterprise SaaS Experience implementing automation or More ❯
and stakeholder management skills Familiarity with Agile or Kanban methodologies Experience using project management tools such as Jira or Confluence Comfort working with data visualisation tools such as Tableau, Grafana, or Power BI Ability to thrive in an environment that moves quickly and values execution Nice to have: Background in cybersecurity, data analytics, or enterprise SaaS Experience implementing automation or More ❯
and stakeholder management skills Familiarity with Agile or Kanban methodologies Experience using project management tools such as Jira or Confluence Comfort working with data visualisation tools such as Tableau, Grafana, or Power BI Ability to thrive in an environment that moves quickly and values execution Nice to have: Background in cybersecurity, data analytics, or enterprise SaaS Experience implementing automation or More ❯
london (city of london), south east england, united kingdom
Method Resourcing
and stakeholder management skills Familiarity with Agile or Kanban methodologies Experience using project management tools such as Jira or Confluence Comfort working with data visualisation tools such as Tableau, Grafana, or Power BI Ability to thrive in an environment that moves quickly and values execution Nice to have: Background in cybersecurity, data analytics, or enterprise SaaS Experience implementing automation or More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
Eligo Recruitment
indexing, and capacity planning for mission-critical systems Develop secure backup, recovery, and disaster recovery procedures Explore multi-tenant and sharded architectures to support growth Implement monitoring strategies using Grafana, Datadog, and CI/CD integrations Champion database best practices, mentor teams, and standardize tooling and automation What You’ll Bring Extensive experience managing cloud-hosted PostgreSQL at scale Proficiency More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Alexander Ash Consulting
research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ years … with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯
research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ years … with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯
london, south east england, united kingdom Hybrid / WFH Options
Alexander Ash Consulting
research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ years … with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Alexander Ash Consulting
research and infrastructure teams to deliver scalable, reliable solutions. Drive automation using Terraform, Ansible, GitLab, Jenkins , and support SDLC best practices. Provide visibility and performance monitoring using Splunk, Prometheus, Grafana . Contribute to containerisation and orchestration strategy with Docker and Kubernetes . Stay ahead of industry trends, conduct POCs, and deliver technical recommendations. What We’re Looking For 10+ years … with DevOps and CI/CD tooling (Terraform, Ansible, GitLab, Jenkins). Programming/scripting knowledge in Python, Golang, or similar . Experience with metrics visualisation tools (Splunk, Prometheus, Grafana). Knowledge of containerisation and orchestration (Docker, Kubernetes). Experience in hedge funds, trading firms, or other low-latency/HPC environments is highly desirable. More ❯