Deploy, UrbanCode etc. • Containers – Docker, Kubernetes, Mesosphere etc. • Configuration Management – Ansible, Chef, Puppet etc. • Cloud – AWS preferred; multi clould experience ie with Azure, GCP etc. highly desirable • Monitoring – ELK, Prometheus, Splunk etc. • Experience in one of the following scripting language: Java, Bash, Python, Powershell, Golang, etc. • Experience working with Linux and/or Windows systems About you (ideally): • Demonstrate a More ❯
Deploy, UrbanCode etc. • Containers – Docker, Kubernetes, Mesosphere etc. • Configuration Management – Ansible, Chef, Puppet etc. • Cloud – AWS preferred; multi clould experience ie with Azure, GCP etc. highly desirable • Monitoring – ELK, Prometheus, Splunk etc. • Experience in one of the following scripting language: Java, Bash, Python, Powershell, Golang, etc. • Experience working with Linux and/or Windows systems About you (ideally): • Demonstrate a More ❯
City of London, England, United Kingdom Hybrid / WFH Options
VE3
networking (DNS, TCP/IP, VPN, firewalls). Knowledge of containerization technologies (Docker, ECS, EKS, or Kubernetes). Experience with monitoring/logging tools such as CloudWatch, ELK Stack, Prometheus/Grafana. Excellent problem-solving skills and the ability to work independently. Preferred Qualifications AWS Certified SysOps Administrator/DevOps Engineer – Professional. Experience with hybrid cloud/on-prem environments. More ❯
City of London, Greater London, UK Hybrid / WFH Options
Amber Labs
teams using tools like Git, Jira, and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault, Consul, Packer Monitoring and observability with Grafana, Prometheus, or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset Collaborative team player More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Amber Labs
teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset Collaborative team player More ❯
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root … cause analysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and … background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana. Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root cause … analysis and resolution. Monitor system performance and health using Prometheus and Grafana; proactively identify and address anomalies. Configure and optimize Solace across WAN environments, ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning, scaling, and tuning of Solace infrastructure to meet current and future … background in production support, preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN, with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management, performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
tools; Maven, Gradle or other build tools; Ansible or other IT Automation/software provisioning tools; JIRA, Confluence; * Experience in monitoring/reporting tools such as Splunk, Grafana/Prometheus etc * Experience in Agile practices * Working knowledge of environment monitoring tools such as GCO, NewRelic, Prometheus, Grafana. * Collaboration Skills: Proactive can-do attitude; A creative approach towards solving technical problems More ❯
canary releases). Monitoring, Logging & Alerting: Implement comprehensive monitoring, logging, and alerting systems to proactively identify and address performance issues, errors, and security threats. Use tools like Azure Monitor, Prometheus, Grafana, or similar to collect and analyse metrics, logs, and traces. Configure alerts and notifications to ensure timely responses to critical events. Security & Compliance: Implement security best practices and controls More ❯
canary releases). Monitoring, Logging & Alerting: Implement comprehensive monitoring, logging, and alerting systems to proactively identify and address performance issues, errors, and security threats. Use tools like Azure Monitor, Prometheus, Grafana, or similar to collect and analyse metrics, logs, and traces. Configure alerts and notifications to ensure timely responses to critical events. Security & Compliance: Implement security best practices and controls More ❯
City of London, Greater London, UK Hybrid / WFH Options
Explore Group
on production support Tech Stack Cloud: AWS (EKS, ECS, RDS, IAM, Lambda, etc.) IaC: Terraform, Terragrunt Containerisation: Docker, Kubernetes (EKS) CI/CD: GitHub Actions, Argo CD, Helm Monitoring: Prometheus, Grafana, CloudWatch, OpenTelemetry Languages: Python, Bash, Go (bonus) What We're Looking For Strong experience in SRE, DevOps, or Production Engineering roles Proven hands-on skills with AWS, Terraform, and More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Explore Group
on production support Tech Stack Cloud: AWS (EKS, ECS, RDS, IAM, Lambda, etc.) IaC: Terraform, Terragrunt Containerisation: Docker, Kubernetes (EKS) CI/CD: GitHub Actions, Argo CD, Helm Monitoring: Prometheus, Grafana, CloudWatch, OpenTelemetry Languages: Python, Bash, Go (bonus) What We're Looking For Strong experience in SRE, DevOps, or Production Engineering roles Proven hands-on skills with AWS , Terraform , and More ❯
Production experience with Kubernetes and cloud-native deployment strategies. Hands-on with AWS, GCP, and Azure for compute, networking, and storage configurations. Familiarity with monitoring/logging tools (e.g., Prometheus, Grafana, ELK stack). Trading Systems & Finance: Solid understanding of trading infrastructure, latency optimization, execution systems, and market data feeds. Experience working in or with quantitative research, HFT, or hedge More ❯
Production experience with Kubernetes and cloud-native deployment strategies. Hands-on with AWS, GCP, and Azure for compute, networking, and storage configurations. Familiarity with monitoring/logging tools (e.g., Prometheus, Grafana, ELK stack). Trading Systems & Finance: Solid understanding of trading infrastructure, latency optimization, execution systems, and market data feeds. Experience working in or with quantitative research, HFT, or hedge More ❯
advantageous: Software development in web technologies or OOP (e.g., Python, Java, etc.) Database tech: Oracle SQL, PostgreSQL, MongoDB Proficient with Linux/Windows command line (Bash, PowerShell) Monitoring: Grafana, Prometheus, ELK, Splunk Agile working and tooling (e.g., Jira, Confluence) Diagnosing and resolving complex system issues ITIL knowledge or exposure to IT service operations Containerisation: Docker, Kubernetes, OpenShift Awareness of modern More ❯
City of London, England, United Kingdom Hybrid / WFH Options
Parser Limited
relevant tools. Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency: 2-3 times a week at the office. More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … What We’re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … What We’re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
high system availability, enabling rapid delivery through CI/CD, and supporting development teams with robust infrastructure and tooling. A key part of the role includes proactive monitoring using Prometheus, Grafana, and Splunk, as well as participating in on-call rotations to respond to live incidents. Collaboration across engineering, security, and product teams is essential to build scalable and resilient … cause analysis and preventive measures. 3. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. 4. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. 5. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. 6. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. 7. … Engineer level 2. Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements 3. Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL 4. Proficient in one or more languages of Python, Go, Bash, SQL 5. Familiar with GitHub/GitOps/container orchestration/Kubernetes More ❯
engineering experience in performance-critical environments Proficiency in Python and bash Scripting, with hands-on Ansible experience Solid networking fundamentals: IP Addressing, VLANs, etc. Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools More ❯
engineering experience in performance-critical environments Proficiency in Python and bash Scripting, with hands-on Ansible experience Solid networking fundamentals: IP Addressing, VLANs, etc. Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools More ❯
infrastructure across on-prem and AWS Administer and optimise Kubernetes clusters and containerised pipelines Implement and maintain Infrastructure as Code using Terraform Improve observability and resilience using tools like Prometheus Manage and monitor GitLab CI/CD pipelines for multi-platform builds (Linux, Windows, macOS) Collaborate with engineering teams to optimise developer workflows and apply DevOps best practices Set clear More ❯
City of London, Greater London, UK Hybrid / WFH Options
Vertus Partners
and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident reviews and implement More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Vertus Partners
and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident reviews and implement More ❯
by thought leaders like Martin Fowler. Hands-on experience building and maintaining complex CI/CD pipelines , preferably with GitHub Actions . Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Google Cloud's operations suite). A solid understanding of networking principles and cloud security best practices. Experience with other cloud platforms like Amazon Web Services (AWS) or Microsoft More ❯