Deploy, UrbanCode etc. • Containers – Docker, Kubernetes, Mesosphere etc. • Configuration Management – Ansible, Chef, Puppet etc. • Cloud – AWS preferred; multi clould experience ie with Azure, GCP etc. highly desirable • Monitoring – ELK, Prometheus, Splunk etc. • Experience in one of the following scripting language: Java, Bash, Python, Powershell, Golang, etc. • Experience working with Linux and/or Windows systems About you (ideally): • Demonstrate a More ❯
South East London, England, United Kingdom Hybrid / WFH Options
LHH
Kubernetes. • Strong scripting skills in Python, Bash, or PowerShell for automation. • Understanding of AWS networking concepts, including VPCs, subnets, security groups. • Experience with monitoring and logging solutions, such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch. • Familiarity with Zero Trust security models and best practices for securing cloud workloads. • Ability to troubleshoot complex infrastructure issues and optimize cloud deployments. Your More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Amber Labs
teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset Collaborative team player More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Explore Group
on production support Tech Stack Cloud: AWS (EKS, ECS, RDS, IAM, Lambda, etc.) IaC: Terraform, Terragrunt Containerisation: Docker, Kubernetes (EKS) CI/CD: GitHub Actions, Argo CD, Helm Monitoring: Prometheus, Grafana, CloudWatch, OpenTelemetry Languages: Python, Bash, Go (bonus) What We're Looking For Strong experience in SRE, DevOps, or Production Engineering roles Proven hands-on skills with AWS , Terraform , and More ❯
Production experience with Kubernetes and cloud-native deployment strategies. Hands-on with AWS, GCP, and Azure for compute, networking, and storage configurations. Familiarity with monitoring/logging tools (e.g., Prometheus, Grafana, ELK stack). Trading Systems & Finance: Solid understanding of trading infrastructure, latency optimization, execution systems, and market data feeds. Experience working in or with quantitative research, HFT, or hedge More ❯
Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and … dstat for monitoring storage and system performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … continuity What Were Looking For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … What We’re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
reliability across production and non-production environments. You will be working on incident response, capacity planning, WAN optimization, and system observability so should have experience with tools such as Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers Provide production support for messaging-related incidents, including root cause analysis and resolution. Monitor system performance … and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and future demand. Automate routine maintenance tasks and … years of experience administering Solace PubSub+ messaging systems. Strong background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN Solid experience with Prometheus and Grafana Proficiency in troubleshooting Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix systems and scripting Beneficial skills: Experience with containerized environments such as More ❯
East London, London, United Kingdom Hybrid / WFH Options
Oliver Bernard
/technologies as possible: AWS Cloud and AWS Services Containerisation with Docker and/or Kubernetes Terraform Strong CI/CD (GitOps, ArgoCD, CircleCI etc) knowledge Monitoring experience with Prometheus and Grafana Linux and Network concepts Front Office trading exerience is a must The role can offer remote working anywhere in the UK. More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Oliver Bernard
/technologies as possible: AWS Cloud and AWS Services Containerisation with Docker and/or Kubernetes Terraform Strong CI/CD (GitOps, ArgoCD, CircleCI etc) knowledge Monitoring experience with Prometheus and Grafana Linux and Network concepts Front Office trading exerience is a must The role can offer remote working anywhere in the UK. More ❯
for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands-on Linux (RHEL … managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
common patterns and implementing best practices Exposure to secrets management platforms (e.g., HashiCorp Vault) Familiarity with infrastructure as code using Terraform Experience with monitoring, logging, and security tools (e.g., Prometheus, Grafana, and BQL) Expertise in containerization and orchestration using Kubernetes for deployments Experience working with high-availability systems architecture and the ability to support critical scalable and robust systems Bachelor More ❯
Ansible Strong debugging, testing, and performance tuning skills Nice to Have: Experience with event-driven architecture and message queues (e.g., Pub/Sub, Kafka) Familiarity with observability tools (e.g., Prometheus, Grafana, Stackdriver) Understanding of security best practices in microservices and API development Experience working in Agile/Scrum environments More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Unitary
you if: Have worked with visualisation tools such as Grafana for creating and maintaining dashboards that provide meaningful insights into system performance Are proficient with metrics platforms such as Prometheus, InfluxDB, or OpenTelemetry for collecting and analysing system data Have experience with incident management tools such as Incident.io for coordinating response efforts and recording follow-up learnings and actions Can More ❯
more programming languages (Go, Rust, C++, Java) Proven experience in troubleshooting and resolving complex issues in large scale backend system Experience with observability stack (ex. Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana) and Infrastructure-as-code (ex. Terraform) Experience with building platform solutions/services on top of major cloud providers (GCP, AWS) is a plus Experience with building and operating More ❯
roles. Strong experience with Java (Spring) and cloud platforms (ideally Azure ). Proven track record in building and maintaining mission-critical systems. Deep understanding of Kubernetes, observability tooling (Grafana, Prometheus, ELK, etc.), and Infrastructure as Code (Terraform, Bicep). Ability to lead technical conversations across Engineering and Product. Bonus points if you bring: Experience in fintech, crypto, or regulated digital More ❯
suit a software engineer who cares about clean, testable code and good software practices, but prefers working in the infra/tooling space. What you’ll be doing: Writing Prometheus exporters and integrations for infrastructure systems Building out dashboards and monitoring pipelines in Grafana and Prometheus Developing infrastructure-as-code tooling (Terraform, Ansible) Designing well-structured, testable software that improves … system visibility What they’re looking for: Strong software engineering skills (Go or Python preferred) Experience working in or alongside platform engineering teams Familiarity with modern observability tools (Grafana, Prometheus, etc.) Comfort working across both code and infrastructure – but this is not a pure ops/SRE role If you've worked in finance that would be great but not More ❯
System Engineer within financial services Know how to write good code (Go, Python, Bash, etc.). Know how to use virtualization (Docker, KVM, etc.). Familiar with monitoring systems (Prometheus, Grafana, etc.). Know about networking hardware (switches, routers). If this opportunity is of interest, please reach out to Daniel O'Connell directly on LinkedIn or email at daniel.oconnell More ❯
cloud-native tools and scripting (e.g., Terraform, Ansible, AWS RDS/Aurora tools, Azure SQL automation). Monitoring & Health Checks: Utilize tools such as CloudWatch, Azure Monitor, OEM, or Prometheus to monitor performance and availability. Troubleshooting & Root Cause Analysis: Diagnose and resolve database incidents; conduct RCAs for critical incidents and outages. Collaboration: Work closely with DevOps, Application, and Security teams More ❯