Deploy, UrbanCode etc. • Containers – Docker, Kubernetes, Mesosphere etc. • Configuration Management – Ansible, Chef, Puppet etc. • Cloud – AWS preferred; multi clould experience ie with Azure, GCP etc. highly desirable • Monitoring – ELK, Prometheus, Splunk etc. • Experience in one of the following scripting language: Java, Bash, Python, Powershell, Golang, etc. • Experience working with Linux and/or Windows systems About you (ideally): • Demonstrate a More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Amber Labs
teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset Collaborative team player More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Explore Group
on production support Tech Stack Cloud: AWS (EKS, ECS, RDS, IAM, Lambda, etc.) IaC: Terraform, Terragrunt Containerisation: Docker, Kubernetes (EKS) CI/CD: GitHub Actions, Argo CD, Helm Monitoring: Prometheus, Grafana, CloudWatch, OpenTelemetry Languages: Python, Bash, Go (bonus) What We're Looking For Strong experience in SRE, DevOps, or Production Engineering roles Proven hands-on skills with AWS , Terraform , and More ❯
Production experience with Kubernetes and cloud-native deployment strategies. Hands-on with AWS, GCP, and Azure for compute, networking, and storage configurations. Familiarity with monitoring/logging tools (e.g., Prometheus, Grafana, ELK stack). Trading Systems & Finance: Solid understanding of trading infrastructure, latency optimization, execution systems, and market data feeds. Experience working in or with quantitative research, HFT, or hedge More ❯
using Kubernetes, ECS, or similar. Manage Helm charts or Customise templates and enforce container security standards. Drive Observability and Operational Readiness - Implement monitoring, logging, and alerting with tools like Prometheus, Grafana, ELK, or Datadog. Create dashboards and promote the adoption of SLOs and error budgets. Embed Security and Support Compliance - Integrate security into pipelines (e.g. secrets detection, policy-as-code More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
with root cause analysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. Develop and maintain … the DevOps Engineer level Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/Kubernetes operations Working More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … continuity What Were Looking For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
and software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity … What We’re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well More ❯
engineering experience in performance-critical environments Proficiency in Python and bash Scripting, with hands-on Ansible experience Solid networking fundamentals: IP Addressing, VLANs, etc. Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools More ❯
and agile development practices Nice to Have: Experience with GraphQL Task queues and async jobs (Celery, BullMQ, RabbitMQ) Familiarity with cloud platforms (AWS, GCP, Azure) Logging and monitoring (ELK, Prometheus, etc.) Soft Skills: Strong problem-solving mindset Clear documentation and communication skills Team player with the ability to work independently Why Join Us? Work on a large-scale, industry-first More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Hunter Bond
Engineer – Trading Client: FinTech Salary: £120,000-£220,000 + Bonus Location: London/Hybrid Skills: Linux, Chef, Ansible, NFS, GPFS, Weka, Python, Go, Rust, CI/CD, ELK, Prometheus, Grafana, AWS, GCP The role: My client are seeking a Linux Compute and Storage Engineer to join their team. You will predominantly work on complex low latency Linux compute systems. … Chef or Ansible for configuration management GPFS, NFS, Weka etc. Some experience working in either Python, Go or Rust Familiarity with CI/CD and Agile practices Observability – ELK, Prometheus, Grafana Degree in relevant subject highly desirable Please apply ASAP for more information. More ❯
Strong understanding of software design patterns , clean code practices, and Agile methodologies Nice to Have: Experience with GraphQL or gRPC Exposure to monitoring/logging tools (e.g., CloudWatch, ELK, Prometheus) Knowledge of security best practices in API and cloud development Familiarity with data streaming using Kafka or Kinesis More ❯
experience with AWS services Proven knowledge of Kubernetes and containerised application delivery Infrastructure as Code with Terraform CI/CD pipelines using GitLab or Drone Monitoring and logging – Grafana, Prometheus, or CloudWatch Experience in secure environments – knowledge of IAM, Vault, networking Active UK*C or Enhanced DV Clearance is a must (sole British nationals only) TO BE CONSIDERED: Please apply More ❯
for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands-on Linux (RHEL … managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
Ansible Strong debugging, testing, and performance tuning skills Nice to Have: Experience with event-driven architecture and message queues (e.g., Pub/Sub, Kafka) Familiarity with observability tools (e.g., Prometheus, Grafana, Stackdriver) Understanding of security best practices in microservices and API development Experience working in Agile/Scrum environments More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Searchability NS&D
with Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Unitary
you if: Have worked with visualisation tools such as Grafana for creating and maintaining dashboards that provide meaningful insights into system performance Are proficient with metrics platforms such as Prometheus, InfluxDB, or OpenTelemetry for collecting and analysing system data Have experience with incident management tools such as Incident.io for coordinating response efforts and recording follow-up learnings and actions Can More ❯
roles. Strong experience with Java (Spring) and cloud platforms (ideally Azure ). Proven track record in building and maintaining mission-critical systems. Deep understanding of Kubernetes, observability tooling (Grafana, Prometheus, ELK, etc.), and Infrastructure as Code (Terraform, Bicep). Ability to lead technical conversations across Engineering and Product. Bonus points if you bring: Experience in fintech, crypto, or regulated digital More ❯
suit a software engineer who cares about clean, testable code and good software practices, but prefers working in the infra/tooling space. What you’ll be doing: Writing Prometheus exporters and integrations for infrastructure systems Building out dashboards and monitoring pipelines in Grafana and Prometheus Developing infrastructure-as-code tooling (Terraform, Ansible) Designing well-structured, testable software that improves … system visibility What they’re looking for: Strong software engineering skills (Go or Python preferred) Experience working in or alongside platform engineering teams Familiarity with modern observability tools (Grafana, Prometheus, etc.) Comfort working across both code and infrastructure – but this is not a pure ops/SRE role If you've worked in finance that would be great but not More ❯
System Engineer within financial services Know how to write good code (Go, Python, Bash, etc.). Know how to use virtualization (Docker, KVM, etc.). Familiar with monitoring systems (Prometheus, Grafana, etc.). Know about networking hardware (switches, routers). If this opportunity is of interest, please reach out to Daniel O'Connell directly on LinkedIn or email at daniel.oconnell More ❯
cloud-native tools and scripting (e.g., Terraform, Ansible, AWS RDS/Aurora tools, Azure SQL automation). Monitoring & Health Checks: Utilize tools such as CloudWatch, Azure Monitor, OEM, or Prometheus to monitor performance and availability. Troubleshooting & Root Cause Analysis: Diagnose and resolve database incidents; conduct RCAs for critical incidents and outages. Collaboration: Work closely with DevOps, Application, and Security teams More ❯
Computer Science, Engineering, or related field. Strong programming skills in Go (ideally) Rust or C++. Solid experience in building and supporting complex backend systems at scale. Experience with Elasticsearch, Prometheus, Grafana and/or Datadog. Exposure either AWS or GCP plus IaC, (Terraform or similar) would be beneficial. Knowledge with open-source storage tools (Ceph, Minio, JuiceFS or Fuse) and More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Ncounter Technology Recruitment
EVPN, VLAN/VxLAN, MLAG, STP. Hands-on with Arista/Cisco; strong troubleshooting tools (Wireshark, netcat, etc.). Familiar with network security, automation (Python, Ansible), and observability stacks (Prometheus, Grafana). Excellent communicator with experience delivering in high-stakes, collaborative settings. STEM degree and CCNP/CCIE preferred. Why Join? Join a trusted global institution where networking is core More ❯
East London, London, United Kingdom Hybrid / WFH Options
Opus Recruitment Solutions
AI Ops | Grafana | Observability | Pagerduty | Prometheus | SRE | Site Reliability Engineer | Telecommunications | Consultant | Dashboard | Systems Engineer Looking to make a step into SRE? Excited by the prospect of AI Ops? I've partnered an exciting business who've recently been acquired by a European leader in the AI Ops Consultancy space. Taking on their UK market to replicate their consistent success … hands on with AI Ops then get in touch. In return the role offers £55k and an opportunity to work remotely within the UK. AI Ops | Grafana | Observability | Pagerduty | Prometheus | SRE | Site Reliability Engineer | Telecommunications | Consultant | Dashboard | Systems Engi More ❯