CI/CD pipelines Hands-on experience with infrastructure-as-code (e.g. Terraform) Deep understanding of security best practices in cloud and application delivery Exposure to observability tooling (Prometheus, Grafana, structured logging, etc.) Confident debugging and resolving issues in complex distributed systems Background in B2B SaaS web applications, with familiarity in Node a plus Able to operate autonomously within small More ❯
West London, London, United Kingdom Hybrid/Remote Options
Staffworx Limited
End development. Familiarity with testing frameworks (Vitest, Playwright) for both API and end-to-end testing. Experience with Docker, Helm, YAML, Kubernetes, and cloud-native deployments. Telemetry tools; Prometheus, Grafana, OpenTelemetry, DataDog, APM tools Understanding of infrastructure-as-code and CI/CD pipelines. Ability to improve codebases and influence architectural direction. Experience mentoring or coaching engineers. Please send updated More ❯
london, south east england, united kingdom Hybrid/Remote Options
ZILO
PostgreSQL — enough to run queries, analyse data, and perform safe fixes. Familiarity with Kubernetes and modern cloud platforms (AWS, GCP, or Azure). Understanding of incident management, observability tools (Grafana, Prometheus, etc.) A mindset focused on reliability, quality, and ownership. Benefits Enhanced leave - 38 days inclusive of 8 UK Public Holidays Private Health Care including family cover Life Assurance – 5x More ❯
london, south east england, united kingdom Hybrid/Remote Options
Insignis
Azure SQL for our relational data. Our architecture is - where appropriate - event-driven with Kafka. We perform integration testing with Cypress and Playwright. We monitor our systems using AppInsights, Grafana, and Zenduty. We ensure code quality with static code analysis using SonarCloud. Requirements We recognise that it's unlikely for anyone to possess every skill listed here. What's important More ❯
majority of our backend codebase AWS & GCP - we're cloud-native Microservice based architecture Kubernetes (EKS) TeamCity for CI/CD (with multiple production releases per day) Terraform and Grafana Our Interview process Interviewing is a two way process and we want you to have the time and opportunity to get to know us, as much as we are getting More ❯
london, south east england, united kingdom Hybrid/Remote Options
Starling
AWS & GCP - we're cloud-native Microservice based architecture Kubernetes (EKS) TeamCity for CI/CD (lots of teams are releasing code 15-20 times per day) Terraform and Grafana Interview Process Interviewing is a two way process and we want you to have the time and opportunity to get to know us, as much as we are getting to More ❯
administering Salesforce (especially Marketing Cloud) is mandatory Basic knowledge of Java, hands on with Linux and Terraform experience Ability to write scripts in Python Experience with monitoring tools (e.g., Grafana) Understanding of full-stack web/mobile technologies, protocols, and web server standards Speaks fluent Mandarin/Cantonese is mandatory Robert Walters Operations Limited is an employment business and employment More ❯
Crawley, Sussex, United Kingdom Hybrid/Remote Options
Akixi
systems. Familiarity with telephony platforms (Microsoft Teams, SIP, or Graph APIs). Experience working in Agile/Scrum teams. Bonus Skills Experience with BroadWorks or Webex platforms. Familiarity with Grafana, Datadog, or CloudWatch monitoring. ISTQB or similar QA certification. Why Join Akixi Work on cutting-edge analytics for cloud communications. Be part of a growing QA engineering function driving automation More ❯
Newcastle Upon Tyne, Tyne and Wear, England, United Kingdom
Nigel Wright Group
business stakeholders, end0users and technologists ITIL (or similar) certification (or experience working within an ITIL framework) Strong understanding of application design, rational databases (SQL Server), monitoring and alerting tools (Grafana, Prometheus, Victoria Metrics), scheduling tools (Control-M), operating systems (Windows/Linux), Kubernetes, cloud platforms (Azure), issue tracking and source control (JIRA, Git, Bitbucket). Interview Process: Coding Challenge – We More ❯
predictive analytics. Understanding of AI frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn) and their application in network automation and monitoring. Experience with telemetry and observability frameworks (e.g., Prometheus, Grafana) for real-time network monitoring and troubleshooting. Experience : Minimum of 7 years' of experience in network engineering, operations, and support. Proven ability to work hands-on and take strong technical More ❯
Core Services – EC2, RDS, S3, IAM, Lambda, CloudWatch) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible CI/CD Pipelines: GitHub Actions Monitoring & Observability: Grafana, Prometheus Scripting/Automation: Python or Java What We’re Looking For Proven experience managing and scaling AWS cloud environments , ideally supporting live software products or high-traffic platforms. Strong … background in Terraform and Infrastructure as Code best practices. Practical experience with Kubernetes (EKS) in production. Familiarity with monitoring and observability tools such as Grafana and Prometheus. Hands-on experience building CI/CD pipelines (GitHub Actions, Jenkins, CircleCI, etc.). Solid scripting and automation experience using Python or Java . A collaborative engineer who enjoys working closely with developers More ❯
and backend systems. Implement observability best practices using OpenTelemetry (OTEL) for tracing, metrics, and logging. Collaborate with platform and DevOps teams to integrate telemetry data with systems such as Grafana, Prometheus, Jaeger, or Tempo. Define and maintain instrumentation standards across Java applications to ensure consistency and performance visibility. Diagnose complex production issues through telemetry data and performance profiling. Contribute to More ❯
deployment patterns Ability to analyse issues across application, network, and infrastructure layers Clear communication skills and the ability to collaborate across engineering teams Useful Extras Experience with Prometheus or Grafana Knowledge of Terraform, Ansible, or similar infrastructure as code tools If you are a practical engineer who enjoys owning and improving Solace-based messaging platforms and wants to play a More ❯
deployment patterns• Ability to analyse issues across application, network, and infrastructure layers• Clear communication skills and the ability to collaborate across engineering teams Useful Extras • Experience with Prometheus or Grafana• Knowledge of Terraform, Ansible, or similar infrastructure as code tools If you are a practical engineer who enjoys owning and improving Solace-based messaging platforms and wants to play a More ❯
perfect environment for you. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, Lambda, CloudWatch) Containerisation & Orchestration: Docker, Kubernetes (EKS) Infrastructure as Code: Terraform Configuration Management: Ansible Monitoring & Observability: Prometheus, Grafana, ELK Stack CI/CD: GitHub Actions Scripting & Automation: Python, Bash, or Go What Youll Be Doing Designing and maintaining reliable, scalable, and secure infrastructure for production systems. Automating operational … Were Looking For Strong experience running cloud infrastructure (AWS preferred) in production. Proven background in Kubernetes operations (EKS, Helm, or similar). Solid knowledge of monitoring, alerting, and logging (Grafana, Prometheus, ELK). Hands-on experience with Terraform and CI/CD tooling. Strong scripting or development background (Python, Go, or similar). Excellent troubleshooting skills and a proactive, problem More ❯
and improve autoscaling, high availability and managed service adoption across the platform. Collaborate with SRE, Security and Engineering teams to enhance observability, monitoring and alerting through tools like Prometheus, Grafana and CloudWatch. Partner with Security to embed best practices for IAM, secrets management, WAF, and posture management. Optimise performance and cloud spend through automation tools and cost visibility dashboards Participate … knowledge of Kubernetes operations on AWS (EKS), including cluster scaling, deployment automation, and monitoring. Solid background in Linux administration, networking, and cloud security principles. Familiarity with observability tools (Prometheus, Grafana, Loki) and structured alerting practices. Experience with database migrations, HA configurations, backups, and DR strategies. Strong scripting and automation skills (Terraform, Python, Bash, or similar). Excellent communication and collaboration More ❯
london, south east england, united kingdom Hybrid/Remote Options
Black Pen Recruitment
extensive use of automation tools such as Terraform and Ansible, alongside programming in Python. Their environments are entirely based on Ubuntu Linux. Experience with server monitoring software (e.g. Prometheus, Grafana, Zabbix, Datadog) and a solid understanding of security principles and best practices (including hardening, access control, auditing, and incident response) is highly valued. This is a remote-first role, and … Terraform, Pulumi) Configuration management with Ansible Cloud platforms (AWS, Azure) Containerization (LXC, LXD, Docker, Kubernetes) CI/CD tooling (TeamCity, Jenkins, GitHub Actions) Server monitoring and alerting systems (Prometheus, Grafana, Zabbix, Datadog) Strong Python programming skills Solid Linux administration and general networking knowledge Understanding of infrastructure security best practices, including secure configuration, identity and access management, and compliance controls Experience More ❯
Science, Engineering, or related field. Strong programming skills in Go (ideally) Rust or C++. Solid experience in building and supporting complex backend systems at scale. Experience with Elasticsearch, Prometheus, Grafana and/or Datadog. Exposure either AWS or GCP plus IaC, (Terraform or similar) would be beneficial. Knowledge with open-source storage tools (Ceph, Minio, JuiceFS or Fuse) and familiarity More ❯
Science, Engineering, or related field. Strong programming skills in Go (ideally) Rust or C++. Solid experience in building and supporting complex backend systems at scale. Experience with Elasticsearch, Prometheus, Grafana and/or Datadog. Exposure either AWS or GCP plus IaC, (Terraform or similar) would be beneficial. Knowledge with open-source storage tools (Ceph, Minio, JuiceFS or Fuse) and familiarity More ❯
to-end systems and processes Experience of network support and troubleshooting Exposure to UK and EU equity markets Desirable Prior experience in a similar role Knowledge or experience of Grafana Previous experience of Binary Protocols Previous experience of the Atlassian suite of products TCP/UDP knowledge Job Offer Competitive salary ranging from £50,000 to £70,000 per annum. More ❯