London, South East, England, United Kingdom Hybrid / WFH Options
Lorien
ability to work independently or lead a small team Nice to Have: Experience with TYK API Gateway Exposure to microservices and event-driven architectures Familiarity with observability tools (e.g., Prometheus, Grafana) Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy. More ❯
with Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Searchability NS&D
with Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must More ❯
Hands-on experience in technical integrations and POCs Comfortable coding in any high-level programming language (Java, Go, Python) Strong hands-on knowledge of Kubernetes, AWS, Azure, GCP, Docker, Prometheus, and OpenTelemetry Industry knowledge and opinions on Monitoring, Observability, Log Management, SIEM Engineering/DevOps Background - advantage Experience in Technical Sales of Log Analytics/Monitoring/APM/SIEM More ❯
penetration testing coordination SaaS or multi-tenant platform delivery models Data protection regulations (e.g. GDPR, ISO 27001) Disaster Recovery (DR), high availability (HA), and business continuity planning Observability tooling: Prometheus, Grafana, Azure Monitor, Log Analytics Role Context This senior role sits at the intersection of cloud engineering, architecture governance, and strategic platform enablement. The Azure Solution Architect will operate across More ❯
AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background is highly More ❯
Azure, AWS or GCP. Experience with Kubernetes is desirable. You have a high degree of experience in observing the performance and health of applications via tools such as Grafana, Prometheus, Data Dog, Sentry, etc. You have a strong desire and are an advocate for performant applications. You have a flair for simplicity when problem solving. Excellent communication skills, with the More ❯
testing. Strong knowledge of containerisation (e.g., Docker) and orchestration (e.g., Kubernetes). Deep understanding of cloud security principles: IAM, network security, encryption. Experience with monitoring/alerting tools (e.g., Prometheus, Grafana, ELK stack). Proficient in Git or other version control systems. Desirable Knowledge, Skills and Experience: Certifications in OCI or other cloud platforms (AWS, GCP). Experience with security More ❯
AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background is highly More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Harnham - Data & Analytics Recruitment
observability, and cost optimisation Nice to Have Experience with ML tooling (MLflow, Kubeflow) Knowledge of FastAPI , Databricks, or Snowflake Exposure to SRE practices or cloud security certifications Familiarity with Prometheus , Grafana , or Datadog Interested? If you want to be part of a world-class AI team at an early stage-where your infrastructure decisions will directly shape the company's More ❯
and live data visualisation Collaborate with QA and DevOps to enhance automated testing and deployment pipelines Lead efforts in securing, scaling, and monitoring the frontend environment Use observability tools (Prometheus, Grafana, Loki) to monitor UI health and performance Drive UI architectural decisions, performance benchmarking, and best practice implementation Skills and Experience Required Degree in Computer Science, Engineering, or a related More ❯
and predictive analytics. Understanding of AI frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn) and their application in network automation and monitoring. Experience with telemetry and observability frameworks (e.g., Prometheus, Grafana) for real-time network monitoring and troubleshooting. Experience : Minimum of 7 years' of experience in network engineering, operations, and support. Proven ability to work hands-on and take strong More ❯
methods such as unit, integration, contract and E2E testing. You have a high degree of experience in observing the performance and health of applications via tools such as Grafana, Prometheus, Data Dog, Sentry, etc. You have a strong desire and are an advocate for performant applications. Proactive in solving problems simply and effectively, with an eye for pragmatic solutions. Excellent More ❯
looking for someone with deep expertise in: oInfrastructure as Code: Terraform, CloudFormation o Security best practices: IAM, KMS, encryption in transit/at rest, DevSecOps o Monitoring & observability: Datadog, Prometheus, Grafana, ELK, or similar What You Bring o 6+ years in DevOps or platform engineering, with experience in a technical lead role. o Proven experience designing and operating cloud-native More ❯
also welcome Proficiency in testing frameworks like JUnit and RestAssured A passion for monitoring, observability , and maintaining resilient systems Desirable Skills: Experience with monitoring and alerting tools like Datadog, Prometheus, Grafana, or PagerDuty Exposure to Python scripting Familiarity with deployment platforms such as Kubernetes and tools like Helm Why Join H&B Tech? Be part of a fast-moving, forward More ❯
InfluxDB, and ClickHouse-schema design, indexing, and caching for sub-second reads. Experience deploying microservices in production using Docker and Kubernetes. Skilled in setting up observability and alerting pipelines (Prometheus, Grafana), including model drift detection. Experience with real-time ML inference and model serving frameworks (e.g., TorchServe, Triton, BentoML) for low-latency applications. Experience designing feedback loops, active learning, or More ❯
Redis, InfluxDB, and ClickHouseschema design, indexing, and caching for sub-second reads. Experience deploying microservices in production using Docker and Kubernetes. Skilled in setting up observability and alerting pipelines (Prometheus, Grafana), including model drift detection. Experience with real-time ML inference and model serving frameworks (e.g., TorchServe, Triton, BentoML) for low-latency applications. Experience designing feedback loops, active learning, or More ❯
issues Support Kubernetes/OpenShift environments and application deployments Enable developers through onboarding and technical support Maintain and improve CI/CD pipelines (Tekton, Argo CD) Monitor systems using Prometheus, Grafana, Splunk, Loki, and EFK Automate infrastructure provisioning using scripting and IaC tools Collaborate with vendors and internal teams for issue resolution What You'll Bring Strong Linux (Red Hat More ❯
Fleet, Hampshire, United Kingdom Hybrid / WFH Options
Minutes To Seconds
MetalLB) Expert in Linux systems (systemd, networking, kernel tuning), Kubernetes internals, and container runtimes Real-world application of SRE principles in high-stakes, always-on environments Strong background operating Prometheus, Grafana, and Elasticsearch/Fluentd/Kibana (ELK/EFK) stacks Preferred Qualifications Experience integrating Kubernetes with OpenStack and Magnum Knowledge of Rancher add-ons: Fleet, Longhorn, CIS Scanning Familiarity More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
INTEC SELECT LIMITED
documentation Conduct architecture reviews, technical audits, and drive adoption of best practices Partner with infrastructure teams to ensure system reliability and operational efficiency Integrate monitoring and logging solutions (e.g., Prometheus, Grafana, ELK) Define strategies for disaster recovery, scaling, and infrastructure resilience Improve observability by enhancing visibility into performance and error metrics Skills and Experience Required 10+ years of backend development More ❯
Farnborough, England, United Kingdom Hybrid / WFH Options
Addition+
in Platform or Site Reliability Engineering (5+ years ideally) Proven background with Kubernetes, CI/CD tooling (e.g. GitLab, Jenkins), and IaC (Terraform, Ansible) Confident with monitoring tools (e.g. Prometheus, Grafana) Git proficiency and solid repository management knowledge Comfortable leading technical decisions and collaborating with engineering teams What’s in It for You: A genuinely collaborative, no-blame engineering culture More ❯
explain complex systems to mixed audiences, and build trust through technical credibility. Automation-first mindset: Skilled in infrastructure-as-code (Terraform or Pulumi), CI/CD workflows, observability stacks (Prometheus, Grafana, Loki), and scripting (Python, Bash). Bonus: Prior experience working with GPU capacity providers, hyperscaler partnerships, or AI infrastructure startups. Benefits: Competitive total compensation package. Retirement or pension plan More ❯
explain complex systems to mixed audiences, and build trust through technical credibility. Automation-first mindset: Skilled in infrastructure-as-code (Terraform or Pulumi), CI/CD workflows, observability stacks (Prometheus, Grafana, Loki), and scripting (Python, Bash). Bonus: Prior experience working with GPU capacity providers, hyperscaler partnerships, or AI infrastructure startups. Benefits: Competitive total compensation package. Retirement or pension plan More ❯
explain complex systems to mixed audiences, and build trust through technical credibility. Automation-first mindset: Skilled in infrastructure-as-code (Terraform or Pulumi), CI/CD workflows, observability stacks (Prometheus, Grafana, Loki), and scripting (Python, Bash). Bonus: Prior experience working with GPU capacity providers, hyperscaler partnerships, or AI infrastructure startups. Benefits: Competitive total compensation package. Retirement or pension plan More ❯
years of technical experience in Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics, logs, traces and APM. Leadership & Global Operations Proven success leading multi-regional or global technical teams with direct management of More ❯