London, South East, England, United Kingdom Hybrid / WFH Options
Lorien
ability to work independently or lead a small team Nice to Have: Experience with TYK API Gateway Exposure to microservices and event-driven architectures Familiarity with observability tools (e.g., Prometheus, Grafana) Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy. More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Searchability NS&D
with Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must More ❯
with Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must More ❯
Hands-on experience in technical integrations and POCs Comfortable coding in any high-level programming language (Java, Go, Python) Strong hands-on knowledge of Kubernetes, AWS, Azure, GCP, Docker, Prometheus, and OpenTelemetry Industry knowledge and opinions on Monitoring, Observability, Log Management, SIEM Engineering/DevOps Background - advantage Experience in Technical Sales of Log Analytics/Monitoring/APM/SIEM More ❯
with ML lifecycle tools, model monitoring, and versioning Exposure to tools like KServe, Ray Serve, Triton, or vLLM is a big plus Bonus Points Experience with observability frameworks like Prometheus or OpenTelemetry Knowledge of ML libraries: TensorFlow, PyTorch, HuggingFace Exposure to Azure or GCP Passion for financial services Qualifications Degree in Computer Science, Engineering, Data Science, or similar What We More ❯
Actions, CircleCI) Understands the importance of monitoring and proactive in resolving critical issues. Fluent in testing frameworks Junit , RestAssured Desirable: Exposure with monitoring and alerting platforms. Datadog , PagerDuty, Graphana, Prometheus Exposure in Python Scripting Exposure in deployment platforms like Kubernetes and tools like Helm. Ready to shape the future of health and wellness through tech? Apply now and help build More ❯
setting up and managing monitoring, metrics, and alerting systems Experience operating production-grade services at scale Great to have: Experience with tools such as: Terraform, SaltStack, MongoDB, Elasticsearch, Kafka, Prometheus, Grafana or HashiCorp Vault Experience with securing applications, services, and data, including authentication, authorization, TLS, and encryption Exposure to Kubernetes (administering, deploying, or developing apps on K8s clusters) Understanding of More ❯
AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background is highly More ❯
Azure, AWS or GCP. Experience with Kubernetes is desirable. You have a high degree of experience in observing the performance and health of applications via tools such as Grafana, Prometheus, Data Dog, Sentry, etc. You have a strong desire and are an advocate for performant applications. You have a flair for simplicity when problem solving. Excellent communication skills, with the More ❯
via Grafana or PowerBI). Ideally, Infrastructure as a code with Cloud formation/ARM templates, Terraform and Ansibl. Ideally, Linux Server Administration including container technology & ecosystem (docker, Kubernetes, Prometheus) linked to AAD. Ideally, experience in telecommunications and similar regulated verticals and environments. Ideally, working knowledge of ISO 27000, ITIL, or similar regulated environment. Ideally, exposure to CRM & ERP systems More ❯
testing. Strong knowledge of containerisation (e.g., Docker) and orchestration (e.g., Kubernetes). Deep understanding of cloud security principles: IAM, network security, encryption. Experience with monitoring/alerting tools (e.g., Prometheus, Grafana, ELK stack). Proficient in Git or other version control systems. Desirable Knowledge, Skills and Experience: Certifications in OCI or other cloud platforms (AWS, GCP). Experience with security More ❯
AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background is highly More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Harnham - Data & Analytics Recruitment
observability, and cost optimisation Nice to Have Experience with ML tooling (MLflow, Kubeflow) Knowledge of FastAPI , Databricks, or Snowflake Exposure to SRE practices or cloud security certifications Familiarity with Prometheus , Grafana , or Datadog Interested? If you want to be part of a world-class AI team at an early stage-where your infrastructure decisions will directly shape the company's More ❯
and live data visualisation Collaborate with QA and DevOps to enhance automated testing and deployment pipelines Lead efforts in securing, scaling, and monitoring the frontend environment Use observability tools (Prometheus, Grafana, Loki) to monitor UI health and performance Drive UI architectural decisions, performance benchmarking, and best practice implementation Skills and Experience Required Degree in Computer Science, Engineering, or a related More ❯
Guildford, Surrey, United Kingdom Hybrid / WFH Options
Person Centred Software Ltd
and communication skills across distributed teams Bonus points for experience with:Flutter, Blazor, Angular, React, microservices, SaaS platforms, Azure services (Functions, Service Bus), GitLab CI/CD, monitoring tools (Prometheus, Azure App Insights), high availability systems. What We Offer: A base salary of £60,000 - £75,000 and bonusdepending on experience Modern town centre offices in Guildford, with opportunityfor ad More ❯
and predictive analytics. Understanding of AI frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn) and their application in network automation and monitoring. Experience with telemetry and observability frameworks (e.g., Prometheus, Grafana) for real-time network monitoring and troubleshooting. Experience : Minimum of 7 years' of experience in network engineering, operations, and support. Proven ability to work hands-on and take strong More ❯
methods such as unit, integration, contract and E2E testing. You have a high degree of experience in observing the performance and health of applications via tools such as Grafana, Prometheus, Data Dog, Sentry, etc. You have a strong desire and are an advocate for performant applications. Proactive in solving problems simply and effectively, with an eye for pragmatic solutions. Excellent More ❯
services on GCP and/or AWS Bachelor's degree in Computer Science or related field Strong proficiency in Kubernetes, microservices architecture, Helm, GitLab CI/CD, and ArgoCD, Prometheus, Grafana. Programming experience in at least one language; Golang or Python preferred Deep understanding of autoscaling, version upgrades, and cloud service optimization Bonus if you're familiar with technologies like More ❯
such as Oracle SQL, Mongo, Postgres o Know your way around Linux and Windows command lines, e.g. Bash and PowerShell o Monitoring large systems using technologies such as Grafana, Prometheus, ELK, Splunk o Experience of working in Agile teams, and the tooling that supports it, e.g. Atlassian o Diagnosing and troubleshooting application issues resulting in service outages o Troubleshooting skills More ❯
methods such as unit, integration, contract and E2E testing. You have a high degree of experience in observing the performance and health of applications via tools such as Grafana, Prometheus, Data Dog, Sentry, etc. You have a strong desire and are an advocate for performant applications. Proactive in solving problems simply and effectively, with an eye for pragmatic solutions. Excellent More ❯
looking for someone with deep expertise in: oInfrastructure as Code: Terraform, CloudFormation o Security best practices: IAM, KMS, encryption in transit/at rest, DevSecOps o Monitoring & observability: Datadog, Prometheus, Grafana, ELK, or similar What You Bring o 6+ years in DevOps or platform engineering, with experience in a technical lead role. o Proven experience designing and operating cloud-native More ❯
also welcome Proficiency in testing frameworks like JUnit and RestAssured A passion for monitoring, observability , and maintaining resilient systems Desirable Skills: Experience with monitoring and alerting tools like Datadog, Prometheus, Grafana, or PagerDuty Exposure to Python scripting Familiarity with deployment platforms such as Kubernetes and tools like Helm Why Join H&B Tech? Be part of a fast-moving, forward More ❯
InfluxDB, and ClickHouse-schema design, indexing, and caching for sub-second reads. Experience deploying microservices in production using Docker and Kubernetes. Skilled in setting up observability and alerting pipelines (Prometheus, Grafana), including model drift detection. Experience with real-time ML inference and model serving frameworks (e.g., TorchServe, Triton, BentoML) for low-latency applications. Experience designing feedback loops, active learning, or More ❯
address performance bottlenecks and ensure scalability. Assist engineering teams with implementing and reviewing SLOs Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example. Work with other teams to ensure it is effective and provides full coverage. Ensure the service is highly available and resilient Champion best practices in design for high More ❯