and backend systems. Implement observability best practices using OpenTelemetry (OTEL) for tracing, metrics, and logging. Collaborate with platform and DevOps teams to integrate telemetry data with systems such as Grafana, Prometheus, Jaeger, or Tempo. Define and maintain instrumentation standards across Java applications to ensure consistency and performance visibility. Diagnose complex production issues through telemetry data and performance profiling. Contribute to More ❯
london, south east england, united kingdom Hybrid/Remote Options
Risk Ledger
get stuck into the ambiguity of an early stage company. Experience of data processing, machine learning, AI or integrating large datasets into a product. Worked with observability solutions (Kibana, Grafana, Sentry). Salary range £110,000 - £130,000 GBP The perks: Competitive base salary Generous EMI equity package 3% employer match on pension 25 days annual leave + bank holidays More ❯
and improve autoscaling, high availability and managed service adoption across the platform. Collaborate with SRE, Security and Engineering teams to enhance observability, monitoring and alerting through tools like Prometheus, Grafana and CloudWatch. Partner with Security to embed best practices for IAM, secrets management, WAF, and posture management. Optimise performance and cloud spend through automation tools and cost visibility dashboards Participate … knowledge of Kubernetes operations on AWS (EKS), including cluster scaling, deployment automation, and monitoring. Solid background in Linux administration, networking, and cloud security principles. Familiarity with observability tools (Prometheus, Grafana, Loki) and structured alerting practices. Experience with database migrations, HA configurations, backups, and DR strategies. Strong scripting and automation skills (Terraform, Python, Bash, or similar). Excellent communication and collaboration More ❯
london, south east england, united kingdom Hybrid/Remote Options
Black Pen Recruitment
extensive use of automation tools such as Terraform and Ansible, alongside programming in Python. Their environments are entirely based on Ubuntu Linux. Experience with server monitoring software (e.g. Prometheus, Grafana, Zabbix, Datadog) and a solid understanding of security principles and best practices (including hardening, access control, auditing, and incident response) is highly valued. This is a remote-first role, and … Terraform, Pulumi) Configuration management with Ansible Cloud platforms (AWS, Azure) Containerization (LXC, LXD, Docker, Kubernetes) CI/CD tooling (TeamCity, Jenkins, GitHub Actions) Server monitoring and alerting systems (Prometheus, Grafana, Zabbix, Datadog) Strong Python programming skills Solid Linux administration and general networking knowledge Understanding of infrastructure security best practices, including secure configuration, identity and access management, and compliance controls Experience More ❯
Science, Engineering, or related field. Strong programming skills in Go (ideally) Rust or C++. Solid experience in building and supporting complex backend systems at scale. Experience with Elasticsearch, Prometheus, Grafana and/or Datadog. Exposure either AWS or GCP plus IaC, (Terraform or similar) would be beneficial. Knowledge with open-source storage tools (Ceph, Minio, JuiceFS or Fuse) and familiarity More ❯
to-end systems and processes Experience of network support and troubleshooting Exposure to UK and EU equity markets Desirable Prior experience in a similar role Knowledge or experience of Grafana Previous experience of Binary Protocols Previous experience of the Atlassian suite of products TCP/UDP knowledge Job Offer Competitive salary ranging from £50,000 to £70,000 per annum. More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Eligo Recruitment
ll Bring Strong experience with GCP , Terraform , and Infrastructure-as-Code Deep knowledge of cloud networking, security automation, and compliance standards Proficiency in CI/CD pipelines , monitoring tools (Grafana, Datadog), and scripting A collaborative mindset with excellent communication and mentoring skills Why Join? Shape a next-gen AI infrastructure with autonomy and purpose Hybrid working with regular meetups in More ❯
. Experience with GPU cluster management (CUDA, NCCL, Triton Inference Server) and performance tuning across accelerators. Solid grasp of cloud-native orchestration (Docker, Kubernetes, Helm) and observability tooling (Prometheus, Grafana, Jaeger). Proven ability to translate cutting-edge research into engineered solutions that can scale globally. Why this role stands out Influence how next-generation LLM services are built and More ❯
. Experience with GPU cluster management (CUDA, NCCL, Triton Inference Server) and performance tuning across accelerators. Solid grasp of cloud-native orchestration (Docker, Kubernetes, Helm) and observability tooling (Prometheus, Grafana, Jaeger). Proven ability to translate cutting-edge research into engineered solutions that can scale globally. Why this role stands out Influence how next-generation LLM services are built and More ❯
/CD pipelines using GitLab and ArgoCD. Design and operate containerised workloads with EKS, Fargate, and Kubernetes. Manage Kubernetes deployments using Helm charts. Implement observability solutions using OpenTelemetry (OTel), Grafana, and Splunk. Optimise infrastructure with Karpenter for autoscaling and cost efficiency. Ensure robust AWS networking (VPC, Transit Gateway, PrivateLink, Route 53) and enforce security best practices. Drive incident response, monitoring … and performance tuning. Key Technologies: AWS (EKS, Fargate, EC2, S3), Terraform, CloudFormation, GitLab, ArgoCD, Docker, Kubernetes, Helm, Cassandra, OTel, Grafana, Splunk, Karpenter, Python, Bash. Desirable: Experience with Google Cloud Platform (GCP), Apigee Hybrid, and hybrid/multi-cloud environments. Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy. More ❯
bristol, south west england, united kingdom Hybrid/Remote Options
TwinStream
Who are we: In 2019, our founders were working as engineers solving complex cross domain problems within government organisations TwinStream was formed to consolidate their collective expertise and experience into one business, providing technical excellence and exceptional service to their More ❯
the testing and deployment of these services all the way to production in a controlled and secure way. Tech stack - Java engineer needs experience with spring boot framework, TDD, Grafana and Prometheus for monitoring and alerting and understanding of the CI/CD process.All candidates must pass a BPSS.Immediate start.End March 2026.Weekly travel to Leeds/Newcastle/Manchester. More ❯
the testing and deployment of these services all the way to production in a controlled and secure way. Tech stack - Java engineer needs experience with spring boot framework, TDD, Grafana and Prometheus for monitoring and alerting and understanding of the CI/CD process. All candidates must pass a BPSS. Immediate start. End March 2026. Weekly travel to Leeds/ More ❯
Manchester, Lancashire, England, United Kingdom Hybrid/Remote Options
Lorien
technologies. with clear progression routes available. Key Requirements: Strong troubleshooting and fault-resolution experience across infrastructure and applications Hands-on experience with monitoring tools such as Instana, Splunk, Prometheus, Grafana, or SolarWinds Confident supporting both Windows and Linux operating systems Experience working in ITIL-aligned support environments Understanding of web hosting technologies (DNS, HTTP/S, SSL Certs, and basic More ❯
leeds, west yorkshire, yorkshire and the humber, united kingdom
Entain
JavaScript, Typescript, Python Frameworks: React Native Databases: NoSQL (DynamoDB), SQL AWS services: Lambda, S3, API Gateway, Step Functions, SQS, Athena DevOps and monitoring tools such as Datadog, New Relic, Grafana Desirable: Experience in mobile application development. Experience in sports betting, gaming, or related high-scale transactional domains. Previous experience leading organisational change or scaling teams. Additional Information At Entain, we More ❯
Newcastle Upon Tyne, Tyne and Wear, England, United Kingdom
Noir
DevOps Engineer - FinTech - Newcastle (Tech stack: DevOps Engineer, PowerShell, C#, Java, Python, Ansible, Terraform, Docker, Kubernetes, Docker Swarm, ELK, Grafana, CI/CD, TeamCity, SQL Server, Windows, Linux, Programmer, Developer, Architect, DevOps Engineer) Our client is a cutting-edge FinTech company with a reputation for innovation and excellence. They design and build advanced trading and analytics platforms used globally by … Practical knowledge of automation tools such as Terraform or Ansible. Background in container platforms (e.g., Docker) with orchestration using Kubernetes or Swarm. Familiarity with system monitoring solutions (e.g., ELK, Grafana, or similar). Proven track record in building and maintaining CI/CD pipelines, preferably with TeamCity. Experience working with SQL databases, particularly Microsoft SQL Server. Comfort managing both Windows More ❯
of our top secure communications clients is looking for an Automation Test Engineer to join a critical project. You'll gain exposure across the DevOps lifecycle, including Kubernetes and Grafana, as well as experience with HIL and embedded systems. Company Details Driving next-generation technology transformation to improve security, transport and logistics for various critical industries. Advanced IoT and secure More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Adecco
data applications using Docker and Kubernetes (including EKS and AKS).* CI/CD for Data: Implement and maintain automated pipelines for data applications.* Monitoring & Observability: Deploy solutions using Grafana, Prometheus, and other tools to ensure data quality and system health.* Infrastructure as Code: Use Terraform and Ansible to provision and manage data infrastructure.* Performance Optimization: Enhance data processing for … tooling.* Familiarity with open-source data technologies (Spark, Kafka, PostgreSQL).* Knowledge of Infrastructure as Code (Terraform, Ansible).* Understanding of data architecture principles.* Experience with monitoring tools like Grafana and Prometheus.* Strong leadership skills to guide teams and influence technical direction.Why Join Us You'll work on innovative projects in a collaborative environment that values automation, scalability, and inclusion. … we are on the client's supplier list for this position.KeywordsLead DataOps Engineer, DataOps, Data Pipeline Automation, Airflow, Prefect, Dagster, Docker, Kubernetes, EKS, AKS, CI/CD, Terraform, Ansible, Grafana, Prometheus, Spark, Kafka, PostgreSQL, Infrastructure as Code, Cloud Data Engineering, Hybrid Working, Security Clearance, Leadership, DevOps, Observability, Monitoring. More ❯
Visualization, Ignition, Wonderware). Experience with Linux-based embedded systems or RTOS. Hands-on experience with WAGO PLCs and edge computers. Knowledge of cloud computing and visualisation tools (e.g., Grafana) Please apply for immedite consideration At Adept Resourcing - Commercial & Engineering, we specialise in connecting companies with top talent that drives innovation, growth and success. With our industry expertise, extensive network More ❯
Cheltenham, Gloucestershire, South West, United Kingdom
itecopeople
DV-Cleared Application Support Engineer - Contract (Outside IR35) The Role We are seeking a DV-cleared Application Support Engineer to join our client's on-site team in the Cheltenham area. You will help maintain and support a managed cross More ❯
Wigan, Lancashire, England, United Kingdom Hybrid/Remote Options
Searchability
critical events. SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability and performance tuning Experience with observability tools (Grafana, Prometheus, OpenTelemetry) Proficiency in a programming language such as Go or .NET for automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform … us to process and submit (subject to required skills) your application to our client in conjunction with this vacancy only. KEY SKILLS SRE, Site Reliability Engineering, AWS, Kubernetes, Terraform, Grafana, Prometheus, OpenTelemetry, Go, .NET, Cloud Infrastructure, Observability, CI/CD, DevOps, Automation, Performance Tuning, Incident Management More ❯
Wigan, Greater Manchester, United Kingdom Hybrid/Remote Options
Searchability (UK) Ltd
critical events. SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability and performance tuning Experience with observability tools (Grafana, Prometheus, OpenTelemetry) Proficiency in a programming language such as Go or .NET for automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform … us to process and submit (subject to required skills) your application to our client in conjunction with this vacancy only. KEY SKILLS SRE, Site Reliability Engineering, AWS, Kubernetes, Terraform, Grafana, Prometheus, OpenTelemetry, Go, .NET, Cloud Infrastructure, Observability, CI/CD, DevOps, Automation, Performance Tuning, Incident Management More ❯
Bristol, Avon, South West, United Kingdom Hybrid/Remote Options
Hargreaves Lansdown
GitOps for Kubernetes (AKS preferred), patterns and multi-environment promotions. Own platform observability: metrics, logs and traces using Azure Monitor/Log Analytics/Application Insights, plus Datadog/Grafana where appropriate. Embed security by design: Azure Policy, Defender for Cloud, secrets management with Key Vault, SBOM and image scanning, policy-as-code and least privilege IAM. Drive reliability using … workload identity. Experience with GitOps, and container build pipelines (e.g., ACR, OPA policies, image scanning). Working knowledge of observability tooling (Azure Monitor, Log Analytics, Application Insights, Datadog/Grafana) and alerting/response workflows. Understanding of the Microsoft Cloud Adoption Framework, Azure Landing Zones and the Well-Architected Framework. Familiarity with DevSecOps practices: threat modelling, dependency and container scanning More ❯
Employment Type: Permanent, Part Time, Work From Home
Hereford, Herefordshire, West Midlands, United Kingdom
IO Associates
iO Associates are partnered with a growing SME in the Defence industry, currently looking for an experienced SRE to start with them as soon as possible. Rate: £550 per day (Outside IR35) Duration: Initial 3 months Location: Hereford - 3 days More ❯
at the heart of technology delivery. Responsibilities include: Designing and enforcing SLOs, SLIs, and SLAs to ensure high reliability and performance. Building and maintaining monitoring/observability solutions (Datadog, Grafana, Azure Application Insights, Log Analytics). Managing Infrastructure as Code (Terraform, Pulumi, CloudFormation) for scalable, repeatable deployments. Automating with PowerShell, Python, or Bash to drive efficiency. Supporting Kubernetes and AKS … Required: Proven Site Reliability Engineering background. Strong Terraform skills with live environment deployment. Kubernetes/AKS expertise. Scripting in PowerShell, Python or Bash. Monitoring experience (Datadog preferred, Azure or Grafana considered). Background in web applications and distributed systems. Desirable Skills: Knowledge of Microservices Architecture. Familiarity with Kanban. Experience with Puppet or Chef If you’re passionate about Site Reliability More ❯