City Of Westminster, London, United Kingdom Hybrid / WFH Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
Westminster, City of Westminster, Greater London, United Kingdom Hybrid / WFH Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Additional Resources Ltd
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
Lead SRE/Observability Engineering Lead - (Outside IR35 Contract/Remote) Location: Bristol/London HQ - Largely Remote (Occasional Travel) Day Rate: Outside IR35 - £650 to £750 p/d Duration: 3-6 Months Initial - with intention to extend Payment Terms: Monthly Our client is a FTSE100 Wealth/Asset Management firm seeking to engage a Lead SRE Engineer (Observability … SME) to support the implementation and instrumentation of their new Observability solution. This role will be critical in delivering against our Digital OKRs by embedding observability best practices, frameworks, and tooling across digital platforms and engineering teams. Key Responsibilities: Strategy & Roadmap: Define and drive the observability roadmap in alignment with business priorities and digital platform objectives. Champion observability-by-design … manage SLIs, SLOs, and error budgets to track and improve system reliability. Support capacity and availability planning through real-time telemetry and predictive analytics. Instrumentation & Runbooks: Design and implement observability runbooks covering metrics, logs, traces, synthetics, and customer journey monitoring. Set standards for instrumentation, dashboards, alerting, and enable teams to self-serve their system metrics and traces. Implementation & Enablement: Assist More ❯
Wokingham, Berkshire, South East, United Kingdom Hybrid / WFH Options
Sanderson Government and Defence
for a sharp-minded Site Reliability Engineer to join our cloud-native mission in Azure. If you thrive in Agile teams, live for automation, and know your way around observability stacks and CI/CD pipelines - this is your playground. What you'll be doing: Automating deployment, monitoring & infrastructure with precision Owning platform reliability, performance & SLAs Building IaC with Helm More ❯
Hereford, Herefordshire, West Midlands, United Kingdom Hybrid / WFH Options
Twinstream Limited
ensuring the availability, performance, and resilience of our secure, high-impact services. You'll work with development and support teams to evolve infrastructure, streamline delivery pipelines, and strengthen system observability — ensuring performance bottlenecks and reliability risks are resolved before they ever reach production. Expect a technically rich environment, diverse challenges, and the opportunity to make a measurable difference. Key Responsibilities … Reliability Engineer: Partner with Software Engineers to enhance reliability and performance across complex systems Collaborate with SysAdmins to automate toil and eliminate manual intervention Build smarter monitoring, logging, and observability pipelines to detect and resolve issues early Support and improve development environments to hit delivery and quality goals Research new tools, services, and architectures to drive scalability and resilience Expand … Ansible, Chef, etc.) Skilled with Docker and Kubernetes/OpenShift/Docker Swarm Hands-on experience building and maintaining CI/CD pipelines (e.g. Jenkins) Deep understanding of monitoring & observability tools (Grafana, Prometheus, InfluxDB) Solid grounding in Linux, network security, SQL, and AWS (EC2, S3, RDS, Lambda) Comfortable with MQ messaging (RabbitMQ or similar) Bonus points for: Experience with Azure More ❯
for a 2-month initial contract. You will build and harden Node.js microservices running on Azure Container Apps, orchestrating asynchronous, file-based and Delta-linked workflows with strong reliability, observability, and security. Key responsibilities: Design and implement Node.js/TypeScript services (Express/lightweight HTTP) for async job orchestration. Implement FIFO/round-robin workers, leases/heartbeats, retries/ More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Harnham - Data & Analytics Recruitment
developing and scaling the company's core data platform, ensuring teams across the business can access, trust, and use data effectively. You'll drive initiatives that improve data quality, observability, and governance, while helping shape a platform-as-a-product mindset. Key responsibilities include: Building and maintaining data infrastructure: Develop microservices, pipelines, and backend systems that power analytics and machine … learning initiatives. Driving platform evolution: Design and implement scalable, secure, and efficient data services using tools such as Terraform, Docker, and AWS. Data governance and observability: Introduce and enhance tooling for data lineage, contracts, monitoring, and cataloguing. Operational excellence: Lead automation, monitoring, and incident response to maintain high platform reliability. Cross-functional collaboration: Work with data scientists, ML engineers, analysts … Proven track record of designing, building, and scaling data platforms in production environments. Hands-on experience with big data technologies such as Airflow, DBT, Databricks, and data catalogue/observability tools (e.g. Monte Carlo, Atlan, Datahub). Knowledge of cloud infrastructure (AWS or GCP) - including services such as S3, RDS, EMR, ECS, IAM. Experience with DevOps tooling, particularly Terraform and More ❯
Hereford, Herefordshire, West Midlands, United Kingdom Hybrid / WFH Options
Hays
focused on ensuring service availability, performance, and cost-efficiency across both cloud and on-prem infrastructure. You'll work closely with development and support teams to evolve infrastructure, enhance observability, and proactively mitigate reliability risks. Key Responsibilities: Collaborate with software engineers to improve reliability and performance Automate operational tasks and reduce alert fatigue Enhance monitoring and observability to pre-empt … platforms, ideally AWS (EC2, RDS, S3, Lambda) Desirable: Coding experience in Java, Go, Python or similar Knowledge of cross-domain technologies Experience in service management environments Practical application of observability patterns Experience with Azure Additional Information: Due to the nature of the work, successful candidates will be required to undergo security vetting. We welcome applications from all backgrounds and are More ❯
looking for an experienced Data Engineer to support on an initial 6 Month Contract engagement. You will own their data platform end to end, from ingestion & modelling to orchestration, observability & governance. You'll be responsible for designing & building robust, reliable pipelines, evolving their lakehouse/warehouse layers & enable fast, trustworthy analytics for multiple teams. Tech you'll be working with More ❯
Knutsford, Cheshire, England, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
and driven Security Engineer to join our small, focused team building a telemetry pipeline MVP. You'll play a key role in designing and securing our containerized environments, ensuring observability tools and infrastructure are built with security at their core. This role blends deep technical expertise with a hands-on, collaborative approach ideal for someone who thrives in fast-moving … documentation and response playbooks What You Bring Hands-on experience with Kubernetes, OpenShift, and secure production systems Strong GitLab and CI/CD security expertise Familiarity with telemetry and observability stacks Solid grasp of networking, firewalls, and core security principles Knowledge of container security tools (Aqua, Twistlock, Trivy) Understanding of frameworks like NIST or ISO 27001 Excellent analytical and communication More ❯
modern warehouse and experience with dbt. Building audiences at scale, including identity resolution and edge-case handling. Rigorous testing/QA mindset and production hygiene (version control, code review, observability). Understanding of privacy & consent in the UK/EU (GDPR, PECR) and operationalising suppression rules. Disclaimer: This vacancy is being advertised by either Advanced Resource Managers Limited, Advanced Resource More ❯
manage and support a customer's AWS and Data platform To be technical hands on Provide Incident and problem management on the AWS IaaS and PaaS Platform Monitoring and observability of system and platform performance Collaboration with development and build teams on application and platform deployments and changes Involvement in the resolution of Incidents and problems in an efficient and … timely manner Actively monitor an AWS platform and components for technical issues Implement and improve on existing monitoring and observability solution To be involved in the resolution of technical incidents tickets Assist in the root cause analysis of incidents Assist with improving efficiency and processes within the team Examining traces and logs Escalate incidents and problems to the appropriate teams More ❯
proactive incident management. Key Responsibilities: . Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. . Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. . Provide live support for monitoring technologies and assist with live service support, including key business … and tooling exploitation to enhance operational efficiency efficiency within immature teams Required Skills and Experience: . Strong understanding and experience in SRE principals and methodologies . Strong understanding of Observability within a complex tech stack . Hands-on experience with monitoring tools such as Splunk, Splunk ITSI, Dynatrace, AppDynamics, and synthetic monitoring platforms. . Strong understanding and experience with implementing More ❯