City Of Westminster, London, United Kingdom Hybrid / WFH Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
Westminster, City of Westminster, Greater London, United Kingdom Hybrid / WFH Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Additional Resources Ltd
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
Lead SRE/Observability Engineering Lead - (Outside IR35 Contract/Remote) Location: Bristol/London HQ - Largely Remote (Occasional Travel) Day Rate: Outside IR35 - £650 to £750 p/d Duration: 3-6 Months Initial - with intention to extend Payment Terms: Monthly Our client is a FTSE100 Wealth/Asset Management firm seeking to engage a Lead SRE Engineer (Observability … SME) to support the implementation and instrumentation of their new Observability solution. This role will be critical in delivering against our Digital OKRs by embedding observability best practices, frameworks, and tooling across digital platforms and engineering teams. Key Responsibilities: Strategy & Roadmap: Define and drive the observability roadmap in alignment with business priorities and digital platform objectives. Champion observability-by-design … manage SLIs, SLOs, and error budgets to track and improve system reliability. Support capacity and availability planning through real-time telemetry and predictive analytics. Instrumentation & Runbooks: Design and implement observability runbooks covering metrics, logs, traces, synthetics, and customer journey monitoring. Set standards for instrumentation, dashboards, alerting, and enable teams to self-serve their system metrics and traces. Implementation & Enablement: Assist More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
Lead SRE/Observability Engineering Lead - (Outside IR35 Contract/Remote) Location: Bristol/London HQ - Largely Remote (Occasional Travel) Day Rate: Outside IR35 - £650 to £750 p/d Duration: 3-6 Months Initial - with intention to extend Payment Terms: Monthly Our client is a FTSE100 Wealth/Asset Management firm seeking to engage a Lead SRE Engineer (Observability … SME) to support the implementation and instrumentation of their new Observability solution. This role will be critical in delivering against our Digital OKRs by embedding observability best practices, frameworks, and tooling across digital platforms and engineering teams. Key Responsibilities: Strategy & Roadmap: Define and drive the observability roadmap in alignment with business priorities and digital platform objectives. Champion observability-by-design … manage SLIs, SLOs, and error budgets to track and improve system reliability. Support capacity and availability planning through real-time telemetry and predictive analytics. Instrumentation & Runbooks: Design and implement observability runbooks covering metrics, logs, traces, synthetics, and customer journey monitoring. Set standards for instrumentation, dashboards, alerting, and enable teams to self-serve their system metrics and traces. Implementation & Enablement: Assist More ❯
to design, build, and maintain the platforms and tooling that underpin our infrastructure provisioning and delivery lifecycle. You'll work collaboratively with cross-functional teams to automate infrastructure, enhance observability, and embed best practices in VMware Hypervisor and DevOps . Key Responsibilities: Build and maintain on-prem and cloud infrastructure (VMware Hypervisor, vSphere, OpenStack, AWS, GCP, Azure). Apply deep More ❯
GitHub Actions, or GitLab CI. Solid understanding of containerization technologies (Docker, Kubernetes). Working knowledge of Python and SQL for automation and data pipeline development. Familiarity with monitoring and observability tools (Grafana, Prometheus, CloudWatch). Strong grasp of data architecture principles and ETL design patterns. Financial services or regulated industry experience (desirable). More ❯
Strong experience with AWS (VPCs, EC2, ECS/EKS, RDS, S3, etc.) Solid understanding of database systems (Postgres, SQL Server) IaC mastery (Terraform, CloudFormation, Ansible) Passion for monitoring and observability (Grafana, Elastic, PagerDuty, etc.) Familiarity with configuration management tools (Puppet, etc.) Git, Docker, and scripting skills (bash or similar) A collaborative mindset and the ability to communicate technical concepts clearly More ❯
Strong experience with AWS (VPCs, EC2, ECS/EKS, RDS, S3, etc.) Solid understanding of database systems (Postgres, SQL Server) IaC mastery (Terraform, CloudFormation, Ansible) Passion for monitoring and observability (Grafana, Elastic, PagerDuty, etc.) Familiarity with configuration management tools (Puppet, etc.) Git, Docker, and scripting skills (bash or similar) A collaborative mindset and the ability to communicate technical concepts clearly More ❯
development and maintenance of CI/CD pipelines using GitLab and ArgoCD. Design and operate containerised workloads with EKS, Fargate, and Kubernetes. Manage Kubernetes deployments using Helm charts. Implement observability solutions using OpenTelemetry (OTel), Grafana, and Splunk. Optimise infrastructure with Karpenter for autoscaling and cost efficiency. Ensure robust AWS networking (VPC, Transit Gateway, PrivateLink, Route 53) and enforce security best More ❯
manage infrastructure as code (IaC) using tools such as Terraform or AWS CloudFormation . Automate deployment pipelines and CI/CD processes for data platform components. Ensure platform reliability, observability, and security through monitoring, logging, and alerting solutions. Collaborate with Data Engineers, Architects, and Security teams to align infrastructure with data platform requirements. Support data sharing systems , ensuring secure and More ❯
design and evolution of our API schemas, ensuring they meet the complex demands of a rapidly growing platform. Champion best practice in code quality, automated testing (Vitest, Playwright) and observability to deliver resilient, maintainable, and production-ready business logic. Drive DevOps excellence by collaborating on CI/CD pipelines (Jenkins, Concourse), containerisation (Docker) and Kubernetes deployments. Mentor and empower fellow More ❯
Leicester, Leicestershire, England, United Kingdom
Uniting Ambition
organisational standards. Support environment troubleshooting, incident resolution, and root cause analysis. Drive continuous improvement and advocate best practices across infrastructure automation and cloud governance. Implement and maintain monitoring and observability solutions using Dynatrace. Essential Experience Proven experience in a DevOps or Cloud Engineer role within an enterprise Azure environment. Strong hands-on experience with: Azure (IaaS, PaaS, networking, identity) Terraform More ❯
Alto preferred), network access control (802.1x, RADIUS), or zero-trust security concepts. Exposure to infrastructure-as-code (Terraform, Ansible) and version control systems (Git). Experience with monitoring and observability tools (LogicMonitor, Grafana, Prometheus). Knowledge of hybrid cloud networking, including AWS Direct Connect or GCP Interconnect. Relevant certifications such as CCNP, AWS Advanced Networking Specialty, or Google Cloud Network More ❯
for leading and executing the migration of data, dashboards, alerts, and configurations from Splunk systems to Elasticsearch. This role involves deep technical expertise in Splunk architecture, data ingestion, and observability tools, along with strong project management and stakeholder communication skills. Must have skills: -Splunk -ELK Stack -Kibana Nice to have skills: -stakeholder communication skills -strong project management Responsibilities: Minimum number More ❯
for leading and executing the migration of data, dashboards, alerts, and configurations from Splunk systems to Elasticsearch. This role involves deep technical expertise in Splunk architecture, data ingestion, and observability tools, along with strong project management and stakeholder communication skills. Must have skills: -Splunk -ELK Stack -Kibana Nice to have skills: -stakeholder communication skills -strong project management Responsibilities: Minimum number More ❯
Wokingham, Berkshire, South East, United Kingdom Hybrid / WFH Options
Sanderson Government and Defence
for a sharp-minded Site Reliability Engineer to join our cloud-native mission in Azure. If you thrive in Agile teams, live for automation, and know your way around observability stacks and CI/CD pipelines - this is your playground. What you'll be doing: Automating deployment, monitoring & infrastructure with precision Owning platform reliability, performance & SLAs Building IaC with Helm More ❯
Security: Firewalls, VPNs, routing, and endpoint management Backup & DR: Experience with enterprise backup solutions (e.g., Veeam, Datto) and disaster recovery planning Automation & Monitoring: Familiar with Terraform, PowerShell, Ansible, and observability tools (Azure Monitor, CloudWatch) Demonstrable experience managing budgets, vendors, and high-performing technical teams Excellent stakeholder management and communication skills , able to influence at both technical and executive level Strong More ❯
Hereford, Herefordshire, West Midlands, United Kingdom Hybrid / WFH Options
Twinstream Limited
ensuring the availability, performance, and resilience of our secure, high-impact services. You'll work with development and support teams to evolve infrastructure, streamline delivery pipelines, and strengthen system observability — ensuring performance bottlenecks and reliability risks are resolved before they ever reach production. Expect a technically rich environment, diverse challenges, and the opportunity to make a measurable difference. Key Responsibilities … Reliability Engineer: Partner with Software Engineers to enhance reliability and performance across complex systems Collaborate with SysAdmins to automate toil and eliminate manual intervention Build smarter monitoring, logging, and observability pipelines to detect and resolve issues early Support and improve development environments to hit delivery and quality goals Research new tools, services, and architectures to drive scalability and resilience Expand … Ansible, Chef, etc.) Skilled with Docker and Kubernetes/OpenShift/Docker Swarm Hands-on experience building and maintaining CI/CD pipelines (e.g. Jenkins) Deep understanding of monitoring & observability tools (Grafana, Prometheus, InfluxDB) Solid grounding in Linux, network security, SQL, and AWS (EC2, S3, RDS, Lambda) Comfortable with MQ messaging (RabbitMQ or similar) Bonus points for: Experience with Azure More ❯
hands-on experience in Microsoft Azure ML Studio * Experience using business intelligence tools, preferably Power BI * Experience applying Generative AI and prompting techniques * Strong understanding of data governance, model observability, and compliance frameworks * Proven ability to deliver secure, scalable, and responsible data science solutions If this sounds like you and you are available on short notice, apply now More ❯
Suite Architect to lead design, automation, and modernisation initiatives across multiple customer environments. This role will focus on developing scalable cloud templates, orchestrating virtual infrastructure, and driving automation and observability using VMware's Aria and NSX technologies. The ideal candidate will combine deep technical expertise with strong communication and customer engagement skills, acting as both an architect and a hands More ❯
company's customer experience (CX) vision. You will collaborate closely with other software engineers, product teams, and AI specialists to develop LLM AI-powered applications, ensuring their scalability, security, observability and performance. This role is hands-on, with a primary focus on coding, testing, and deploying AI solutions in a fast-paced, agile environment. Responsibilities: Code Development and Testing Write More ❯
for a 2-month initial contract. You will build and harden Node.js microservices running on Azure Container Apps, orchestrating asynchronous, file-based and Delta-linked workflows with strong reliability, observability, and security. Key responsibilities: Design and implement Node.js/TypeScript services (Express/lightweight HTTP) for async job orchestration. Implement FIFO/round-robin workers, leases/heartbeats, retries/ More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Harnham - Data & Analytics Recruitment
developing and scaling the company's core data platform, ensuring teams across the business can access, trust, and use data effectively. You'll drive initiatives that improve data quality, observability, and governance, while helping shape a platform-as-a-product mindset. Key responsibilities include: Building and maintaining data infrastructure: Develop microservices, pipelines, and backend systems that power analytics and machine … learning initiatives. Driving platform evolution: Design and implement scalable, secure, and efficient data services using tools such as Terraform, Docker, and AWS. Data governance and observability: Introduce and enhance tooling for data lineage, contracts, monitoring, and cataloguing. Operational excellence: Lead automation, monitoring, and incident response to maintain high platform reliability. Cross-functional collaboration: Work with data scientists, ML engineers, analysts … Proven track record of designing, building, and scaling data platforms in production environments. Hands-on experience with big data technologies such as Airflow, DBT, Databricks, and data catalogue/observability tools (e.g. Monte Carlo, Atlan, Datahub). Knowledge of cloud infrastructure (AWS or GCP) - including services such as S3, RDS, EMR, ECS, IAM. Experience with DevOps tooling, particularly Terraform and More ❯