to design, build, and maintain the platforms and tooling that underpin our infrastructure provisioning and delivery lifecycle. You'll work collaboratively with cross-functional teams to automate infrastructure, enhance observability, and embed best practices in VMware Hypervisor and DevOps . Key Responsibilities: Build and maintain on-prem and cloud infrastructure (VMware Hypervisor, vSphere, OpenStack, AWS, GCP, Azure). Apply deep More ❯
to design, build, and maintain the platforms and tooling that underpin our infrastructure provisioning and delivery lifecycle. You'll work collaboratively with cross-functional teams to automate infrastructure, enhance observability, and embed best practices in VMware Hypervisor and DevOps . Key Responsibilities: Build and maintain on-prem and cloud infrastructure (VMware Hypervisor, vSphere, OpenStack, AWS, GCP, Azure). Apply deep More ❯
GitHub Actions, or GitLab CI. Solid understanding of containerization technologies (Docker, Kubernetes). Working knowledge of Python and SQL for automation and data pipeline development. Familiarity with monitoring and observability tools (Grafana, Prometheus, CloudWatch). Strong grasp of data architecture principles and ETL design patterns. Financial services or regulated industry experience (desirable). More ❯
Wokingham, Berkshire, United Kingdom Hybrid / WFH Options
Experis
Collaborate with Agile teams to automate deployment, monitoring, and infrastructure management. Ensure platform and business application reliability and performance against strict SLAs and KPIs. Implement and maintain cloud-native observability stacks (Prometheus, Grafana, Loki, Tempo). Develop and maintain Infrastructure as Code (IaC) using tools like Kustomize or Helm. Manage CI/CD pipelines using Tekton and ArgoCD. Support and More ❯
Wokingham, Berkshire, United Kingdom Hybrid / WFH Options
Experis
Collaborate with Agile teams to automate deployment, monitoring, and infrastructure management. Ensure platform and business application reliability and performance against strict SLAs and KPIs. Implement and maintain cloud-native observability stacks (Prometheus, Grafana, Loki, Tempo). Develop and maintain Infrastructure as Code (IaC) using tools like Kustomize or Helm. Manage CI/CD pipelines using Tekton and ArgoCD. Support and More ❯
design and evolution of our API schemas, ensuring they meet the complex demands of a rapidly growing platform. Champion best practice in code quality, automated testing (Vitest, Playwright) and observability to deliver resilient, maintainable, and production-ready business logic. Drive DevOps excellence by collaborating on CI/CD pipelines (Jenkins, Concourse), containerisation (Docker) and Kubernetes deployments. Mentor and empower fellow More ❯
consistency, repeatability, and auditability across environments Develop and maintain developer tooling and golden templates (CI/CD pipelines, scaffolds, environments) to standardize best practices across teams Design and implement observability frameworks (metrics, tracing, logging, alerting) that are easy to consume and part of the platform baseline Eliminate repetitive tasks through automation and opinionated defaults, so teams are not blocked by … and orchestration (Docker, Kubernetes) Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, etc.) Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) Knowledge of observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.). Solid grasp of Linux systems and networking fundamentals Strong problem-solving and debugging skills Your Package & Perks: A competitive salary Flexible working More ❯
consistency, repeatability, and auditability across environments Develop and maintain developer tooling and golden templates (CI/CD pipelines, scaffolds, environments) to standardize best practices across teams Design and implement observability frameworks (metrics, tracing, logging, alerting) that are easy to consume and part of the platform baseline Eliminate repetitive tasks through automation and opinionated defaults, so teams are not blocked by … and orchestration (Docker, Kubernetes) Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, etc.) Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) Knowledge of observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.). Solid grasp of Linux systems and networking fundamentals Strong problem-solving and debugging skills Your Package & Perks: A competitive salary Flexible working More ❯
Alto preferred), network access control (802.1x, RADIUS), or zero-trust security concepts. Exposure to infrastructure-as-code (Terraform, Ansible) and version control systems (Git). Experience with monitoring and observability tools (LogicMonitor, Grafana, Prometheus). Knowledge of hybrid cloud networking, including AWS Direct Connect or GCP Interconnect. Relevant certifications such as CCNP, AWS Advanced Networking Specialty, or Google Cloud Network More ❯
Farnborough, Hampshire, South East, United Kingdom
Stott & May Professional Search Limited
play a key role in designing, implementing, and maintaining performance testing frameworks to ensure application reliability, scalability, and efficiency. This role combines expertise in performance engineering, infrastructure automation, and observability to optimise system performance across production and pre-production environments. Your Responsibilities * Design and implement performance testing strategies using tools such as JMeter, Gatling, or LoadRunner. * Monitor and analyse system … such as Jenkins or GitLab CI. * Proficiency in scripting languages such as Python or Bash. * Experience with Infrastructure as Code tools (Ansible, Terraform). * Strong knowledge of monitoring and observability tools (Prometheus, Grafana). * Familiarity with Linux environments and cloud platforms (AWS, Azure, or GCP). * Excellent analytical, problem-solving, and communication skills. Desirable Skills: * Experience with Kubernetes and container More ❯
Wokingham, Berkshire, South East, United Kingdom Hybrid / WFH Options
Sanderson Government and Defence
for a sharp-minded Site Reliability Engineer to join our cloud-native mission in Azure. If you thrive in Agile teams, live for automation, and know your way around observability stacks and CI/CD pipelines - this is your playground. What you'll be doing: Automating deployment, monitoring & infrastructure with precision Owning platform reliability, performance & SLAs Building IaC with Helm More ❯
ground models with enterprise data (SharePoint, Dataverse, SQL, Azure AI Search/RAG). Craft, test and version prompts ; define evaluation metrics, safety rails and guardrails. Implement telemetry/observability (App Insights/Kusto), A/B tests and continuous improvement loops. Work with Security/Compliance on data access, DLP, retention and audit ; follow least-privilege and secure-by More ❯
to improving scalability, performance, and reliability across distributed systems, while also mentoring engineers and shaping technical direction. Key Responsibilities Build scalable, resilient microservices & platform components Optimize performance, reliability, and observability Contribute to system architecture & simplification Ensure clean, tested, high-quality code Mentor engineers and set technical standards Produce clear documentation & architectural diagrams What They're Looking For 5+ years' experience More ❯
insight, and proactive incident management. Key Responsibilities Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
functional teams delivering and maintaining large-scale digital platforms, ensuring high availability, scalability, and resilience. The role requires a blend of technical depth and leadership capability particularly in automation, observability, and mentoring team members. Key Skills & Experience: DevOps/SRE experience (5+ years) – ownership of projects, strong automation and Infrastructure-as-Code approach, incident management, and leadership of initiatives. Terraform … state management, and AWS integration. Kafka – experience with production clusters, scaling, tuning, troubleshooting, and event-driven systems. MongoDB – strong admin experience including replication, sharding, tuning, and backups. Monitoring/Observability – Prometheus, Grafana, ELK, Datadog, with strong alerting/SLO design. AWS – expertise across EC2, VPC, S3, RDS, IAM, ALB/NLB, and cost optimisation. Linux – advanced administration, performance debugging, and More ❯
UKIC DV Cleared Site Reliability/DevOp Engineer London - 5 Days Onsite Up to £550 per day (Umbrella, Inside IR35) 12-Month Contract Must hold UKIC DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join More ❯
South West London, London, United Kingdom Hybrid / WFH Options
Purview Consultancy Services Ltd
and agentic workflows Drive architectural reviews for LlamaParse/Azure Document Intelligence integration Design fault-tolerant, high-availability AI systems with automatic failover and load balancing Establish comprehensive monitoring, observability, and performance optimization strategies Mentor technical teams and establish AI engineering best practices using modern toolchains Oversee model performance evaluation using LangGraph evals and DeepEval frameworks More ❯
Suite Architect to lead design, automation, and modernisation initiatives across multiple customer environments. This role will focus on developing scalable cloud templates, orchestrating virtual infrastructure, and driving automation and observability using VMware's Aria and NSX technologies. The ideal candidate will combine deep technical expertise with strong communication and customer engagement skills, acting as both an architect and a hands More ❯
for a 2-month initial contract. You will build and harden Node.js microservices running on Azure Container Apps, orchestrating asynchronous, file-based and Delta-linked workflows with strong reliability, observability, and security. Key responsibilities: Design and implement Node.js/TypeScript services (Express/lightweight HTTP) for async job orchestration. Implement FIFO/round-robin workers, leases/heartbeats, retries/ More ❯
company's customer experience (CX) vision. You will collaborate closely with other software engineers, product teams, and AI specialists to develop LLM AI-powered applications, ensuring their scalability, security, observability and performance. This role is hands-on, with a primary focus on coding, testing, and deploying AI solutions in a fast-paced, agile environment. Responsibilities: Code Development and Testing Write More ❯
specialism in vulnerability management Self-starter, able to work in technical detail and motivate a diverse group of stakeholders to build sponsorship for significant and impactful change Desired: Establishing observability platforms Capabilities adjacent to exposure/vulnerability management capabilities (ie cyber security asset management, attack surface management, etc) Pragmatic application of zero-trust philosophies Cloud based security (GCP, AWS and More ❯
South West London, London, United Kingdom Hybrid / WFH Options
Purview Consultancy Services Ltd
Intelligence Implement advanced RAG systems with text-embedding-3-large and Azure DB for Postgres Lead hands-on development using Claude Code for rapid agentic workflow creation Establish AI observability and monitoring using Arize Phoenix and Azure AI Foundry Fine-tune and optimize Azure OpenAI GPT-5 models for financial document understanding Implement comprehensive evaluation strategies using LangGraph evals and More ❯
business processes. (LEAD) Familiarity with Microsoft Power Platform concepts, including Power Automate, Power Apps, and Dataverse. (LEAD) Experience applying Generative AI and prompting techniques. Strong understanding of AI governance, observability, and compliance frameworks. Proven ability to deliver secure, scalable, and responsible AI solutions. Excellent communication and presentation skills Extensive experience working collaboratively with diverse colleagues and stakeholders. Knowledge of the More ❯