for leading and executing the migration of data, dashboards, alerts, and configurations from Splunk systems to Elasticsearch. This role involves deep technical expertise in Splunk architecture, data ingestion, and observability tools, along with strong project management and stakeholder communication skills.Must have skills: -Splunk -ELK Stack -KibanaNice to have skills: -stakeholder communication skills -strong project managementDetailed Job Description: -Ability to deploy More ❯
Milton Keynes, Buckinghamshire, South East, United Kingdom
Interact Consulting Limited
or strong interest in learning) cloud-native tooling: AWS (especially CloudWatch) Artifact Management (e.g., Artifactory, CodeArtifact) Infrastructure as Code with Terraform Monitor test metrics, troubleshoot failures, and improve system observability and debuggability. More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Tank Recruitment
backend services/databases. Experience with TDD and testing frameworks such as Jest and Pact. Knowledge of CI/CD pipelines (ideally GitHub Actions). Hands-on experience with observability/monitoring tools (e.g., DataDog). A proactive, problem-solving mindset with the ability to work both independently and collaboratively. Senior Software Engineer Location: London (Hybrid 2 Days a week More ❯
of student lifecycle processes in Higher Education and relevant data domains. Knowldge of event-driven and message-based architectures (Event Hub, Kafka, or Service Bus) Experience with monitoring and observability tools like Azure Monitor, Application Insights, and Log Analytics. Awareness of data security, GDPR, and compliance in educational or public sector environments. Exposure to OpenAPI/Swagger, API lifecycle management More ❯
and maintain secure, event-driven integrations via webhooks and callback mechanisms Lead the backend design of new products and services, ensuring interoperability and long-term maintainability Optimize for resilience, observability, and fault-tolerant behavior across distributed cloud systems Deploy infrastructure using AWS CDK and build with modern tools like Golang, TypeScript, Python, and PHP Ensure clean interface contracts and clear More ❯
Tool to production by building and supporting ML-driven applications. Furthering Developer Experience (DevEx) by mentoring others in writing code that is intuitive, clear, and easy to test Developing observability for new and existing ML applications and GenAI/LLM integrations, making use of the Grafana Stack (Prometheus, Loki, Tempo) Working closely with Data Scientists and ML Engineers throughout the More ❯
scalability and reduce manual intervention. Operational Security, SRE & Assurance: Ensure security platforms are resilient, continuously monitored, and designed for 24x7 support and incident response readiness. Embed security telemetry and observability to enable proactive threat detection and automated response. Apply SRE principles to improve reliability, performance, and maintainability of security services. Define service level objectives (SLOs) and key performance indicators (KPIs More ❯
consistency, repeatability, and auditability across environments Develop and maintain developer tooling and golden templates (CI/CD pipelines, scaffolds, environments) to standardize best practices across teams Design and implement observability frameworks (metrics, tracing, logging, alerting) that are easy to consume and part of the platform baseline Eliminate repetitive tasks through automation and opinionated defaults, so teams are not blocked by … and orchestration (Docker, Kubernetes) Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, etc.) Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) Knowledge of observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.). Solid grasp of Linux systems and networking fundamentals Strong problem-solving and debugging skills Your Package & Perks: A competitive salary Flexible working More ❯
consistency, repeatability, and auditability across environments Develop and maintain developer tooling and golden templates (CI/CD pipelines, scaffolds, environments) to standardize best practices across teams Design and implement observability frameworks (metrics, tracing, logging, alerting) that are easy to consume and part of the platform baseline Eliminate repetitive tasks through automation and opinionated defaults, so teams are not blocked by … and orchestration (Docker, Kubernetes) Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, etc.) Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) Knowledge of observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.). Solid grasp of Linux systems and networking fundamentals Strong problem-solving and debugging skills Your Package & Perks: A competitive salary Flexible working More ❯
consistency, repeatability, and auditability across environments Develop and maintain developer tooling and golden templates (CI/CD pipelines, scaffolds, environments) to standardize best practices across teams Design and implement observability frameworks (metrics, tracing, logging, alerting) that are easy to consume and part of the platform baseline Eliminate repetitive tasks through automation and opinionated defaults, so teams are not blocked by … and orchestration (Docker, Kubernetes) Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, etc.) Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) Knowledge of observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.). Solid grasp of Linux systems and networking fundamentals Strong problem-solving and debugging skills Your Package & Perks: A competitive salary Flexible working More ❯
to improving scalability, performance, and reliability across distributed systems, while also mentoring engineers and shaping technical direction. Key Responsibilities Build scalable, resilient microservices & platform components Optimize performance, reliability, and observability Contribute to system architecture & simplification Ensure clean, tested, high-quality code Mentor engineers and set technical standards Produce clear documentation & architectural diagrams What They're Looking For 5+ years' experience More ❯
Wokingham, Berkshire, South East, United Kingdom Hybrid / WFH Options
Sanderson Government and Defence
for a sharp-minded Site Reliability Engineer to join our cloud-native mission in Azure. If you thrive in Agile teams, live for automation, and know your way around observability stacks and CI/CD pipelines - this is your playground. What you'll be doing: Automating deployment, monitoring & infrastructure with precision Owning platform reliability, performance & SLAs Building IaC with Helm More ❯
South West, England, United Kingdom Hybrid / WFH Options
Interquest
platform - Working deeply with Kubernetes (AKS) alongside a highly skilled team of specialists - Supporting and delivering projects in collaboration with the wider delivery team - Driving best practices around automation, observability, scalability, and performance - Balancing projects with operational support across a growing international environment Whats in this for you? - Salary up to £60k+ - Fully remote global team (no plans to ever More ❯
Birmingham, West Midlands (County), United Kingdom Hybrid / WFH Options
Sherborne Talent Solutions
automation, and optimisation of CI/CD pipelines to drive speed, reliability, and consistency. Manage and optimise Azure infrastructure for scalability, security, performance, and cost control. Champion modern monitoring, observability, and incident management practices to maintain high availability. Partner with engineering, architecture, and product leadership to accelerate delivery and reduce operational friction. Drive adoption of FinOps principles to balance technical More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Huxley
availability, secure deployments, and efficient agent orchestration using AKS. You will create and maintain CI/CD pipelines for Azure services, Semantic Kernel agents, manage Kubernetes clusters, and integrate observability tools to monitor system health and performance. You'll also ensure alignment with enterprise-grade security practices, including zero trust principles, identity-aware routing, and integration with Azure API Management More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Huxley Associates
availability, secure deployments, and efficient agent orchestration using AKS. You will create and maintain CI/CD pipelines for Azure services, Semantic Kernel agents, manage Kubernetes clusters, and integrate observability tools to monitor system health and performance. You'll also ensure alignment with enterprise-grade security practices, including zero trust principles, identity-aware routing, and integration with Azure API Management More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Salt Search
production. Deploy, maintain, and optimise machine learning services within a cloud environment (AWS). Recommend and implement prompt management tools and provide expertise in prompt engineering. Introduce and manage observability, monitoring, and evaluation frameworks for ML and AI services. Enable auto-evaluation of prompts and models against domain-specific requirements. Build Python-based microservices, data pipelines, and serverless functions. Collaborate More ❯
Leeds, West Yorkshire, United Kingdom Hybrid / WFH Options
Tria
within enterprise systems. Strong understanding of cloud platforms (Azure preferred). Knowledge of Infrastructure-as-Code (IaC), APIs, and automation tools. Familiarity with CI/CD pipelines, monitoring, and observability tools. Knowledge of ITSM, Agile, DevOps, and service-level objectives (SLOs) and indicators (SLIs). Excellent problem-solving skills and ability to work in complex, multi-supplier environments. Desirable: Bachelor More ❯
security risks. Own your work: Take responsibility for the design, delivery, and quality of signi cant technical projects. Drive improvement: Champion engineering best practices across testing, CI/CD, observability, and documentation. Collaborate widely: Work closely with Product, DevOps, QA, and Customer Success to build solutions that deliver real business value. Support secure development: Apply secure coding practices and support More ❯
a focus on security, data protection, and performance optimization. Experience managing transport and change governance, incident triage, and root cause analysis. Skilled in monitoring tools like SAP Cloud ALM, observability platforms, and incident management platforms such as Jira or Azure DevOps. Adept at documentation using Confluence and following agile methodologies like Scrum and Kanban. Exceptional stakeholder management and communication skills More ❯
designed infrastructure that scales without slowing anyone down. Tame complex LLM infrastructure (real-time usage, flaky providers, token routing - the lot). Raise the quality bar across the board: observability, auth, reliability, and more. This isn't a role for passengers. It's for engineers who love ambiguity, thrive under pressure, and see infrastructure as a multiplier. What We're More ❯
designed infrastructure that scales without slowing anyone down. Tame complex LLM infrastructure (real-time usage, flaky providers, token routing - the lot). Raise the quality bar across the board: observability, auth, reliability, and more. This isn't a role for passengers. It's for engineers who love ambiguity, thrive under pressure, and see infrastructure as a multiplier. What We're More ❯
team leadership. Expert-level knowledge of AWS and deep hands-on experience with AWS CDK in production environments. Strong background in DevSecOps, infrastructure-as-code, CI/CD, and observability practices. Proven ability to scale cloud platforms in a high-growth, high-regulatory tech environment. Experience building and leading high-performing technical teams across multiple cloud disciplines. Strong understanding of More ❯
Wetherby, West Yorkshire, Yorkshire, United Kingdom
Equals One Ltd
Architecture: Evolve a modular, scalable platform (ECS on AWS), with clear boundaries between ingestion, retrieval, reasoning and delivery. Quality & reliability: Testing (unit/integration/evals), CI/CD, observability (tracing/metrics for LLM and retrieval paths), and performance tuning. Collaboration: Work closely with Product and ELT; mentor engineers; contribute to technical strategy and research. Innovation: Research and recommend More ❯
LS22, Wetherby, City and Borough of Leeds, West Yorkshire, United Kingdom
Handshaik
Architecture: Evolve a modular, scalable platform (ECS on AWS), with clear boundaries between ingestion, retrieval, reasoning and delivery. Quality & reliability: Testing (unit/integration/evals), CI/CD, observability (tracing/metrics for LLM and retrieval paths), and performance tuning. Collaboration: Work closely with Product and ELT; mentor engineers; contribute to technical strategy and research. Innovation: Research and recommend More ❯