Sheffield, South Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
VANLOQ LIMITED
and containerised environments Required Skills: Proven experience in Python development & FastAPI Strong knowledge of PostgreSQL database administration Excellent problem-solving, debugging, and analytical skills Nice to Have: Exposure to observability tools ( Prometheus, Grafana, OpenTelemetry ) Experience with enterprise tools (Control M, True Sight, Guardium, Tenable Nessus, Delinea) Understanding of security and software development in highly regulated environments End-to-end experience More ❯
Experience using LLM based workflows and integrating AI capabilities into applications Experience with cloud deployment services (AWS preferred, Azure, GCP) Knowledge of containerization and orchestration (Docker, Kubernetes) Experience with observability tools (Prometheus, Grafana, New Relic, Datadog More ❯
Newcastle Upon Tyne, Tyne and Wear, England, United Kingdom Hybrid / WFH Options
Lorien
cloud infrastructure on Azure or AWS. Driving Infrastructure as Code (IaC) practices using Terraform. Building and optimising CI/CD pipelines to accelerate delivery. Implementing and maintaining monitoring and observability with Prometheus and Grafana. Enabling team collaboration and incident response through Slack and other ChatOps tools. Leading, mentoring, and supporting engineers (or preparing to step into people management if you More ❯
Leighton Buzzard, Bedfordshire, United Kingdom Hybrid / WFH Options
Big Red Recruitment Midlands Limited
voice will be critical in its evolution. KEY RESPONSIBILITIES - Oversee day-to-day platform operations, including monitoring, incident response and trouble shooting. - Moving across orchestration, automation, pipelines, cloud services, observability and security domains. - Leading and managing short and long term project planning. - Developing and implementing cloud governance, security and compliance. - Leading automation and IaC improvements. - Providing mentorship and professional development More ❯
Track, SonarQube, SBOM generation). Oversee infrastructure, authentication, access control, patching, and hardening for Linux and Windows systems. Implement monitoring and logging (Grafana, Prometheus, EFK/ELK) and ensure observability of systems. Coach and enable development teams, defining and promoting DevOps best practices. Technologies & Tools: Docker, Kubernetes, Helm, Rancher, GitLab CI/CD, JFrog Artifactory, VMware vSphere, Linux/Windows More ❯
test solutions and automation frameworks using Python, Terraform, and modern cloud-native practices. Contribute to the platform’s CI/CD pipeline by integrating automated testing, resilience checks, and observability hooks at every stage. Lead initiatives that drive testability, platform resilience, and validation as code across all layers of the ML platform stack. Collaborate with engineering, MLOps, and infrastructure teams More ❯
design and evolution of our API schemas, ensuring they meet the complex demands of a rapidly growing platform. Champion best practice in code quality, automated testing (Vitest, Playwright) and observability to deliver resilient, maintainable, and production-ready business logic. Drive DevOps excellence by collaborating on CI/CD pipelines (Jenkins, Concourse), containerisation (Docker) and Kubernetes deployments. Mentor and empower fellow More ❯
Cambridge, Cambridgeshire, England, United Kingdom
Opus Recruitment Solutions Ltd
. Build and maintain automated test suites for APIs, service components, and AI pipelines. Automate evaluation of AI outputs for accuracy , safety , and consistency . Define quality metrics and observability hooks in collaboration with engineering teams. Ensure compliance with AI regulations and standards such as NIST AI RMF and the EU AI Act . Conduct threat modelling and security testing More ❯
etc.) Comfort with basic computer administration including software installation, system configuration, and networking. Comfort with git and automated build pipelines (Jenkins, GitLab CI/CD, etc.) Preferred Passion for observability (Elastic, APM, Grafana, etc.) Experience integrating software with a Large Language Model (LLM) Experience with retrieval-augmented generation (RAG) Production-grade software development experience with Python Service containerization and deployment More ❯
through coaching, recruitment, and career development aligned with DDaT frameworks. Excellent development skills, with a depth of experience including C#, Java (Spring Boot, JPA/Hibernate), REST API's, observability and monitoring, queue technologies and security. Detailed knowledge of best practices such as SOLID principles Experience of building new and evolving microservices with emphasis on high availability and data integrity. More ❯
mindset, from commit to production Collaborate directly with end-users and internal teams to understand needs and deliver value Operate across multi-cloud environments (AWS, GCP, Azure) Drive system observability and reliability with tools like Datadog Help shape our engineering culture by mentoring, sharing knowledge, and encouraging best practices Push boundaries, challenge assumptions, and ensure delivery of meaningful solutions Tech More ❯
Actions (Preferred) or similar. Working knowledge of Kubernetes and GPU scheduling, including setup of GPU-enabled clusters and deployment of GPU workloads in Kubernetes. Familiarity with GPU monitoring and observability, using tools such as Prometheus, Grafana, NVIDIA Data Center GPU Manager (DCGM), or custom scripts. Proven ability to analyze deployment approaches for GPU-accelerated serving frameworks and deliver reference implementations. More ❯
in piloting new products. Qualifications: Strong experience with Confluent Kafka and AWS cloud, including experience in building and operating solutions for high-scale distributed systems. Prior experience with enabling "Observability" using tools for Distributed tracing, Event logging, APM Synthetic monitoring. Understanding of SRE Practices. Experience in Automation. Experience in building self-service platforms. Prior experience with web services and messaging More ❯
teams. Desirable Microsoft Azure certifications. GraphQL (e.g. HotChocolate). Exposure to Kafka or other event-driven platforms. Knowledge of DevOps/IaC (Docker, Azure DevOps). Familiarity with Azure observability, identity, and security tools. Gitflow knowledge. Personal Qualities Customer-focused and improvement-driven. Positive, proactive, and collaborative. Strong problem-solving and influencing skills. Committed to personal and team development. Think More ❯
clients products Finding opportunities to exploit cloud native technologies with clients' products Being part of designing and delivering cloud-native applications that deliver on key architectural requirements (scalability, reliability, observability, secure etc) and DevOps best practices Providing technical guidance, mentoring, and support to the development teams and other architects Designing applications that can be supported and maintained Your key skills More ❯
experience in Azure and GCP) Kubernetes (AWS EKS) and container infrastructure IAM and managing cloud identities at-scale Secure development and application of IAC solutions (Terraform, Helm) Cloud-native observability and management tools Development experience in Go, Python and Rust PREFERRED QUALIFICATIONS Bachelor's degree in computer science or a related field and/or candidates with equivalent job experience More ❯
ISO 27001, CIS) and support audit readiness for GenAI deployments. - Evaluate and enforce IAM, data residency, and privacy controls across AWS, Azure, and hybrid environments. Operational Readiness & Monitoring - Build observability and monitoring frameworks for GenAI systems using tools like Prometheus, Grafana, ELK, and Langfuse. - Develop automated pipelines for model deployment, evaluation, and rollback using Kubernetes, Helm, and CI/CD More ❯
Newcastle Upon Tyne, Tyne and Wear, North East, United Kingdom
Anson Mccade
to deliver infrastructure that supports high-performance applications and services Lead automation initiatives using tools like Terraform , Ansible , or scripting languages (e.g. PowerShell, Python) Drive improvements in infrastructure monitoring, observability, and incident response Evaluate and introduce new technologies to improve scalability, availability, and security Support endpoint management and enterprise IT systems (e.g. Intune, SCCM, JAMF) Contribute to disaster recovery and More ❯
of high-impact integration solutions across services and platforms. Collaborate on reusable API assets such as SDKs, templates, shared schemas, and common middleware. Implement robust error handling, logging, and observability across services and endpoints. Promote automation of API tests, documentation, contract validation, and pipeline integration. Collaboration & Engineering Maturity Act as a subject matter expert for APIs across squads and tribes More ❯
including ontologies (OWL/RDF) and graph databases (e.g., Neo4j). Familiarity with the concepts behind the Model Context Protocol (MCP) or similar advanced agentic architectures. Experience with modern observability stacks, particularly OpenTelemetry. Experience designing multi-tenant enterprise software platforms. Knowledge of enterprise security patterns and identity management systems. More ❯
services/message buses and other architectural elements Deploy these applications using features such as containers to cloud leveraging CI/CD to support this process backed with good observability when running these in production Ensure quality through the creation of documentation and use of unit/integration/contract testing with a consideration of security/performance requirements We More ❯
warrington, cheshire, north west england, united kingdom
Accenture
services/message buses and other architectural elements Deploy these applications using features such as containers to cloud leveraging CI/CD to support this process backed with good observability when running these in production Ensure quality through the creation of documentation and use of unit/integration/contract testing with a consideration of security/performance requirements We More ❯
bolton, greater manchester, north west england, united kingdom
Accenture
services/message buses and other architectural elements Deploy these applications using features such as containers to cloud leveraging CI/CD to support this process backed with good observability when running these in production Ensure quality through the creation of documentation and use of unit/integration/contract testing with a consideration of security/performance requirements We More ❯
Exposure to AI/ML-based testing, self-healing tests, or model-based test generation. Understanding of platform thinking and working in centralized enablement teams. Hands-on experience with observability and monitoring tools for test impact analysis. Educational Requirements Bachelor's Degree in Computer Science, Engineering, or a related field is required. Why Join Medline? Be a key contributor to More ❯
running on Java 21. We're in the process of moving our backend services to Spring Boot. We've invested heavily in our DataDog integration to bring world class observability and monitoring to our systems. We've recently moved to Gitlab and are currently building out our next generation of automated deployment pipelines. We've incorporated some of the best More ❯