london, south east england, united kingdom Hybrid / WFH Options
DraftKings Inc
through others. Deep knowledge of microservices architecture, containerization and orchestration paradigms, messaging, and streaming systems—familiarity with .NET, Kafka, and Kubernetes technologies is a plus. Strong background in system observability, monitoring, and alerting. Understanding of infrastructure and CI/CD tooling in a DevOps culture. A culture of testing automation at multiple levels - from Unit tests to domain-wide end More ❯
DevOps: you build it, you run it. Tech Stack M&S uses a variety of technologies including; Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis Everyone's Welcome We are ambitious about the future of retail. We're disrupting, innovating and leading the industry into a more conscientious, inspiring digital More ❯
DevOps: you build it, you run it. Tech Stack M&S uses a variety of technologies including; Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis Everyone's Welcome We are ambitious about the future of retail. We're disrupting, innovating and leading the industry into a more conscientious, inspiring digital More ❯
assurance (QA), deployment, and infrastructure pipelines. Driving the adoption of cloud-native tools and automation, enhancing integration, scalability, and cost efficiency. Developing and refining cloud support processes, improving reliability, observability, and user experience. Like many organisations we need to maintain our services 24/7, therefore, on occasions there may be a requirement to work out of hours, for which More ❯
assurance (QA), deployment, and infrastructure pipelines. Driving the adoption of cloud-native tools and automation, enhancing integration, scalability, and cost efficiency. Developing and refining cloud support processes, improving reliability, observability, and user experience. Like many organisations we need to maintain our services 24/7, therefore, on occasions there may be a requirement to work out of hours, for which More ❯
assurance (QA), deployment, and infrastructure pipelines. Driving the adoption of cloud-native tools and automation, enhancing integration, scalability, and cost efficiency. Developing and refining cloud support processes, improving reliability, observability, and user experience. Like many organisations we need to maintain our services 24/7, therefore, on occasions there may be a requirement to work out of hours, for which More ❯
of Go Lang or Java, with hands-on experience building scalable services. Ability and willingness to enhance existing Go Lang backend services regardless of specialisation. Experience with working with observability stack (logging, metrics,tracing). Expertise in building RESTful APIs following company standards. Understanding of Domain-Driven Design and Modularization concepts. Asynchronous processing with approaches like co-routines, messages queuing More ❯
product planning, roadmap discussions, and strategic prioritization. Operational Excellence Own key engineering KPIs including system uptime, velocity, tech debt reduction, and deployment frequency. Drive cloud infrastructure cost-efficiency, system observability, and DevSecOps maturity. Lead incident management and escalation processes with customer sensitivity and transparency. Qualifications: 10+ years in software engineering, including 5+ years in engineering leadership roles. Proven experience building More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
RVU Co UK
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
london, south east england, united kingdom Hybrid / WFH Options
Morae Services India Private Limited
product planning, roadmap discussions, and strategic prioritization. Operational Excellence Own key engineering KPIs including system uptime, velocity, tech debt reduction, and deployment frequency. Drive cloud infrastructure cost-efficiency, system observability, and DevSecOps maturity. Lead incident management and escalation processes with customer sensitivity and transparency. Qualifications: 10+ years in software engineering, including 5+ years in engineering leadership roles. Proven experience building More ❯
Kubernetes) at scale Experience working with a cloud provider (AWS/Azure/GCE), or sysadmin/SRE experience in data centers Experience designing, building, and operating high-scale observability or infrastructure systems Working knowledge of networking fundamentals, experience with CNIs or cloud networking infrastructure preferred What We Require 4+ years of professional software development experience on core infrastructure with More ❯
Malvern, Worcestershire, United Kingdom Hybrid / WFH Options
QinetiQ Limited
the evaluation of the performance of LLMs in different contexts. Accountabilities: Understands the technical aspects of the project and the wider customer business model. Solution architecture, including security, availability, observability, scalability, performance, reliability, and cost-efficiency. Ensures team members understand and adhere to project standards for quality, documentation, techniques and tools. Identifies, escalates & manages technical risk with Team Manager and More ❯
in Python Proficiency in designing and executing complex prompt strategies and intput/output data validation models to achieve desired outputs from LLMs Experience monitoring AI applications using popular observability tools (e.g. Langfuse, Langsmith) to ensure seamless performance and monitoring Strong skills in data transformations for both structured and unstructured data; ability to integrate these processes into scalable pipelines Experience More ❯
in Python Proficiency in designing and executing complex prompt strategies and intput/output data validation models to achieve desired outputs from LLMs Experience monitoring AI applications using popular observability tools (e.g. Langfuse, Langsmith) to ensure seamless performance and monitoring Strong skills in data transformations for both structured and unstructured data; ability to integrate these processes into scalable pipelines Experience More ❯
in Python Proficiency in designing and executing complex prompt strategies and intput/output data validation models to achieve desired outputs from LLMs Experience monitoring AI applications using popular observability tools (e.g. Langfuse, Langsmith) to ensure seamless performance and monitoring Strong skills in data transformations for both structured and unstructured data; ability to integrate these processes into scalable pipelines Experience More ❯
a live service for users Experience with understanding network architectures and troubleshooting network-related issues using Linux tools In-depth expertise in at least one of: Kubernetes, TerraForm, Networking, Observability Flexibility and mobility are required to deliver this role as there may be requirements to spend time onsite with our clients and partners to enable delivery of the first-class More ❯
a live service for users Experience with understanding network architectures and troubleshooting network-related issues using Linux tools In-depth expertise in at least one of: Kubernetes, TerraForm, Networking, Observability Flexibility and mobility are required to deliver this role as there may be requirements to spend time onsite with our clients and partners to enable delivery of the first-class More ❯
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
Cardiff, South Glamorgan, Wales, United Kingdom Hybrid / WFH Options
Confused.com
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
AI team. I’m Responsible For... Delivering robust, fully tested, maintainable software that impacts end users Designing and implementing production-ready scalable NLP applications and APIs Developing monitoring and observability solutions and integration testing frameworks Conducting code reviews and providing constructive feedback to team members Ensuring the scalability, performance, and reliability of AI applications Staying up-to-date with the More ❯
distributed service architectures, including how best to test and release them, and how to ensure system stability when making changes independent of other services. You are able to use Observability tooling to understand, diagnose, improve, debug, measure and visualise platform health. You are up-to-date with the latest technologies including AI for example Machine Learning for personalisation or automation More ❯
and support of existing software systems, ensuring prompt resolution of issues and bugs. Tech stack The team uses the following core technologies: Java/Kotlin GraphQL Federation Cloud: Azure Observability: Dynatrace Who you are Previous polyglot hands-on senior software engineer. Experience working on highly scalable software solutions across web or backend. Extensive background in software engineering with several years More ❯
enablement teams, to promote these through regular knowledge sharing sessions. Accountable for operational efficiency - drive improvements in efficien cy , reliab ility , and scala bility supported by logging , monitoring and observability as a foundational capability. Responsible for adoption - promote the platform capabilities through technical communities of practice leadership, high internal standards for documented processes and internal guides, an d take steps More ❯
Applied AI team. iM Responsible For... Delivering robust, fully tested, maintainable software that impacts end users Designing and implementing production-ready scalable NLP applications and APIs Developing monitoring and observability solutions and integration testing frameworks Conducting code reviews and providing constructive feedback to team members Ensuring the scalability, performance, and reliability of AI applications Staying up-to-date with the More ❯