Experience using AWS (Serverless) and/or GCP Understand the importance of driving quality into code through test automation Have supported applications in production, with demonstrable experience of good observability practices within a full stack environment. (e.g. Rum, Tracing) Have worked in a collaborative environment with strong engineering practices and know what good engineering looks like Care about the product … mindset, you will have intimate knowledge of our products from code commit through to production operation Supporting production systems with monitoring tools such as Datadog Strive for stable systems observability You will champion our principles, fuel a growth mindset by getting involved in communities and help improve our engineering culture Pushing the boundaries, questioning the status quo, ensuring what we More ❯
platform that epitomises the best of modern cloud practices - leveraging immutable infrastructure, zero trust and effective pipelines to allow our teams to quickly operationalise workloads that have compliance, security, observability, logging, and alerting baked in, and we're growing a team passionate to deliver it. What Experience You'll Bring to the Team Building out an effective, modular, platform leveraging … existing infrastructure to the new platform. Helping us track, understand, and optimize cloud spend Enabling product teams to own and run their own infrastructure, backed by solid provision around observability, alerting, and operability Building out an effective HA/DR approach so that teams are serviced well around these needs Supporting 'secure by default' approaches on the platform, so that More ❯
and frameworks that accelerate development velocity across Samsara's web and mobile applications. Ensure high reliability, performance, and security across the stack by implementing best practices in testing, monitoring, observability, and CI/CD pipelines. Oversee the development lifecycle from planning to deployment, following Agile methodologies to ensure timely and efficient delivery. Champion best practices for API design, authentication, service … for efficient data fetching and API design, with a working knowledge of integrating GraphQL in mobile and web environments. Performance & Security: Deep understanding of web performance optimizations, caching strategies, observability, and security best practices. An ideal candidate also has: Leadership Experience: A proven ability to scale and manage diverse engineering teams and foster a high-performance culture. Collaboration & Communication: Strong More ❯
Picture yourself at one of the world's most innovative companies, providing life-saving AI solutions that can make a real difference in helping to combat one of the world's biggest health burdens. Surrounded by teams and people who More ❯
Complexio's Foundational AI works to automate business activities by ingesting whole company data - both structured and unstructured - and making sense of it. Using proprietary models and algorithms Complexio forms a deep understanding of how humans are interacting and using More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Starling Bank
problems and challenges, who can work across teams do great things here at Starling, to continue changing banking for good. Responsibilities: As a Data Scientist in the Machine Learning Observability & Governance team, you will play a crucial role in enabling Starling Bank to maximally exploit AI in line with its risk appetite, while ensuring ethical and responsible AI practices. Your … responsible. Stakeholder Communication & Visibility: Ensure clear communication and good visibility with stakeholders such as risk teams, regarding how data scientists at Starling observe and manage ML and AI models. Observability Centre of Excellence: Support colleagues in enhancing their observability work by maintaining existing observability tooling, assisting in identifying key metrics to monitor, and providing expert advice on internally-developed model More ❯
platform engineering team as we scale LangSmith and LangGraph Platform products. You'll work in Europe (remotely) and architect and operate the critical systems that power our customers' AI observability and LangGraph app deployments, working directly with cutting-edge technologies at the intersection of AI and distributed systems. Scale critical systems : Design and implement high throughput data-intensive systems supporting … building and operating production systems at scale Infrastructure expertise : Deep knowledge of Kubernetes, containerized infrastructure, cloud platforms (e.g. GCP) Database expertise : Production experience with OSS datastores (PostgreSQL, Redis, Kafka) Observability mastery : Hands-on experience with observability stacks (Datadog, Prometheus/Grafana, OpenTelemetry or similar) Programming proficiency : Strong hands-on software engineering skills (Python, Go, Rust) Operational mindset : "You build it More ❯
Experience with AWS (preferred) or GCP. Infrastructure as Code (Terraform, CloudFormation). Git/GitHub/GitLab. Proficiency in at least one programming language. Understanding of core concepts like observability, networking, and cloud management consoles. Excellent communication in English. Nice-to-Haves: Experience with specific observability tools (Prometheus, Grafana, PagerDuty). Experience with CRM and ticketing systems (Freshdesk, Zendesk). More ❯
a sharp troubleshooter who can solve problems independently, especially within complex, distributed systems. You'll also need a solid grasp of modern containerized environments and how to effectively use observability tools to keep things running smoothly. You'll have the chance to work both independently and collaboratively with a global team, always striving to ensure our applications are stable, performant … Kubernetes. Investigate logs and system behavior, pulling data from pods and containers using tools like kubectl, event viewer, and central logging platforms Monitor application health, performance, and availability levaraging observability platforms like Grafana and Kibana. Test and interact with API endpoints, documenting and validating their functionality using tools like Swagger/OpenAPI and Postman. Respond to support tickets promptly within … and log retrieval. APIs : Skilled in interacting with and testing RESTful APIs using Swagger/OpenAPI and Postman. Elastic Stack : Familiarity with Elasticsearch for log aggregation, indexing, and querying. Observability : Good understanding of monitoring and visualization concepts; experience with Grafana Scripting & Automation : Automation-first mindset, can streamline tasks. Powershell experience is a plus. Diagnostics : Skilled with Event Viewer, IIS Manager More ❯
delivering fast access from any geography Playlist Services: Dynamic path configuration systems optimizing user connectivity in real-time PGM Relays: Infrastructure for reliable multicast data delivery We use automation, observability, and software engineering to detect issues before they impact customers and reduce manual toil wherever we can. What You'll Do Build production-grade software that powers Bloomberg's global … infrastructure Design and implement scalable, fault-tolerant systems with a focus on observability, performance, and automation Collaborate across engineering teams to introduce automated, self-service operational workflows Conduct deep systems analysis and root cause investigations for complex, distributed systems Propose and prototype innovative approaches to reliability and risk mitigation Contribute to design docs, runbooks, and post-incident reviews-clear communication …/or Kubernetes or other Pipeline Management Platforms is a significant advantage. Machine Management at Scale: Experience with capacity planning and automating the lifecycle of large machine fleets. System Observability and Monitoring: Deep understanding of SLIs/SLOs/SLAs, alerting, and building dashboards for complex systems. Reliability in Distributed Systems: Knowledge of fault tolerance and the unique challenges of More ❯
incident tooling (e.g., PagerDuty, Datadog). Technical Expertise required for this engagement: Guide operational practices across services built using Java (Spring Boot) , Kafka , MongoDB and related technologies. Oversee monitoring, observability, and performance tuning using Datadog , ELK , Prometheus , or similar tooling. Problem Management & Root Cause Elimination required: Lead proactive and reactive problem management efforts. Identify recurring production issues and collaborate with … rapid change practices including canary releases, feature flags, and progressive delivery. Continuous Improvement & DevOps Practices: Drive automation and self-service initiatives to reduce manual intervention and operational burden. Champion observability best practices (metrics, traces, logs) and error budget tracking. Promote DevOps culture and continuous feedback loops between engineering and operations. Governance, Risk & Compliance: Ensure operational processes comply with security, privacy More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Moneysupermarket Group
platform underpins critically important services, designed for security, scalability, and developer efficiency. We leverage AWS, container orchestration (ECS), and infrastructure-as-code (Terraform, CDK) to deliver resilient, automated environments. Observability is powered by DataDog, and we're embracing AI-driven tooling (GitHub Copilot, ChatGPT) to accelerate delivery and improve quality. You'll help us evolve this ecosystem to support rapid … and drive smart, scalable data decisions across the Group. Design & Deliver : Architect and implement scalable, secure platform solutions that empower product teams. Innovate & Improve : Drive continuous improvement in automation, observability, and developer experience. Collaborate & Influence : Partner with Product, Security, and Engineering leaders to align platform strategy with business goals. What value you'll bring to the role: Proven experience leading … data engineering. Deep expertise in AWS, infrastructure-as-code (Terraform/CDK), and container orchestration. Strong understanding of database technologies (MongoDB, relational DBs) and data modelling. Passion for automation, observability, and improving developer experience. Comfortable leveraging AI tools to accelerate delivery and improve quality. Excellent communicator who can influence technical and non-technical stakeholders. Our interview process: A call with More ❯
at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … architectures and multi-account AWS setups. Extensive experience with AWS Organisations Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Working with Control Tower and Landing Zones Why Work For Us? Competitive base salary up More ❯
portals, dashboards, internal tools, and web applications. Collaborate closely with DevOps on CI/CD pipelines, deployment workflows, infrastructure, and SecOps compliance. Uphold high standards for code quality, system observability, and technical documentation. Act as the technical lead, setting direction and best practices for the full-stack engineering team. Mentor engineers, providing guidance on architecture, design patterns, and career growth. … cross-functional teams Deep experience with React, TypeScript, .NET Core, SOAP/REST APIs, and MySQL/PostgreSQL, Red Hat OpenShift, Kubernetes Understanding of DevOps, cloud deployments, and service observability Bonus: Interest/experience in AI, digital twins, Nvidia Omniverse SDK & APIs, Universal Scene Description What We Offer : Reimbursement for tuition and professional dues Three weeks of vacation and five More ❯
enhancing our proprietary search engine , indexing and querying structured and unstructured data. Collaborate closely with the AI team to deliver intelligent, contextual responses to user queries. Ensure high performance, observability, and resilience across all backend services. Contribute to technical strategy , code reviews, and overall engineering best practices. You may be suited for this role if you meet the following criteria … 5+ years of backend development experience. Expertise in Python and cloud-based architectures (preferably GCP). Strong understanding of modern software development best practices, including CI/CD, containerization, observability, and microservices . Experience with data integrations and APIs , particularly across enterprise tools. Familiarity with search indexing and large-scale data pipelines is a strong plus. Strong understanding of system More ❯
differentiator. As a Staff Software Engineer in Commercial Trading, your expertise will help us on this journey, creating solutions for the business that are robust and scalable, with good observability and metrics, following best-in-class engineering practice. What's In It For You Being a part of M&S is exactly that - playing your part to bring the magic … and, as part of our modernization drive, will be introducing new ones. The sorts of technologies include: Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis Everyone's Welcome We are ambitious about the future of retail. We're disrupting, innovating and leading the industry into a more conscientious, inspiring digital More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown
documentation practices, including Architectural Decision Records, Solution Memos, and C4 diagrams. Guide cloud architecture choices, particularly around container orchestration and the use of AWS services. Champion best practices for observability, logging, security, and networking. Identify opportunities to enhance Developer Experience and efficiency through smarter tooling and frameworks. Support engineering teams with mentoring, pairing, and skills development. Lead conversations around Event … every level of the organisation. Proven ability to balance trade-offs, costs, and technical constraints. Experience coaching teams towards engineering and architecture best practices. Deep understanding of security, networking, observability, and system flows. Adept at producing clear, concise architectural documentation. Desirable: Previous experience as a Solution or Enterprise Architect. Background in enterprise systems and legacy-to-modern transitions. Familiarity with More ❯
Employment Type: Permanent, Part Time, Work From Home
newport, wales, united kingdom Hybrid / WFH Options
Hargreaves Lansdown
documentation practices, including Architectural Decision Records, Solution Memos, and C4 diagrams. Guide cloud architecture choices, particularly around container orchestration and the use of AWS services. Champion best practices for observability, logging, security, and networking. Identify opportunities to enhance Developer Experience and efficiency through smarter tooling and frameworks. Support engineering teams with mentoring, pairing, and skills development. Lead conversations around Event … every level of the organisation. Proven ability to balance trade-offs, costs, and technical constraints. Experience coaching teams towards engineering and architecture best practices. Deep understanding of security, networking, observability, and system flows. Adept at producing clear, concise architectural documentation. Desirable: Previous experience as a Solution or Enterprise Architect. Background in enterprise systems and legacy-to-modern transitions. Familiarity with More ❯
bath, south west england, united kingdom Hybrid / WFH Options
Hargreaves Lansdown
documentation practices, including Architectural Decision Records, Solution Memos, and C4 diagrams. Guide cloud architecture choices, particularly around container orchestration and the use of AWS services. Champion best practices for observability, logging, security, and networking. Identify opportunities to enhance Developer Experience and efficiency through smarter tooling and frameworks. Support engineering teams with mentoring, pairing, and skills development. Lead conversations around Event … every level of the organisation. Proven ability to balance trade-offs, costs, and technical constraints. Experience coaching teams towards engineering and architecture best practices. Deep understanding of security, networking, observability, and system flows. Adept at producing clear, concise architectural documentation. Desirable: Previous experience as a Solution or Enterprise Architect. Background in enterprise systems and legacy-to-modern transitions. Familiarity with More ❯
bristol, south west england, united kingdom Hybrid / WFH Options
Hargreaves Lansdown
documentation practices, including Architectural Decision Records, Solution Memos, and C4 diagrams. Guide cloud architecture choices, particularly around container orchestration and the use of AWS services. Champion best practices for observability, logging, security, and networking. Identify opportunities to enhance Developer Experience and efficiency through smarter tooling and frameworks. Support engineering teams with mentoring, pairing, and skills development. Lead conversations around Event … every level of the organisation. Proven ability to balance trade-offs, costs, and technical constraints. Experience coaching teams towards engineering and architecture best practices. Deep understanding of security, networking, observability, and system flows. Adept at producing clear, concise architectural documentation. Desirable: Previous experience as a Solution or Enterprise Architect. Background in enterprise systems and legacy-to-modern transitions. Familiarity with More ❯
bradley stoke, south west england, united kingdom Hybrid / WFH Options
Hargreaves Lansdown
documentation practices, including Architectural Decision Records, Solution Memos, and C4 diagrams. Guide cloud architecture choices, particularly around container orchestration and the use of AWS services. Champion best practices for observability, logging, security, and networking. Identify opportunities to enhance Developer Experience and efficiency through smarter tooling and frameworks. Support engineering teams with mentoring, pairing, and skills development. Lead conversations around Event … every level of the organisation. Proven ability to balance trade-offs, costs, and technical constraints. Experience coaching teams towards engineering and architecture best practices. Deep understanding of security, networking, observability, and system flows. Adept at producing clear, concise architectural documentation. Desirable: Previous experience as a Solution or Enterprise Architect. Background in enterprise systems and legacy-to-modern transitions. Familiarity with More ❯
lead senior engineers and technical customers to a desired outcome, without prescribing it Authoritative skills at cloud computing (network, security, serverless, Kubernetes etc) and automation Experience with implementation of Observability and Reliability using market technologies (e.g.: Dynatrace) Good experience with Performance Engineering (load testing, derivations, tuning, core web vitals, page speed etc.) Expertise in reliability testing Able to influence people … highly technical to non-technical Tech Stack M&S uses a variety of technologies across Fulfilment systems, including: Java, Micronaut, GraphQL ReactJS, Next.js Kafka, MongoDB Azure Cloud, Terraform, Dynatrace (observability) Everyone's welcome We are ambitious about the future of retail. We're disrupting, innovating and leading the industry into a more conscientious, inspiring digital era. We're transforming how More ❯
lead senior engineers and technical customers to a desired outcome, without prescribing it Authoritative skills at cloud computing (network, security, serverless, Kubernetes etc) and automation Experience with implementation of Observability and Reliability using market technologies (e.g.: Dynatrace) Good experience with Performance Engineering (load testing, derivations, tuning, core web vitals, page speed etc.) Expertise in reliability testing Able to influence people … highly technical to non-technical Tech Stack M&S uses a variety of technologies across Fulfilment systems, including: Java, Micronaut, GraphQL ReactJS, Next.js Kafka, MongoDB Azure Cloud, Terraform, Dynatrace (observability) Everyone's welcome We are ambitious about the future of retail. We're disrupting, innovating and leading the industry into a more conscientious, inspiring digital era. We're transforming how More ❯
systems. Define, evolve, and manage data schemas and catalogues -from raw staging to high-quality analytics and feature stores-ensuring consistency and discoverability. Build end-to-end monitoring and observability for your pipelines: owning data quality, latency, completeness, and lineage at every stage. Champion secure, governed data practices : access controls, secrets management, encrypted data-in-transit/at-rest, and … Kafka), and 3rd party SaaS integrations, with idempotency and error handling. Storage & Query Engines: Strong with RDBMS (PostgreSQL, MySQL), NoSQL (DynamoDB, Cassandra), data lakes (Parquet, ORC), and warehouse paradigms. Observability & Quality: Deep familiarity with metrics, logging, tracing, and data quality tools (e.g., Great Expectations, Monte Carlo, custom validation/test suites). Security & Governance: Data encryption, secrets management, RBAC/ More ❯
We often fill jobs before they're advertised, get an email as soon as we get a new job matching your search criteria. Our client, a leading consulting company, are seeking a Google Cloud Architect to lead enterprise-scale Google More ❯