data science teams to deliver AI-enhanced features and intelligent automation. Guide the integration of AI/ML into both engineering workflows and customer-facing capabilities. Establish and evolve observability practices including structured logging, distributed tracing, and real-time alerting. Promote a culture of automation across testing, deployment, infrastructure, and compliance. Partner with QA and DevOps to implement shift-left … CI/CD pipelines, and infrastructure as code (IaC). Demonstrated experience with AI/ML technologies and their practical application in product development or engineering efficiency. Familiarity with observability stacks and SRE practices. Proficiency in TDD, BDD, and integrating quality gates into the development lifecycle. Extensive experience with multi-tenant SaaS architectures and managing performance at scale. Experience with … on multiple concurrent initiatives. Ability to balance technical depth with strategic thinking and business alignment. Tools Development & Deployment:GitHub, Docker, Kubernetes AI/ML:Azure AI, OpenAI, and similar Observability:Dynatrace, New Relic, Grafana, or similar QA & Testing:Selenium, Playwright, Postman, Cucumber, or similar Automation & IaC:Terraform, Ansible, Bicep, or similar Incident Management:PagerDuty, Opsgenie, or similar Security & Compliance:Snyk More ❯
retry logic, circuit breakers, andrate-limiting to ensure the APIs can withstand transient failures. Use techniques such as load balancing, failover mechanisms, anddistributed architectures to improve fault tolerance. Monitoring & Observability: Set up and maintain real-time monitoring and alerting using tools likePrometheus, Grafana, ELK stack, Datadog, or New Relic. Ensure comprehensive logging, tracing, andmetrics collection (e.g., through OpenTelemetry,Jaeger, or … role. Strong expertise in designing and building RESTful APIs using Node.js. Experience in building highly available, fault-tolerant systems that can handle production-level traffic. Proficiency in monitoring and observability tools (e.g.,Prometheus, Grafana, ELK stack, Datadog, New Relic Experience with resilience patterns such ascircuit breakers, retry logic, andrate limiting. Deep understanding of API security best practices (OAuth2, JWT, API More ❯
and CTO Ref # Description & Requirements Our Team: The Public Cloud Engineering organization provides a suite of services to facilitate Bloomberg's usage of public cloud. From security, to observability, to networking, to access management, to compute, our organization provides the foundational building blocks on which Bloomberg's solutions on public cloud are built. Within this organization, our team provides … kubernetes clusters, for deploying containerized workloads Utilities to deploy and manage virtual machines and kubernetes clusters on public cloud Integrations with other aspects of public cloud lifecycle, such as observability, security, and access management What's in it for you: You will be part of a team that is building the foundation to support a multi-cloud environment for public More ❯
optimize and maintain Kubernetes environments and CI/CD pipelines. Develop and refine automation scripts to enhance system reliability, including automated recovery and self-healing capabilities. Build and maintain observability frameworks, integrating metrics, logging, and tracing tools for proactive issue identification. Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience. A minimum of … stack, or Datadog). Experience with Elastic will be highly helpful with this position. Hands-on experience with incident response, including designing and improving incident management processes. Expertise in Observability practices, including metrics, logs, traces, and understanding of distributed tracing tools (e.g., OpenTelemetry). Strong problem-solving skills with a focus on building resilient, fault-tolerant systems. Excellent communication skills More ❯
Are you an experienced Test Analyst with a background in secure or classified programmes, ready to contribute to projects of national importance? Step into a role where you’ll challenge the complex to strengthen security and help protect what matters More ❯
the future team through recruitment and onboarding. Required Skills - We're primarily using AWS, utilising Lambda, ECS, SQS, API Gateway among others. Our database engine is MongoDB and our observability platform is Datadog. Our application is written in Typescript/NodeJS and our infrastructure is defined in Terraform. Experience working with JavaScript/TypeScript but also open to other languages More ❯
Luton, Bedfordshire, United Kingdom Hybrid / WFH Options
Stott and May
teams, using existing templates and best practices. Assess applications hosted in AWS Cloud to define best practices for integration with Azure CCaaS. Contribute to the design and integration of observability tooling into central dashboards and monitoring systems. Skills, Knowledge & Experience Strong experience with Microsoft Azure PaaS solutions. Expertise in CCaaS and MS Dynamics 365 . Proven capability in Azure infrastructure More ❯
F# are welcome) Proven track record of building and scaling distributed backend systems Solid understanding of infrastructure-as-code and cloud orchestration (AWS, Terraform, Docker) Familiarity with queue management, observability tooling, and shipping in fast-paced environments Awareness of GenAI and prompt engineering, or a keen interest to develop expertise in this area A self-starter attitude, with a strong More ❯
leaks, and performance bottlenecks Turn research prototypes into robust, production-ready software modules Lead architecture discussions and enforce clean, scalable design patterns Drive engineering standards across CI/CD, observability, and system modularisation Mentor developers through code reviews, pair programming, and design walkthroughs Bridge the gap between research and deployable robotics software-across embedded and cloud platforms What we're More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Method Resourcing
teams to operationalize models and ship ML-powered features into production. Continuously assess and iterate on production models, balancing long-term ML strategy with tactical improvements. Champion code quality, observability, and resilience within their ML systems through reviews and hands-on contributions. Help shape their internal ML standards and practices, ensuring they stay ahead of industry advancements. Offer technical mentorship More ❯
Oxfordshire, England, United Kingdom Hybrid / WFH Options
Humand Talent
to deepen your expertise in these areas. Nice-to-Have Experience (or Areas You’ll Grow Into) Working with distributed systems or IoT-style devices. Building internal tools or observability dashboards. Firmware, device protocols, or embedded development exposure. Systems performance optimisation or memory/latency tuning. Python frameworks like FastAPI or similar. Any work in telemetry, edge computing, or robotics. More ❯
Reading, England, United Kingdom Hybrid / WFH Options
Humand Talent
to deepen your expertise in these areas. Nice-to-Have Experience (or Areas You’ll Grow Into) Working with distributed systems or IoT-style devices. Building internal tools or observability dashboards. Firmware, device protocols, or embedded development exposure. Systems performance optimisation or memory/latency tuning. Python frameworks like FastAPI or similar. Any work in telemetry, edge computing, or robotics. More ❯
Oxford, England, United Kingdom Hybrid / WFH Options
Humand Talent
to deepen your expertise in these areas. Nice-to-Have Experience (or Areas You’ll Grow Into) Working with distributed systems or IoT-style devices. Building internal tools or observability dashboards. Firmware, device protocols, or embedded development exposure. Systems performance optimisation or memory/latency tuning. Python frameworks like FastAPI or similar. Any work in telemetry, edge computing, or robotics. More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Humand Talent
to deepen your expertise in these areas. Nice-to-Have Experience (or Areas You’ll Grow Into) Working with distributed systems or IoT-style devices. Building internal tools or observability dashboards. Firmware, device protocols, or embedded development exposure. Systems performance optimisation or memory/latency tuning. Python frameworks like FastAPI or similar. Any work in telemetry, edge computing, or robotics. More ❯
banbury, south east england, united kingdom Hybrid / WFH Options
Humand Talent
to deepen your expertise in these areas. Nice-to-Have Experience (or Areas You’ll Grow Into) Working with distributed systems or IoT-style devices. Building internal tools or observability dashboards. Firmware, device protocols, or embedded development exposure. Systems performance optimisation or memory/latency tuning. Python frameworks like FastAPI or similar. Any work in telemetry, edge computing, or robotics. More ❯
oxford district, south east england, united kingdom Hybrid / WFH Options
Humand Talent
to deepen your expertise in these areas. Nice-to-Have Experience (or Areas You’ll Grow Into) Working with distributed systems or IoT-style devices. Building internal tools or observability dashboards. Firmware, device protocols, or embedded development exposure. Systems performance optimisation or memory/latency tuning. Python frameworks like FastAPI or similar. Any work in telemetry, edge computing, or robotics. More ❯
Streaming Data Strategy with a comprehensive approach to data control, compliance, and security; unconstrained by their infrastructure providers. Our platform mitigates data security risks while enhancing communication, automation, and observability across data flows, enabling teams to collaborate effortlessly across the organisation. With hubs in London and New York, we're looking for people who are passionate about our mission and More ❯
the architecture of our platform: modular, secure, scalable, and maintainable from day one Define integration patterns across internal services and third-party providers Own key infrastructure choices (messaging systems, observability, deployment strategies, etc.) Collaborate closely with Product Managers, Designers, and Mobile Engineers to shape end-to-end journeys Be hands-on in code when needed, but primarily act as a More ❯
technical skills with business insight We'd love to see: - Master's degree or equivalent experience or certification such as a CFA charter holder or CAIA - Experience with data observability platforms (e.g., Great Expectations, Soda) - Exposure to distributed systems like Spark - Background in statistics or machine learning - Experience with navigating complex data environments Salary Range = 110000 - 190000 USD Annually + More ❯
action to restore functionality for both simple and complex issues. Proactively acquire new knowledge and stay current with emerging trends, technologies, and patterns to improve product availability, reliability, efficiency, observability, and performance, while driving consistency in monitoring and operations at scale. Embody and promote our culture and values in day-to-day work and decision-making. Benefits/perks listed More ❯
family delivers infrastructure-as-code, policy-as-code, and self-service provisioning across AWS, Azure, and GCP. Our portfolio includes account and subscription management, networking, storage, containers, serverless, and observability tooling to enable secure, scalable, and automated cloud environments. Key responsibilities Product Strategy and Vision: Develop and communicate the vision and strategy for digital product(s). Market Research: Conduct More ❯
Brooklyn Park, Minnesota, United States Hybrid / WFH Options
Innova
CSS skills including layout systems and responsive design Experience with Git, unit testing (Jest, Mocha), and UI testing (Cypress, Playwright) Proven collaboration with UX and product teams Familiarity with observability tools (OpenTelemetry, Grafana) Understanding of RESTful services and backend integration Bonus PointsExperience with AI-assisted coding tools (GitHub Copilot, ChatGPT) Familiarity with Java or Kotlin Exposure to design systems (Figma More ❯
applications of AI for the construction domain, pushing the boundaries of what's possible. Build core infrastructure that allows us to build and ship LLM apps quickly - this includes observability, how we work with several LLM providers + our own fine tuned models. Work with other engineers in the product and research teams to bring new models and applications to More ❯
others in the team. You have a bias to simplicity, where you care most about achieving impact Bonus Experience with evaluation harnesses and frameworks for Generative AI Experience with observability, monitoring, and safety techniques for deployed GenAI systems Experience in strongly typed languages such as Go The Company Our mission is to be the definitive food company. We are transforming More ❯
Reliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the Site Reliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks … Linux administration, scripting, and network security protocols. Experience with cloud services (preferably AWS – EC2, RDS, S3, Lambda). Desirable: Experience coding in Java, Go, or Python; cross-domain technologies; observability patterns; and service management environments. Why Join TwinStream? Salary: £65,000–£95,000 (DOE & clearance level) Pension: 8% employer contribution Private Healthcare: Includes dental & optical cover for you & your family More ❯