926 to 950 of 1,290 Observability Jobs

Principal AI Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Agentic ecosystem, responsible for the high‐level design choices that define how agents run at PhysicsX. You will cover topics such as: Agent Observability: Own the implementation to enforce deep tracing, granular cost tracking, and observability across the lifecycle. Agent Deployment: Deliver an intuitive deployment lifecycle which simplifies questions around … behalf of users in a regulated enterprise environment. The Tech Stack Core Platform: Python (Primary), Go or TypeScript (Secondary), Kubernetes, Docker, Terraform. Observability & Evals: OTel, LangSmith, Arize, Braintrust. Who You Are An Architect at Heart: You have strong, reasoned opinions on Durable Execution vs. Standard Async, Vector Search vs. Keyword ...

Head of Engineering

Hiring Organisation
Xapien
Location
London Area, United Kingdom
leads who own architectural decisions within a domain-driven design structure. ● Establish engineering-wide standards for code quality, review processes, and technical governance. ● Build observability, incident management, and on-call practices that scale with team growth and deployment frequency. ● Embed DevOps, MLOps, security, and compliance practices into … Series A/B). ● Technical Credibility: Strong background in cloud-native architectures, distributed systems, and modern delivery practices (CI/CD, automated testing, observability). Experience with cloud cost management and infrastructure optimisation. ● Operational Maturity: Experience building observability, on-call rotations, and incident management practices as engineering organisations scale ...

Azure Site Relaibility Engineer

Hiring Organisation
WWT EMEA UK LIMITED
Location
Glasgow, Lanarkshire, Scotland, United Kingdom
Employment Type
Contract
Contract Rate
From £650 to £700 per day
Technology (WWT) is seeking experienced Azure Site Reliability Engineers to join a client-embedded Platform Health workstream. You will help deliver critical reliability and observability capabilities that underpin two major Azure milestones: Gold Dev and Azure General Availability. The ideal candidate will work shoulder-to-shoulder with product, engineering … Glasgow, United Kingdom (Onsite) Job Description: Engineer will support Platform Health workstream, focusing on Azure GA readiness and Gold Dev milestones, with emphasis on observability, automation, and secure cloud architecture. Key Responsibilities: Design and implement SLOs/SLIs across user, application, and infrastructure layers Build Azure platform health solutions using ...

Azure SRE Engineer

Hiring Organisation
Oscar Associates (UK) Limited
Location
Glasgow, Lanarkshire, Scotland, United Kingdom
Employment Type
Part Time
Salary
£575 - £625 per day
Contract We're looking for two experienced Azure Site Reliability Engineers to join a major Financial Services programme focused on platform health, reliability, and observability across a large-scale Azure environment. You'll be responsible for building and maintaining Azure platform health infrastructure using Terraform, developing Python-based automation … integrations, and implementing SLOs/SLIs across infrastructure and application layers. The role also involves working with observability tooling, event-driven integrations, and Azure-native services in a highly collaborative environment with engineering and product stakeholders. Required experience: * Strong hands-on Azure engineering experience * Terraform in production environments (primary ...

Azure SRE Engineer

Hiring Organisation
Oscar Technology
Location
Glasgow, Lanarkshire, Scotland, United Kingdom
Employment Type
Contractor
Contract Rate
£575 - £625 per day
Contract We're looking for two experienced Azure Site Reliability Engineers to join a major Financial Services programme focused on platform health, reliability, and observability across a large-scale Azure environment. You'll be responsible for building and maintaining Azure platform health infrastructure using Terraform, developing Python-based automation … integrations, and implementing SLOs/SLIs across infrastructure and application layers. The role also involves working with observability tooling, event-driven integrations, and Azure-native services in a highly collaborative environment with engineering and product stakeholders. Required experience:* Strong hands-on Azure engineering experience* Terraform in production environments (primary ...

Agentic AI Data Architect

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
ModelOps - Azure AI Foundry (model hosting, versioning, monitoring); Evaluation frameworks (LLM-as-judge, test datasets); Prompt/version control, cost/latency monitoring DevOps & Observability - CI/CD pipelines (Azure DevOps/GitHub Actions); Logging, monitoring, observability (App Insights, etc.); Performance tuning and scalability As part of a leading global ...

Senior Platform Engineer

Hiring Organisation
REALM
Location
United Kingdom
building and owning the production infrastructure for a multi-user distributed system from the ground up. That means designing for debuggability and observability from day one, not bolting it on later. Core remit includes scalable multi-environment Terraform, secrets management, gradual deployment practices (blue/green), and the ability … testing. An AI/multi-agent infrastructure component is on the near-term roadmap. The stack IaC Terraform + Terragrunt Helm/Kubernetes AWS Observability Prometheus/Grafana Auth0 Rust/Golang NoSQL What they're looking for Production experience with Terraform, Helm/Kubernetes, AWS networking, and debugging multi ...

Data Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
work from home 2 days per week. This is a high-impact role focused on improving data quality, reducing incidents, and building scalable observability across a modern enterprise data platform. You’ll help ensure data across the organisation is accurate, reliable, and trusted for critical business decision-making. … style roles, with strong SQL and Python skills and experience working in modern cloud-based data environments. Hands‐on experience with data observability tools such as Grafana, Monte Carlo, or Acceldata, and data governance/quality platforms like Informatica, Collibra or Microsoft Purview is highly desirable. Experience within the Azure ...

DevOps Engineer

Hiring Organisation
Prism Digital
Location
Cambridge, Cambridgeshire, UK
across two regions, with the plan to bring the second along the same path over time. They also need someone to introduce proper observability and monitoring - knowing when things aren't running, alerting the right people, and building the kind of visibility that lets the team respond rather than react. … session host infrastructure (being deprecated over the next 12-18 months) CI/CD tooling and working with the development team Observability and monitoring tooling - currently limited; you'd shape this Disaster recovery architecture MFC C++, .NET services, Angular front-end (context for the broader dev estate) Nice to Haves ...

DevOps Manager

Hiring Organisation
Prism Digital
Location
Cambridge, England, United Kingdom
across two regions, with the plan to bring the second along the same path over time. They also need someone to introduce proper observability and monitoring - knowing when things aren't running, alerting the right people, and building the kind of visibility that lets the team respond rather than react. … session host infrastructure (being deprecated over the next 12-18 months) CI/CD tooling and working with the development team Observability and monitoring tooling - currently limited; you'd shape this Disaster recovery architecture MFC C++, .NET services, Angular front-end (context for the broader dev estate) Nice to Haves ...

Senior Software Engineer / SRE - Electronic Trading

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Senior Software Engineer/SRE - Electronic Trading Location London Business Area Engineering and CTO Ref # 10050148 Description & Requirements About Observability Engineering Senior Software Engineers - SRE in Electronic Trading (ET) ensure our global enterprise products spanning fixed income, equities, and derivatives are resilient and observable. This role focuses on building … culture and platforms of observability and resilience to prevent market disruptions for global traders. We specialize in proactive anomaly detection, providing advanced performance insights and best practice guidance. Our team collaborates with application developers to define meaningful SLOs, implement chaos engineering, and build diagnostic tools that mitigate architectural risks ...

Technical Lead

Hiring Organisation
Findrs
Location
Aylesbury, England, United Kingdom
technical contribution. The Role As Software Tech Lead, the successful candidate will define and drive the overall software architecture across backend services, APIs, observability systems, data infrastructure, and cloud integrations. Working closely with product and engineering leadership, they will translate complex deployment requirements into scalable technical solutions while ensuring … part in setting technical standards across the business. From API consistency and schema evolution through to CI/CD practices, security baselines, and observability frameworks, the successful candidate will help establish the engineering foundations that support long term scale. On the backend, the role will involve guiding and contributing ...

Snowflake Data Cloud Architect

Hiring Organisation
Talent Software Services
Location
New York, United States
Employment Type
Permanent
Salary
USD 180,000 Annual
upstream and downstream system interoperability. Data Governance and Compliance: Implement RBAC, data masking, and encryption aligned with enterprise data policy. Ensure lineage and observability for regulatory reporting and audit. Technical Leadership: Act as a trusted advisor for architectural decisions and future-state roadmaps. Prepare technical specifications and design documentation. Innovation … upstream and downstream system interoperability. Data Governance and Compliance: Implement RBAC, data masking, and encryption aligned with enterprise data policy. Ensure lineage and observability for regulatory reporting and audit. Technical Leadership: Act as a trusted advisor for architectural decisions and future-state roadmaps. Prepare technical specifications and design documentation. Innovation ...

Senior AI Product Engineer

Hiring Organisation
Jobleads-UK
Location
York and North Yorkshire, England, United Kingdom
awareness Use tools such as DSPy (or similar) for optimisation and evaluation Deploy and operate services using Azure (OpenAI, Web Apps/Functions) Implement observability (Application Insights) and CI/CD (Azure DevOps) Contribute to infrastructure via Terraform Build high‐quality, async Python services with strong testing (pytest) Collaborate with … similar) Strong API and data modelling skills Experience with async Python Experience with Azure environments and CI/CD pipelines Familiarity with Terraform and observability tooling Minimum Qualifications Degree or equivalent Right to work in the country of employment Integrity and Ethics All StarCompliance employees are expected to commit ...

Principal Full Stack Engineer & Architecture Lead

Hiring Organisation
BCT Resourcing
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£80,000 - £90,000 per annum
technical design decisions * Define scalable, secure, and maintainable engineering standards * Provide technical leadership across frontend, backend, APIs, infrastructure, and integrations * Drive platform scalability, resilience, observability, and performance * Partner with leadership teams to align technical strategy with business goals * Act as the senior technical authority for complex engineering decisionsHands-On Engineering … Lambda, API Gateway, EventBridge, SQS, Step Functions, S3, CloudWatch, RDS)Backend Node.js, TypeScriptFrontend React, Next.js, Tailwind CSSData & Architecture PostgreSQL, Serverless, Event-Driven MicroservicesDevOps & Observability Terraform/AWS CDK, CI/CD, Monitoring & LoggingAbout YouWe are looking for a technically strong and commercially minded engineering leader with: * 10+ years of software ...

AI Engineering Product Manager

Hiring Organisation
Jobleads-UK
Location
Waterside, Scotland, United Kingdom
grade AI agents integrated with complex airline systems. Establish best practices for OpenAI, Anthropic, Azure OpenAI, LangGraph, AutoGen and other frameworks. Implement engineering discipline: observability, safety, automated evaluation, behavioural testing and continuous improvement. Matrix and Partner Leadership Operate effectively across Group, OpCos, cloud, data and security teams. Coordinate delivery streams … direct authority. Demonstrated integration of LLM‐based agents with enterprise systems, APIs, RPA, orchestration platforms and internal tools. Grounding in DevSecOps, cloud‐native architecture, observability and CI/CD. Strong communication skills; able to translate complex technical concepts to senior executives. Experience with high‐stakes, fast‐paced environments and ambiguous ...

Platform Principal Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
self-service capabilities. Upskill and Mentor: Transition the in-house engineering team into a high-performing internal platform team throughout the platform build process. Observability: Design and implement enterprise-grade logging, metrics, and tracing for Kubernetes at scale. IaC Leadership: Implement and manage Infrastructure as Code to a senior standard … Terraform/Open Tofu module design. (MUST) Kubernetes Engineering: GitOps (Argo CD/Flux), secrets management, ingress/mesh, and OPA/Gatekeeper. (MUST) Observability: OpenTelemetry (MUST) Tooling: Spacelift, Atlantis, or Terraform Cloud (Desired) Governance: EPAC (Enterprise Policy as Code) (Desired) What You'll Bring To Us Recent, hands ...

Senior Software Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
flexibility, simplicity and delivery speed Build and maintain backend services and integrations that support our insurance journeys Work with infrastructure, CI/CD and observability to help the team ship safely and often Partner with product, design and data to turn ambiguous opportunities into concrete, measurable improvements Raise the technical … similar Testing: integration and end-to-end testing, component story testing, and visual regression testing CI/CD: Automated testing and deployment pipelines Observability: Analytics platforms, error monitoring and performance tracking Cloudflare experience, including Workers, CDN or load balancing Builder.io or other visual/content tooling experience ...

Director of Software Engineering

Hiring Organisation
Spire
Location
Glasgow, Scotland, United Kingdom
hands-on: review code, prototype solutions, and get into the details when it matters Establish engineering standards across code quality, system design, testing, and observability, and hold the team to them Be the person engineers come to when the problem is genuinely hard Team Building & Culture Recruit, develop, and retain … Experience writing performance software in Rust Background in space systems, aerospace, or highly constrained real-time environments Experience building data lakes, telemetry platforms, or observability infrastructure at scale A history of leading teams through technical transformations and not just maintaining the status quo Spire operates a hybrid work model ...

Principal Machine Learning Infrastructure Engineer London, United Kingdom

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
training pipelines for throughput, fault tolerance, and cost efficiency, including checkpointing strategies, gradient accumulation, and multi-node synchronization. Build and maintain experiment tracking and observability systems that give researchers clear visibility into training runs, hyperparameter sweeps, and model performance. Data I/O and Performance Solve data loading bottlenecks … workflows generate and consume data Experience building model serving infrastructure with latency and throughput requirements Familiarity with experiment tracking tools (Weights & Biases, MLflow) and observability stacks (Prometheus, Grafana) What we offer Equity options – share in our success and growth. 10% employer pension contribution – invest in your future. Free office lunches ...

Cloud Architect

Hiring Organisation
Tata Consultancy Services
Location
Luton, England, United Kingdom
least privilege, KMS encryption, secrets management, data classification, PII redaction, prompt/response filtering, and model governance. Drive non-functional requirements: reliability, scalability, latency, observability, DR, and cost controls (FinOps) for GenAI workloads. Guide build teams through solution design, reviews, and implementation; produce architecture artefacts (HLD/LLD), patterns … more languages (Python/Node.js preferred) and infrastructure-as-code (CDK/CloudFormation/Terraform) for repeatable deployments. Experience setting up observability for GenAI: tracing, logging, metrics, and model/application performance dashboards. Excellent communication skills for architecture storytelling, stakeholder management, and client-facing workshops. Rewards & Benefits TCS is consistently ...

Senior Front-End Engineer

Hiring Organisation
Mochi Health
Location
San Francisco, California, United States
Employment Type
Permanent
Salary
USD Annual
will own your applications end-to-end: architecting the client-side state, building flawlessly responsive UI, optimizing rendering performance, and owning the frontend observability in production. If you are drawn to product problems where the UX complexity is real, the autonomy is absolute, and the impact on patient outcomes … responsible for what happens after the code ships. You will own the frontend deployment pipelines, establish strict performance budgets, and manage client-side observability and error tracking (e.g., Sentry, Datadog) to catch regressions before our patients do. Build Agentic Workflows: Mochi is an AI-first engineering org. You will ...

Cloud SRE - Global Observability Lead (Remote UK)

Hiring Organisation
Jobleads-UK
Location
Newcastle upon Tyne, England, United Kingdom
leading technology company is seeking a Staff Site Reliability Engineer - Cloud to architect the Observability Centre of Excellence, ensuring reliability and uptime of global platforms. This role involves implementing OpenTelemetry, developing automation scripts, and optimizing platform performance while collaborating with engineering teams. Required skills include experience with observability tools like ...

Senior SRE & Observability Engineer – Trade Tech

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Bloomberg L.P. is seeking a Senior Software Engineer/SRE for the TRAX Observability team in London. This role involves enhancing systems for performance metrics, improving telemetry reliability, and collaborating with various teams across global offices. Candidates should have experience with high-level programming languages, Unix/Linux basics … observability concepts like distributed tracing and logging. Strong communication skills are essential. The position emphasizes technical growth, stakeholder influence, and a commitment to diversity and inclusion within the workplace. #J-18808-Ljbffr ...

Lead Machine Learning Engineer - REMOTE

Hiring Organisation
Lennar Homes
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD 190,700 Annual
Lead ML Engineer - REMOTE We are Lennar Lennar is one of the nation's leading homebuilders, dedicated to making an impact and creating an extraordinary experience for their Homeowners, Communities, and Associates by building quality ...