326 to 350 of 1,199 Permanent Observability Jobs

Senior Software Engineer

Hiring Organisation
Jobleads-UK
Location
England, United Kingdom
models, data pipelines, and event‐driven systems on Databricks and Snowflake into the Intelligence layer Maintain the quality bar through code reviews, automated testing, observability, and CI/CD Support junior and mid‐level engineers and help them grow WHAT WE'RE LOOKING FOR Extensive full‐stack experience building ...

Senior Agentic AI Specalist

Hiring Organisation
Midwest Family Mutual Insurance Company
Location
Urbandale, Iowa, United States
Employment Type
Permanent
Salary
USD Annual
reusable patterns where appropriate. Tool Integration: Support integration with internal systems and APIs. Help manage tool access and ensure reliable execution and error handling. Observability & Operations: Contribute to logging, tracing, and monitoring. Support analysis of system behavior, cost, and performance in production. Governance & Configuration: Work with configuration across agents, prompts ...

Head of Engineering - Retail

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
scale (50+ FTE including partners), within Financial Services. Strong knowledge of modern engineering practices, including software engineering, architecture, cloud platforms, CI/CD, DevSecOps, observability, and security‐focused design. Practical experience delivering large‐scale technology change and modernising legacy platforms, preferably within a Microsoft ecosystem. The ability to build high ...

Senior Software Engineer

Hiring Organisation
OAG
Location
Luton, England, United Kingdom
models, data pipelines, and event-driven systems on Databricks and Snowflake into the Intelligence layer Maintain the quality bar through code reviews, automated testing, observability, and CI/CD Support junior and mid-level engineers and help them grow WHAT WE'RE LOOKING FOR Extensive full-stack experience building ...

Software Engineering Manager - Loyalty

Hiring Organisation
Jobleads-UK
Location
City of Westminster, England, United Kingdom
methodologies, Promoter of DevOps: you build it, you run it. Tech Stack Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis What’s In It For You Working at M & S means being part of something bigger – helping to deliver quality ...

Senior AI Platform Engineer

Hiring Organisation
Vaco LLC
Location
San Francisco, California, United States
Employment Type
Permanent
Salary
USD 200,000 Annual
architectures (Kafka, Pulsar). Understanding of retrieval-augmented generation (RAG) patterns. Background in authorization/identity systems (ReBAC, RBAC, Zanzibar-style). Familiarity with observability (OpenTelemetry, DataHub, MLflow, Prometheus). Experience in enterprise AI governance (audit, lineage, compliance). Contributions to open-source AI frameworks (LangChain, OpenAI MCP, Hugging Face ...

Solution Architect

Hiring Organisation
Jobleads-UK
Location
Bradford, England, United Kingdom
Cloud & Technical Responsibilities Design and implement solutions using Azure services Promote cloud-native and microservices-based architectures Ensure solutions meet non-functional requirements Support observability, monitoring and optimisation strategies Ecommerce & Domain Alignment Support delivery of scalable ecommerce platforms Ensure systems support high traffic, performance and customer experience expectations Collaborate with ...

Senior Data Platform Owner

Hiring Organisation
Costa Coffee
Location
St. Albans, Hertfordshire, England, United Kingdom
Employment Type
Full-Time
Salary
Competitive salary
with architecture, engineering, security, and product teams. Ensure alignment to enterprise strategy and business priorities Improving reliability, quality, and cost efficiency – Lead initiatives in observability, governance, and performance optimisation. Ensure the platform is resilient, trusted, and cost-effective Who you are We’re looking for a senior data leader with ...

Senior Software Engineer - DataHub Experience & Control Plane London, GBR Posted today

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
cataloging, schema understanding, semantic context, lineage, and AI‐assisted workflows. Bring product taste to engineering decisions, from interaction design and API shape to latency, observability, failure handling, and operational safety. Help create new data experiences across portals, notebooks, query tools, programmable workflows, and agentic interfaces. Design systems that are simple ...

Alpha Data Services - Data Solution Architect, Managing Director

Hiring Organisation
State Street
Location
Greater London, United Kingdom
Employment Type
Full Time
with Financial Services data domains - Security Master, Entity Master, Benchmarks, Positions, Transactions, Cash, Performance, Analytics, ESG etc. Knowledge of Data Management, Data Governance, Data Observability, Service Catalogue, Data Lineage tools, Operational Dashboards, Process Monitoring tools Working knowledge in Azure/AWS cloud platforms with Snowflake and Databricks is a huge ...

Artificial Intelligence Engineer

Hiring Organisation
SBS
Location
United Kingdom
reading, understanding, and extending code they did not write — not lobbying to rewrite it. Production mindset: Thinks about error handling, retry logic, graceful degradation, observability, and security as first-class concerns. Understands that agentic autonomy creates new failure modes that must be anticipated. Clear communicator: Can explain technical trade-offs ...

AI Engineer II

Hiring Organisation
Lennar Homes
Location
Miami, Florida, United States
Employment Type
Permanent
Salary
USD Annual
governed data. Establish agent orchestration patterns: define tool invocation sequences, fallback logic, human-in-the-loop escalation rules, and multi-agent handoff protocols. Build observability into the agent layer: logging, tracing, usage dashboards, and performance monitoring so agent reliability can be measured and improved over time. Maintain governance and data ...

Engineering Manager

Hiring Organisation
Novatus
Location
London Area, United Kingdom
event-driven (we use Kafka), service-based systems using DDD and hexagonal architecture. Strong software engineering fundamentals (clean code, automated testing, CI/CD, observability). Hands-on experience with AWS. Actively uses AI tooling (e.g. Copilot, Claude Code) and has opinions on best practices. Ability to translate complex regulatory ...

Head of AI Platform & Applied Intelligence

Hiring Organisation
Jobleads-UK
Location
Reigate, England, United Kingdom
tooling, frameworks, and services - making principled, cost‐conscious decisions grounded in practical experience rather than vendor positioning Establish AI telemetry and runtime observability - latency, cost, accuracy drift, and performance monitoring - so the platform is instrumented from the outset Applied Intelligence - Product & Delivery Drive the applied use of AI across Infinity ...

Senior Software Engineer - Pay Sustainable Engineering

Hiring Organisation
Jobleads-UK
Location
Birmingham, England, United Kingdom
e.g., E2E/Cypress). Backend Excellence: Engineers sophisticated backend solutions involving API versioning, caching strategies, and complex data migration plans. Operational Maturity: Leads observability and SRE practices; defines SLOs, manages incident responses, and conducts blameless post-mortems. Security & Risk: Oversees operational security, including secrets hygiene and dependency risk management ...

Site Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Glasgow, Scotland, United Kingdom
that support millions of users worldwide. As a Site Reliability Engineer, you will work at the intersection of software engineering and operations, applying automation, observability, and reliability engineering practices to improve platform stability, reduce operational toil, and enable development teams to deliver high-quality solutions with confidence. What … critical role in maintaining and evolving Orion Health\'s cloud infrastructure and operational platforms. You will help define and implement reliability standards, improve system observability, automate operational processes, and lead efforts to enhance platform resilience. Design, implement, and maintain reliable, scalable, and secure infrastructure that supports Orion Health\'s products ...

Senior Software Engineer/SRE - TRAX Observability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Senior Software Engineer/SRE – TRAX Observability Location: London Business Area: Engineering and CTO Ref #: 10049287 About TRAX TRade Automation and eXecution (TRAX) is part of Bloomberg Enterprise Products Engineering. We build trade automation solutions and multiple Execution Management Systems (EMSs) that enable clients to route orders, execute trades … manage trades. Ensuring these systems are observable, scalable, resilient, and well‐managed from a technical risk perspective is critical — and that’s where TRAX Observability comes in. TRAX Observability provides the data infrastructure, dashboards, and insights needed to understand system behavior and client experience across our EMS platforms. We equip ...

Site Reliability Engineer - Observability & Automation

Hiring Organisation
Jobleads-UK
Location
Manchester, England, United Kingdom
Manchester Digital is seeking a Site Reliability Engineer to enhance system reliability and observability. This role focuses on monitoring critical systems and improving operational efficiency through engineering solutions. The ideal candidate will be proficient in ...

AI Infrastructure Platform SRE | Kubernetes & Observability

Hiring Organisation
Jobleads-UK
Location
England, United Kingdom
Radiant is seeking a Site Reliability Engineer (SRE) in the UK to manage Kubernetes operations and optimize Linux systems. The ideal candidate will have over 5 years of experience in high-performance, 24/7 ...

SQL Database SRE: Reliability, Automation & Observability

Hiring Organisation
Jobleads-UK
Location
Knutsford, England, United Kingdom
GCS Recruitment is seeking a Database SRE in Knutsford, UK, to lead engineering efforts that enhance Microsoft SQL operations. Candidates need in-depth SQL Server knowledge and strong SRE practices to drive automation and ensure ...

Senior Software Engineer

Hiring Organisation
Permax Recruitment Limited
Location
West London, London, United Kingdom
Employment Type
Permanent, Work From Home
efficiently, this role spans cloud infrastructure, data platform engineering, and AI tooling. You'll manage our Snowflake environment and contribute to our Claude Enterprise observability alongside your core AWS and DevOps responsibilities. Beyond keeping systems running, we expect you to identify improvements, take ownership of them, and actively upskill colleagues … prod Lead the implementation of monitoring, logging, and alerting systems to ensure reliability in our solutions Collaborate in the management and optimisation of our observability dashboards, ensuring platform health is visible and actionable across the team Take ownership of our Snowflake environment: access controls, cost governance, performance, and data organisation ...

Software Engineer/ SRE (Linux)

Hiring Organisation
Visa
Location
Basingstoke, Hampshire, UK
Employment Type
Full-time
platform strategy. In this role, you'll ensure our development platform and tools let engineers focus on innovation instead of infrastructure. You'll promote observability best practices and automate resolution of recurring issues, working closely with software engineering teams to support security, availability, and performance. Responsibilities include triaging issues, collaborating … implement, and maintain systems for high availability, scalability, and performance. Monitor and improve application reliability through proactive measures and incident response. Develop and maintain observability solutions (metrics, logging, tracing). Participate in on-call rotations and drive root cause analysis for incidents. Collaboration & Continuous Improvement Partner with engineering teams ...

Senior Site Reliability Engineer

Hiring Organisation
Veloc Inc
Location
Irving, Texas, United States
Employment Type
Permanent
Salary
USD Annual
Partner with development teams to improve deployment safety, release reliability, and operational scalability. Drive standardization of cloud infrastructure, operational engineering practices, and deployment governance. Observability & Performance Optimization ( 15% of role) Build and maintain monitoring, logging, tracing, and alerting capabilities across distributed systems. Establish service-level objectives (SLOs), SLIs, and error …/CD pipelines, and Infrastructure as Code tools. Strong scripting and automation skills using Python, Bash, PowerShell, Go, or similar languages. Experience with observability and monitoring platforms such as Datadog, Grafana, Prometheus, or Splunk. Strong understanding of networking, Linux/Windows administration, distributed systems, and cloud-native architectures. Experience with ...

Senior DevOps, Infrastructure & Security Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
vulnerability management, threat modelling, penetration testing, and incident response planning Build and evolve CI/CD pipelines, release management processes, and deployment automation Establish observability, monitoring, logging, alerting, and operational runbooks Manage secrets, key custody, access controls, and infrastructure governance Deliver backup, disaster recovery, and business continuity strategies Drive compliance … Kubernetes, Docker, and containerised application deployment Modern CI/CD platforms including GitHub Actions, Cloud Build, Buildkite, CircleCI, or similar Cloud platforms, ideally GCP Observability tooling including Prometheus, Grafana, OpenTelemetry, or equivalent PostgreSQL operations, backup, recovery, and data durability Identity management, API gateways, networking, and access controls Bash and Python ...

Senior Platform Engineer

Hiring Organisation
Akixi
Location
United Kingdom
inventory structuring, and role-based automation. Manage secrets securely using services such as AWS Secrets Manager or HashiCorp Vault. Implement robust monitoring, alerting, and observability tooling (e.g., CloudWatch, Prometheus, Grafana, Datadog). Participate in incident response, root cause analysis, and resilience improvements. Maintain and evolve CI/CD pipelines using … container orchestration and deployment (Docker, ECS, or Kubernetes). Proficient with GitOps or IaC-based workflows. Familiarity with Google SRE practices, particularly around reliability, observability, and operational excellence. Understanding of systems reliability metrics and associated tooling Soft Skills & Behaviours Self-driven with a bias toward action and ownership. Excellent communicator ...