1,051 to 1,075 of 1,280 Observability Jobs

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Santa Clara, California, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Grand Rapids, Michigan, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Providence, Rhode Island, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
El Paso, Texas, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fort Worth, Texas, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Rapid City, South Dakota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fargo, North Dakota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fort Lauderdale, Florida, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fort Wayne, Indiana, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Saint Paul, Minnesota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Las Vegas, Nevada, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Charleston, West Virginia, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Little Rock, Arkansas, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Sioux Falls, South Dakota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Salt Lake City, Utah, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Technology Head of AI

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
adherence to security and compliance requirements. Drive rapid experimentation with clear exit criteria; scale successful pilots into reliable, maintainable services using automated testing, observability and release practices. Develop and manage strategic vendor and partner relationships, balancing build vs. buy decisions and negotiating commercial and risk terms that protect value. Provide … including model risk management, privacy, data residency, human oversight and auditability. Hands on understanding of AI/MLOps practices and platforms, including model lifecycle, observability, cost control, CI/CD, feature stores and data integration. Experience defining and governing enterprise standards and architectures for AI platforms, APIs and integration, aligned ...

Staff Machine Learning Engineer, ML Infrastructure

Hiring Organisation
SimpliSafe
Location
Cambridge, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
highest-stakes ML systems at SimpliSafe. Identify and remove the systemic bottlenecks in our ML deployment infrastructure - whether that's serving reliability, deployment friction, observability gaps, scaling, or cost. Build and operate real-time CV inference at scale Own the design and evolution of cloud-side inference systems that process … durable. Own reliability and operational excellence Lead incident response and postmortems for critical ML systems; turn lessons learned into platform-level improvements. Define SLOs, observability standards, and on-call practices for ML services in production. Qualifications 8+ years of software/ML engineering experience, with a clear track record ...

London-Based Observability TAM - Drive Real-Time Data Value

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
leading tech company in Greater London is seeking a seasoned Technical Account Manager (TAM) to redefine the observability landscape. The role involves leading post-sales journeys, engaging with stakeholders from software engineers to executives, and troubleshooting complex integrations. Candidates should have hands-on experience with observability tools like Grafana, DataDog ...

Manager Applications

Hiring Organisation
Medline Industries
Location
Northbrook, Illinois, United States
Employment Type
Permanent
Salary
USD 201,000 Annual
Job Summary The Application Manager will be responsible for the organization's portfolio of business applications across various departments. This will include development, implementation, upgrades, daily management, and maintenance, Stakeholder management, application availability, system and ...

Site Reliability Engineer — Observability & Automation

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
infrastructure. The role involves designing and implementing monitoring solutions, analyzing system performance, and optimizing operational processes. The ideal candidate will have strong monitoring, observability, and alerting skills, experience with cloud platforms, and excellent problem-solving abilities. This position offers a hybrid work environment, unlimited PTO, and a comprehensive benefits program. ...

Staff SRE: Observability, Automation & Global Reliability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
London. This role focuses on the reliability, scalability, and performance of Replit's infrastructure serving millions of users worldwide. You will work on designing observability solutions, leading incident response, and automating operational tasks while mentoring other engineers. The ideal candidate has extensive experience in Site Reliability Engineering, strong programming skills ...

RVP, EMEA Sales - Observability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
just to execute a function, but to help redefine the future of how work gets done. Observe by Snowflake brings AI-native observability to the Snowflake AI Data Cloud, helping engineering and data teams debug, optimize, and understand systems operating at massive scale. Traditional observability tools were not built … strong judgment, and the ability to align people, strategy, and execution across functions. WHAT WE LOOK FOR 10+ years of experience selling cloud, infrastructure, observability, data platforms, or enterprise software. 2+ years of experience managing high-performing enterprise sales teams. Experience selling to senior technical and business stakeholders, including CIOs ...

Software Engineer (Prometheus / Grafana)

Hiring Organisation
SRT Marine Systems PLC
Location
Bristol, United Kingdom
Employment Type
Permanent
Salary
£50000 - £75000/annum
Software Engineer (Prometheus/Grafana) here at SRT, you will be part of a small team tasked with implementing an end-user observability visualisation. Currently, we have observability dashboards in place for our engineers, utilising Prometheus for metrics collection and Grafana for visualisation. This initiative aims to deliver a more … across multiple sites. We are fortunate to have a team of highly experienced engineers, including UX designers, who can provide support and guidance. Ourlead observability engineer will oversee and assist with your work throughout the project in the role of Software Engineer (Prometheus/Grafana). Key Responsibilities - Software Engineer ...

Software Engineer (Prometheus / Grafana)

Hiring Organisation
SRT Marine Systems PLC
Location
Birmingham, West Midlands (County), United Kingdom
Employment Type
Permanent
Salary
£50000 - £75000/annum
Software Engineer (Prometheus/Grafana) here at SRT, you will be part of a small team tasked with implementing an end-user observability visualisation. Currently, we have observability dashboards in place for our engineers, utilising Prometheus for metrics collection and Grafana for visualisation. This initiative aims to deliver a more … across multiple sites. We are fortunate to have a team of highly experienced engineers, including UX designers, who can provide support and guidance.Our lead observability engineer will oversee and assist with your work throughout the project in the role of Software Engineer (Prometheus/Grafana). Key Responsibilities - Software Engineer ...

Principal Engineer - Customer Engagement Platform

Hiring Organisation
Jobleads-UK
Location
Skipton, England, United Kingdom
Apps, Power Automate and the CRM/engagement ecosystem. You define and embed cross‐cutting standards such as API/event contracts, workflow architecture, observability, resilience patterns, and dependency baselines, and drive adoption of the Golden Path: policy‐as‐code CI/CD, progressive delivery, automated rollback/forward … building on Dynamics 365, Power Platform and workflow automation to move with speed *and* confidence. Through Golden Path pipelines, policy‐as‐code, release‐linked observability, on‐demand environments and shift‐left quality, you turn high‐performance delivery into a normal, repeatable capability that compounds over time. This empowers colleagues, reduces ...