976 to 1,000 of 1,201 Permanent Observability Jobs

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
St. Louis, Missouri, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Charleston, South Carolina, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Des Moines, Iowa, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fort Lauderdale, Florida, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fargo, North Dakota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Saint Paul, Minnesota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Santa Clara, California, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Overland Park, Kansas, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Cedar Rapids, Iowa, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fort Worth, Texas, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Providence, Rhode Island, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Charleston, West Virginia, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Little Rock, Arkansas, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Grand Rapids, Michigan, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
El Paso, Texas, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Rapid City, South Dakota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Las Vegas, Nevada, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Fort Wayne, Indiana, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Sioux Falls, South Dakota, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Staff AI Machine Learning Engineer

Hiring Organisation
Medeloop
Location
Salt Lake City, Utah, United States
Employment Type
Permanent
Salary
USD Annual
decommissioning agents dynamically for complex healthcare workflows). Develop rigorous evaluation and safety frameworks - automated testing, benchmarking, regression testing, adversarial testing, safety guardrails, observability (tracing, logging, metrics), and human-in-the-loop mechanisms to ensure reliable, compliant performance in production. Drive LLM and ML model development - train, fine-tune … tools: LangChain/LangGraph, Model Context Protocol (MCP), Agent-to-Agent (A2A) protocols, Hugging Face, PyTorch, vector databases/semantic search, prompt engineering, and observability platforms (e.g., LangSmith, Phoenix). Experience designing fully automated evaluation and testing pipelines for autonomous agents and their orchestration, including metrics for reliability, safety, factuality ...

Technology Head of AI

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
adherence to security and compliance requirements. Drive rapid experimentation with clear exit criteria; scale successful pilots into reliable, maintainable services using automated testing, observability and release practices. Develop and manage strategic vendor and partner relationships, balancing build vs. buy decisions and negotiating commercial and risk terms that protect value. Provide … including model risk management, privacy, data residency, human oversight and auditability. Hands on understanding of AI/MLOps practices and platforms, including model lifecycle, observability, cost control, CI/CD, feature stores and data integration. Experience defining and governing enterprise standards and architectures for AI platforms, APIs and integration, aligned ...

Staff Machine Learning Engineer, ML Infrastructure

Hiring Organisation
SimpliSafe
Location
Cambridge, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
highest-stakes ML systems at SimpliSafe. Identify and remove the systemic bottlenecks in our ML deployment infrastructure - whether that's serving reliability, deployment friction, observability gaps, scaling, or cost. Build and operate real-time CV inference at scale Own the design and evolution of cloud-side inference systems that process … durable. Own reliability and operational excellence Lead incident response and postmortems for critical ML systems; turn lessons learned into platform-level improvements. Define SLOs, observability standards, and on-call practices for ML services in production. Qualifications 8+ years of software/ML engineering experience, with a clear track record ...

London-Based Observability TAM - Drive Real-Time Data Value

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
leading tech company in Greater London is seeking a seasoned Technical Account Manager (TAM) to redefine the observability landscape. The role involves leading post-sales journeys, engaging with stakeholders from software engineers to executives, and troubleshooting complex integrations. Candidates should have hands-on experience with observability tools like Grafana, DataDog ...

Manager Applications

Hiring Organisation
Medline Industries
Location
Northbrook, Illinois, United States
Employment Type
Permanent
Salary
USD 201,000 Annual
Job Summary The Application Manager will be responsible for the organization's portfolio of business applications across various departments. This will include development, implementation, upgrades, daily management, and maintenance, Stakeholder management, application availability, system and ...

Site Reliability Engineer — Observability & Automation

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
infrastructure. The role involves designing and implementing monitoring solutions, analyzing system performance, and optimizing operational processes. The ideal candidate will have strong monitoring, observability, and alerting skills, experience with cloud platforms, and excellent problem-solving abilities. This position offers a hybrid work environment, unlimited PTO, and a comprehensive benefits program. ...