AI Engineer

Numi are partnered with an early-stage HealthTech company building the next generation of conversational AI for the medical field. Their platform allows clinicians to create controllable, customized AI assistants that automate workflows and support clinical decision-making.

While many tools exist for general transcription, we are solving the \"last mile\" problem in healthcare: enabling doctors and nurses to build their own deterministic, safe agents using natural language.

The Engineering Challenge

We are building a mission-critical platform where safety, latency, and correctness must coexist in a chaotic real-world environment. Our core technical hurdles include:

Ultra-Low Latency Voice: delivering sub-2-second response times across full duplex audio while handling \"barge-ins\" (interruptions) and natural turn-taking in noisy environments.
User-Defined Agent Logic: Creating a \"no-code\" engine where users can verbally define workflows and guardrails that the system executes deterministically.
Hierarchical & Auditable RAG: Routing queries through complex layers of patient history, clinical guidelines, and organizational policies with full traceability.
Resilient Orchestration: Managing long-running conversation states, tool usage, and concurrency control to ensure reliability even if external EHR or telephony systems falter.
Safety & Compliance: Enforcing strict scope-of-practice guardrails, PHI redaction, and rapid fallback mechanisms when model confidence drops.
Continuous Improvement: A feedback loop that utilizes automated evaluators and shadow testing to safely evolve the system in production.

The Role

You will own the architecture and evolution of the \"Central Brain\" the core service that powers our AI agents. You will design multi-agent systems that reason, retrieve data, and communicate via voice and text in real time.

Key Responsibilities:

End-to-End Ownership: Define the architecture, SLAs, and error handling for the core reasoning engine.
Real-Time Comms Engineering: Implement streaming voice pipelines, focusing on VAD (Voice Activity Detection), interruption handling, and SIP/WebRTC integrations.
Advanced Agent Orchestration: Build planner-executor patterns and manage shared memory across agents.
Prompt Engineering & Optimization: utilize programmatic approaches to compile and iteratively improve prompts based on evaluation metrics.
RAG Optimization: Enhance retrieval signal through hybrid search, re-ranking, and query rewriting, ensuring high context precision and recall.
Observability & Evals: Build robust tracing (OpenTelemetry) and automated CI/CD evaluation gates (faithfulness, hallucination detection) to prevent regression.

Qualifications

Must Haves:

5+ years of backend or ML engineering experience, with a recent deep focus on LLM application layers.
Expert-level Python: Deep familiarity with FastAPI, asyncio, and pydantic.
Real-Time Experience: Proven track record building low-latency voice or text systems (experience with WebRTC, sockets, or similar streaming technologies).
Agentic Patterns: Hands-on experience with ReAct, Chain-of-Thought (CoT), or other reasoning frameworks.
Startup DNA: Ability to ship fast and manage technical debt in a rapidly evolving environment.

Nice to Haves:

Experience with DSPy or other programmatic prompt optimizers.
Familiarity with LLM-as-a-judge evaluation setups.
Knowledge of VoIP standards (SIP, SRTP) or modern voice infrastructure (e.g., LiveKit).
Experience with GCP (Cloud Run, GKE) and Healthcare data standards.

Tech Stack

Core: Python, FastAPI, Pydantic, Asyncio.
Data: Postgres, Redis, Vector Stores.
Voice/Infra: WebRTC, SIP Gateways, Docker, Kubernetes, Terraform.
AI/Ops: OpenTelemetry, Custom Eval Frameworks.

Apply Now

AI Engineer

Job Details