LLMOps Engineer
Role: LLMOps Engineer
Location: Remote (UK or US-based)
Type: Full-time, permanent
Compensation: Competitive salary + equity available
About Veridox:
Veridox is an AI-driven fraud detection platform purpose-built for insurers. We combine document analysis with contextual intelligence to output detailed risk analysis. We have a high focus on trust, accuracy and explainability.
As part of our growing team, you’ll play a key role in scaling the technical vision that powers our platform.
Role Summary:
We are developing a sophisticated AI engine that goes far beyond simple chatbots, utilizing a multi-step, semi-agentic architecture with consecutive LLM calls, complex retrieval pipelines, and recursive summarization logic.
As an LLMOps Engineer , you will bridge the gap between experimental AI logic and robust, production-grade systems. You will actively engineer the pipelines that drive our models, own the evaluation infrastructure, architect retrieval optimization layers, and ensure our system is grounded in statistical rigor.
If you are passionate about Python, data validity, and defining how agentic systems are measured and optimized at scale, this is the role for you.
Key Responsibilities:
1. Automated Evaluation & Benchmarking
- Architect the "Judge": Design and build automated evaluation pipelines using LLM-as-a-judge frameworks (e.g., GPT-4o grading production outputs) to measure subjective quality at scale.
- Golden Dataset Management: Develop strategies for curating, versioning, and expanding "Ground Truth" datasets. Ensure every system update is tested against statistically significant edge cases.
- Data-Driven Gates: Implement CI/CD pipelines that automatically block prompt or logic changes if key metrics (Faithfulness, Recall, Latency) degrade.
2. Pipeline Architecture & Retrieval Optimization
- Hands-on RAG Implementation: Take ownership of the retrieval layer’s performance. Write complex vector queries, manage storage schemas, and implement advanced chunking/re-ranking logic to maximize context precision.
- Chain Optimization: Analyze and refine the orchestration logic of consecutive LLM calls to balance reasoning depth with latency and cost.
- Prompt Architecture: Collaborate on designing prompt chains to ensure they are deterministic, testable, and efficient.
3. Observability & Production Health
- Full-Stack Tracing: Implement end-to-end tracing (using tools like LangSmith, Arize Phoenix, or custom solutions) to visualize the entire lifecycle of a user request.
- Drift Detection: Set up real-time monitoring for hallucination spikes, token usage anomalies, and response quality drift.
- Cost Intelligence: Track unit economics per feature and optimize token usage without sacrificing output quality.
Requirements
Technical Skills:
- Python: Strong proficiency required; must be able to write production-grade code.
- LLM Orchestration: Experience with frameworks like LangChain, LlamaIndex, or custom Python orchestration logic.
- Vector Storage & Querying: Hands-on experience with vector databases (e.g., Pinecone, Weaviate, Milvus, pgvector). Ability to write and optimize complex vector queries.
- Evaluation Suites: Experience with RAG evaluation tools (Ragas, DeepEval, TruLens) or building custom evaluation harnesses.
Analytical Mindset:
- Statistical Rigor: Think in terms of sample sizes, confidence intervals, and regression testing.
- Root Cause Analysis: Ability to isolate failures in complex chains (retrieval, reasoning, formatting).
Operations & DevOps:
- Experience with Step Functions / Durable Functions and cloud environments (AWS, GCP, Azure).
- Familiarity with CI/CD workflows (e.g., BitBucket Pipelines).
Nice to Have:
- Experience fine-tuning smaller models (7B–8B parameters) for specialized, cost-effective evaluation.
- Background in Search Engineering or Recommender Systems.
- Experience with agentic workflows involving tool use or self-correction loops.
Why Join Us?
You will work on a system where evaluation is central to the product. You’ll have the autonomy to define standards for building, measuring, and improving complex AI systems.
If you care about rigour, impact, and building things that matter: we’d love to hear from you.
- Company
- Veridox
- Location
- United Kingdom, UK
- Posted
- Company
- Veridox
- Location
- United Kingdom, UK
- Posted