Gen AI Engineer
We're hiring a Gen AI Engineer
If you've built LLM systems that have run in production, failed in production, and you have been the one to fix them, this is worth reading.
What our client does
They're an AI company operating at the intersection of computer vision and large language models, building intelligent workflows for industries where work happens in the field: utilities, telecoms, energy, retail.
Their platform processes real-world operational data at scale, helping global enterprise clients make faster, safer and more accurate decisions about their assets and people.
They have built one of the largest datasets of real-world operational workflows in this space and have over 50 AI models running in production today.
The role
his is not a prompt engineering or prototype role.
You will own LLM systems end to end in production, including:
- Building and deploying LLM applications and agent workflows
- Monitoring system behaviour, performance and output quality
- Debugging issues across pipelines such as retrieval, orchestration and model outputs
- Tracing failures using logs, metrics and LLM call inspection
- Designing evaluation frameworks to detect regressions and drift
You will be responsible not just for building systems, but for keeping them reliable under real-world conditions.
What we're looking for
- Experience building and running LLM applications in production environments
- Evidence of debugging real issues such as incorrect outputs, latency spikes, retrieval failures or agent misbehaviour
- Experience with monitoring and observability of LLM systems, for example Langfuse, Prometheus, Grafana, OpenTelemetry or similar
- Strong understanding of RAG systems, retrieval pipelines and evaluation workflows
- Experience with agentic frameworks such as LangGraph, CrewAI or similar beyond basic LangChain usage
- Ability to explain how you diagnose and fix issues step by step
- Strong Python and experience working across application and infrastructure layers
- Multimodal experience across text and image or video is beneficial
Tech stack
Python, AWS, LangGraph, LangChain, vector databases, evaluation tooling, observability platforms, Docker
Why join
- Small, senior team with high ownership
- Systems already in production with real customers
- Bi-weekly shipping cycles with fast feedback loops
- Remote-first with optional London office and monthly meetups
- Equity, healthcare allowance, pension
You will be working on systems where failures matter and fixing them is part of the job.