AI Engineer - Reply
AI Engineer
About Cognita Reply:
Cognita Reply is the vertical setup within Reply that brings OpenAI's AI into clients' day-to-day processes - where change is measured by efficiency, innovation, and ROI. We emphasize partnership, compliance and security, deep sector knowledge, and speed of execution. We deliver scalable, governed, and measurable solutions, integrated with the systems that keep companies running.
Role Overview:
As an AI Engineer, you will be responsible for designing, building, and deploying production-grade AI features for our clients. This hands-on role will see you working across the entire stack, from data ingestion and retrieval to model selection, evaluation, and scalable service deployment. You will balance quality, latency, cost, and governance, collaborating closely with solution architects and client stakeholders to transform use cases into robust, observable AI services.
Responsibilities:
- Develop LLM-powered applications-including assistants, agents, summarization, classification, extraction, and workflow automation-utilizing OpenAI/Azure OpenAI and open-weight models as needed.
- Implement retrieval-augmented generation (RAG) pipelines, including data ingestion, chunking, embeddings, vector/hybrid search, re-ranking, caching, and content governance processes.
- Build reliable APIs and background workers using Python (FastAPI/Flask), integrating with queues and background jobs to productionize pilots into secure, scalable services.
- Instrument evaluation and observability for AI services, leveraging golden sets, offline and online evaluations, A/B testing, and monitoring of quality, safety, drift, cost, and latency.
- Apply MLOps and DevOps best practices, including CI/CD, infrastructure as code (Terraform), containerization (Docker), orchestration (Kubernetes), and experiment tracking (MLflow).
- Engineer AI systems for security and compliance, focusing on data minimization, PII handling, DLP, secrets management, data residency, and auditability requirements.
- Collaborate with product and consulting teams as well as client subject matter experts to scope requirements, define KPIs/ROI, and iterate quickly from pilot to scale.
- Document designs and architectural decisions, sharing reusable components, templates, and reference architectures with the wider team.
- Minimum 2 years of experience in software engineering or machine learning engineering, with a track record of delivering production services.
- Strong proficiency in Python and backend development, with hands-on experience using at least one deep learning framework such as PyTorch or TensorFlow.
- Practical experience working with large language models (LLMs), embeddings, and vector databases, as well as retrieval-augmented generation (RAG) and prompt/tool/function-calling design for structured outputs.
- Proficient with cloud platforms (AWS or Azure), as well as containerization and orchestration tools (Docker, Kubernetes); experience with CI/CD pipelines.
- Experience in gathering requirements and delivering solutions for clients, demonstrating clear communication and an ability to meet KPIs.
- Additional experience with agentic frameworks, workflow/orchestration tools, fine-tuning, observability, data engineering pipelines, or security reviews is a plus.
Reply is committed to making sure that our selection methods are fair to everyone. To help you during the recruitment process, please let us know of any Reasonable Adjustments you may need.
- Company
- Reply
- Location
- London, UK
- Employment Type
- Full-time
- Posted
- Company
- Reply
- Location
- London, UK
- Employment Type
- Full-time
- Posted