Infrastructure Engineer
We’re working with an early-stage, high-calibre team building systems that turn large volumes of unstructured data (think PDFs, documents, messy real-world inputs) into production-ready datasets that drive critical decisions.
This is a builder-led environment where engineers own problems end-to-end, automating what they did yesterday and moving onto harder challenges. The focus is on designing reliable, scalable pipelines, leveraging AI agents and LLMs where they add value, and shipping high-quality data as a product.
Responsibilities
- Own data pipelines end-to-end: ingestion, parsing, cleaning, transformation, storage, and delivery
- Build systems to extract structured data from unstructured sources (PDFs, filings, documents at scale)
- Design and scale pipelines leveraging AI agents for extraction and processing
- Work hands-on with LLMs in production (RAG pipelines, agentic workflows, evaluation and fallback logic)
- Drive data quality: validation, anomaly detection, reconciliation, and reliability
- Continuously improve systems - automating workflows and increasing output over time
- Operate with full ownership: identifying problems, building solutions, and delivering outcomes
Requirements
- Strong hands-on Python experience building production systems (not just orchestration)
- Experience working with unstructured data (document processing, OCR, text extraction, or scraping)
- Experience using LLMs in production (RAG, extraction pipelines, agent-based systems)
- Strong understanding of data modelling and working with databases such as PostgreSQL (SQLite a plus)
- Experience building in small teams (startup or similar) with end-to-end ownership
- Familiarity with async processing, queues, and distributed systems
- A proactive, builder mindset - comfortable operating in ambiguity and driving work independently