systems, and potentially SAP. Ensure data quality, governance, and lineage tracking throughout the project. Required Skills ETL/ELT pipeline design and data validation frameworks. Advanced Python (pandas, numpy, boto3) and SQL (complex queries, optimisation). Experience with AWS Glue , Step Functions , and event-driven architectures . Knowledge of vector … databases , embeddings, and semantic search strategies. Familiarity with document parsing libraries (PyPDF2, pdfplumber, Textract) and OCR tools. Understanding of data governance , schema validation, and master data management. Strong grasp of real-time vs batch processing trade-offs . Beneficial Experience CockroachDB deployment and management. PySpark or similar for large-scale ...