Lead PySpark Engineer
PySpark Engineer Lead
As the Technical Lead, you will drive the high-stakes migration of legacy SAS analytics to a modern, cloud-native PySpark ecosystem on AWS. This isn't just a lift and shift you will refactor complex procedural logic into scalable, production-ready distributed pipelines for a Tier-1 financial services environment.
Core Responsibilities
Engineering Leadership: Design and develop complex ETL/ELT pipelines and Data Marts using PySpark, EMR, and Glue.
Legacy Modernisation: Architect the conversion of SAS Base/Macros into modular, testable Python code using SAS2PY and manual refactoring.
Performance Tuning: Optimise Spark execution (partitioning, shuffling, caching) to ensure cost-efficient processing of massive financial datasets.
Quality & Governance: Implement rigorous CI/CD, unit testing, and data reconciliation frameworks to ensure "penny-perfect" accuracy.
Technical Stack
Engine: PySpark (Expert), Python (Clean Code/SOLID principles).
AWS: EMR, Glue, S3, Athena, IAM, Lambda.
Data Modeling: SCD Type 2, Fact/Dimension tables, Data Vault/Star Schema.
Legacy: Proficiency in reading/debugging SAS (Base, Macros, DI Studio).
DevOps: Git-based workflows, Jenkins/GitLab CI, Terraform.
Randstad Technologies is acting as an Employment Business in relation to this vacancy.