Lead PySpark Engineer

PySpark Engineer Lead

As the Technical Lead, you will drive the high-stakes migration of legacy SAS analytics to a modern, cloud-native PySpark ecosystem on AWS. This isn't just a lift and shift you will refactor complex procedural logic into scalable, production-ready distributed pipelines for a Tier-1 financial services environment.



Core Responsibilities

  • Engineering Leadership: Design and develop complex ETL/ELT pipelines and Data Marts using PySpark, EMR, and Glue.

  • Legacy Modernisation: Architect the conversion of SAS Base/Macros into modular, testable Python code using SAS2PY and manual refactoring.

  • Performance Tuning: Optimise Spark execution (partitioning, shuffling, caching) to ensure cost-efficient processing of massive financial datasets.

  • Quality & Governance: Implement rigorous CI/CD, unit testing, and data reconciliation frameworks to ensure "penny-perfect" accuracy.



Technical Stack

  • Engine: PySpark (Expert), Python (Clean Code/SOLID principles).

  • AWS: EMR, Glue, S3, Athena, IAM, Lambda.

  • Data Modeling: SCD Type 2, Fact/Dimension tables, Data Vault/Star Schema.

  • Legacy: Proficiency in reading/debugging SAS (Base, Macros, DI Studio).

  • DevOps: Git-based workflows, Jenkins/GitLab CI, Terraform.

Randstad Technologies is acting as an Employment Business in relation to this vacancy.

Job Details

Company
Randstad Technologies Recruitment
Location
City, London, United Kingdom EC1A2
Employment Type
Contract
Salary
GBP 281 - 292 Daily
Posted