AWS Data Engineer
Mandatory Skills: Python, PySpark, AWS, Cloud, AWS Services, AWS Components
- Designing and developing scalable, testable data pipelines using Python and Apache Spark
- Orchestrating data workflows with AWS tools like Glue, EMR Serverless, Lambda, and S3
- Applying modern software engineering practices: version control, CI/CD, modular design, and automated testing
- Contributing to the development of a lakehouse architecture using Apache Iceberg
- Collaborating with business teams to translate requirements into data-driven solutions
- Building observability into data flows and implementing basic quality checks
- Participating in code reviews, pair programming, and architecture discussions
- Continuously learning about the financial indices domain and sharing insights with the team
WHAT YOU'LL BRING:
- Writes clean, maintainable Python code (ideally with type hints, linters, and tests like pytest)
- Understands data engineering basics: batch processing, schema evolution, and building ETL pipelines
- Has experience with or is eager to learn Apache Spark for large-scale data processing
- Is familiar with the AWS data stack (eg S3, Glue, Lambda, EMR)
- Enjoys learning the business context and working closely with stakeholders Works well in Agile teams and values collaboration over solo heroics
Nice-to-haves:
- It's great (but not required) if you also bring:
- Experience with Apache Iceberg or similar table formats
- Familiarity with CI/CD tools like GitLab CI, Jenkins, or GitHub Actions
- Exposure to data quality frameworks like Great Expectations or Deequ
- Curiosity about financial markets, index data, or investment analytics
Note: Hybrid (2 or 3 days a week, Onsite)