Data Engineer
We are seeking a highly skilled Python Data Engineer with hands-on experience in Behave-based unit testing, PySpark development, Delta Lake optimization, and Azure cloud services. This role involves designing, developing, and deploying scalable data processing solutions in a containerized environment, with an emphasis on maintainable, configurable, and test-driven code delivery.
Key Responsibilities:
- Develop and maintain data ingestion, transformation, and validation pipelines using Python and PySpark.
- Implement unit and behavior-driven testing using Behave, ensuring robust mocking and patching of dependencies.
- Design and maintain Delta Lake tables for optimized query performance, ACID compliance, and incremental data loads.
- Build and manage containerized environments using Docker for consistent development, testing, and deployment.
- Develop configurable, parameter-driven codebases to support modular and reusable data solutions.
- Integrate Azure services, including Azure Functions for serverless transformation logic, Azure Key Vault for secure credential management, and Azure Blob Storage for data lake operations.
- Collaborate closely with cloud architects, data scientists, and DevOps teams to ensure seamless CI/CD workflows, version control, and environment consistency.
- Troubleshoot and optimize Spark jobs for performance and scalability in production environments.
- Maintain technical documentation and adhere to best practices in cloud security and data governance.
Required Skills and Experience:
- Strong proficiency in Python programming with emphasis on modular and test-driven design.
- Demonstrated experience in writing unit tests and BDD scenarios using Behave or similar frameworks.
- In-depth understanding of mocking, patching, and dependency injection in Python testing.
- Proficiency in PySpark with hands-on experience in distributed data processing and performance tuning.
- Solid understanding of Delta Lake concepts, transactional guarantees, and schema evolution.
- Experience with Docker for development, testing, and deployment workflows.
- Familiarity with Azure components such as Azure Functions, Key Vault, Blob Storage, and Data Lake Storage Gen2.
- Ability to implement configuration-driven applications for flexible deployment across environments.
- Experience with CI/CD pipelines (Azure DevOps or similar) and infrastructure-as-code tools is a plus.
- Strong problem-solving skills and ability to work independently in fast-paced, agile environments.
Preferred Qualifications:
- Experience developing in Databricks or Synapse with Delta Lake integration.
- Knowledge of best practices in data security and governance within Azure ecosystems.
- Strong communication skills and experience collaborating with distributed teams.