Data Engineer

We are seeking a highly skilled Python Data Engineer with hands-on experience in Behave-based unit testing, PySpark development, Delta Lake optimization, and Azure cloud services. This role involves designing, developing, and deploying scalable data processing solutions in a containerized environment, with an emphasis on maintainable, configurable, and test-driven code delivery.

Key Responsibilities:

Develop and maintain data ingestion, transformation, and validation pipelines using Python and PySpark.
Implement unit and behavior-driven testing using Behave, ensuring robust mocking and patching of dependencies.
Design and maintain Delta Lake tables for optimized query performance, ACID compliance, and incremental data loads.
Build and manage containerized environments using Docker for consistent development, testing, and deployment.
Develop configurable, parameter-driven codebases to support modular and reusable data solutions.
Integrate Azure services, including Azure Functions for serverless transformation logic, Azure Key Vault for secure credential management, and Azure Blob Storage for data lake operations.
Collaborate closely with cloud architects, data scientists, and DevOps teams to ensure seamless CI/CD workflows, version control, and environment consistency.
Troubleshoot and optimize Spark jobs for performance and scalability in production environments.
Maintain technical documentation and adhere to best practices in cloud security and data governance.

Required Skills and Experience:

Strong proficiency in Python programming with emphasis on modular and test-driven design.
Demonstrated experience in writing unit tests and BDD scenarios using Behave or similar frameworks.
In-depth understanding of mocking, patching, and dependency injection in Python testing.
Proficiency in PySpark with hands-on experience in distributed data processing and performance tuning.
Solid understanding of Delta Lake concepts, transactional guarantees, and schema evolution.
Experience with Docker for development, testing, and deployment workflows.
Familiarity with Azure components such as Azure Functions, Key Vault, Blob Storage, and Data Lake Storage Gen2.
Ability to implement configuration-driven applications for flexible deployment across environments.
Experience with CI/CD pipelines (Azure DevOps or similar) and infrastructure-as-code tools is a plus.
Strong problem-solving skills and ability to work independently in fast-paced, agile environments.

Preferred Qualifications:

Experience developing in Databricks or Synapse with Delta Lake integration.
Knowledge of best practices in data security and governance within Azure ecosystems.
Strong communication skills and experience collaborating with distributed teams.

Apply Now

Data Engineer

Job Details