Data Engineer

We are seeking a highly skilled Python Data Engineer with hands-on experience in Behave-based unit testing, PySpark development, Delta Lake optimization, and Azure cloud services. This role involves designing, developing, and deploying scalable data processing solutions in a containerized environment, with an emphasis on maintainable, configurable, and test-driven code delivery.

Key Responsibilities:

  • Develop and maintain data ingestion, transformation, and validation pipelines using Python and PySpark.
  • Implement unit and behavior-driven testing using Behave, ensuring robust mocking and patching of dependencies.
  • Design and maintain Delta Lake tables for optimized query performance, ACID compliance, and incremental data loads.
  • Build and manage containerized environments using Docker for consistent development, testing, and deployment.
  • Develop configurable, parameter-driven codebases to support modular and reusable data solutions.
  • Integrate Azure services, including Azure Functions for serverless transformation logic, Azure Key Vault for secure credential management, and Azure Blob Storage for data lake operations.
  • Collaborate closely with cloud architects, data scientists, and DevOps teams to ensure seamless CI/CD workflows, version control, and environment consistency.
  • Troubleshoot and optimize Spark jobs for performance and scalability in production environments.
  • Maintain technical documentation and adhere to best practices in cloud security and data governance.

Required Skills and Experience:

  • Strong proficiency in Python programming with emphasis on modular and test-driven design.
  • Demonstrated experience in writing unit tests and BDD scenarios using Behave or similar frameworks.
  • In-depth understanding of mocking, patching, and dependency injection in Python testing.
  • Proficiency in PySpark with hands-on experience in distributed data processing and performance tuning.
  • Solid understanding of Delta Lake concepts, transactional guarantees, and schema evolution.
  • Experience with Docker for development, testing, and deployment workflows.
  • Familiarity with Azure components such as Azure Functions, Key Vault, Blob Storage, and Data Lake Storage Gen2.
  • Ability to implement configuration-driven applications for flexible deployment across environments.
  • Experience with CI/CD pipelines (Azure DevOps or similar) and infrastructure-as-code tools is a plus.
  • Strong problem-solving skills and ability to work independently in fast-paced, agile environments.

Preferred Qualifications:

  • Experience developing in Databricks or Synapse with Delta Lake integration.
  • Knowledge of best practices in data security and governance within Azure ecosystems.
  • Strong communication skills and experience collaborating with distributed teams.

Job Details

Company
Hays
Location
England, UK
Employment Type
Full-time
Posted