Databricks Engineer

Data Pipeline Development:

  • Design and implement end-to-end data pipelines in Azure Databricks, handling ingestion from various data sources, performing complex transformations, and publishing data to Azure Data Lake or other storage services.
  • Write efficient and standardized Spark SQL and PySpark code for data transformations, ensuring data integrity and accuracy across the pipeline.
  • Automate pipeline orchestration using Databricks Workflows or integration with external tools (e.g., Apache Airflow, Azure Data Factory).


Data Ingestion & Transformation:

  • Build scalable data ingestion processes to handle structured, semi-structured, and unstructured data from various sources (APIs, databases, file systems).
  • Implement data transformation logic using Spark, ensuring data is cleaned, transformed, and enriched according to business requirements.
  • Leverage Databricks features such as Delta Lake to manage and track changes to data, enabling better versioning and performance for incremental data loads.


Data Publishing & Integration:

  • Publish clean, transformed data to Azure Data Lake or other cloud storage solutions for consumption by analytics and reporting tools.
  • Define and document best practices for managing and maintaining robust, scalable data pipelines.


Data Governance & Security:

  • Implement and maintain data governance policies using Unity Catalog, ensuring proper organization, access control, and metadata management across data assets.
  • Ensure data security best practices, such as encryption at rest and in transit, and role-based access control (RBAC) within Azure Databricks and Azure services.


Performance Tuning & Optimization:

  • Optimize Spark jobs for performance by tuning configurations, partitioning data, and caching intermediate results to minimize processing time and resource consumption.
  • Continuously monitor and improve pipeline performance, addressing bottlenecks and optimizing for cost efficiency in Azure.


Automation & Monitoring:

  • Automate data pipeline deployment and management using tools like Terraform, ensuring consistency across environments.
  • Set up monitoring and alerting mechanisms for pipelines using Databricks built-in features and Azure Monitor to detect and resolve issues proactively.


Requirements

  • Data Pipeline Expertise: Extensive experience in designing and implementing scalable ETL/ELT data pipelines in Azure Databricks, transforming raw data into usable datasets for analysis.
  • Azure Databricks Proficiency: Strong knowledge of Spark (SQL, PySpark) for data transformation and processing within Databricks, along with experience building workflows and automation using Databricks Workflows.
  • Azure Data Services: Hands-on experience with Azure services like Azure Data Lake, Azure Blob Storage, and Azure Synapse for data storage, processing, and publication.
  • Data Governance & Security: Familiarity with managing data governance and security using Databricks Unity Catalog, ensuring data is appropriately organized, secured, and accessible to authorized users.
  • Optimization & Performance Tuning: Proven experience in optimizing data pipelines for performance, cost-efficiency, and scalability, including partitioning, caching, and tuning Spark jobs.
  • Cloud Architecture & Automation: Strong understanding of Azure cloud architecture, including best practices for infrastructure-as-code, automation, and monitoring in data environments.
Company
Tenth Revolution Group
Location
London, United Kingdom
Employment Type
Contract
Salary
£400 - £500/day
Posted
Company
Tenth Revolution Group
Location
London, United Kingdom
Employment Type
Contract
Salary
£400 - £500/day
Posted