Databricks Engineer
Data Pipeline Development:
- Design and implement end-to-end data pipelines in Azure Databricks, handling ingestion from various data sources, performing complex transformations, and publishing data to Azure Data Lake or other storage services.
- Write efficient and standardized Spark SQL and PySpark code for data transformations, ensuring data integrity and accuracy across the pipeline.
- Automate pipeline orchestration using Databricks Workflows or integration with external tools (e.g., Apache Airflow, Azure Data Factory).
Data Ingestion & Transformation:
- Build scalable data ingestion processes to handle structured, semi-structured, and unstructured data from various sources (APIs, databases, file systems).
- Implement data transformation logic using Spark, ensuring data is cleaned, transformed, and enriched according to business requirements.
- Leverage Databricks features such as Delta Lake to manage and track changes to data, enabling better versioning and performance for incremental data loads.
Data Publishing & Integration:
- Publish clean, transformed data to Azure Data Lake or other cloud storage solutions for consumption by analytics and reporting tools.
- Define and document best practices for managing and maintaining robust, scalable data pipelines.
Data Governance & Security:
- Implement and maintain data governance policies using Unity Catalog, ensuring proper organization, access control, and metadata management across data assets.
- Ensure data security best practices, such as encryption at rest and in transit, and role-based access control (RBAC) within Azure Databricks and Azure services.
Performance Tuning & Optimization:
- Optimize Spark jobs for performance by tuning configurations, partitioning data, and caching intermediate results to minimize processing time and resource consumption.
- Continuously monitor and improve pipeline performance, addressing bottlenecks and optimizing for cost efficiency in Azure.
Automation & Monitoring:
- Automate data pipeline deployment and management using tools like Terraform, ensuring consistency across environments.
- Set up monitoring and alerting mechanisms for pipelines using Databricks built-in features and Azure Monitor to detect and resolve issues proactively.
Requirements
- Data Pipeline Expertise: Extensive experience in designing and implementing scalable ETL/ELT data pipelines in Azure Databricks, transforming raw data into usable datasets for analysis.
- Azure Databricks Proficiency: Strong knowledge of Spark (SQL, PySpark) for data transformation and processing within Databricks, along with experience building workflows and automation using Databricks Workflows.
- Azure Data Services: Hands-on experience with Azure services like Azure Data Lake, Azure Blob Storage, and Azure Synapse for data storage, processing, and publication.
- Data Governance & Security: Familiarity with managing data governance and security using Databricks Unity Catalog, ensuring data is appropriately organized, secured, and accessible to authorized users.
- Optimization & Performance Tuning: Proven experience in optimizing data pipelines for performance, cost-efficiency, and scalability, including partitioning, caching, and tuning Spark jobs.
- Cloud Architecture & Automation: Strong understanding of Azure cloud architecture, including best practices for infrastructure-as-code, automation, and monitoring in data environments.
- Company
- Tenth Revolution Group
- Location
- London, United Kingdom
- Employment Type
- Contract
- Salary
- £400 - £500/day
- Posted
- Company
- Tenth Revolution Group
- Location
- London, United Kingdom
- Employment Type
- Contract
- Salary
- £400 - £500/day
- Posted