Data Architect

Databricks Architect (Contract) - Greenfield Data Platform

Location: Hybrid working (London)

Duration: 12-month initial contract

Are you a visionary Databricks Architect with a passion for building cutting-edge data platforms from the ground up? Do you thrive on shaping strategy and driving technical excellence in a greenfield environment?

Our client is embarking on a pivotal journey to establish a brand-new, enterprise-grade data platform using the full power of Databricks. This is a unique opportunity to lead the architectural design and implementation of a truly greenfield data ecosystem that will underpin all future data-driven initiatives, from advanced analytics to AI/ML.

We are looking for a hands-on architect who can translate business needs into robust, scalable, and secure Databricks solutions.

The Role:

As our Databricks Architect, you will be instrumental in defining and delivering our new data strategy and architecture. This is a greenfield project, meaning you'll have the exciting challenge of building the entire Databricks Lakehouse Platform from scratch. You will provide critical technical leadership, guidance, and hands-on expertise to ensure the successful establishment of a scalable, high-performance, and future-proof data environment.

Phase 1: Strategic Vision & Blueprint

  • Data Strategy & Roadmap: Collaborate with business stakeholders and leadership to define the overarching data vision, strategy, and a phased roadmap for the Databricks Lakehouse Platform.
  • Architectural Design: Lead the end-to-end design of the Databricks Lakehouse architecture (Medallion architecture), including data ingestion patterns, storage layers (Delta Lake), processing frameworks (Spark), and consumption mechanisms.
  • Technology Selection: Evaluate and recommend optimal Databricks features and integrations (e.g., Unity Catalog, Photon, Delta Live Tables, MLflow) and complementary cloud services (e.g., Azure Data Factory, Azure Data Lake Storage, Power BI).
  • Security & Governance Frameworks: Design robust data governance, security, and access control models within the Databricks ecosystem, ensuring compliance with industry standards and regulations.

Phase 2: Core Platform Build & Development

  • Hands-on Implementation: Act as a lead engineer in the initial build-out of core data pipelines, ETL/ELT processes, and data models using PySpark, SQL, and Databricks notebooks.
  • Data Ingestion & Integration: Establish scalable data ingestion frameworks from diverse sources (batch and streaming) into the Lakehouse.
  • Performance Optimization: Design and implement solutions for optimal data processing performance, cost efficiency, and scalability within Databricks.
  • CI/CD & Automation: Develop and implement Continuous Integration/Continuous Delivery (CI/CD) pipelines for automated deployment of Databricks assets and data solutions.

Phase 3: Enablement, Optimisation & Transition

  • Team Enablement: Provide mentorship and technical guidance to a growing team of Data Engineers and Analysts, fostering best practices and Databricks expertise.
  • Data Quality & Monitoring: Implement comprehensive data quality checks, monitoring, and alerting mechanisms to ensure data integrity and reliability.
  • MLOps Integration: Lay the groundwork for seamless integration with Machine Learning Operations (MLOps) capabilities for future AI initiatives.
  • Documentation & Knowledge Transfer: Create comprehensive technical documentation and conduct knowledge transfer sessions to ensure long-term sustainability of the platform.

Required Skills & Experience

  • Proven Databricks Expertise: Deep, hands-on experience designing and implementing solutions on the Databricks Lakehouse Platform (Delta Lake, Unity Catalog, Spark, Databricks SQL Analytics).
  • Cloud Data Architecture: Extensive experience with Azure data services (e.g., Azure Data Factory, Azure Data Lake Storage, Azure Synapse) and architecting cloud-native data platforms.
  • Programming Proficiency: Expert-level skills in Python (PySpark) and SQL for data engineering and transformation. Scala is a strong plus.
  • Data Modelling: Strong understanding and practical experience with data warehousing, data lake, and dimensional modelling concepts.
  • ETL/ELT & Data Pipelines: Proven track record of designing, building, and optimizing complex data pipelines for both batch and real-time processing.

Desirable Skills & Certifications

  • Databricks Certified Data Engineer Associate/Professional.
  • Microsoft Certified: Azure Data Engineer Associate (DP-203) or Azure Solutions Architect Expert (AZ-305/304).
  • Experience with other cloud providers (AWS, GCP).
  • Knowledge of streaming technologies (Kafka, Event Hubs).
Company
Osmii
Location
City of London, Greater London, UK
Hybrid / WFH Options
Posted
Company
Osmii
Location
City of London, Greater London, UK
Hybrid / WFH Options
Posted