Data Architect
Databricks Architect (Contract) - Greenfield Data Platform
Location: Hybrid working (London)
Duration: 12-month initial contract
Are you a visionary Databricks Architect with a passion for building cutting-edge data platforms from the ground up? Do you thrive on shaping strategy and driving technical excellence in a greenfield environment?
Our client is embarking on a pivotal journey to establish a brand-new, enterprise-grade data platform using the full power of Databricks. This is a unique opportunity to lead the architectural design and implementation of a truly greenfield data ecosystem that will underpin all future data-driven initiatives, from advanced analytics to AI/ML.
We are looking for a hands-on architect who can translate business needs into robust, scalable, and secure Databricks solutions.
The Role:
As our Databricks Architect, you will be instrumental in defining and delivering our new data strategy and architecture. This is a greenfield project, meaning you'll have the exciting challenge of building the entire Databricks Lakehouse Platform from scratch. You will provide critical technical leadership, guidance, and hands-on expertise to ensure the successful establishment of a scalable, high-performance, and future-proof data environment.
Phase 1: Strategic Vision & Blueprint
- Data Strategy & Roadmap: Collaborate with business stakeholders and leadership to define the overarching data vision, strategy, and a phased roadmap for the Databricks Lakehouse Platform.
- Architectural Design: Lead the end-to-end design of the Databricks Lakehouse architecture (Medallion architecture), including data ingestion patterns, storage layers (Delta Lake), processing frameworks (Spark), and consumption mechanisms.
- Technology Selection: Evaluate and recommend optimal Databricks features and integrations (e.g., Unity Catalog, Photon, Delta Live Tables, MLflow) and complementary cloud services (e.g., Azure Data Factory, Azure Data Lake Storage, Power BI).
- Security & Governance Frameworks: Design robust data governance, security, and access control models within the Databricks ecosystem, ensuring compliance with industry standards and regulations.
Phase 2: Core Platform Build & Development
- Hands-on Implementation: Act as a lead engineer in the initial build-out of core data pipelines, ETL/ELT processes, and data models using PySpark, SQL, and Databricks notebooks.
- Data Ingestion & Integration: Establish scalable data ingestion frameworks from diverse sources (batch and streaming) into the Lakehouse.
- Performance Optimization: Design and implement solutions for optimal data processing performance, cost efficiency, and scalability within Databricks.
- CI/CD & Automation: Develop and implement Continuous Integration/Continuous Delivery (CI/CD) pipelines for automated deployment of Databricks assets and data solutions.
Phase 3: Enablement, Optimisation & Transition
- Team Enablement: Provide mentorship and technical guidance to a growing team of Data Engineers and Analysts, fostering best practices and Databricks expertise.
- Data Quality & Monitoring: Implement comprehensive data quality checks, monitoring, and alerting mechanisms to ensure data integrity and reliability.
- MLOps Integration: Lay the groundwork for seamless integration with Machine Learning Operations (MLOps) capabilities for future AI initiatives.
- Documentation & Knowledge Transfer: Create comprehensive technical documentation and conduct knowledge transfer sessions to ensure long-term sustainability of the platform.
Required Skills & Experience
- Proven Databricks Expertise: Deep, hands-on experience designing and implementing solutions on the Databricks Lakehouse Platform (Delta Lake, Unity Catalog, Spark, Databricks SQL Analytics).
- Cloud Data Architecture: Extensive experience with Azure data services (e.g., Azure Data Factory, Azure Data Lake Storage, Azure Synapse) and architecting cloud-native data platforms.
- Programming Proficiency: Expert-level skills in Python (PySpark) and SQL for data engineering and transformation. Scala is a strong plus.
- Data Modelling: Strong understanding and practical experience with data warehousing, data lake, and dimensional modelling concepts.
- ETL/ELT & Data Pipelines: Proven track record of designing, building, and optimizing complex data pipelines for both batch and real-time processing.
Desirable Skills & Certifications
- Databricks Certified Data Engineer Associate/Professional.
- Microsoft Certified: Azure Data Engineer Associate (DP-203) or Azure Solutions Architect Expert (AZ-305/304).
- Experience with other cloud providers (AWS, GCP).
- Knowledge of streaming technologies (Kafka, Event Hubs).
- Company
- Osmii
- Location
- City of London, Greater London, UK
Hybrid / WFH Options - Posted
- Company
- Osmii
- Location
- City of London, Greater London, UK
Hybrid / WFH Options - Posted