PySpark Data Engineer | up to £450/day Inside | Remote with occasional London travel We are seeking a PySpark Data Engineer to support the development of a modern, scalable data lake for a new strategic programme. This is a greenfield initiative to replace fragmented legacy reporting solutions, offering the opportunity to shape a long-term, high-impact platform … from the ground up. Key Responsibilities: * Design, build, and maintain scalable data pipelines using PySpark 3/4 and Python 3. * Contribute to the creation of a unified data lake following medallion architecture principles. * Leverage Databricks and Delta Lake (Parquet format) for efficient, reliable data processing. * Apply BDD testing practices using Python Behave and ensure code quality with Python … Coverage. * Collaborate with cross-functional teams and participate in Agile delivery workflows. * Manage configurations and workflows using YAML, Git, and Azure DevOps. Required Skills & Experience: * Proven expertise in PySpark 3/4 and Python 3 for large-scale data engineering. * Hands-on experience with Databricks, Delta Lake, and medallion architecture. * Familiarity with Python Behave for Behaviour Driven Development. * Strong More ❯
skills Strong experience with Databricks Proficient in SQL for data querying and transformation Solid understanding of Azure Services including: Azure Data Factory Azure Storage Containers Lakehouse architecture Expertise in PySpark for big data processing In-depth knowledge of data migration processes, including ETL and data mapping Excellent verbal and written communication skills Experience Experience with CI/CD pipelines … for deployment automation Familiarity with COSMOS DB Proficiency in Python beyond PySpark Working knowledge of Azure DevOps tools and practices Benefits Collaborative working environment - we stand shoulder to shoulder with our clients and our peers through good times and challenges We empower all passionate technology loving professionals by allowing them to expand their skills and take part in inspiring More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Mars
pet owners everywhere. Join us on a multi-year digital transformation journey where your work will unlock real impact. 🌟 What you'll do Build robust data pipelines using Python, PySpark, and cloud-native tools Engineer scalable data models with Databricks, Delta Lake, and Azure tech Collaborate with analysts, scientists, and fellow engineers to deliver insights Drive agile DevOps practices More ❯
pet owners everywhere. Join us on a multi-year digital transformation journey where your work will unlock real impact. 🌟 What you'll do Build robust data pipelines using Python, PySpark, and cloud-native tools Engineer scalable data models with Databricks, Delta Lake, and Azure tech Collaborate with analysts, scientists, and fellow engineers to deliver insights Drive agile DevOps practices More ❯
stack”. You’ll be expected to work across a broad tech landscape: Big Data & Distributed Systems: HDFS, Hadoop, Spark, Kafka Cloud: Azure or AWS Programming: Python, Java, Scala, PySpark – you’ll need two or more, Python preferred Data Engineering Tools: Azure Data Factory, Databricks, Delta Lake, Azure Data Lake SQL & Warehousing: Strong experience with advanced SQL and database More ❯
Databricks environments and developing lakehouse architectures with a focus on automation, performance tuning, cost optimisation, and system reliability. Proven proficiency in programming languages such as Python, T-SQL, and PySpark, with practical knowledge of test-driven development. Demonstrated capability in building secure, scalable data solutions on Azure with an in-depth understanding of data security and regulatory compliance, using More ❯
/Data Warehousing or Data Engineering. Develop customer relationships and build internal partnerships with account executives and teams. Prior experience with coding in a core programming language (i.e., Python, PySpark, or SQL) and willingness to learn a base level of Spark. Proficient with Big Data Analytics technologies, including hands-on expertise with complex proofs-of-concept and public cloud More ❯
processes Develop dashboards and visualizations Work closely with data scientists and stakeholders Follow CI/CD and code best practices (Git, testing, reviews) Tech Stack & Experience: Strong Python (Pandas), PySpark, and SQL skills Cloud data tools (Azure Data Factory, Synapse, Databricks, etc.) Data integration experience across formats and platforms Strong communication and data literacy Nice to Have: Commodities/ More ❯
processes Develop dashboards and visualizations Work closely with data scientists and stakeholders Follow CI/CD and code best practices (Git, testing, reviews) Tech Stack & Experience: Strong Python (Pandas), PySpark, and SQL skills Cloud data tools (Azure Data Factory, Synapse, Databricks, etc.) Data integration experience across formats and platforms Strong communication and data literacy Nice to Have: Commodities/ More ❯
processes Develop dashboards and visualizations Work closely with data scientists and stakeholders Follow CI/CD and code best practices (Git, testing, reviews) Tech Stack & Experience: Strong Python (Pandas), PySpark, and SQL skills Cloud data tools (Azure Data Factory, Synapse, Databricks, etc.) Data integration experience across formats and platforms Strong communication and data literacy Nice to Have: Commodities/ More ❯
data tooling, helping to solve complex data challenges that have wide-reaching impact across multiple business domains. Key Requirements: Strong experience in AWS data engineering tools (e.g., Glue, Athena, PySpark, Lake Formation) Solid skills in Python and SQL for data processing and analysis Deep understanding of data governance, quality, and security A passion for building scalable, secure, and efficient More ❯
a focus on data quality at scale. Hands-on expertise in core GCP data services such as BigQuery, Composer, Dataform, Dataproc, and Pub/Sub. Strong programming skills in PySpark, Python, and SQL. Proficiency in ETL processes, data mining, and data storage principles. Experience with BI and data visualisation tools, such as Looker or Power BI. Excellent communication skills More ❯
a focus on data quality at scale. Hands-on expertise in core GCP data services such as BigQuery, Composer, Dataform, Dataproc, and Pub/Sub. Strong programming skills in PySpark, Python, and SQL. Proficiency in ETL processes, data mining, and data storage principles. Experience with BI and data visualisation tools, such as Looker or Power BI. Excellent communication skills More ❯
a focus on data quality at scale. Hands-on expertise in core GCP data services such as BigQuery, Composer, Dataform, Dataproc, and Pub/Sub. Strong programming skills in PySpark, Python, and SQL. Proficiency in ETL processes, data mining, and data storage principles. Experience with BI and data visualisation tools, such as Looker or Power BI. Excellent communication skills More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Recruit with Purpose
they modernise the use of their data. Overview of responsibilities in the role: Design and maintain scalable, high-performance data pipelines using Azure Data Platform tools such as Databricks (PySpark), Data Factory, and Data Lake Gen2. Develop curated data layers (bronze, silver, gold) optimised for analytics, reporting, and AI/ML, ensuring they meet performance, governance, and reuse standards. More ❯
they modernise the use of their data. Overview of responsibilities in the role: Design and maintain scalable, high-performance data pipelines using Azure Data Platform tools such as Databricks (PySpark), Data Factory, and Data Lake Gen2. Develop curated data layers (bronze, silver, gold) optimised for analytics, reporting, and AI/ML, ensuring they meet performance, governance, and reuse standards. More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Recruit with Purpose
they modernise the use of their data. Overview of responsibilities in the role: Design and maintain scalable, high-performance data pipelines using Azure Data Platform tools such as Databricks (PySpark), Data Factory, and Data Lake Gen2. Develop curated data layers (bronze, silver, gold) optimised for analytics, reporting, and AI/ML, ensuring they meet performance, governance, and reuse standards. More ❯
Date: 21 Mar 2025 Location: Edinburgh, GB Macclesfield, GB Glasgow, GB Company: Royal London Group Contract Type: Permanent Location: Wilmslow or Edinburgh or Glasgow Working style: Hybrid 50% home/office based The Group Data Office (GDO) is responsible for More ❯
Date: 21 Mar 2025 Location: Edinburgh, GB Macclesfield, GB Glasgow, GB Company: Royal London Group Contract Type: Permanent Location: Wilmslow or Edinburgh or Glasgow Working style: Hybrid 50% home/office based The Group Data Office (GDO) is responsible for More ❯
data types, data structures, schemas (JSON and Spark), and schema management. Key Skills and Experience: Strong understanding of complex JSON manipulation Experience with Data Pipelines using custom Python/PySpark frameworks Knowledge of the 4 core Data categories (Reference, Master, Transactional, Freeform) and handling Reference Data Understanding of Data Security principles, access controls, GDPR, and handling sensitive datasets Strong … scripting, environment variables Experience with browser-based IDEs like Jupyter Notebooks Familiarity with Agile methodologies (SAFE, Scrum, JIRA) Languages and Frameworks: JSON YAML Python (advanced proficiency, Pydantic bonus) SQL PySpark Delta Lake Bash Git Markdown Scala (bonus) Azure SQL Server (bonus) Technologies: Azure Databricks Apache Spark Delta Tables Data processing with Python PowerBI (Data ingestion and integration) JIRA Additional More ❯
Azure Databricks, handling ingestion from various data sources, performing complex transformations, and publishing data to Azure Data Lake or other storage services. Write efficient and standardized Spark SQL and PySpark code for data transformations, ensuring data integrity and accuracy across the pipeline. Automate pipeline orchestration using Databricks Workflows or integration with external tools (e.g., Apache Airflow, Azure Data Factory … in designing and implementing scalable ETL/ELT data pipelines in Azure Databricks, transforming raw data into usable datasets for analysis. Azure Databricks Proficiency: Strong knowledge of Spark (SQL, PySpark) for data transformation and processing within Databricks, along with experience building workflows and automation using Databricks Workflows. Azure Data Services: Hands-on experience with Azure services like Azure Data More ❯
field/experience Hands-on data science expertise with code-based model development e.g. R, Python Strong knowledge of deploying end-to-end machine learning models in Databricks utilizing Pyspark, MLflow and workflows Strong knowledge of data platforms and tools, including Hadoop, Spark, SQL, and NoSQL databases Communicate algorithmic solutions in a clear, understandable way. Leverage data visualization techniques More ❯
Design and implement end-to-end data architecture on AWS using tools such as Glue, Lake Formation, and Athena Develop scalable and secure ETL/ELT pipelines using Python, PySpark, and SQL Drive decisions on data modeling, lakehouse architecture, and integration strategies with Databricks and Snowflake Collaborate cross-functionally to embed data governance, quality, and lineage into platform design … Serve as a trusted advisor to engineering and business stakeholders on data strategy and architecture What You Bring: Deep, hands-on expertise with AWS data services (Glue, Lake Formation, PySpark, Athena, etc.) Strong coding skills in Python and SQL for building, testing, and optimizing data pipelines Proven experience designing secure, scalable, and reliable data architectures in cloud environments Solid More ❯