extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging … SQL for data manipulation and scripting. Strong understanding of data modelling concepts and techniques, including relational and dimensional modelling. Experience in big data technologies and frameworks such as Databricks, Spark, Kafka, and Flink. Experience in using modern data architectures, such as lakehouse. Experience with CI/CD pipelines, version control systems like Git, and containerization (e.g., Docker). Experience … with ETL tools and technologies such as Apache Airflow, Informatica, or Talend. Strong understanding of data governance and best practices in data management. Experience with cloud platforms and services such as AWS, Azure, or GCP for deploying and managing data solutions. Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues. SQL More ❯
extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging … SQL for data manipulation and scripting. Strong understanding of data modelling concepts and techniques, including relational and dimensional modelling. Experience in big data technologies and frameworks such as Databricks, Spark, Kafka, and Flink. Experience in using modern data architectures, such as lakehouse. Experience with CI/CD pipelines and version control systems like Git. Knowledge of ETL tools and … technologies such as Apache Airflow, Informatica, or Talend. Knowledge of data governance and best practices in data management. Familiarity with cloud platforms and services such as AWS, Azure, or GCP for deploying and managing data solutions. Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues. SQL (for database management and querying More ❯
In this role, you will be responsible for designing, building, and maintaining robust data pipelines and infrastructure on the Azure cloud platform. You will leverage your expertise in PySpark, ApacheSpark, and Apache Airflow to process and orchestrate large-scale data workloads, ensuring data quality, efficiency, and scalability. If you have a passion for data engineering and … significant impact, we encourage you to apply! Job Responsibilities ETL/ELT Pipeline Development: Design, develop, and optimize efficient and scalable ETL/ELT pipelines using Python, PySpark, and Apache Airflow. Implement batch and real-time data processing solutions using Apache Spark. Ensure data quality, governance, and security throughout the data lifecycle. Cloud Data Engineering: Manage and optimize … effectiveness. Implement and maintain CI/CD pipelines for data workflows to ensure smooth and reliable deployments. Big Data & Analytics: Develop and optimize large-scale data processing pipelines using ApacheSpark and PySpark. Implement data partitioning, caching, and performance tuning techniques to enhance Spark-based workloads. Work with diverse data formats (structured and unstructured) to support advanced More ❯
Databricks platform. Optimise data pipelines for performance, efficiency, and cost-effectiveness. Implement data quality checks and validation rules within data pipelines. Data Transformation & Processing: Implement complex data transformations using Spark (PySpark or Scala) and other relevant technologies. Develop and maintain data processing logic for cleaning, enriching, and aggregating data. Ensure data consistency and accuracy throughout the data lifecycle. Azure … Databricks Implementation: Work extensively with Azure Databricks Unity Catalog, including Delta Lake, Spark SQL, and other relevant services. Implement best practices for Databricks development and deployment. Optimise Databricks workloads for performance and cost. Need to program using the languages such as SQL, Python, R, YAML and JavaScript Data Integration: Integrate data from various sources, including relational databases, APIs, and … best practices. Essential Skills & Experience: 10+ years of experience in data engineering, with at least 3+ years of hands-on experience with Azure Databricks. Strong proficiency in Python and Spark (PySpark) or Scala. Deep understanding of data warehousing principles, data modelling techniques, and data integration patterns. Extensive experience with Azure data services, including Azure Data Factory, Azure Blob Storage More ❯
Databricks platform. Optimise data pipelines for performance, efficiency, and cost-effectiveness. Implement data quality checks and validation rules within data pipelines. Data Transformation & Processing: Implement complex data transformations using Spark (PySpark or Scala) and other relevant technologies. Develop and maintain data processing logic for cleaning, enriching, and aggregating data. Ensure data consistency and accuracy throughout the data lifecycle. Azure … Databricks Implementation: Work extensively with Azure Databricks Unity Catalog, including Delta Lake, Spark SQL, and other relevant services. Implement best practices for Databricks development and deployment. Optimise Databricks workloads for performance and cost. Need to program using the languages such as SQL, Python, R, YAML and JavaScript Data Integration: Integrate data from various sources, including relational databases, APIs, and … best practices. Essential Skills & Experience: 10+ years of experience in data engineering, with at least 3+ years of hands-on experience with Azure Databricks. Strong proficiency in Python and Spark (PySpark) or Scala. Deep understanding of data warehousing principles, data modelling techniques, and data integration patterns. Extensive experience with Azure data services, including Azure Data Factory, Azure Blob Storage More ❯
In this role, you will be responsible for designing, building, and maintaining robust data pipelines and infrastructure on the Azure cloud platform. You will leverage your expertise in PySpark, ApacheSpark, and Apache Airflow to process and orchestrate large-scale data workloads, ensuring data quality, efficiency, and scalability. If you have a passion for data engineering and … to apply! Job Responsibilities Data Engineering & Data Pipeline Development Design, develop, and optimize scalable DATA workflows using Python, PySpark, and Airflow Implement real-time and batch data processing using Spark Enforce best practices for data quality, governance, and security throughout the data lifecycle Ensure data availability, reliability and performance through monitoring and automation. Cloud Data Engineering : Manage cloud infrastructure … data processing workloads Implement CI/CD pipelines for data workflows to ensure smooth and reliable deployments. Big Data & Analytics: Build and optimize large-scale data processing pipelines using ApacheSpark and PySpark Implement data partitioning, caching, and performance tuning for Spark-based workloads. Work with diverse data formats (structured and unstructured) to support advanced analytics and More ❯
technologies to create and maintain data assets and reports for business insights. Assist in engineering and managing data models and pipelines within a cloud environment, utilizing technologies like Databricks, Spark, Delta Lake, and SQL. Contribute to the maintenance and enhancement of our progressive tech stack, which includes Python, PySpark, Logic Apps, Azure Functions, ADLS, Django, and ReactJs. Support the … environment and various platforms, including Azure, SQL Server. NoSQL databases is good to have. Hands-on experience with data pipeline development, ETL processes, and big data technologies (e.g., Hadoop, Spark, Kafka). Experience with DataOps practices and tools, including CI/CD for data pipelines. Experience in medallion data architecture and other similar data modelling approaches. Experience with data More ❯
technologies to create and maintain data assets and reports for business insights. Assist in engineering and managing data models and pipelines within a cloud environment, utilizing technologies like Databricks, Spark, Delta Lake, and SQL. Contribute to the maintenance and enhancement of our progressive tech stack, which includes Python, PySpark, Logic Apps, Azure Functions, ADLS, Django, and ReactJs. Support the … environment and various platforms, including Azure, SQL Server. NoSQL databases is good to have. Hands-on experience with data pipeline development, ETL processes, and big data technologies (e.g., Hadoop, Spark, Kafka). Experience with DataOps practices and tools, including CI/CD for data pipelines. Experience in medallion data architecture and other similar data modelling approaches. Experience with data More ❯
as a Data Site Reliability Engineer or similar role, focusing on data infrastructure management Proficiency in data technologies, such as relational databases, data warehousing, big data platforms (e.g., Hadoop, Spark), data streaming (e.g., Kafka), and cloud services (e.g., AWS, GCP, Azure) Programming skills in Python, Java, or Scala, with automation and scripting experience Experience with containerization and orchestration tools More ❯
Job Accountabilities Develop robust, scalable data pipelines to serve the easyJet analyst and data science community. Highly competent hands-on experience with relevant Data Engineering technologies, such as Databricks, Spark, Spark API, Python, SQL Server, Scala. Work with data scientists, machine learning engineers and DevOps engineers to develop, develop and deploy machine learning models and algorithms aimed at … indexing, partitioning. Hands-on IaC development experience with Terraform or CloudFormation. Understanding of ML development workflow and knowledge of when and how to use dedicated hardware. Significant experience with ApacheSpark or any other distributed data programming frameworks (e.g. Flink, Hadoop, Beam) Familiarity with Databricks as a data and AI platform or the Lakehouse Architecture. Experience with data … e.g. access management, data privacy, handling of sensitive data (e.g. GDPR) Desirable Skills Experience in event-driven architecture, ingesting data in real time in a commercial production environment with Spark Streaming, Kafka, DLT or Beam. Understanding of the challenges faced in the design and development of a streaming data pipeline and the different options for processing unbounded data (pubsub More ❯
optimisation for efficient storage and retrieval. Build and maintain data models to support analytics and machine learning workflows. Pipeline Orchestration: Develop, monitor, and optimize ETL/ELT workflows using Apache Airflow. Ensure data pipelines are robust, error-tolerant, and scalable for real-time and batch processing. Data Scraping & Unstructured Data Processing: Develop and maintain scalable web scraping solutions to … or a related field; or equivalent professional experience. Experience: 5+ years of experience in data engineering or a related field. Strong expertise in data pipeline orchestration tools such as Apache Airflow . Proven track record of designing and implementing data lakes and warehouses (experience with Azure is a plus). Demonstrated experience with Terraform for infrastructure provisioning and management. More ❯
technical and professional experience Preferred Skills: Experience working within the public sector. Knowledge of cloud platforms (e.g., IBM Cloud, AWS, Azure). Familiarity with big data processing frameworks (e.g., ApacheSpark, Hadoop). Understanding of data warehousing concepts and experience with tools like IBM Cognos or Tableau. Certifications:While not required, the following certifications would be highly beneficial … Experience working within the public sector. Knowledge of cloud platforms (e.g., IBM Cloud, AWS, Azure). Familiarity with big data processing frameworks (e.g., ApacheSpark, Hadoop). Understanding of data warehousing concepts and experience with tools like IBM Cognos or Tableau. ABOUT BUSINESS UNIT IBM Consulting is IBM's consulting and global professional services business, with market leading More ❯
Functions, Azure SQL Database, HDInsight, and Azure Machine Learning Studio. Data Storage & Databases: SQL & NoSQL Databases: Experience with databases like PostgreSQL, MySQL, MongoDB, and Cassandra. Big Data Ecosystems: Hadoop, Spark, Hive, and HBase. Data Integration & ETL: Data Pipelining Tools: Apache NiFi, Apache Kafka, and Apache Flink. ETL Tools: AWS Glue, Azure Data Factory, Talend, and ApacheMore ❯
modern data products within a high-performance cloud platform • Collaborate with cross-functional squads to solve real-world data challenges • Design, build, and optimise scalable data pipelines using Python, Spark & Databricks • Work on orchestration, monitoring, and performance optimisation • Create frameworks and processes for high-quality, scalable data workflows • Help shape and promote software engineering best practices • Support ML/… Python data ecosystem • Solid SQL skills and experience with data modelling best practices • Hands-on experience with Databricks or Snowflake, ideally on AWS (open to Azure) • Strong knowledge of Spark or PySpark • Experience with CI/CD, Git, Jenkins (or similar tools) • Proven ability to think about scalability, production readiness, and data quality • Experience working in Agile, collaborative teams More ❯
real-time data pipelines for processing large-scale data. Experience with ETL processes for data ingestion and processing. Proficiency in Python and SQL. Experience with big data technologies like Apache Hadoop and Apache Spark. Familiarity with real-time data processing frameworks such as Apache Kafka or Flink. MLOps & Deployment: Experience deploying and maintaining large-scale ML inference More ❯
Maths or similar Science or Engineering discipline Strong Python and other programming skills (Java and/or Scala desirable) Strong SQL background Some exposure to big data technologies (Hadoop, spark, presto, etc.) NICE TO HAVES OR EXCITED TO LEARN: Some experience designing, building and maintaining SQL databases (and/or NoSQL) Some experience with designing efficient physical data models More ❯
microservice architecture, API development. Machine Learning (ML): Deep understanding of machine learning principles, algorithms, and techniques. Experience with popular ML frameworks and libraries like TensorFlow, PyTorch, scikit-learn, or Apache Spark. Proficiency in data preprocessing, feature engineering, and model evaluation. Knowledge of ML model deployment and serving strategies, including containerization and microservices. Familiarity with ML lifecycle management, including versioning More ❯
two of the following: Python, SQL, Java Commercial experience in client-facing projects is a plus, especially within multi-disciplinary teams Deep knowledge of database technologies: Distributed systems (e.g., Spark, Hadoop, EMR) RDBMS (e.g., SQL Server, Oracle, PostgreSQL, MySQL) NoSQL (e.g., MongoDB, Cassandra, DynamoDB, Neo4j) Solid understanding of software engineering best practices - code reviews, testing frameworks, CI/CD More ❯
microservice architecture, API development. Machine Learning (ML): • Deep understanding of machine learning principles, algorithms, and techniques. • Experience with popular ML frameworks and libraries like TensorFlow, PyTorch, scikit-learn, or Apache Spark. • Proficiency in data preprocessing, feature engineering, and model evaluation. • Knowledge of ML model deployment and serving strategies, including containerization and microservices. • Familiarity with ML lifecycle management, including versioning More ❯
years of experience in data engineering or a related field, with a focus on building scalable data systems and platforms. Expertise in modern data tools and frameworks such as Spark, dbt, Airflow, Kafka, Databricks, and cloud-native services (AWS, GCP, or Azure) Understanding of data modeling, distributed systems, ETL/ELT pipelines, and streaming architectures Proficiency in SQL and More ❯
SageMaker, GCP AI Platform, Azure ML, or equivalent). Solid understanding of data-engineering concepts: SQL/noSQL, data pipelines (Airflow, Prefect, or similar), and batch/streaming frameworks (Spark, Kafka). Leadership & Communication: Proven ability to lead cross-functional teams in ambiguous startup settings. Exceptional written and verbal communication skills-able to explain complex concepts to both technical More ❯
SageMaker, GCP AI Platform, Azure ML, or equivalent). • Solid understanding of data-engineering concepts: SQL/noSQL, data pipelines (Airflow, Prefect, or similar), and batch/streaming frameworks (Spark, Kafka). • Leadership & Communication: • Proven ability to lead cross-functional teams in ambiguous startup settings. • Exceptional written and verbal communication skills—able to explain complex concepts to both technical More ❯
tools, and statistical packages. Strong analytical, problem-solving, and critical thinking skills. 8.Experience with social media analytics and understanding of user behaviour. 9.Familiarity with big data technologies, such as Apache Hadoop, ApacheSpark, or Apache Kafka. 10.Knowledge of AWS machine learning services, such as Amazon SageMaker and Amazon Comprehend. 11.Experience with data governance and security best More ❯
field. Technical Skills Required Hands-on software development experience with Python and experience with modern software development and release engineering practices (e.g. TDD, CI/CD). Experience with ApacheSpark or any other distributed data programming frameworks. Comfortable writing efficient SQL and debugging on cloud warehouses like Databricks SQL or Snowflake. Experience with cloud infrastructure like AWS … Skills Hands-on development experience in an airline, e-commerce or retail industry Experience in event-driven architecture, ingesting data in real time in a commercial production environment with Spark Streaming, Kafka, DLT or Beam. Experience implementing end-to-end monitoring, quality checks, lineage tracking and automated alerts to ensure reliable and trustworthy data across the platform. Experience of More ❯
solutions using Databricks on Azure or AWS. Databricks Components : Proficient in Delta Lake, Unity Catalog, MLflow, and other core Databricks tools. Programming & Query Languages : Strong skills in SQL and ApacheSpark (Scala or Python). Relational Databases : Experience with on-premises and cloud-based SQL databases. Data Engineering Techniques : Skilled in Data Governance, Architecture, Data Modelling, ETL/ More ❯