in at least one of the big 3 cloud ML stacks (AWS, Azure, GCP). Hands-on experience with open-source ETL, and data pipeline orchestration tools such as Apache Airflow and Nifi. Experience with large scale/Big Data technologies, such as Hadoop, Spark, Hive, Impala, PrestoDb, Kafka. Experience with workflow orchestration tools like Apache Airflow. More ❯
atleast 5 years of experience • Working experience in Palantir Foundry platform is must • Experience designing and implementing data analytics solutions on enterprise data platforms and distributed computing (Spark/Hive/Hadoop preferred). • Proven track record of understanding and transforming customer requirements into a best-fit design and architecture. • Demonstrated experience in end-to-end data management, data More ❯
advocate best practices within a Centre of Excellence. Skills, knowledge and expertise: Deep expertise in the Databricks platform, including Jobs and Workflows, Cluster Management, Catalog Design and Maintenance, Apps, Hive Metastore Management, Network Management, Delta Sharing, Dashboards, and Alerts. Proven experience working with big data technologies, i.e., Databricks and Apache Spark. Proven experience working with Azure data platform More ❯
advocate best practices within a Centre of Excellence. Skills, knowledge and expertise: Deep expertise in the Databricks platform, including Jobs and Workflows, Cluster Management, Catalog Design and Maintenance, Apps, Hive Metastore Management, Network Management, Delta Sharing, Dashboards, and Alerts. Proven experience working with big data technologies, i.e., Databricks and Apache Spark. Proven experience working with Azure data platform More ❯
Google Cloud Platform (GCP) Strong proficiency in SQL and experience with relational databases such as MySQL, PostgreSQL, or Oracle Experience with big data technologies such as Hadoop, Spark, or Hive Familiarity with data warehousing and ETL tools such as Amazon Redshift, Google BigQuery, or Apache Airflow Proficiency in Python and at least one other programming language such as More ❯
Google Cloud Platform (GCP) Strong proficiency in SQL and experience with relational databases such as MySQL, PostgreSQL, or Oracle Experience with big data technologies such as Hadoop, Spark, or Hive Familiarity with data warehousing and ETL tools such as Amazon Redshift, Google BigQuery, or Apache Airflow Proficiency in Python and at least one other programming language such as More ❯
years of experience working on mission critical data pipelines and ETL systems. 5+ years of hands-on experience with big data technology, systems and tools such as AWS, Hadoop, Hive, and Snowflake Expertise with common Software Engineering languages such as Python, Scala, Java, SQL and a proven ability to learn new programming languages Experience with workflow orchestration tools such … certification/s Strong data visualizations skills to convey information and results clearly Experience with DevOps tools such as Docker, Kubernetes, Jenkins, etc. Experience with event messaging frameworks like Apache Kafka The hiring range for this position in Santa Monica, California is $136,038 to $182,490 per year, in Glendale, California is $136,038 to $182,490 per More ❯
Belfast, Northern Ireland, United Kingdom Hybrid / WFH Options
Citigroup Inc
role. Demonstrated execution capabilities. Strong analytical and quantitative skills; Data driven and results-oriented Experience with Core Java required (Spark a plus) Experience with SQL Experience working with Hadoop, Hive, Sqoop and other technologies in Cloudera's CDP distribution. Understanding of version control (git) Experience working as part of an agile team. Excellent written and oral communication skills Technical … Skills: Strong knowledge in Java Some knowledge inHadoop, hive, SQL, Spark Understanding of Unix Shell Scripting CI/CD Pipeline Maven or Gradle experience Predictive analytics (desirable) PySpark (desirable) Trade Surveillance domain knowledge (desirable) Education: Bachelor’s/University degree or equivalent experience What we’ll provide you: By joining Citi, you will not only be part of a More ❯
automation processes and best practices to streamline data workflows and reduce manual interventions. Must have: AWS, ETL, EMR, GLUE, Spark/Scala, Java, Python. Good to have: Cloudera – Spark, Hive, Impala, HDFS, Informatica PowerCenter, Informatica DQ/DG, Snowflake Erwin. Qualifications: Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field. 5 to More ❯
as Teradata Oracle, SAP BW and migration of these data warehouses to modern cloud data platforms. Deep understanding and hands-on experience with big data technologies like Hadoop, HDFS, Hive, Spark and cloud data platform services. Proven track record of designing and implementing large-scale data architectures in complex environments. CICD/DevOps experience is a plus. Skills: Strong More ❯
Experience performing data analytics on AWS platforms Experience in writing efficient SQL's, implementing complex ETL transformations on big data platform. Experience in a Big Data technologies (Spark, Impala, Hive, Redshift, Kafka, etc.) Experience in data quality testing; adept at writing test cases and scripts, presenting and resolving data issues Experience with Databricks, Snowflake, Iceberg are required Preferred qualifications More ❯
or more) Experience in Data mining, Data warehousing, ETL Experience in handling large volumes of data on SQL, NoSQL and Big Data databases Experience in Hadoop ecosystem: Hadoop, Spark, Hive, and/or Scala Experience in programming languages: PHP, Python, C Java Experience in Web development in Laravel MVC Framework Comfortable working in Shell scripting, AWS (cloud), Linux and More ❯
Easter Howgate, Midlothian, United Kingdom Hybrid / WFH Options
Leonardo UK Ltd
and tools, including experience with CI/CD pipelines, containerisation, and workflow orchestration. Familiar with ETL/ELT frameworks, and experienced with Big Data Processing Tools (e.g. Spark, Airflow, Hive, etc.) Knowledge of programming languages (e.g. Java, Python, SQL) Hands-on experience with SQL/NoSQL database design Degree in STEM, or similar field; a Master's is a More ❯
foundation in data engineering, data analytics, or data science, with the ability to work effectively with various data types and sources. Experience using big data technologies (e.g. Hadoop, Spark, Hive) and database management systems (e.g. SQL and NoSQL). Graph Database Expertise : Deep understanding of graph database concepts, data modeling, and query languages (e.g., Cypher). Demonstrate hands-on More ❯
and well-tested solutions to automate data ingestion, transformation, and orchestration across systems. Own data operations infrastructure: Manage and optimise key data infrastructure components within AWS, including Amazon Redshift, Apache Airflow for workflow orchestration and other analytical tools. You will be responsible for ensuring the performance, reliability, and scalability of these systems to meet the growing demands of data … pipelines , data warehouses , and leveraging AWS data services . Strong proficiency in DataOps methodologies and tools, including experience with CI/CD pipelines, containerized applications , and workflow orchestration using Apache Airflow . Familiar with ETL frameworks, and bonus experience with Big Data processing (Spark, Hive, Trino), and data streaming. Proven track record - You've made a demonstrable impact More ❯
Azure SQL Database, HDInsight, and Azure Machine Learning Studio. Data Storage & Databases: SQL & NoSQL Databases: Experience with databases like PostgreSQL, MySQL, MongoDB, and Cassandra. Big Data Ecosystems: Hadoop, Spark, Hive, and HBase. Data Integration & ETL: Data Pipelining Tools: Apache NiFi, Apache Kafka, and Apache Flink. ETL Tools: AWS Glue, Azure Data Factory, Talend, and ApacheMore ❯
tools to automate profit-and-loss forecasting and planning for the Physical Consumer business. We are building the next generation Business Intelligence solutions using big data technologies such as Apache Spark, Hive/Hadoop, and distributed query engines. As a Data Engineer in Amazon, you will be working in a large, extremely complex and dynamic data environment. You More ❯
Belfast, Northern Ireland, United Kingdom Hybrid / WFH Options
Citi
and data science solutions that are Accurate, Reliable, Relevant, Consistent, Complete, Scalable, Timely, Secure, Nimble. Olympus is built on Big data platform and technologies under Cloudera distribution like HDFS, Hive, Impala, Spark, YARN, Sentry, Oozie, Kafka. Our team interfaces with a vast client base and works in close partnership with Operations, Development and other technology counterparts running the application … personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency. Skills & Qualifications: Working knowledge of various components and technologies under Cloudera distribution like HDFS, Hive, Impala, Spark, YARN, Sentry, Oozie, Kafka. Very good knowledge on analyzing the bottlenecks on the cluster - performance tuning, effective resource usage, capacity planning, investigating. Perform daily performance monitoring of More ❯
more scripting language (e.g., Python, KornShell) - 3+ years of analyzing and interpreting data with Redshift, Oracle, NoSQL etc. experience PREFERRED QUALIFICATIONS - Experience with big data technologies such as: Hadoop, Hive, Spark, EMR - Experience with big data processing technology (e.g., Hadoop or ApacheSpark), data warehouse technical architecture, infrastructure components, ETL, and reporting/analytic tools and environments Our inclusive culture More ❯
data modeling, warehousing, and ETL pipelines Proficiency in SQL Experience with scripting languages like Python or KornShell Unix experience Troubleshooting data and infrastructure issues Preferred Qualifications Experience with Hadoop, Hive, Spark, EMR Experience with ETL tools like Informatica, ODI, SSIS, BODI, DataStage Knowledge of distributed storage and computing systems Experience with reporting and analytics platforms We promote an inclusive More ❯
London, England, United Kingdom Hybrid / WFH Options
Solirius Reply
learning frameworks (e.g., scikit-learn, TensorFlow, XGBoost, PyTorch). Strong foundation in statistics, probability, and hypothesis testing. Experience with cloud platforms (AWS, GCP, Azure) and big data tools (Spark, Hive, Databricks, etc.) is a plus. Excellent communication and storytelling skills with the ability to explain complex concepts to non-technical stakeholders. Proven track record of delivering impactful data science More ❯
Milton Keynes, England, United Kingdom Hybrid / WFH Options
Santander
effective communication skills to interact with team members, stakeholders and end users conveying technical concepts in a comprehensible manner Skills across the following data competencies: SQL (AWS Athena/Hive/Snowflake) Hadoop/EMR/Spark/Scala Data structures (tables, views, stored procedures) Data Modelling - star/snowflake Schemas, efficient storage, normalisation Data Transformation DevOps - data pipelines More ❯
London, England, United Kingdom Hybrid / WFH Options
AlphaSights
and well-tested solutions to automate data ingestion, transformation, and orchestration across systems. Own data operations infrastructure: Manage and optimise key data infrastructure components within AWS, including Amazon Redshift, Apache Airflow for workflow orchestration and other analytical tools. You will be responsible for ensuring the performance, reliability, and scalability of these systems to meet the growing demands of data … pipelines , data warehouses , and leveraging AWS data services . Strong proficiency in DataOps methodologies and tools, including experience with CI/CD pipelines, containerized applications , and workflow orchestration using Apache Airflow . Familiar with ETL frameworks, and bonus experience with Big Data processing (Spark, Hive, Trino), and data streaming. Proven track record – You’ve made a demonstrable impact More ❯
distributed web application Deep understanding in software architecture, object-oriented design principles, and data structures Extensive experience in developing microservices using Java, Python Experience in distributed computing frameworks like - Hive/Hadoop, Apache Spark. Good experience in Test driven development and automating test cases using Java/Python Experience in SQL/NoSQL (Oracle, Cassandra) database design Demonstrated … HR related applications Experience with following cloud services: AWS Elastic Beanstalk, EC2, S3, CloudFront, RDS, DynamoDB, VPC, Elastic Cache, Lambda Working experience with Terraform Experience in creating workflows for Apache Airflow About Roku Roku pioneered streaming to the TV. We connect users to the streaming content they love, enable content publishers to build and monetize large audiences, and provide More ❯
S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions - Experience building large-scale, high-throughput, 24x7 data systems - Experience with big data technologies such as: Hadoop, Hive, Spark, EMR - Experience providing technical leadership and mentoring other engineers for best practices on data engineering Our inclusive culture empowers Amazonians to deliver the best results for our customers. More ❯