techniques * Demonstrable knowledge of applying Data Engineering best practices (coding practices to DS, unit testing, version control, code review). * Big Data Eco-Systems, Cloudera/Hortonworks, AWS EMR, GCP DataProc or GCP Cloud Data Fusion. * NoSQL Databases. Dynamo DB/Neo4j/Elastic, Google Cloud Datastore. * Snowflake Data Warehouse more »
Experience building data lakes and data pipelines in cloud using Azure and Databricks or similar tools. Spark Developer certification from any of (Databricks, MAPR, Cloudera or Hortonworks) is added advantage but not required Practice with Unix command line tools Familiarity working with agile methodology Strong database and data analysis skills more »
visualization – Tools like Tableau Master data management (MDM) – Concepts and expertise in tools like Informatica & Talend MDM Big data – Hadoop eco-system, Distributions like Cloudera/Hortonworks, Pig and HIVE Data processing frameworks – Spark & Spark streaming Hands-on experience with multiple databases like PostgreSQL, Snowflake, Oracle, MS SQL Server, NOSQL more »
Greater London, England, United Kingdom Hybrid / WFH Options
First Derivative
development and the opportunity to design your own path. We support a variety of external training courses and accreditations such as AWS, GCP, Azure, Cloudera to name a few and are truly passionate about our Mentor Program, through which our senior colleagues generously set aside personal time to coach and more »
Greater London, England, United Kingdom Hybrid / WFH Options
InterEx Group
experience in Big Data implementation projects Experience in the definition of Big Data architecture with different tools and environments: Cloud (AWS, Azure and GCP), Cloudera, No-sql databases (Cassandra, Mongo DB), ELK, Kafka, Snowflake, etc. Past experience in Data Engineering and data quality tools (Informatica, Talend, etc.) Previous involvement in more »
make corrective recommendations. Monitoring – Be able to monitor Spark jobs using wider tools such as Grafana to see whether there are Cluster level failures. Cloudera (CDP) – Knowledge of understanding how Cloudera Spark is set up and how the run time libraries are used by PySpark code. Prophecy – High level understanding more »