role-this is your chance to engineer solutions that truly matter. Key Responsibilities: Design, develop, and optimize scalable data pipelines using technologies such as Apache Spark, ApacheIceberg, Trino, OpenSearch, AWS EMR, NiFi, and Kubernetes containers. Ingest and move structured and unstructured data using approved methods into … of working with diverse data types and formats, including structured, semi-structured, and unstructured data. Familiarity with data ingestion tools and platforms such as Apache NiFi, Spark, and related open-source technologies. Demonstrated ability to collaborate across teams, including data scientists, software engineers, data stewards, and mission partners. Knowledge More ❯
design, implementation, testing, and support of next-generation features related to Dremio's Query Planner and Reflections technologies Work with open source projects like Apache Calcite and ApacheIceberg Use modular design patterns to deliver an architecture that's elegant, simple, extensible and maintainable Solve complex technical … distributed query engines. Hands on experience in query processing or optimization, distributed systems, concurrency control, data replication, code generation, networking, storage systems, heap management, Apache Arrow, SQL Operators, caching techniques, and disk spilling Hands on experience with multi-threaded and asynchronous programming models More ❯
grasp of data governance/data management concepts, including metadata management, master data management and data quality. Ideally, have experience with Data Lakehouse toolset (Iceberg) What you'll get in return Hybrid working (4 days per month in London HQ + as and when required) Access to market leading More ❯
Scala Starburst and Athena Kafka and Kinesis DataHub ML Flow and Airflow Docker and Terraform Kafka, Spark, Kafka Streams and KSQL DBT AWS, S3, Iceberg, Parquet, Glue and EMR for our Data Lake Elasticsearch and DynamoDB More information: Enjoy fantastic perks like private healthcare & dental insurance, a generous work More ❯
and reporting Experience with data warehousing concepts and platforms, such as Snowflake and Amazon Redshift, and with databases such as Postgres, Solr, Accumulo, or Iceberg Experience integrating structured and unstructured data from various sources such as APIs, databases, or flat files, and with web services and communication protocols Experience More ❯
working with hierarchical reference data models. Proven expertise in handling high-throughput, real-time market data streams. Familiarity with distributed computing frameworks such as Apache Spark. Operational experience supporting real-time systems. Equal Opportunity Workplace We are proud to be an equal opportunity workplace. We do not discriminate based More ❯
and be responsible for building and maintaining sophisticated data pipelines that ingest data from major auto customers, utilizing advanced AWS services including EMR, S3, Iceberg and Databricks. Your work will directly impact thousands of auto dealerships across the US in the way they run their business and unleash the More ❯
and be responsible for building and maintaining sophisticated data pipelines that ingest data from major auto customers, utilizing advanced AWS services including EMR, S3, Iceberg and Databricks. Your work will directly impact thousands of auto dealerships across the US in the way they run their business and unleash the More ❯
Mountain View, California, United States Hybrid / WFH Options
LinkedIn
design, and your passion for writing code that performs at an extreme scale. LinkedIn has already pioneered well-known open-source infrastructure projects like Apache Kafka, Pinot, Azkaban, Samza, Venice, Datahub, Feather, etc. We also work with industry standard open source infrastructure products like Kubernetes, GRPC and GraphQL - come … build a fast-growing team - You will work closely with the open-source community to participate and influence cutting edge open-source projects (e.g., ApacheIceberg) - You will deliver incremental impact by driving innovation while iteratively building and shipping software at scale - You will diagnose technical problems, debug … running large-scale distributed systems - Experience with industry, opensource, and/or academic research in technologies such as Hadoop, Spark, Kubernetes, Feather, GraphQL, GRPC, Apache Kafka, Pinot, Samza or Venice - Experience with open-source project management and governance Suggested Skills - Distributed systems - Backend Systems Infrastructure - Java/Golang/ More ❯
Mountain View, California, United States Hybrid / WFH Options
LinkedIn
design, and your passion for writing code that performs at an extreme scale. LinkedIn has already pioneered well-known open-source infrastructure projects like Apache Kafka, Pinot, Azkaban, Samza, Venice, Datahub, Feather, etc. We also work with industry standard open source infrastructure products like Kubernetes, GRPC and GraphQL - come … build a fast-growing team - You will work closely with the open-source community to participate and influence cutting edge open-source projects (e.g., ApacheIceberg) - You will deliver incremental impact by driving innovation while iteratively building and shipping software at scale - You will diagnose technical problems, debug … running large-scale distributed systems - Experience with industry, opensource, and/or academic research in technologies such as Hadoop, Spark, Kubernetes, Feather, GraphQL, GRPC, Apache Kafka, Pinot, Samza or Venice - Experience with open-source project management and governance Suggested Skills - Distributed systems - Backend Systems Infrastructure - Java/Golang/ More ❯
data. What you offer Experience with AWS cloud. Experience programming, debugging, and running production systems in Python. Exposure to open-source technologies such as Iceberg, Trino, and Airflow. Passionate about the use and adoption of these capabilities, focused on user experience and ensuring our business sees real value from More ❯
to-end engineering experience supported by excellent tooling and automation. Preferred Qualifications, Capabilities, and Skills: Good understanding of the Big Data stack (Spark/Iceberg). Ability to learn new technologies and patterns on the job and apply them effectively. Good understanding of established patterns, such as stability patterns More ❯
analysis and automation. Proficiency in building and maintaining batch and streaming ETL/ELT pipelines at scale, employing tools such as Airflow, Fivetran, Kafka, Iceberg, Parquet, Spark, Glue for developing end-to-end data orchestration leveraging on AWS services to ingest, transform and process large volumes of structured and More ❯
delivering customer proposals aligned with Analytics Solutions. Experience with one or more relevant tools (Sqoop, Flume, Kafka, Oozie, Hue, Zookeeper, HCatalog, Solr, Avro, Parquet, Iceberg, Hudi). Experience developing software and data engineering code in one or more programming languages (Java, Python, PySpark, Node, etc). AWS and other More ❯
Reference Data Management, and Metadata Management. Be comfortable coding with Python or Scala and proficient in SQL In-depth understanding of Parquet, DeltaLake and Iceberg data formats Have a background in using multiple data storage technologies including relational, document, key/value, graph and object stores Have ability to More ❯
Reference Data Management, and Metadata Management. Be comfortable coding with Python or Scala and proficient in SQL In-depth understanding of Parquet, DeltaLake and Iceberg data formats Have a background in using multiple data storage technologies including relational, document, key/value, graph and object stores Have ability to More ❯
Reference Data Management, and Metadata Management. Be comfortable coding with Python or Scala and proficient in SQL In-depth understanding of Parquet, DeltaLake and Iceberg data formats Have a background in using multiple data storage technologies including relational, document, key/value, graph and object stores Have ability to More ❯
Reference Data Management, and Metadata Management. Be comfortable coding with Python or Scala and proficient in SQL In-depth understanding of Parquet, DeltaLake and Iceberg data formats Have a background in using multiple data storage technologies including relational, document, key/value, graph and object stores Have ability to More ❯
for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the More ❯
from real-time/batch ingestion to delivering analytics feedback and reporting to consumers. Understanding different file format (parquet, ORC etc.) and table format (iceberg, delta etc.) and how to enable them if different data catalogs. Required Education Bachelor's degree in Computer Science, Information Systems, Software, Electrical or More ❯
processes using infrastructure-as-code (Terraform) Build and maintain data pipelines using Airflow. Manage our tech stack including Python, Node.js, PostgreSQL, MongoDB, Kafka, and Apache Iceberg. Optimize infrastructure costs and develop strategies for efficient resource utilization. Provide critical support by monitoring services and resolving production issues. Contribute to the More ❯
various data modelling methods such as Star Schema, Snowflake, and Data Vault design Experience in implementing a Data Lakehouse using a Medallion Architecture with ApacheIceberg on S3 Object Storage More ❯
using tools and techniques such as BDD, Data Reconciliation, Source Control, TDD, Jenkins. Documenting configurations, processes, and best practices. Knowledge of file formats JSON, Iceberg, Avro. Basic knowledge of AWS technologies like IAM roles, Lakeformation, Security Groups, CloudFormation, Redshift. Big Data/Data Warehouse testing experience. Experience in the More ❯
a highly skilled Software Engineer with expertise in SQL and either Python or Scala, who is experienced in building large-scale data pipelines using Apache Spark and designing robust data architectures on AWS. The ideal candidate will have hands-on experience in data lake architectures, open table formats (Delta … Lake/Iceberg), and modern data platforms. If you are a problem solver, a data infrastructure enthusiast, and someone who thrives in fast-paced environments, we'd love to hear from you! Job Description Core Responsibilities Collaborate with project stakeholders and cross-functional teams (including Frontend Service Engineers) to … learning applications, and real-time analytics. We process tens of billions of ad events daily, leveraging a modern data stack that includes Databricks, AWS, Apache Spark, ClickHouse, Snowflake, and Google Looker. Disclaimer: This information has been designed to indicate the general nature and level of work performed by employees More ❯
a highly skilled Software Engineer with expertise in SQL and either Python or Scala, who is experienced in building large-scale data pipelines using Apache Spark and designing robust data architectures on AWS. The ideal candidate will have hands-on experience in data lake architectures, open table formats (Delta … Lake/Iceberg), and modern data platforms. If you are a problem solver, a data infrastructure enthusiast, and someone who thrives in fast-paced environments, we'd love to hear from you! Job Description Core Responsibilities Collaborate with project stakeholders and cross-functional teams (including Frontend Service Engineers) to … learning applications, and real-time analytics. We process tens of billions of ad events daily, leveraging a modern data stack that includes Databricks, AWS, Apache Spark, ClickHouse, Snowflake, and Google Looker. Disclaimer: This information has been designed to indicate the general nature and level of work performed by employees More ❯