diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging cloud-native services More ❯
table/file formats: Delta Lake, Parquet, Iceberg, Hudi Previous h ands-on expertise with Spark Experience working with containerisation technologies - Docker, Kubernetes Streaming Knowledge: Experience with Kafka/Flink or other streaming ecosystems, with a solid understanding of their components DevOps experience building CI/CD pipelines (Jenkins), IaC (Terraform) Direct experience contributing to projects involving lakehouse/ More ❯
Comfortable using Git; an awareness of CI/CD practices and tools such as GitHub Actions or Azure DevOps Nice to have: Experience of working with Apache Spark/Flink/Kafka Familiarity with object storage e.g. AWS S3 Knowledge of containerised development workflows using e.g., VSCode Basic understanding of cloud platforms like AWS or GCP Experience contributing to More ❯
Comfortable using Git; an awareness of CI/CD practices and tools such as GitHub Actions or Azure DevOps Nice to have: Experience of working with Apache Spark/Flink/Kafka Familiarity with object storage e.g. AWS S3 Knowledge of containerised development workflows using e.g., VSCode Basic understanding of cloud platforms like AWS or GCP Experience contributing to More ❯
projects. • Experience with distributed systems and microservices architectures. • Knowledge of security best practices and compliance requirements. • Experience with real-time data processing and streaming platforms (e.g., Apache Kafka, ApacheFlink). • Familiarity with chaos engineering principles and tools. More ❯
are recognised by industry leaders like Gartner's Magic Quadrant, Forrester Wave and Frost Radar. Our tech stack: Superset and similar data visualisation tools. ETL tools: Airflow, DBT, Airbyte, Flink, etc. Data warehousing and storage solutions: ClickHouse, Trino, S3. AWS Cloud, Kubernetes, Helm. Relevant programming languages for data engineering tasks: SQL, Python, Java, etc. What you will be doing More ❯
Advanced skills in Python or another major language; writing clean, testable, production-grade ETL code at scale. Modern Data Pipelines: Experience with batch and streaming frameworks (e.g., Apache Spark, Flink, Kafka Streams, Beam), including orchestration via Airflow, Prefect or Dagster. Data Modeling & Schema Management: Demonstrated expertise in designing, evolving, and documenting schemas (OLAP/OLTP, dimensional, star/snowflake More ❯
in a high-ownership, fast-paced environment. Nice to have: Experience working in the Payments, Fintech, or Financial Crime domain (e.g., fraud detection, AML, KYC). Experience with ApacheFlink or other streaming data frameworks is highly desirable. Experience working in teams, building and maintaining data science and AI solutions. Experience integrating with third-party APIs, especially in regulated More ❯
Mathematics, or a related field, and/or related professional experience Nice to Have Familiarity with big data processing with highly scalable technologies such as Spark, Kafka, RabbitMQ, Redis, Flink, Airflow and Cassandra Familiarity with Cloud Platforms like AWS, Azure, or GCP Familiarity with S3 compliant data store (e.g., AWS S3, Azure Blob Storage, GCP Cloud Storage) Familiarity with More ❯
Nice to Have (But Not Required) Experience with sustainability or carbon accounting standards (e.g., GLEC, ISO 14083, GHG Protocol). Experience with high-scale data processing (e.g., Spark , Presto , Flink ). Contributions to open-source Python tooling or emissions-related libraries. Experience working in both startup and enterprise environments. Our Values If you want to know the heart of More ❯
or in a similar role Technical expertise with data models Great numerical and analytical skills Experience with event-driven and streaming data architectures (using technologies such as Apache Spark, Flink or similar) Degree in Computer Science, IT, or similar field; a Master's is a plus or four years' equivalent experience Taptap Values Impact first Team next Accept reality More ❯
through rigorous validation and testing procedures. Key Qualifications Proficiency in building data pipelines with Python and Java Proficiency in large scale data processing technologies such as Spark, Kafka and Flink Familiarity with RDBMS and solid experience with SQL for data manipulation and reporting Experience building scalable solutions in cloud computing environment such as AWS EKS and S3 Experience in More ❯
scalable data processing. Data Systems: Previous experience with both batch and streaming systems, understanding their limitations and challenges. Data Processing Technologies: Familiarity with a range of technologies such as Flink, Spark, Polars, Dask, etc. Data Storage Solutions: Knowledge of various storage technologies, including S3, RDBMS, NoSQL, Delta/Iceberg, Cassandra, Clickhouse, Kafka, etc. Data Formats and Serialization: Experience with More ❯
non-technical stakeholders A background in software engineering, MLOps, or data engineering with production ML experience Nice to have: Familiarity with streaming or event-driven ML architectures (e.g. Kafka, Flink, Spark Structured Streaming) Experience working in regulated domains such as insurance, finance, or healthcare Exposure to large language models (LLMs), vector databases, or RAG pipelines Experience building or managing More ❯
with a focus on data quality and reliability. Design and manage data storage solutions, including databases, warehouses, and lakes. Leverage cloud-native services and distributed processing tools (e.g., ApacheFlink, AWS Batch) to support large-scale data workloads. Operations & Tooling Monitor, troubleshoot, and optimize data pipelines to ensure performance and cost efficiency. Implement data governance, access controls, and security … pipelines and data architectures. Hands-on expertise with cloud platforms (e.g., AWS) and cloud-native data services. Comfortable with big data tools and distributed processing frameworks such as ApacheFlink or AWS Batch. Strong understanding of data governance, security, and best practices for data quality. Effective communicator with the ability to work across technical and non-technical teams. Additional … following prior to applying to GSR? Experience level, applicable to this role? Select How many years have you designed, built, and operated stateful, exactly once streaming pipelines in ApacheFlink (or an equivalent framework such as Spark Structured Streaming or Kafka Streams)? Select Which statement best describes your hands on responsibility for architecting and tuning cloud native data lake More ❯
join a fast-growing team that plays an integral part of the revenue producing arm of a company, then our team is for you. Technologies include Scala, Python, ApacheFlink, Spark, Databricks, and AWS (ECS, Lambda, DynamoDB, WAF, among others). Experience in these areas is preferred but not required. Qualifications: You collaborate with team members and project management More ❯
with big data technologies ( e.g. , Spark, Hadoop)Background in time-series analysis and forecastingExperience with data governance and security best practicesReal-time data streaming is a plus (Kafka, Beam, Flink)Experience with Kubernetes is a plusEnergy/maritime domain knowledge is a plus What We Offer Competitive salary commensurate with experience and comprehensive benefits package (medical, dental, vision) Significant More ❯
with big data technologies ( e.g. , Spark, Hadoop)Background in time-series analysis and forecastingExperience with data governance and security best practicesReal-time data streaming is a plus (Kafka, Beam, Flink)Experience with Kubernetes is a plusEnergy/maritime domain knowledge is a plus What We Offer Competitive salary commensurate with experience and comprehensive benefits package (medical, dental, vision) Significant More ❯
Go, Julia etc.) •Experience with Amazon Web Services (S3, EKS, ECR, EMR, etc.) •Experience with containers and orchestration (e.g. Docker, Kubernetes) •Experience with Big Data processing technologies (Spark, Hadoop, Flink etc) •Experience with interactive notebooks (e.g. JupyterHub, Databricks) •Experience with Git Ops style automation •Experience with ix (e.g, Linux, BSD, etc.) tooling and scripting •Participated in projects that are More ❯
leading data and ML platform infrastructure, balancing maintenance with exciting greenfield projects. develop and maintain our real-time model serving infrastructure, utilising technologies such as Kafka, Python, Docker, ApacheFlink, Airflow, and Databricks. Actively assist in model development and debugging using tools like PyTorch, Scikit-learn, MLFlow, and Pandas, working with models from gradient boosting classifiers to custom GPT More ❯
the ground up. Familiarity with AWS services like S3, EMR, and technologies like Terraform and Docker. Know the ins and outs of current big data frameworks like Spark or Flink, but this is not an absolute requirement - youre a quick learner! This role is open to individuals based in or willing to relocate to London. #J-18808-Ljbffr More ❯
the ground up. Familiarity with AWS services like S3, EMR, and technologies like Terraform and Docker. Know the ins and outs of current big data frameworks like Spark or Flink, but this is not an absolute requirement - youre a quick learner! This role is open to individuals based in or willing to relocate to London Seniority level Seniority level More ❯
delivering under tight deadlines without compromising quality. Your Qualifications 12+ years of software engineering experience, ideally in platform, infrastructure, or data-centric product development. Expertise in Apache Kafka, ApacheFlink, and/or Apache Pulsar. Deep understanding of event-driven architectures, data lakes, and streaming pipelines. Strong experience integrating AI/ML models into production systems, including prompt engineering More ❯
delivering under tight deadlines without compromising quality. Your Qualifications 12+ years of software engineering experience, ideally in platform, infrastructure, or data-centric product development. Expertise in Apache Kafka, ApacheFlink, and/or Apache Pulsar. Deep understanding of event-driven architectures, data lakes, and streaming pipelines. Strong experience integrating AI/ML models into production systems, including prompt engineering More ❯