Data Engineer

The overall technical lead and architect. Designs the metadata schema, builds the simulation onboarding pipeline, deploys metadata embedding pipeline and OpenSearch k-NN vector store, and authors data export format spec for AI/ML use case. This role is the deepest technical seat on the engagement:

Key responsibilities on this engagement

  • Run the Sprint 1 architecture review of the existing UAT codebase (S3 + Glue + S3 Tables + OpenSearch + Athena) and deliver written gap findings.
  • Design the metadata schema, taxonomy, and field catalogue (Light, Brain, Power).
  • Tune data orchestration — Glue jobs, Athena queries, S3 Tables config, scheduling. Lead the deep-dive technical sessions with analysts on visualization requirements
  • Build and validate the simulation data onboarding pipeline against real data — including the 30 GB-per-run acoustic spectra dataset.
  • Configure and validate the OpenSearch k-NN vector store and the Bedrock embedding pipeline.
  • Author the AI/ML data export format specification and the AI onboarding pattern document.
  • Co-design the API middleware blueprint with the Cloud Infrastructure Architect.

Must Have:

  • Principal-level hands-on data engineering on AWS — 7+ years
  • Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch
  • (including k-NN / vector search)
  • Built and shipped vector embedding workloads
  • Strong metadata modelling and data taxonomy design experience for scientific
  • or engineering domains
  • Comfort working with Parquet, JSON-LD, and large binary scientific data formats
  • (mesh, time-series, spectra)
  • Python proficiency; PySpark / Glue job tuning experience

Nice-to-have / differentiators

  • Prior simulation / CAE / HPC data lake experience (Ansys, Siemens NX, BETA CAE, OpenFOAM, etc.)
  • Familiarity with surrogate model training data pipelines
  • Experience with SageMaker Unified Studio or comparable governed data-mesh tooling
  • (in case of required integration)
  • Multi-cloud data engineering (AWS GCP) experience
  • Published or contributed to AWS data architecture patterns or blueprints

Job Details

Company
Zensar Technologies
Location
United Kingdom
Posted