Graph Engineers
The Engineers will focus on Scientific Data/Knowledge (specifically Metadata & Harmonization). This role is responsible for maximizing the value of our data assets over a lifetime to bring purpose to data by acting as translators of highly technical information from domain experts into an appropriate data model - complete with significant ontology and vocabulary - that can be utilized to effectively structure and index the data. Specifically working with Product managers and R&D subject matter expertise to define the language (data models, ontology, standards, etc.) of science into data products by acting as the voice of Knowledgebase and interoperability/value of asset.
- Metadata harmonization/curation and large-scale dataset ingestion (structured, auditable transformations)
- Ontology alignment (eg, via OLS) and entity normalisation; schema-driven automation (eg, JSON)
- Knowledge graph/semantic technologies where applicable (RDF, SPARQL, Neo4j/GraphDB)
- API/ETL engineering and data pipeline delivery (eg, FastAPI, PostgreSQL) with cloud execution as required
- Languages & Query: SPARQL, Scala, Python, and SQL.
- Semantic Technologies: RDF/triple stores, OWL, SHACL, LinkML, and ontologies such as RAO.
- Platforms & Infrastructure: Google Cloud Platform (GCP), BigQuery, Google Cloud Storage (GCS), and Infrastructure as Code (IaC).
- Data Engineering: ETL processes, data harmonization, URI generation, and graph embedding machine learning pipelines.
- Tools: GitHub/GitLab, Apache Jena, Protege, and Jira/Confluence, Top Quadrant (EDG), Apache Jena, Protégé, and Semaphore
Aligned to EST hours