standardization of the hipc data templates: the story so far
TRANSCRIPT
Standardization of the HIPC Data Templates: The Story So Far
Ahmad C. Bukhari, Ph.D., Kei-Hoi Cheung, Ph.D. and Steven H. Kleinstein, Ph.D.
Yale University, School of MedicineUser Group
(HIPC)
● An important resource for raw data and protocols from clinical trials, mechanistic studies and novel methods for cellular and molecular measurements
● Provides templates and standard operating procedures to facilitate data representation and transfer.
● Provides a variety of tools for data access and manipulation
ImmPortSQL Dump for localhosting
Human Immunology Project Consortium (HIPC)● Well-characterized human cohorts are studied using a variety of modern
analytic tools including multiplex transcriptional, cytokine, and proteomic assays.
● HIPC submitted data is an important subset of the ImmPort database● Submitted HIPC data is not standardized.
● Inconsistent naming and data reporting
Our aim is to make HIPC data FAIR
● Findability○ Finding a large variety of related datasets is an important step to knowledge discovery
● Accessibility○ A growing number of datasets are being submitted to public repositories such as ImmPort.
These datasets can accessed through different methods including web-based search, bulk
download and API access
● Interoperability○ Data mining/analysis often requires multiple datasets to be integrated within a single repository
or across multiple repositories
● Reusability○ Entering enough metadata as part of the data submission process facilitates data reuse
❖ FAIR a set of Digital Object Compliance principles that describes the properties of digital objects defined under NIH Commons initiative
Current practices towards data FAIRness
● Minimum information standards (checklists) specify the minimum amount of information (metadata) needed for reporting results in a reproducible and reusable fashion. For example,
○ MIAME: Minimum information about a microarray experiment○ MIAPE: Minimum Information About a Proteomics Experiment
● Scientific communities have developed templates incorporating detailed checklists of the metadata needed to describe about the particular types of experimental data sources.
● Standard identifiers/terminologies/ontologies have been created for different domains
We propose an ontological mapping for the ImmPort data submission templates.● Ontology term mapping allows to achieve semantic normalization across
different repositories.
● Ontologically annotated datasets allow context-aware queries and data integration
● Mapping to controlled vocabularies, relationships and rules facilitates run-time data validation.
● These help achieve data FAIRness.
Ontology mapping of templates
Ontology Recommender
OBI, OBO, Cell, PR
13
2
4
6 5
Incorporate into CEDAR and ImmPort Retrieve annotation (concept Uri, defns, etc)
A collection of ontologies
Expert Verification
Finalizing Mapping
Suggested Alteration
Terms Suggestion
Concept mapper
Our mapping strategy• For certain value sets such as cell populations and cytokines, CM maps
the values to domain specific ontologies such as Cell Ontology (CL) and Protein Ontology (PR)
• For other elements, CM maps them to the terms in Ontology for Biomedical Investigations (OBI)
• For elements that do not have matches in OBI, we map these elements to terms in top-ranked ontologies by OBO Foundry
• For elements that do not have any ontology term matches, we perform manual search in Bioportal and other available repos for these missing terms.
• We work closely with individual ontology groups (e.g., CL, OBI) to fill the gap
Template elements mapped to ontologies• Assay types (e.g., gene expression, flow cytometry, ELISA,
HAI, Luminex )
• Template types (e.g., human subject, biosample)
• Column names (e.g., biosample type, measurement
technique)
• Value sets (e.g., set of cell populations, set of measurement
techniques)
Assay Type # Templates # Sub-Templates # Concept # Value Set
Microarray gene expression
6 10 113 209
Flowcytometry 6 - 67 262
ELISA 2 - 39 602
HAI 2 - 37 117
Luminex 7 - 102 1032
General 6 - 115 190
Mapping Statistics
OBI
OBIOBI
Newly added
A device that moves charged particles through a .... OBI_0001121
A cytometry assay in which the presence of molecules OBI_0002115
CEDAR helps to generate ontology-linked metadata
Use case: CEDAR immunology data submission templates
CEDAR has employed our suggested mapping
Map to cell term in cell ontology
Manual Mapping to “assay”In OBI Automatic mapping with NCIT
https://cedar.metadatacenter.net
Automatic mapping with OBI
Future plan• Refine mapping of new assay types with updated
algorithm.• Mapping of clinical metadata with ontology terms.• Incorporate our ontology-term mapping approach into
CEDAR and ImmPort• Submit missing terms to relevant ontologies (e.g., OBI)