research project on metadata extraction, exploration and pooling: challenges and achievements ronald...

16
Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Upload: meryl-little

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Research Project on Metadata Extraction, Exploration and Pooling:

Challenges and Achievements

Ronald Steinhau (Entimo AG - Berlin/Germany)

Page 2: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Content

Project Goals Pre-Requisites Work Packages Advanced Workflows Conclusions and Outlook

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 2

Page 3: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Project Goals (1)

Main Goals Support different metadata systems

- SDTM, ADaM, BRIDG, custom Explore items dependent on contexts Accelerate mapping process Re-use information from comparable studies Provide support in specification creation and

issue resolution (full automation is illusionary)

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 3

Page 4: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Project Goals (2)

Additional Goals Immediate usage and classification of metadata Advanced metadata management based

on ISO 11179 for Metadata Repositories Cross-linking between MD-Systems

incl. terminology/codelists Smart search and recommendation of attributes

and mappings Preserve history of user decisions after

recommendations

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 4

Page 5: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Work Packages

1. Development Preparation2. Specification / Modeling3. Development4. Test & Optimizations

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 5

Page 6: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Development Preparation

Development Environment Eclipse Helios / Scala IDE

Advanced Libraries Statistical analysis Machine (“adaptive”) learning

Infrastructure - Clinical Repository Based on relational database Fully generic tables (free schema) Fast, minimal redundancy Audit trail, versioning, SAS compliance

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 6

• Missing Values• Codelists• Formats

Page 7: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Specification / Modeling

Metadata management & rules Data analysis Smart recommendations & history usage Finding and applying mapping specs Mapping / meta generator

Page 8: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Specification / Modeling (1)Example Workflow: Import Clinical Data

Analyze Data Analyze data and retrieve statistical profiles Extract all available metadata/data attributes:

- Name (synonym support)- Label / Comment (Google like searches)- Profiles (statistics based searches)- Codelist analysis (context sensitive)…

Save all data in the clinical data repository Save meta-information in the metadata

repository Keep links between data and metadata

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 8

Page 9: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Specification / Modeling (2) Example Workflow: Import Clinical Data

Provide recommendations: Data types and their type length Primary keys Code lists References to existing metadata

(SDTM, BRIDG, custom) Find attributes used in mappings

SDTM/custom domain memberships BRIDG references

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 9

Page 10: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Example: Schema Recommendation

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 10

Page 11: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Enhanced Data Import

Schema AnalysisSchema Analysis

Data ImportData Import

File or external

DB

File or external

DB

Types, Prim.Keys,Glob.Attr.

Types, Prim.Keys,Glob.Attr.

Clin. Repositoryand/or

SAS-Datasets

Clin. Repositoryand/or

SAS-DatasetsStatistics

and Profiles

Statisticsand

Profiles

MDR / PoolMDR / Pool

Questionnaires /Recommendations

(applying rules)

Questionnaires /Recommendations

(applying rules)

SimilarityAnalysis

Source SelectionSource Selection

Schema-Completion &Verification

Schema-Completion &Verification

Metadata Links

Thick lines indicate enhanced workflow

Optionalassignment

ofmetadata

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 11

Page 12: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Mapping / Meta-Generator

Finding mapping specifications Find and recommend existing mappings Support users with the completion

(modification) of copied mappings Tag mappings with metadata for smarter

recognition Applying mappings

Generate mapping programs Execute mapping programs with data

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 12

Page 13: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Enhanced Data Mapping

Select Mapping Source and Target

Select Mapping Source and Target

Clin. Repositoryand/or

SAS-Datasets

Clin. Repositoryand/or

SAS-Datasets

Find & Recommend

similar Mappings

Find & Recommend

similar Mappings

MDR (Pool)MDR (Pool)

SimilarityAnalysis

Clone Mapping-Task(s)

Clone Mapping-Task(s)

Create To-Do-ListCreate To-Do-List

Mapping Completion and

Execution

Mapping Completion and

Execution

Enhance Mapping

with additional Metadata

Enhance Mapping

with additional Metadata

Pooling

Derive Metadata

FromDataset

Direct Metadata Selection

Thick lines indicate enhanced workflow

Metadata Links

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 13

Page 14: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Conclusions

Providing “smart” technical infrastructure is challenging, but necessary for complex systems

Once in place, positive effects with growing usage and stored content

Interconnected metadata systems and data provide better transparency and reusability

Contextual knowledge (e.g. drug, study) leads to improved results

Page 15: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

Outlook

Define more metadata inter-connections Collect time saving statistics with larger studies Deeper Integration into entimICE

Embrace the new principle “analyse recommend re-use”!

Page 16: Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)

© Entimo AG | Stralauer Platz 33-34 | 10243 Berlin | www.entimo.com 16

End

Thank you for your attention!

Questions?