![Page 1: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/1.jpg)
Data Management Plans:A good idea, but not sufficient
Andreas Rauber
Department of Software Technology and Interactive Systems
Vienna University of Technology&
Secure Business [email protected]
http://www.ifs.tuwien.ac.at/~andi
![Page 2: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/2.jpg)
Outline
Why are Data Management Plans good but insufficient?
From Data to Process Management Plans
How to capture process & context?
Summary
![Page 3: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/3.jpg)
Sustainable (e-)Science
Data is key enabler in science
- Basis for evaluation and verification
- Basis for re-use
- Basis for meta-studies
Safeguarding investment made in data
Need to preserve and curate the data
Preservation: keeping useable over time fighting mostly technical & semantic obsolescence
How to avoid data being lost after projects end?
![Page 4: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/4.jpg)
Sustainable (e-)Science
Data Management Plans as integral part of research proposals
Need recognized by researchers, funding bodies,…
Focus on- Data- Descriptions- Declarations of activities to ensure long-term availability of data
Data Management Plans are good, but not sufficient!
https://data.uni-bielefeld.de/de/data-management-plan
https://dmp.cdlib.org/
https://dmponline.dcc.ac.uk/
![Page 5: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/5.jpg)
Data Management Plans
Short, free-form text, requiring human interpretation Declarations of intent Not enforceable, hardly verifiable (Burden remains with researchers / institutions,
who need to become data management experts) Focuses solely on data, ignoring the process:
pre-processing, processing, analysis Limits
- availability of data & results
- verification of results,
- re-use and re-purposing http://deepblue.lib.umich.edu/bitstream/handle/2027.42/86586/CoE_DMP_template_v1.pdf?sequence=1
http://rci.ucsd.edu/_files/DMP%20Example%20Cosman.pdf
![Page 6: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/6.jpg)
From Data to Processes
Excursion: Scientific Processes
![Page 7: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/7.jpg)
From Data to Processes
Rhythm Pattern Feature Set- extracts numeric descriptors from audio- basically 2 Fourier Transforms- some psycho-acoustic modelling- some filters (gaussian, gradient) to make features more robust
Used for- music genre classification- clustering of music by similarity- retrieval
Implemented first in Matlab, then in Java- both publicly available on website- same same but different...
![Page 8: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/8.jpg)
From Data to Processes
Excursion: scientific processes
set1_freq440Hz_Am12.0Hz
set1_freq440Hz_Am05.5Hz
set1_freq440Hz_Am11.0Hz
Java Matlab
![Page 9: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/9.jpg)
From Data to Processes
Excursion: Scientific Processes
Bug? Psychoacoustic transformation tables? Forgetting a transformation? Diferent implementation of filters? Limited accuracy of calculation? Difference in FFT implementation? ...?
![Page 10: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/10.jpg)
From Data to Processes
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0038234
![Page 11: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/11.jpg)
From Data to Processes
To sum up:
Data
- is the fuel for scientific processes
- is the result of scientific processes
Curation of data thus needs to consider these processes
Data Management Plans
- are data centric
- put too little focus on the processes associated with data
- are written by humans for humans
![Page 12: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/12.jpg)
Outline
Why are Data Management Plans insufficient?
From Data to Process Management Plans
How to capture process & context?
Summary
![Page 13: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/13.jpg)
Process Management Plans
Process Management Plans (PMPs)
Go beyond data to cover research process:
- ideas, steps, tools, documentation, results, …
- data is only one (important) element, commonly actually a result of a research (pre-)process
Ensure re-executability, re-usability
Must be machine-actionable & verifiable
Basis for preservation and re-use of research
Similar to “research objects”, “executable papers”, …
![Page 14: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/14.jpg)
Process Management Plans
Need to establish
Models for representing such process management plans (PMPs)
Must be machine-readable and machine-actionable
Identify “minimum set” of information
Devise means to automate (most of) the activity in creating and maintaining those PMPs
Establish them to replace (enhance / subsume / …) Data Management Plans
![Page 15: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/15.jpg)
Process Management Plans
Structure of PMPs (following concept of DMPs):
1.Overview and context
2.Description of processes and their implementation Process description | Process implementation | Data used and
produced by process
3.Preservation1. Preservation history | Long term storage and funding
4.Sharing and reuse Sharing | Reuse | Verification | Legal aspects
§Monitoring and external dependencies§Adherence and Review
![Page 16: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/16.jpg)
Outline
Why are Data Management Plans insufficient?
From Data to Process Management Plans
How to capture process & context?
Summary
![Page 17: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/17.jpg)
Process Capture
Need to establish what forms part of a process:- analyzing process documentation- establishing context of process, relationships between elements- monitoring of process activities
Capture and describe this in a context model
![Page 18: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/18.jpg)
Architectural Concepts
Based on Enterprise Architecture Framework(Zachmann), taxonomies (e.g. PREMIS), …
DIO: Domain-Independent Ontology DSO: Domain-Specific Ontologies
(legal, sensor, multimedia codecs, …)
19
DIO (ArchiMate) DSO-1DIO-DSO1
Transformation Map
DIO-DSO2Transformation Map DSO-2
![Page 19: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/19.jpg)
Process Capture
Input: music (e.g. MP3 format) Input: training data, i.e. music with genre labels Output: classification of music, e.g. into genres Intermediate steps
extract numeric description (features) from music combine features with ground truth into specific file format, …
Example: Music Classification Process
![Page 20: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/20.jpg)
Process Capture
Taverna
…………….
![Page 21: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/21.jpg)
Process Capture
Software setup can be automatically detected in OS with software packages (e.g. Linux);
allows detection of licenses, dependencies
![Page 22: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/22.jpg)
Process Capture
![Page 23: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/23.jpg)
Process Capture
24
Example:
Music Classification Workflow
![Page 24: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/24.jpg)
![Page 25: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/25.jpg)
Process Re-deployment
Preservation and Re-deployment
„Encapsulate“ as complex „research objects“ (RO)
Re-Deployment beyond original environment Format migration of elements of ROs
Cross-compilation of code
Emulation-as-a-Service, virtual machines, …
![Page 26: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/26.jpg)
Process Re-deployment
Verification, Validation & Data
Verify correctness of re-execution validation and verification framework
process instance data
points of capture
Metrics
Data and data citation Identifying subsets of data in large and dynamic databases
Timestamping and versioning of data
Assigning PID (DOI, …) to time-stamped query
Data
Table A
Table B
Query
Query Store
Subsets
PID Provider
PID Store
![Page 27: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/27.jpg)
Sustainable (e-)Science
How to get there?
Research infrastructure support
- Versioning systems
- Logging (“virtual lab-book”)
- Virtual machines / pre-configured virtual labs for research
- Data citation support for large, dynamic databases
R&D in process preservation, re-deployment & verification
- Evolving research environments, code migration, …
- Verification of process re-execution
- Financial impact, business models
![Page 28: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/28.jpg)
Summary
Need to move beyond concept of data
Need to move beyond the focus on description
Process Management Plans (PMPs) extending DMPs
Process capture, preservation & verification
Capture “all” elements of a research process
Machine-readable and -actionable
Data and process re-use as basis for data driven science
![Page 29: Data Management Plans: A good idea, but not sufficient](https://reader035.vdocument.in/reader035/viewer/2022062518/56814647550346895db357b1/html5/thumbnails/29.jpg)
Thank you!
http://www.ifs.tuwien.ac.at/imp
Data
Table A
Table B
Query
Query Store
Subsets
PID Provider
PID Store
DIO (ArchiMate) DSO-1DIO-DSO1
Transformation Map
DIO-DSO2Transformation Map DSO-2