discovering data science design patterns with examples from r and python software ecosystem
TRANSCRIPT
![Page 1: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem](https://reader031.vdocument.in/reader031/viewer/2022021815/5a6660c87f8b9afe4c8b4ae1/html5/thumbnails/1.jpg)
Discovering Data Science Design Patterns
with Examples from R and Python
Dmitrij Petrov
Autumn 2017
30/11/2017 1Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Outlining Master Thesis
![Page 2: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem](https://reader031.vdocument.in/reader031/viewer/2022021815/5a6660c87f8b9afe4c8b4ae1/html5/thumbnails/2.jpg)
Motivation• Design patterns capture best solutions to recurring issues in
• Architecture• Started the Pattern Language Movement
• Object-Oriented Programming• Seminal work for software analysis, design and implementation
• Cloud Computing, Database Modelling, etc.
• Data Science
30/11/2017
![Page 3: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem](https://reader031.vdocument.in/reader031/viewer/2022021815/5a6660c87f8b9afe4c8b4ae1/html5/thumbnails/3.jpg)
Research Questions
• RQ1: What exactly does software ecosystem, data science and design pattern mean?
• RQ2: Which data science-oriented design patterns can be recognized?
• RQ3: What are the specific FOSS R and Python tools that can be used for solving common data mining problems?
30/11/2017 3Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
![Page 4: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem](https://reader031.vdocument.in/reader031/viewer/2022021815/5a6660c87f8b9afe4c8b4ae1/html5/thumbnails/4.jpg)
Methodology – 3D2P framework
Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Pattern prospecting
Pattern mining Pattern writing
- Literature Sources- General Inductive Approach &
Open/Axial Coding
- Discovery of patterns (i.e. best practises and their relationships)
Relevant works of: Thomas (‘06), Inventado & Scupelli (‘15), Meszaros & Doble (‘96)
- Follow PW guidelines for their documentation
![Page 5: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem](https://reader031.vdocument.in/reader031/viewer/2022021815/5a6660c87f8b9afe4c8b4ae1/html5/thumbnails/5.jpg)
A Pattern Example – “Build Me Dataset”
“Build Me Dataset”
Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
1. Pattern Name & Sketch2. Context: you want to process data from multiple data sources/formats
3. Problem: extracting/storing data in a common data structure
4. Solution: “table” “data frame”
5. Consequences: can be very simple but also slow
6. Known uses: modelling, visualization…
7. Examples: from R & Python ecosystem
30/11/2017 5
![Page 6: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem](https://reader031.vdocument.in/reader031/viewer/2022021815/5a6660c87f8b9afe4c8b4ae1/html5/thumbnails/6.jpg)
Expected Outcomes
1. Aim to formulate Data Science design patterns
2. Data Science R and Python Toolkit Matrix• A holistic map of tools can simplify knowledge discovery process
30/11/2017 6Dmitrij Petrov - Master Thesis Presentation - Autumn 2017