data science 101 - ibmpublic.dhe.ibm.com/systems/power/community/aix/... · transforming data 8...
TRANSCRIPT
![Page 1: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/1.jpg)
Data Science 101Chris Parsons
14 February 2018
![Page 2: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/2.jpg)
© 2016 IBM Corporation
Agenda
• Disclaimer..• What format does my data need to be for the Machine Learning
frameworks?• Transforming Data• Options/Alternatives• Moving Forward..
2
![Page 3: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/3.jpg)
© 2016 IBM Corporation
How do I get data into PowerAI?
3
![Page 4: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/4.jpg)
© 2016 IBM Corporation
What format does my data need to be in for Machine Learning?
4
• Labeled data– Metadata (this is a dog, cat, etc.)– Sub folders named appropriately (/dog /cat)– CSV. (prevalent)
• TensorFlow– TFRecords
• Caffe– Blobs
![Page 5: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/5.jpg)
© 2016 IBM Corporation
Transforming Data
5Credit - Udacity
![Page 6: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/6.jpg)
© 2016 IBM Corporation
Transforming Data (null)
6
![Page 7: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/7.jpg)
© 2016 IBM Corporation
Transforming Data
7
![Page 8: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/8.jpg)
© 2016 IBM Corporation
Transforming Data
8
Tumor Proliferation Assessment – mitosis detectionImages from electron-microscope Size of image - 70K * 60K
Framework
Format Input Size (Faster R-CNN)
Caffe LMDB 1K*1KTensorFlow
TensorRecord
1K*1K
Data Transformation
Data Distribution among training, validation and testing
Data Shuffle
![Page 9: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/9.jpg)
© 2016 IBM Corporation
Options/Alternatives
9
![Page 10: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/10.jpg)
© 2016 IBM Corporation
DSX
• Access data from local files– Local CSV data. – Loaded and transformed to DSX “Asset” – Drag and drop/file explorer
• Access HDFS data– Cloudera/Hortonworks/BigInsights
• Access RDB Data– Scala, Python, R APIs– Db2, Netezza, Informix, Oracle, Mongo
• Remote data?10
![Page 11: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope](https://reader035.vdocument.in/reader035/viewer/2022070913/5fb47c17f9d0eb5d1b0b8905/html5/thumbnails/11.jpg)
| 1114February2018