environmental research with rapidminer · 2019-08-05 · warming up. how to solve these challenges?...
TRANSCRIPT
![Page 1: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/1.jpg)
Environmental Research with RapidMiner
![Page 2: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/2.jpg)
About Me
Rodrigo Fuentealba Cartes !
Lead Data Scientist and Senior Software Developer at Pegasus
Mr. Fuentealba has been using and developing open source technologies since 1995. His career in data science began in 2008 when he began building models for healthcare and predictive maintenance for vessels.
![Page 3: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/3.jpg)
![Page 4: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/4.jpg)
Use Case
This is a project in development since 2016 as an effort to address environmental issues in the salmon farming process.
The Pegasus Group provides data science services and technology support to this project.
![Page 5: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/5.jpg)
Background
• Chile !
• World's 2nd largest farmed salmon exporter.
• Salmon farming is the 3rd largest economic activity.
• In 2017, produced USD $ 4.5 billion in revenue.
![Page 6: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/6.jpg)
Problem
• Sea Lice
• Deadly parasite that hosts and damages salmonids.
• Threatens the environment, the communities and the local economy, both directly and indirectly.
• USD$ 350 million are spent to address it.
![Page 7: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/7.jpg)
Sea Lice
![Page 8: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/8.jpg)
Challenge
• Understanding how the Caligus is spread.
• Predicting what salmon farms are in immediate danger.
• Evaluating the best antibiotic treatments against Caligus.
![Page 9: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/9.jpg)
Warming up
![Page 10: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/10.jpg)
How to solve these challenges?
• Apply a Hydrodynamic Model to review tide directions.
• Apply Predictive Analytics to detect farms in danger.
• Apply Machine Learning to evaluate the best treatments.
![Page 11: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/11.jpg)
Methodology
• RMDS: Rod's Methodology for Data Science
• Understanding the Context.
• Asking the right Questions.
• Identifying the Nouns.
• Taking action with Verbs.
• Interpreting Answers.
![Page 12: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/12.jpg)
RMDS vs CRISP-DM
• Context
• Questions
• Nouns (Data)
• Verbs (Processes)
• Answers
![Page 13: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/13.jpg)
Infrastructure
GIS DWHCMM
Hydra 12 DBs
API
Connie
Dashboard
![Page 14: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/14.jpg)
Applying Nouns and Verbs
GIS(Noun)
DWH(Verb)
CMM(Verb)
Hydra(Noun)
9 DBs (Noun)
API(Noun)
Connie(Noun)
Dashboards(Verb)
![Page 15: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/15.jpg)
But there are massive amounts of it.
No Big Data
![Page 16: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/16.jpg)
How much data do we have?
100 Gb (stable)
47 Gb(hourly)
10 Gb(yearly)
300 Gb(stable)
1 Gb(hourly)
![Page 17: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/17.jpg)
Challenge 1:How the parasite is spread?
![Page 18: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/18.jpg)
How the Caligus is spread?
• Hydrodynamic Model
• Streaming a 4D representation of the ocean (latitude, longitude, depth and time) in time-series format.
• Processing this representation with Navier-Stokes equations and map/reduced into Connie Matrix.
(Think of automatic BMP to SVG transformation, a few million times heavier)
![Page 19: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/19.jpg)
4D Representation of the Ocean
A A
T T
A A
D A T A
A A
T T
A A
D A T A
A A
T T
A A
D A T A
TIME
X Dimension
Y D
imen
sion
Z Dimension
X Dimension X Dimension
![Page 20: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/20.jpg)
![Page 21: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/21.jpg)
0
750
1500
2250
3000
1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 Hit
Performance of Hydrodynamic Model
![Page 22: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/22.jpg)
Connectivity Matrix
![Page 23: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/23.jpg)
Connectivity Matrix
![Page 24: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/24.jpg)
Challenge 2:What farms are in danger?
![Page 25: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/25.jpg)
What farms are in danger?
• Answer: the ones in the path of Caligus!
• Mix operational databases, the GIS database and the Connie Matrix in the data warehouse.
• Perform Time-Series and k-Means on different pairs 360 times on each block.
• A manually trained Decision Tree helps categorizing the threat level between 0 and 10.
![Page 26: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/26.jpg)
RapidMiner: Getting Operational DB's
![Page 27: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/27.jpg)
RapidMiner: Joining Operational and GIS
![Page 28: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/28.jpg)
RapidMiner: Joining Connie Matrix
• Same ol', same ol',
• Except that it's done with PostgreSQL and PostGIS.
• So, no pictures of this process.
![Page 29: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/29.jpg)
RapidMiner: k-Means + Decision Tree
![Page 30: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/30.jpg)
Reports
![Page 31: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/31.jpg)
Results
• Find farms that might be attacked within 2 weeks.
• Trained data from 2016, tested data from 2017.
• This has been pretty consistent with data from 2018.
True Hit True Miss %
Pred. Hit 4982 1845 72.97
Pred. Miss 890 192817 99.54
Class Recall 84.84 99.05
![Page 32: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/32.jpg)
Challenge 3:What is the best treatment?
![Page 33: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/33.jpg)
![Page 34: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/34.jpg)
Data Model for Production/Mortality
![Page 35: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/35.jpg)
Challenge
• Explore operational databases for the following things:
• Maximized production and minimized mortality rate.
• Analyze diseases, caligus reports, treatments and vaccinations.
• Retrieve patterns that are applied in the best farms and apply these to the worst ones.
![Page 36: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/36.jpg)
Notice
While the database has been entirely designed by me (the structure), the information (the data) contained on it is
proprietary and I cannot share it with you. That doesn't mean I can't obfuscate the data to show you how we performed
the analysis.
Also, it has been simplified from nearly a thousand processes to just two, as proper data extraction and
classification was quite difficult.
![Page 37: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/37.jpg)
Preparation Process
![Page 38: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/38.jpg)
Analytics Process
![Page 39: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/39.jpg)
Results
![Page 40: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/40.jpg)
Real Life Testing
• Sample: 20 farms of nearly 5800.
• The combination of treatments was designed through SVM, Neural Networks and Time-Series. (Too complex to be shown here).
• Mortality reduced in 46.1%. (73.7% in Caligus)
• USD$ 97,565 saved in treatments.
• Expected to save USD$ 24 million by 2019.
![Page 41: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/41.jpg)
Conclusions
• #DataSci is about solving challenges with technology: we apply it in many other use cases.
• Proper data prep overcomes technical debt limits. Public organizations developments suffer a lot of this.
• Quick process model (20%) helps us fail fast and achieve results earlier.
• RapidMiner excels at both. We couldn't have done this without it.
![Page 42: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/42.jpg)
RapidMinerData Science, Fast and Simple
![Page 43: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive](https://reader034.vdocument.in/reader034/viewer/2022050515/5f9f426dea350623c53a3678/html5/thumbnails/43.jpg)
Contact Information
Rodrigo Fuentealba Cartes
E-mail: [email protected] Twitter:@datasciencegemsLinkedIn:https://www.linkedin.com/in/rodrigofuentealbacartes/