isaac shah research paper roberto campos austin...
TRANSCRIPT
![Page 1: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/1.jpg)
Research paper Project proposalAustin WilsonRoberto CamposIsaac Shah
![Page 2: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/2.jpg)
Economic impact of epidemics and pandemics
![Page 3: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/3.jpg)
Market losses!
https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/646195/EPRS_BRI(2020)646195_EN.pdf
Market losses from a pandemic could be up to $500 billion
Lower-middle income countries are impacted more than high income countries
![Page 4: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/4.jpg)
Industries affected
Healthcare industry sees a huge spike in costs when a pandemic occurs. Also insurance industry because of people going to the doctor
![Page 5: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/5.jpg)
Industries affected
Agricultural industry is adversely impacted.
● In developed countries the agriculture industry is incentivized to prioritize spending on reducing infectious disease prevention.
● In less developed countries agricultural companies are not incentivized to spend to reduce infectious disease
● Some of these less developed countries may cause an infectious disease outbreak, the result being travel and trade isolation
![Page 6: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/6.jpg)
Travel industry
● People do not want to travel to places where the disease is running rampid● People don’t want to be on planes or ships where they think there might be an outbreak● Estimated $2.8 billion loss to Mexican travel industry from H1N1
![Page 7: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/7.jpg)
Time Series Data Mining by Phillippe Esling
● Data representation: how can time series be represented, what is the shape?
● Similarity measurement: how do we compare two time series objects● Indexing method: how can we speed up query time for big data?
![Page 8: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/8.jpg)
Clustering
● Whole series clustering tries to maximize the distance between different clusters while also maximizing the variance within each cluster
● We can also use subsequence clustering where we try to subset a single time series into different clusters
● Classification is similar to whole series clustering where we are given sets of time series and a label for each set, the task is to train a classifier to label new time series
![Page 9: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/9.jpg)
Segmentation
● Create an accurate approximation while reducing dimensionality of the time series
● Want to keep the essential features and drop redundant or uninsightful features
![Page 10: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/10.jpg)
Piecewise linear approximation
● One of the most successful approaches of segmentation over the years● Try and split the time series up into segments ● Fit individual polynomial or linear cures to each segment● Slicing windows
○ Keep growing a window until it exceeds an error threshold
● Top-down○ Recursively partition a data set until some stopping criteria is met
● Bottom-up○ Start from the finest segments and iteratively merge segments
![Page 11: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/11.jpg)
Data-adaptive vs non-data-adaptive vs model-based
● Data-adaptive: parameters are modified based on the values of consecutive segments
● Non-data-adaptive: parameters of transformation remain the same for every series
● Model-based: assume the time series has been produced by an underlying model and find the parameters of the model
![Page 12: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/12.jpg)
Data
COVID - DetailedNovel Corona Virus 2019 Dataset
COVID - South Korea https://www.kaggle.com/kimjihoo/coronavirusdataset
Stockshttps://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs .
![Page 13: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/13.jpg)
Task Perform + Task Division
Research: Isaac S. Study past events and find dataset’s that we can use to analyze the initial
problems in past situations. Locate when the problem first initiated, when the situation plateaued and when the situation returned to normal. Tableau: Isaac
Visualization will be done before choosing which models to work with. Try to find trends that are visible. Seek patterns and similarities between events. Try to map each case in a US Map and find if there is a correlation between its performance in the stock market.Modeling: Roberto Austin
Explore which types of models can be used to solve each problem. For example, should we use linear regression vs logistic regression, can we find which variables are important. Is a fully connected neural network a useful method for the problem we are currently analyzing. Should we use CNN to identify important features. Can we use SVM to categorize the different events from the past and categorize the current event COVID-19.Data Pre-Processing: Roberto Austin
Data pre-processing will play an important role. We have to analyze the types of data we will be inputting into which model. Different models require different processes.
![Page 14: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/14.jpg)
Tools JupyterInteractive notebook to visually present our models in detail.
PythonOur language of choice to pre-process data and create ML models. We are
interested in using ANN or CNN for our model. We will also consider simple linear or log-linear models as well. R
Used in support of Python as R is a great statistical tool that provides statistical inference. It can help us mathematically prove that there is a correlation between that which we seek to answer. Tableau
A visualization tool that is versatile and creates a custom robust graph.
![Page 15: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/15.jpg)
Progress + Experience
Initial design/case study/prototype/ experimentsWith the expertise of the team combined, we will be able to analyze and seek
Data that can help us answer our problem statement. Once the data is gathered quick visualizations will be rendered to further gain insights. All three members of the team have extensive knowledge of Tableau.
Models can be easily prototyped with the use of Sklearn and Tensorflow libraries. Two Members of the team have experience using these libraries and have access to consulting outside of the classroom.
Progress milestones what will be completed by week 11 and 14By week 11 and 14, the team will have developed visual aids and prototype
models to begin refining and preparing to approach specific details that will need to be specifically taken care of. For example, increasing the accuracy of our model.
ExperienceModeling with Sklearn and Tensor FlowModeling with RData Pre-processingTableauGoogle Colab for Big Data
![Page 16: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/16.jpg)
sources
Economic impact of epidemics/pandemicshttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6491983/
Time Series Data Mininghttps://www.researchgate.net/publication/261722458_Time-Series_Data_Mining
![Page 17: Isaac Shah Research paper Roberto Campos Austin Wilsonathena.ecs.csus.edu/~wilsona/presentation.pdf · Study past events and find dataset’s that we can use to analyze the initial](https://reader034.vdocument.in/reader034/viewer/2022050414/5f8a80f53f2d341bf718e6cb/html5/thumbnails/17.jpg)
Thanks!