machine learning for knowledge dissemination in creative...

Machine Learning for Knowledge Dissemination in Creative Economies

Krzysztof

Pampuch

• What is machine learning?

• Basic terminology

• Systematics of ML methods

• How to measure the quality of our model

• Selected methods of ML

• What ML looks like in everyday practice?

StatisticsComputer

Science

Machine learning (ML) is a category of algorithm that allows software applications to become more accurate in predicting outcomes without being explicitly programmed.

No observation

Length of stalk

Width of stalk

Length of petal

Width of petal

1 5.1 3.5 1.4 0.2 Setosa

2 4.9 3.0 1.4 0.2 Setosa

3 6.4 3.5 4.5 1.2 Versicolor

… … … … … …

100 5.9 3.0 5.0 1.8 Virginica

FeaturesPredictors

LabelPredicted variable

A neurone of McCullocha-Pittsa (1943)

A neurone of Frank Rosenblatt (1957)

Learning conception:

Machine learning

unsupervised

clusteringdimensionality

reduction

supervised

classification regression

reinforcementlearning

quantity

• can be expressedusing specificunits of measurement

quality

• can be describedonly by words, can’tbe ordered

Criteria:

• Efficiency

• Stability

• For other samples

• Over time

• Interpretability

• We split the dataset into:• Train set - used for training a model

• Validation set - used to choose the best model

• Test set - used to make sure that our model is stable

train validation test

Test set Training set

Test setTraining set…

Each observation is used exactly one for test and k-1 times for a training

The quality of a model is a mean counted on all training sets

An expected error on a test test:

𝐸( 𝑦𝑖 − 𝑦𝑖)2 = 𝑉𝑎𝑟 𝑦𝑖 + [𝐵𝑖𝑎𝑠( 𝑦𝑖)]

2+𝑉𝑎𝑟(𝜀)

𝑉𝑎𝑟 𝑦𝑖 - variance

𝐵𝑖𝑎𝑠( 𝑦𝑖) - bias

𝑉𝑎𝑟(𝜀) - variance of a random component

• A bias reflects what error we make when appraching reality with a model

• A variance reflects how much the prediction would change if a different set of data were used to learn the model

• A random component variance is independent of the proces modeled and irreducible

• Best situation: negliglible deviation and variance

The more „flexible” the method, the less devation

The more „flexible” the method, the higher the variance

• Goal: to fit a linear function to our data

• 𝑦 = 𝛽0 + 𝑖=1𝑝

𝛽𝑖𝑥𝑖 + 𝜖

• How to find model coefficients?

• Minimizing the cost functions:

𝐿 = 𝑖=1𝑁 (𝑦𝑖 − 𝑦𝑖)

• Disadvantages: sensitivity to outliers, poorly modeling nonlinear relationships

𝑅2 = 1 − 𝑖( 𝑦𝑖 − 𝑦)2

𝑖(𝑦𝑖 − 𝑦)2

• Values in the range [0;1]• Interpretation:

How much variance of data does the model explain?

Mean value 𝑦

• Misclassification Rate: 𝑀𝑅 = 1 − 𝑖 𝑓𝑖𝑖

𝑖≠𝑗 𝑓𝑖𝑗

• Accuracy: 𝐴𝐶𝐶 = 1 − 𝑀𝑅

• Multi-class log-loss: 𝑀𝐿𝐿 = −1

𝑁 𝑖=1

𝑁 𝑗=1𝑀 𝑦𝑖𝑗log(𝑝𝑖𝑗)

• ROC, AUC, F-measure: 𝐹1 =2𝑇𝑃

2𝑇𝑃+𝐹𝑃+𝐹𝑁

True value

0 𝑓00 𝑓01 𝑓021 𝑓10 𝑓11 𝑓122 𝑓20 𝑓21 𝑓22

True value

1/T 0/N

1/T 𝑇𝑃 𝐹𝑃

0/N 𝐹𝑁 𝑇𝑁

K-means DBSCAN

DataFeature

engineering Tain set

Test set

Learning

Model validation

• Data almost never has the desired format

• Often we have to acquire data from many sources

• Volume, inflow rate

• Examples of problems

• Storage of terabytes of data

• Data from various DBMS + external data

• Data refreshing and retention

• Consistency od data types

• Unstructured data

• Character encoding, numer and date formats

• The most time-consuming activity

• The type of processing required depends on the type of data and the problem

• Generating features – manual vs automatic:

• Examples of generation of the features:

preprocessingdimensionality reduction

prediciton

• Regular expression• tokenization• lematiozation• bag-of-words• TF-IDF

Customer data

• Total playments• Balance on accounts• Number of logins• Demographic data

Audio / video

• Signal framing• LPC, MFCC• Color/gradient hist• SIFT, SURF• bag-of-words

• High dimensionality of the space of features:

• Degrades the predictive power of models

• Introduces redundancy (variable correlation)

• Leads to overfitting

• Requires larger data sets to achieve the same goal

• Increases the computational effort

• And besides… decision-makers do not like complex models and many variables

• So let’s reduce the dimensionality!

• Principle of operation (most ofen):

• The most accurate reproduction of data in the space of lower dimensionality

• The best possible highlighting of information differentiating the predicted value of variables

cech ekstrakcja2

cech selekcja2

𝑘 < 𝑛Feature selection Feature selection

machine learning for knowledge dissemination in creative...

Documents

amen shem nora - trad arr ml.pdf

applied machine learning at facebook: a datacenter...

ft 500 assignment study of financial use cases influenced...

an introduction to machine learning - university of...

deep learning: an introduction for applied...

joseph.kambourakis@ibm.com...

partners in progress...

mp, ml, aml reconstruction of phylogenetic trees: a status...

the singularity: a philosophical...

bioinformatics machine learning...

15-780 machine learningzkolter/course/15-780-s14/ml.pdf ·...

internet security seminar - university of...

introduction to machine...

nordic forest research...

distributed machine learning: an intro. -...

invitation to attend - cimtec...

cigre experimentalconductormodels-ml.pdf

message - belgian federal science policy office finrep...

r commercial indoor and outdoor gas tankless water...

machine...