![Page 1: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/1.jpg)
Machine Learning for Knowledge Dissemination in Creative Economies
Krzysztof
Pampuch
![Page 2: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/2.jpg)
• What is machine learning?
• Basic terminology
• Systematics of ML methods
• How to measure the quality of our model
• Selected methods of ML
• What ML looks like in everyday practice?
![Page 3: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/3.jpg)
StatisticsComputer
Science
Machine learning (ML) is a category of algorithm that allows software applications to become more accurate in predicting outcomes without being explicitly programmed.
![Page 4: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/4.jpg)
No observation
Length of stalk
Width of stalk
Length of petal
Width of petal
Label
1 5.1 3.5 1.4 0.2 Setosa
2 4.9 3.0 1.4 0.2 Setosa
3 6.4 3.5 4.5 1.2 Versicolor
… … … … … …
100 5.9 3.0 5.0 1.8 Virginica
Ob
serv
atio
ns
FeaturesPredictors
LabelPredicted variable
![Page 5: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/5.jpg)
A neurone of McCullocha-Pittsa (1943)
A neurone of Frank Rosenblatt (1957)
Learning conception:
![Page 6: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/6.jpg)
Machine learning
unsupervised
clusteringdimensionality
reduction
supervised
classification regression
reinforcementlearning
![Page 7: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/7.jpg)
quantity
• can be expressedusing specificunits of measurement
quality
• can be describedonly by words, can’tbe ordered
![Page 8: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/8.jpg)
Criteria:
• Efficiency
• Stability
• For other samples
• Over time
• Interpretability
![Page 9: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/9.jpg)
• We split the dataset into:• Train set - used for training a model
• Validation set - used to choose the best model
• Test set - used to make sure that our model is stable
train validation test
![Page 10: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/10.jpg)
Test set Training set
Test setTraining set…
Each observation is used exactly one for test and k-1 times for a training
The quality of a model is a mean counted on all training sets
![Page 11: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/11.jpg)
An expected error on a test test:
𝐸( 𝑦𝑖 − 𝑦𝑖)2 = 𝑉𝑎𝑟 𝑦𝑖 + [𝐵𝑖𝑎𝑠( 𝑦𝑖)]
2+𝑉𝑎𝑟(𝜀)
𝑉𝑎𝑟 𝑦𝑖 - variance
𝐵𝑖𝑎𝑠( 𝑦𝑖) - bias
𝑉𝑎𝑟(𝜀) - variance of a random component
• A bias reflects what error we make when appraching reality with a model
• A variance reflects how much the prediction would change if a different set of data were used to learn the model
• A random component variance is independent of the proces modeled and irreducible
• Best situation: negliglible deviation and variance
![Page 12: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/12.jpg)
The more „flexible” the method, the less devation
𝐸( 𝑦𝑖 − 𝑦𝑖)2 = 𝑉𝑎𝑟 𝑦𝑖 + [𝐵𝑖𝑎𝑠( 𝑦𝑖)]
2+𝑉𝑎𝑟(𝜀)
![Page 13: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/13.jpg)
The more „flexible” the method, the higher the variance
𝐸( 𝑦𝑖 − 𝑦𝑖)2 = 𝑉𝑎𝑟 𝑦𝑖 + [𝐵𝑖𝑎𝑠( 𝑦𝑖)]
2+𝑉𝑎𝑟(𝜀)
![Page 14: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/14.jpg)
![Page 15: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/15.jpg)
• Goal: to fit a linear function to our data
• 𝑦 = 𝛽0 + 𝑖=1𝑝
𝛽𝑖𝑥𝑖 + 𝜖
• How to find model coefficients?
• Minimizing the cost functions:
𝐿 = 𝑖=1𝑁 (𝑦𝑖 − 𝑦𝑖)
2
• Disadvantages: sensitivity to outliers, poorly modeling nonlinear relationships
15
![Page 16: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/16.jpg)
𝑅2 = 1 − 𝑖( 𝑦𝑖 − 𝑦)2
𝑖(𝑦𝑖 − 𝑦)2
• Values in the range [0;1]• Interpretation:
How much variance of data does the model explain?
Mean value 𝑦
![Page 17: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/17.jpg)
![Page 18: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/18.jpg)
![Page 19: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/19.jpg)
• Misclassification Rate: 𝑀𝑅 = 1 − 𝑖 𝑓𝑖𝑖
𝑖≠𝑗 𝑓𝑖𝑗
• Accuracy: 𝐴𝐶𝐶 = 1 − 𝑀𝑅
• Multi-class log-loss: 𝑀𝐿𝐿 = −1
𝑁 𝑖=1
𝑁 𝑗=1𝑀 𝑦𝑖𝑗log(𝑝𝑖𝑗)
• ROC, AUC, F-measure: 𝐹1 =2𝑇𝑃
2𝑇𝑃+𝐹𝑃+𝐹𝑁
True value
0 1 2
Pre
dic
ted
valu
e
0 𝑓00 𝑓01 𝑓021 𝑓10 𝑓11 𝑓122 𝑓20 𝑓21 𝑓22
True value
1/T 0/N
Pre
dic
ted
valu
e
1/T 𝑇𝑃 𝐹𝑃
0/N 𝐹𝑁 𝑇𝑁
![Page 20: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/20.jpg)
K-means DBSCAN
![Page 21: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/21.jpg)
DataFeature
engineering Tain set
Test set
Model
Learning
Model validation
![Page 22: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/22.jpg)
• Data almost never has the desired format
• Often we have to acquire data from many sources
• Volume, inflow rate
• Examples of problems
• Storage of terabytes of data
• Data from various DBMS + external data
• Data refreshing and retention
• Consistency od data types
• Unstructured data
• Character encoding, numer and date formats
![Page 23: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/23.jpg)
• The most time-consuming activity
• The type of processing required depends on the type of data and the problem
• Generating features – manual vs automatic:
• Examples of generation of the features:
czas
preprocessingdimensionality reduction
prediciton
Text
• Regular expression• tokenization• lematiozation• bag-of-words• TF-IDF
Customer data
• Total playments• Balance on accounts• Number of logins• Demographic data
Audio / video
• Signal framing• LPC, MFCC• Color/gradient hist• SIFT, SURF• bag-of-words
![Page 24: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/24.jpg)
• High dimensionality of the space of features:
• Degrades the predictive power of models
• Introduces redundancy (variable correlation)
• Leads to overfitting
• Requires larger data sets to achieve the same goal
• Increases the computational effort
• And besides… decision-makers do not like complex models and many variables
• So let’s reduce the dimensionality!
• Principle of operation (most ofen):
• The most accurate reproduction of data in the space of lower dimensionality
• The best possible highlighting of information differentiating the predicted value of variables
nkn x
x
x
f
y
y
y
x
x
x
2
1
2
1
cech ekstrakcja2
1
ki
i
i
nx
x
x
x
x
x
2
1
cech selekcja2
1
𝑘 < 𝑛Feature selection Feature selection
![Page 25: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/25.jpg)
![Page 26: Machine Learning for Knowledge Dissemination in Creative ...dbis.fberg.tuke.sk/public/media/0134/pampuch-ml.pdf · Science Machine learning (ML) is a category of algorithm that allows](https://reader033.vdocument.in/reader033/viewer/2022042317/5f068e757e708231d4189375/html5/thumbnails/26.jpg)