descriptive machine learning explanations · improving the model - linear model and random forest...

16
DALEX Descriptive mAchine Learning EXplanations Alicja Gosiewska MI2 Data Lab Warsaw University of Technology

Upload: others

Post on 27-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

DALEXDescriptive mAchine Learning EXplanations

Alicja GosiewskaMI2 Data Lab

Warsaw University of Technology

Page 2: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Data and models

Page 3: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Data and models

Page 4: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Data and models

Page 5: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

The explain() function

explain(model, data, y, predict_function, link, ..., label)

Page 6: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Model performance

Page 7: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Model performance

Page 8: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

auditor: model performance

Page 9: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Variable importance

Page 10: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Variable response

Page 11: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Improving the model

- Linear model and random forest had equal performance for apartments dataset.

- In general the random forest model has smaller residuals than the linear model but there is a small

fraction of very large residuals.

- Random forest model under-predicts expensive apartments. It is not a model that we would like to

employ.

Page 12: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Improving the model

- Linear model and random forest had equal performance for apartments dataset.

- In general the random forest model has smaller residuals than the linear model but there is a small

fraction of very large residuals.

- Random forest model under-predicts expensive apartments. It is not a model that we would like to

employ.

- `construction_year` is important for the random forest model.

- the relation between `construction_year` and the price of square meter is non linear.

Page 13: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Improving the model

Page 14: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

Improving the model

Page 15: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model
Page 16: Descriptive mAchine Learning EXplanations · Improving the model - Linear model and random forest had equal performance for apartments dataset. - In general the random forest model

We acknowledge the financial support from the NCN Opus grant

2016/21/B/ST6/02176

Acknowledgements