big data: new tricks for econometrics varian, hal r. "big data: new tricks for...
TRANSCRIPT
![Page 1: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/1.jpg)
Big Data: New Tricks for EconometricsVarian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27.
Konstantina ChristakopoulouLiang ZengGroup G21
Related to the Chapter 28: Data Mining
![Page 2: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/2.jpg)
Motivation. Machine Learning for Economic
Transactions: Linear Regression is not Enough!
Big data size A lot of features: Choose variables Relationships are not only linear!!
![Page 3: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/3.jpg)
Connection to the Course: Decision Trees e.g ID3Challenges of ID3:- Cannot handle continuous attributes- Prone to outliers
1. C4.5, Classification And Regression Trees (CART) can handle: + continuous and discrete attributes+ handle missing attributes+ over-fitting by post-pruning
2. Random Forests: Ensemble of decision stumps. Randomization (choosing sample + choosing attributes) leads to better accuracy!
![Page 4: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/4.jpg)
ID3 Decision Tree
![Page 5: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/5.jpg)
Classification and Regression Trees(CART)CART:Classification tree is when the predicted
outcome is the class to which the data belongs.
Regression tree is when the predicted outcome can be considered a real number (e.g. the age of a house, or a patient’s length of stay in a hospital).
![Page 6: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/6.jpg)
Classification and Regression Trees(CART)Predict Titanic survivors using age and class
![Page 7: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/7.jpg)
Classification and Regression Trees(CART)A CART for Survivors of the Titanic using R language
![Page 8: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/8.jpg)
Random Forests
![Page 9: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/9.jpg)
Random Forests Choose a bootstrap sample and start to grow a tree At each node:
Choose random sample of predictors to make the next decision
Repeat many times to grow a forest of trees
For prediction: have each tree make its prediction and then a majority vote.
Decision Tree Learning + One Tree+ On all learning samples+ Prone to distortions e.g outliers
Random Forest
+ Many decision trees+ Each DT on a random subset of samples+ Reduce the effect of outliers (no overfitting)
![Page 10: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0191a28abf838cce533/html5/thumbnails/10.jpg)
Thank you!