machine learning from disaster
DESCRIPTION
TRANSCRIPT
![Page 1: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/1.jpg)
MACHINE LEARNING FROM
DISASTERF#unctional Londoners @ Skills Matter
Phil Trelford 2013 @ptrelford
![Page 2: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/2.jpg)
RMS TitanicOn April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew.
…there were not enough lifeboats for the passengers and crew.
…some groups of people were more likely to survive than others, such as women, children, and the upper-class.
![Page 3: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/3.jpg)
Kaggle competition
![Page 4: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/4.jpg)
Kaggle Titanic datasettrain.csv
test.csv
PassengerIdSurvived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked1 0 3 Braund, Mr. Owen Harrismale 22 1 0 A/5 21171 7.25 S2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C3 1 3 Heikkinen, Miss. Lainafemale 26 0 0 STON/O2. 3101282 7.925 S4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)female 35 1 0 113803 53.1 C123 S5 0 3 Allen, Mr. William Henrymale 35 0 0 373450 8.05 S6 0 3 Moran, Mr. Jamesmale 0 0 330877 8.4583 Q7 0 1 McCarthy, Mr. Timothy Jmale 54 0 0 17463 51.8625 E46 S8 0 3 Palsson, Master. Gosta Leonardmale 2 3 1 349909 21.075 S9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female 27 0 2 347742 11.1333 S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem)female 14 1 0 237736 30.0708 C11 1 3 Sandstrom, Miss. Marguerite Rutfemale 4 1 1 PP 9549 16.7 G6 S12 1 1 Bonnell, Miss. Elizabethfemale 58 0 0 113783 26.55 C103 S13 0 3 Saundercock, Mr. William Henrymale 20 0 0 A/5. 2151 8.05 S14 0 3 Andersson, Mr. Anders Johanmale 39 1 5 347082 31.275 S15 0 3 Vestrom, Miss. Hulda Amanda Adolfinafemale 14 0 0 350406 7.8542 S16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16 S17 0 3 Rice, Master. Eugenemale 2 4 1 382652 29.125 Q18 1 2 Williams, Mr. Charles Eugenemale 0 0 244373 13 S19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18 S20 1 3 Masselmani, Mrs. Fatimafemale 0 0 2649 7.225 C21 0 2 Fynney, Mr. Joseph Jmale 35 0 0 239865 26 S22 1 2 Beesley, Mr. Lawrencemale 34 0 0 248698 13 D56 S23 1 3 McGowan, Miss. Anna "Annie"female 15 0 0 330923 8.0292 Q24 1 1 Sloper, Mr. William Thompsonmale 28 0 0 113788 35.5 A6 S25 0 3 Palsson, Miss. Torborg Danirafemale 8 3 1 349909 21.075 S26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)female 38 1 5 347077 31.3875 S27 0 3 Emir, Mr. Farred Chehabmale 0 0 2631 7.225 C28 0 1 Fortune, Mr. Charles Alexandermale 19 3 2 19950 263 C23 C25 C27 S
![Page 5: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/5.jpg)
DATA ANALYSISTitanic: Titanic: Machine Learning from Disaster
![Page 6: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/6.jpg)
FSharp.Data: CSV Provider
![Page 7: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/7.jpg)
Countinglet female (passenger:Passenger) = passenger.Sex = “female”
let survived (passenger:Passenger) = passenger.Survived = 1
let females = passengers |> where female
let femaleSurvivors = females |> tally survived
let femaleSurvivorsPc = females |> percentage survived
![Page 8: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/8.jpg)
Tally Ho!/// Tally up items that match specified criteria
let tally criteria items =
items |> Array.filter criteria |> Array.length
/// Percentage of items that match specified criteria
let percentage criteria items =
let total = items |> Array.length
let count = items |> tally criteria
float count * 100.0 / float total
![Page 9: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/9.jpg)
Survival rate/// Survival rate of a criteria’s group
let survivalRate criteria =
passengers |> Array.groupBy criteria
|> Array.map (fun (key,matching) ->
key, matching |> Array.percentage survived
)
let embarked = survivalRate (fun p -> p.Embarked)
![Page 10: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/10.jpg)
Score
let score f = passengers |> Array.percentage (fun p -> f p = p.Survived)
let rate = score (fun p -> (child p || female p) && not (p.Class = 3))
![Page 11: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/11.jpg)
MACHINE LEARNING
Titanic: Machine Learning from Disaster
![Page 12: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/12.jpg)
20 QuestionsThe game suggests that the information (as measured by Shannon's entropy statistic) required to identify an arbitrary object is at most 20 bits. The game is often used as an example when teaching people about information theory. Mathematically, if each question is structured to eliminate half the objects, 20 questions will allow the questioner to distinguish between 220 or 1,048,576 objects.
![Page 13: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/13.jpg)
Decision TreesA tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning.
![Page 14: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/14.jpg)
Split data set (from ML in Action)
Python
def splitDataSet(dataSet, axis, value):
retDataSet = []
for featVec in dataSet:
if featVec[axis] == value:
reducedFeatVec = featVec[:axis]
reducedFeatVec.extend(featVec[axis+1:])
retDataSet.append(reducedFeatVec)
return retDataSet
F#
let splitDataSet(dataSet, axis, value) =
[|for featVec in dataSet do
if featVec.[axis] = value then
yield featVec |> Array.removeAt axis|]
![Page 15: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/15.jpg)
Decision Tree
let labels =
[|"sex"; "class"|]
let features (p:Passenger) : obj[] =
[|p.Sex; p.Pclass|]
let dataSet : obj[][] =
[|for passenger in passengers ->
[|yield! features passenger;
yield box (p.Survived = 1)|] |]
let tree = createTree(dataSet, labels)
![Page 16: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/16.jpg)
Overfitting
![Page 17: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/17.jpg)
CLASSIFYTitanic: Machine Learning from Disaster
![Page 18: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/18.jpg)
Decision Tree: Create -> Classifylet rec classify(inputTree, featLabels:string[], testVec:obj[]) =
match inputTree with
| Leaf(x) -> x
| Branch(s,xs) ->
let featIndex = featLabels |> Array.findIndex ((=) s)
xs |> Array.pick (fun (value,tree) ->
if testVec.[featIndex] = value
then classify(tree, featLabels,testVec) |> Some
else None
)
![Page 19: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/19.jpg)
Titanic Data
Variable Description
survival Survival (0 = No; 1 = Yes)
pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
name Name
sex Sex
age Age
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Number
fare Passenger Fare
cabin Cabin
embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
Tips:
* Empty floats - Double.Nan
![Page 20: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/20.jpg)
RESOURCESTitanic: Machine Learning from Disaster
![Page 21: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/21.jpg)
Special thanks!◦Matthias Brandewinder for the Machine Learning samples
◦ http://www.clear-lines.com/blog/
◦ Tomas Petricek & Gustavo Guerra for FSharp.Data library◦ http://fsharp.github.io/FSharp.Data/
◦ F# Team for Type Providers◦ http://blogs.msdn.com/b/dsyme/archive/2013/01/30/twelve-type-providers-in-pictures.aspx
◦Peter Harrington’s for the Machine Learning in Action code samples◦ http://www.manning.com/pharrington/
◦Kaggle for the Titanic data set◦ http://www.kaggle.com/c/titanic-gettingStarted
![Page 22: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/22.jpg)
Machine Learning Job TrendsSource indeed.co.uk
![Page 23: Machine learning from disaster](https://reader033.vdocument.in/reader033/viewer/2022061223/54c64e754a7959ad7b8b458c/html5/thumbnails/23.jpg)
What next?F# Machine Learning information
◦ http://fsharp.org/machine-learning/
Random Forests
◦http://tinyurl.com/randomforests
Progressive F# Tutorials
◦http://skillsmatter.com/event/scala/progressive-f-tutorials-2013