machine learning in practice lecture 3 carolyn penstein rosé language technologies institute/...

Machine Learning in PracticeLecture 3

Carolyn Penstein Rosé

Language Technologies Institute/ Human-Computer Interaction

Institute

Plan for Today Announcements

Assignment 2Quiz 1

Weka helpful hints Topic of the day: Input and Output More on cross-validation ARFF format

Weka Helpful Hints

Increase Heap Size

Weka Helpful Hint: Documentation!!

Click on Morebutton!

Output Predictions Option

Important note: Because of the way Weka randomizes the data forcross-validation, the only circumstance under which you can matchthe instance numbers to positions in your data is if you have separate train and test sets so the order will be preserved!

View Classifier Errors

Input and Output

Representations

Concept: the rule you want to learn

Instance: one data point from your training or testing data (row in table)

Attribute: one of the features that an instance is composed of (column in table)

Numeric versus Nominal Attributes What kind of reasoning does your

representation enable? Numeric attributes allow instances to be

ordered Numeric attributes allow you to measure

distance between instances Sometimes numeric attributes make too fine

grained of a distinction

.2 .25 .28 .31 .35 .45 .47 .52 .6 .63

Numeric versus Nominal Attributes

.2 .25 .28 .31 .35 .45 .47 .52 .6 .63

Numeric attributes can be discretized into nominal values Then you lose ordering and distance Another option is applying a function that maps a range

of values into a single numeric attribute

Nominal attributes can be mapped into numbers i.e., decide that blue=1 and green=2 But are inferences made based on this valid?

Numeric versus Nominal Attributes

.2 .25 .28 .31 .35 .45 .47 .52 .6 .63

.2 .3 .5 .6

Numeric attributes can be discretized into nominal values Then you lose ordering and distance Another option is applying a function that maps a range

of values into a single numeric attribute

Nominal attributes can be mapped into numbers i.e., decide that blue=1 and green=2 But are inferences made based on this valid?

Example!

Problem: Learn a rule that predicts how much time a person spends doing math problems each day

Attributes: You know gender, age, socio-economic status of parents, chosen field if any

How would you represent age, and why? What would you expect the target rule to look like?

Styles of Learning Classification – learn rules from labeled

instances that allow you to assign new instances to a class

Association – look for relationships between features, not just rules that predict a class from an instance (more general)

Clustering – look for instances that are similar (involves comparisons of multiple features)

Numeric Prediction (regression models)

Food Web

http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

Food Web

What else would be affected if wheatwere to disappear?

Food Web

How would you represent this data?

Food Web

What would the learned rule look like?

Food Web

What would the learned rule look like?

Food Web

Food Web What if you wanted a more general rule: i.e., Affects(Entity1, Entity2)

122 rows altogether!Now let’s look at the learned rule….

122 rows altogether!Now let’s look at the learned rule…. Does it have to be this complicated?

Food Web

What would your representation for Affects(Entity1, Entity2) look like?

Food Web

machine learning in practice lecture 3 carolyn penstein rosé language technologies institute/...

output slide

numeric attributes

day attributes

increase heap size slide

nominal values

food web http

instance numbers

range of values

Documents

taghelper and infomagnets technologies for exploring the...

timido vino spumante brut rosÉ - scarpettawine.com ·...

savor a rosÉ - ncl

hey! hey! let’s hear it for rosé! · 2019. 3. 5. ·...

rosÉ & bubbles

carolyn penstein rosé language technologies institute...

lightside user's manualemayfiel/lightside.pdf · lightside...

2016 sunset rosé - paliwineco.com

machine learning in practice lecture 2 carolyn penstein...

kwan thai food 2020...2010 [bard rosÉ shiraz rosÉ shiraz...

taghelper: basics part 1 carolyn penstein rosé carnegie...

integrated collaborative learning environments with dynamic...

effective instructional conversations + tutalk instruction...

carolyn penstein rosé language technologies institute...

machine learning in practice lecture 19 carolyn penstein...

alcohol-free wine house rosé

taghelper & side carolyn penstein rosé language...

lustau vermut rosé (en)

· jolidon spain, ctramuntana, 21, 17180, vilablareix,...

carolyn penstein rosé language technologies institute and...