cp365 artificial intelligence - colorado...

CP365Artificial Intelligence

Tech News!

Apple news conference tomorrow?

Tech News!

Apple news conference tomorrow?

Google cancels Project Ara modular phone

Weather-Based Stock Market Predictions?

Dataset Preparation

Clean – remove bogus data/fill in missing data

Normalize data – adjust features to be similar magnitudes

Deal with Missing Data

Option 1: remove datapoints with any missing feature values

Option 2: fill in missing data with <data_missing> tags for categorical data

Option 3: fill in missing data with global means for numeric data

Option 4: fill in missing data with values from similar data points

Remove Outliers

Some datapoints may have ridiculous feature values.

We can remove outliers from our dataset to increase performance.

What is an outlier?

Outliers

Patient Height (cm)

Patient Weight (kg)

... Prognosis

131.2 59.2 ... Good

176.7 82.9 ... Good

12613.9 66.0 ... Poor

161.0 70.2 ... Poor

Outliers

Patient Height (cm)

Patient Weight (kgs)

... Prognosis

131.2 59.2 ... Good

176.7 82.9 ... Good

12613.9 66.0 ... Poor

161.0 70.2 ... Poor

Obvious outlier. How can we define what makes an outlier?

We could use 3σ as the threshold.

Outliers

Patient Height (cm)

Patient Weight (kgs)

... Prognosis

131.2 59.2 ... Good

176.7 82.9 ... Good

12613.9 66.0 ... Poor

161.0 70.2 ... Poor

This column has x = 156.3 and σ = 23.1 (without the possible

outlier).

The 3σ thresholds would be (156.3 - 3 * 23.1, 156.3 + 3 * 23.1)

or(87, 225.6)

A Bad Dataset

Patient Height (nm)

Patient Weight (tons)

... Prognosis

1.31 x 109 0.065 ... Good

1.76 x 109 0.091 ... Good

1.23 x 109 0.073 ... Poor

1.61 x 109 0.077 ... Poor

A Bad Dataset

Patient Height (nm)

Patient Weight (tons)

... Prognosis

1.31 x 109 0.065 ... Good

1.76 x 109 0.091 ... Good

1.23 x 109 0.073 ... Poor

1.61 x 109 0.077 ... Poor

How will these large differences affect learning?

Data Normalization Procedure

Patient Height (nm)

1.31 x 109

1.76 x 109

1.23 x 109

1.61 x 109

Range of Extreme Values

1.76 x 109

1.23 x 109

Data Normalization Procedure

Patient Height (nm)

1.31 x 109

1.76 x 109

1.23 x 109

1.61 x 109

Range of Extreme Values

1.76 x 109

1.23 x 109

Normalized Range

0.0 (-1.0)

Mapping

Data Normalization Formula

Patient Height (nm)

1.31 x 109

1.76 x 109

1.23 x 109

1.61 x 109

Say we want the normalized value, newpt, for the first height, 1.31 x 109, called pt.

oldmax = 1.76 x 109 oldmin = 1.23 x 109

newmax = 1.0newmin = 0.0

Data Normalization Formula

newpt=( pt−oldminoldmax−oldmin )⋅(newmax−newmin )+newmin

Patient Height (nm)

1.31 x 109

1.76 x 109

1.23 x 109

1.61 x 109

newpt=0.15

Say we want the normalized value, newpt, for the first height, 1.31 x 109, called pt.

oldmax = 1.76 x 109 oldmin = 1.23 x 109

newmax = 1.0newmin = 0.0

How do we know if an ML model is any good?

Overfitting

Training

Testing

A Biological Neuron

Human Brain

How many neurons?

AnimalNumber Neurons (cerebral cortex)

Rat 20,000,000

Dog 160,000,000

Cat 300,000,000

Pig 450,000,000

Horse 1,200,000,000

Dolphin 5,800,000,000

African Elephant 11,000,000,000

Human 20,000,000,000

How many connections?

Human 100,000,000,000,000

How many connections?

Human 100,000,000,000,000

Google (2012) 1,700,000,000

Google/Stanford (2013) 11,200,000,000

Digital Reasoning (2015) 160,000,000,000

Artificial Neuron

Threshold Function

w2 w3Input connectionsand weights

Output connections

Hard Threshold

Threshold Function

S = Sum up all inputi * weight

if S > THRESHOLD: output = 1else: output = 0

Hard Threshold:Step Function

Write down artificial neurons with weights and thresholds that model the following functions:

IdentityLogical ANDLogical OR

Logical XORConstant function

Sigmoid Threshold

Threshold Function

S = Sum up all inputi * weight

output = 1

1e−S

Sigmoid Threshold:'S' Function

sigmoid

w1 = 0.1

w2 = 0.2

w3 = 0.42

sigmoid

w1 = 0.1

w2 = 0.2

w3 = 0.42

Features x1 = 0.66 x2 = 0.11 x3 = 0.20

s = w1 * x1 + w2 * x2 + w3 * x3s = 0.1 * 0.66 + 0.2 * 0.11 + 0.42 * 0.2s = 0.09

1e−0.09=0.52

Output Calculations

sigmoid

w1 = 0.1

w2 = 0.2

w3 = 0.42

Features x1 = 0.66 x2 = 0.11 x3 = 0.20

y1 = 0.52

Perceptron Network

Input Layer

Output Layer

Perceptron: Linear Boundary

Linear Boundary?

Multilayer Network

Input Layer

Hidden Layer(s)

Output Layer

ANN Learning – How to get the weights?

weight1 weight2

ANN Learning

● How do we get the right weights?

● Perceptron:● Gradient descent

● Multilayer Network:● Back propagation

Node Activation Function

Activation (output) of node j.

a j=g(input j)=g(∑i=0

wij ai)

g is the threshold activation function.

wij ai)

g is the threshold activation function.

Sum of all weights and input values.

Minimize Global Error Function

error=∑j

(t j−a j)2

For every output node, j, sum up...

Minimize Global Error Function

error=∑j

(t j−a j)2

...the difference in target value vs. generated output

value and square it.For every output node, j, sum up...

Perceptron Learning

Δwij=η(t j−a j)ai

Update the weight on connection i → j

Perceptron Learning

The learning rate (0.3ish)

Perceptron Learning

Difference in target and generated output.

Perceptron Learning

Difference in target and generated output.

Input activation

Let's learn NAND!

Input1 Input2 Label

1 1 0In1 In2

Starting weight values: W1 = 0.81, W2 = 0.55, W3 = 0.16

η = 0.3

Use sigmoid threshold

Dataset: NAND

a j=g (input j)=g (∑i=0

w jiai)

ANN Learning - Backpropagation

Input Layer

Hidden Layer

Output Layer

Put in input

values and feed

the activation forward

to produce

the output.

Input Layer

Hidden Layer

Output Layer

Calculate the error in the output layer and

then back-propagate it to update

lower weights.

Δwij=ηδ j ai

Think of this as the error measure for node j.

Different for output and hidden weights.

Δwij=ηδ j ai

Update the weight on connection i → j Input activation

Think of this as the error measure for node j.

Different for output and hidden weights.

ANN Learning – Backpropagationfor Output Nodes

δ j=a j(1−a j)(t j−a j)

Error measure for output node, j.

Derivative of sigmoid function.

Difference in target vs. generated output.

ANN Learning – Backpropagationfor Hidden Nodes

δ j=a j(1−a j)∑k

δk w jk

Error measure for hidden node, j.

δk w jk

Error measure a combination of output errors that this weight

contributes to.

ANN Learning

● Initialize random network weights● for epoch in range NUMBER_EPOCHS:

● Train network on random presentation of instances● Update weights with backpropagation● Report global error function value

Choosing the Learning Rate, η

What happened when our learning rate was too high for linear regression?

How do we choose an appropriate learning rate for ANNs?

Bold Driver

After each epoch...

if error went down:η = η * 1.05

else:η = η * 0.50

sodahead.com

Choosing the Network Structure

Input Layer

Hidden Layer

Output Layer

How many nodes? What are their connections?

Input Layer

Hidden Layer

Output Layer

# of output nodes determined by the number of function

outputs.

Input Layer

Hidden Layer

Output Layer

# of input nodes determined by the number of function

inputs.

Input Layer

Hidden Layer

Output Layer

Too few hidden nodes: unable to get

a detailed enough approximation of the

target function

Input Layer

Hidden Layer

Output Layer

Too many hidden nodes: slower to train and easier to overfit

training data

ANN Representational Power

● With one hidden layer:● Model all continuous functions

● With two hidden layers:● Model all functions

Rules of Thumb

● Use 1 or 2 hidden layers

Rules of Thumb

● Use about (2/3)n hidden nodes for reasonably complex functions

Rules of Thumb

● Use about (2/3)n hidden nodes for reasonably complex functions

● Don't train for too many epochs

Splitting up datasets

Training data – use to train your ML model

Validation data – use to improve your ML model while training

Testing data – use to test performance of your ML model

K-Fold Cross Validation

Full Dataset Dataset split into k chunks

K-Fold Cross Validation: Pass 1

Training Dataset Validation Dataset

K-Fold Cross Validation: Pass 2

Training Dataset Validation Dataset

K-Fold Cross Validation

Perform K training/validation passes

Each pass counts as a classification accuracy sample

Extreme case: K = datasetSizeLeave-one-out testing

ANN Implementation?

Break!

cp365 artificial intelligence - colorado...

Documents

ara - international civil aviation organization · ara...

matasellos de literatura - cancels of literature

ara fts flow calibrator - ara instruments

matasellos de ajedrez - cancels of chess

brian nye oam - ara - rail industry update from the ara

matasellos de extremadura. cancels of extremadura

matasellos de fotografia - cancels of photography

english economics department unclassified cancels

check reversals, check cancels & ach stops

matasellos de castillos - cancels of castles

matasellos de monarquias - cancels of monarchy

ara ic60 1 ara ic60 2p : a9c70132

ara presentation

matasellos de ciclismo - cancels of cycling

ara collection

matasellos de setas - cancels of mushrooms

project ara

bob herbert am - ara - ara update

matasellos de astrofilatelia - cancels of astrophilately

ara hyacintový