csci3230 entropy, information &decision tree liu pengfei week 12, fall 2015 1
TRANSCRIPT
1
CSCI3230Entropy, Information &Decision Tree
Liu Pengfei
Week 12, Fall 2015
2
Which feature to choose?
3
Information GainSuppose we have a dataset of peopleWhich split is more informative?
Split over whether Account exceeds
50K
Over 50KLess or equal 50K EmployedUnemployed
Split over whether applicant is employed
4
Information Gain
Uncertainty/Entropy (informal) Measures the level of uncertainty in
a group of examples
5
Uncertainty
Very uncertain group Less uncertain Minimum uncertainty
6
Entropy: a common way to measure uncertainty
• Entropy =
pi is the probability of class i, or that the proportion of class i in the set.
• Entropy comes from information theory. The higher the entropy, the less the information content
• --some say higher entropy,• more information
7
Information Gain We want to determine which attribute in a
given set of training feature vectors is most useful for discriminating between the classes to be learned.
Information gain tells us how important a given attribute of the feature vectors is.
We will use it to decide the ordering of attributes in the nodes of a decision tree.
Example of Entropy
Gender Major Like
Male Math Yes
Female History No
Male CS Yes
Female Math No
Female Math No
Male CS Yes
Male History No
Female Math Yes
=1
Whether students like the movie Gladiator
Example of Entropy
Gender Major Like
Male Math Yes
Female History No
Male CS Yes
Female Math No
Female Math No
Male CS Yes
Male History No
Female Math Yes
P(Major=Math & Like =Yes) = 0.25
P(Major=History & Like =Yes) = 0
P(Major=CS & Like =Yes) = 1
=1
=0
=0
Example of Entropy
Gender Major Like
Male Math Yes
Female History No
Male CS Yes
Female Math No
Female Math No
Male CS Yes
Male History No
Female Math Yes
P(Gender=male & Like =Yes) = 0.75
P(Gender =female & Like =Yes) = 0.25
𝐸 (𝐿𝑖𝑘𝑒∨𝐺𝑒𝑛𝑑𝑒𝑟=𝑚𝑎𝑙𝑒 )=− 14
log ( 14 )−− 3
4log( 3
4 )𝐸 (𝐿𝑖𝑘𝑒∨𝐺𝑒𝑛𝑑𝑒𝑟= 𝑓𝑒𝑚𝑎𝑙𝑒 )=− 3
4log( 3
4 )−− 14
log( 14 )
CSCI3230Introduction to Neural Network I
Liu Pengfei
Week 12, Fall 2015
12
The Angelina Effect Angelia is a carrier of the
mutation in the BRCA1 gene.
Angelina Jolie's risk of having breast cancer was amplified by more than 80 percent and ovarian cancer by 50 percent.
Her aunt died from breast cancer and her mother from ovarian cancer.
She decided to go for surgery and announced her decision to have both breasts removed.
Can we use a drug for the treatment?
13
Neural Network Project Topic: Protein-Ligand Binding Affinity Prediction Goal: Given the structural properties of a drug, you are helping a
chemist to develop a classifier which can predict how strong a drug (ligand) will bind to a target (protein).
Due date will be announced later Do it alone / Form a group of max. 2 students
(i.e. >= 3 students per group is not allowed.)
You can use any one of the following language: C, C++, Java, Swi-Prolog, CLisp, Python, Ruby or Perl.
However, you cannot use data mining or machine learning packages
Start the project as early as possible !
14
How to register your group ?
15
How to register your group ?
16
What to include in your zip file ?Your zip file should contain the followings:
preprocess.c – your source code (if you use C) preprocessor.sh – a script file to compile your
source code trainer.c – your source code (if you use C) trainer.sh – a script file to compile your source
code best.nn – your Neural Network
Case Sensitive !
17
Grading Please note that the neuron network
interfaces/formats are different from the previous years.
We adopts a policy of zero tolerance on plagiarism. Plagiarism will be SERIOUSLY punished.
To make the project estimation easier, we will evaluate your work based on the F-measure
18
Model Evaluation Cost-sensitive measures
cba
a
pr
rpba
a
FNTP
TPca
a
FPTP
TP
2
22(F) measure-F
(r) Recall
(p)Precision
Predicted Class
ActualClass
Class = Yes Class = No
Class = Yes a (TP) b (FN)
Class = No c (FP) d (TN)
20
Introduction to Neural Network
21
Biological Neuron A neuron is an
electrically excitable cell that processes and transmits information through electrical and chemical signals.
A chemical signal occurs via a synapse, a specialized connection with other cells.
22
Artificial Neuron An artificial neuron is a
logic computing unit.
In this simple case, we use a step function as the activation function: only 0 and 1 are possible outputs
Mechanism: Input:
Output: y = g(in)
Activation Function g(u)
Artificial Neuron
Step function
𝜃
23
Example
If w1 = 0.5, w2 = -0.75, w3 = 0.8, and step function g (with threshold 0.2) is used as activation function, what is the output?
Summed input = 5(0.5) + 3.2(-0.75) + 0.1(0.8) = 0.18Output = g(Summed input)
Since 0.18 < 0.2, so Output = 0
w1
w2
w3
Activation Function g(u)
Step function
𝜃
5
3.2
0.1
24
Artificial Neuron An artificial neuron is a
logic computing unit.
In this case, we use sigmoid function as activation function: real values from 0 and 1 are possible outputs
Mechanism: Input:
Output: y = g(in)
Activation Function g(z)
Artificial Neuron
Sigmoid function
𝑔 (𝑧 )= 1
1+𝑒− 𝑧
25
Example
If w1 = 0.5, w2 = -0.75, w3 = 0.8, and sigmoid function g is used as activation function, what is the output?
Summed input = 5(0.5) + 3.2(-0.75) + 0.1(0.8) = 0.18Output = g(Summed input) = = 0.54488
Activation Function g(z)
Sigmoid function
𝑔 (𝑧 )= 1
1+𝑒− 𝑧
w1
w2
w3
5
3.2
0.1
26
Logic Gate Simulation[0,1]
[0,1]
[0,1]
[0,1]
[0,1]
I1 I2 Total Input
t Input > t?
AND 0 0 0 1.5 0
AND 0 1 1 1.5 0
AND 1 0 1 1.5 0
AND 1 1 2 1.5 1
Function:Sum(input)>t
?
27
Logic Gate Simulation[0,1]
[0,1]
[0,1]
[0,1]
[0,1]
I1 I2 Total Input
t Input > t?
OR 0 0 0 0.5 0
OR 0 1 1 0.5 1
OR 1 0 1 0.5 1
OR 1 1 2 0.5 1
Function:Sum(input)>t
?
28
Logic Gate Simulation[0,1]
[0,1]
[0,1]
[0,1]
[0,1]
I1 I2 Total Input
t Input > t?
NOT 0 N/A 0 -0.5 1
NOT 1 N/A -1 -0.5 0
Function:Sum(input)>t
?
29
Logic Gate Simulation
For the previous cases, it can be viewed as a classification problem:
separate the class 0 and class 1.The neuron simply finds a line to separate the two classes
And: I1+I2 – 1.5 = 0Or: I1+I2 – 0.5 = 0Xor: ?
I1
I2
Output
0 0 0
0 1 1
1 0 1
1 1 0
30
Linear SeparabilityTwo classes ('+' and '-') below are linearly separable in two dimensions.
i.e we can find a set of w1, w2 such that:
Every point x belongs to class ‘+’ satisfy &Every point x belongs to class ‘-’ satisfy
Two classes ('+' and '-') below are linearly inseparable in two dimensions.
The above example would need two straight lines and thus is not linearly separable.
http://en.wikipedia.org/wiki/Linear_separability
31
Artificial Neural NetworksImportant concepts:
What is a perceptron? What is a single-layer perceptron? What is a multi-layer perceptron?
What is the feed forward property?
What is the general learning principle ?
32
Technical Terms Perceptron = Neuron
Single-layer perceptron = Single-layer neural network
Multi-layer perceptron = Multi-layer neural network
The presence of one or more hidden layer is the difference between single-layer perceptron and multi-layer perceptron.
The simplest kind of neural network is a single-layer perceptron network, which consists of a single layer of output nodes; the inputs are fed directly to the outputs via a series of weights.
33
Multi-layer Perceptron Multi-layer
Input layer Hidden layer(s) Output layer
Feed-forward Links go in one
direction only
A MLP consists of multiple layers of nodes in a directed graph. Except for the input nodes, each node is a neuron.
The multilayer perceptron consists of three or more layers (an input and an output layer with one or more hidden layers).
http://en.wikipedia.org/wiki/Multilayer_perceptron
34
Feed Forward Property Given inputs to input layer
(L0)
Outputs of neurons in L1 can be calculated.
Outputs of neurons in L2 can be calculated and so on.
Finally, outputs of neurons in the output layer can be calculated
35
General Learning Principle1. For supervised learning,
we provide the model a set of inputs and targets.
2. The model returns the outputs
3. Reduce the difference between the outputs and targets by updating the weights
4. Repeat step 1-3 until some stopping criteria is encountered
Target Input
w1
w2
w3
Input
Input
Input
Output
36
Summary1. We have learnt the similarity between the biological neurons
and artificial neurons
2. We have learnt the underlying mechanism of artificial neurons
3. We have learnt how artificial neurons compute logic (AND, OR, NOT)
4. We have learnt the meaning of perceptron, single-layer perceptron and multi-layer perceptron.
5. We have learnt how information is propagated between neurons in multi-layer perceptron (Feedforward property).
6. We have learnt the general learning principle of artificial neuron network.
37
Announcements Written assignment 3 will be released this
week.
Neural network project specification will be released this week.
Since there will be no tutorial on next Thursday due to congregation ceremony, please attend one of the other two tutorial sessions on Wednesday.