smoothing, sampling, and simulation vasileios hatzivassiloglou university of texas at dallas

11
Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

Upload: david-hensley

Post on 19-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

Smoothing, Sampling, and Simulation

Vasileios Hatzivassiloglou

University of Texas at Dallas

Page 2: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

2

Back to motif finding

• Apply MLE to the profile data

• Note that we already used MLE when calculating each cell

• Now θ is the set of choices for each letter

• Because each choice is independent of the others, the MLE is– Choose at each position j the letter

• The algorithm takes O(kn) timei

ijAargmax

Page 3: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

3

Representing profiles

• Usually stored as logAij values– historically for ease of calculation– with computers for maintaining accuracy

• Smoothing– estimated values can be 0– this will affect calculations, sometimes leading to

serious problems (e.g., no solution)– smoothing increases 0 probabilities– it has to reduce other estimated probabilities to

account for this

Page 4: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

4

Additive smoothing

• Replace each probability

with

where is a small number (such as 0.001)

||

k

cA ijij

k

cA ijij

k/||

Page 5: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

5

Student presentations

• Scheduled for December 2 and December 4

• Each student gets 10 minutes (7 minutes for presentation, 3 minutes for questions)

• Select project or topic and papers in consultation with the instructor by November 13

Page 6: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

6

Potential presentation topics

• Similarity

• Statistical, predictive, and generative models

• Simulation

• Estimation

• Classification

• Clustering

• Text mining and knowledge discovery

Page 7: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

7

Statistical sampling

• A very general method for solving difficult problems with many variables that cannot be solved directly, but where partial solutions can be “guessed” and improved

• Commonly known as “Monte Carlo” methods (from the Monaco casino) because one of the pioneers of the technique liked gambling

Page 8: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

8

Famous MC applications

• Buffon’s needle (18th century)

• Enrico Fermi’s study of the neutron (1930)

• The Manhattan project (1944)

• Currently used in– aerodynamics– video games and computer-generated films– share pricing– bioinformatics

Page 9: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

9

Buffon’s needle• How to calculate π?

• Consider a random throwing of a needle of length l on a floor with parallel boards of width w (w>l). Then it can be shown that the probability p of the needle crossing a line between boards is

• By estimating p (experimentally through MLE) one can then calculate π

• Using this, the estimate 355/113 was obtained (accurate to 7 decimal places)

w

l

2

Page 10: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

10

The classification problem

• Given examples from two or more different classes of objects, and a description of a new object, which class does the new object come from?

• A lot of variation depending on what kind of description we have available

Page 11: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

11

Example classification problems

• Given samples of spam and non-spam email messages, classify an incoming message as spam or non-spam

• Given samples of paying and non-paying credit card holders, accept or reject a credit card application

• Given samples of patients who entered a hospital, predict whether a given patient will exit the hospital alive