smoothing, sampling, and simulation vasileios hatzivassiloglou university of texas at dallas
TRANSCRIPT
![Page 1: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/1.jpg)
Smoothing, Sampling, and Simulation
Vasileios Hatzivassiloglou
University of Texas at Dallas
![Page 2: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/2.jpg)
2
Back to motif finding
• Apply MLE to the profile data
• Note that we already used MLE when calculating each cell
• Now θ is the set of choices for each letter
• Because each choice is independent of the others, the MLE is– Choose at each position j the letter
• The algorithm takes O(kn) timei
ijAargmax
![Page 3: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/3.jpg)
3
Representing profiles
• Usually stored as logAij values– historically for ease of calculation– with computers for maintaining accuracy
• Smoothing– estimated values can be 0– this will affect calculations, sometimes leading to
serious problems (e.g., no solution)– smoothing increases 0 probabilities– it has to reduce other estimated probabilities to
account for this
![Page 4: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/4.jpg)
4
Additive smoothing
• Replace each probability
with
where is a small number (such as 0.001)
||
k
cA ijij
k
cA ijij
k/||
![Page 5: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/5.jpg)
5
Student presentations
• Scheduled for December 2 and December 4
• Each student gets 10 minutes (7 minutes for presentation, 3 minutes for questions)
• Select project or topic and papers in consultation with the instructor by November 13
![Page 6: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/6.jpg)
6
Potential presentation topics
• Similarity
• Statistical, predictive, and generative models
• Simulation
• Estimation
• Classification
• Clustering
• Text mining and knowledge discovery
![Page 7: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/7.jpg)
7
Statistical sampling
• A very general method for solving difficult problems with many variables that cannot be solved directly, but where partial solutions can be “guessed” and improved
• Commonly known as “Monte Carlo” methods (from the Monaco casino) because one of the pioneers of the technique liked gambling
![Page 8: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/8.jpg)
8
Famous MC applications
• Buffon’s needle (18th century)
• Enrico Fermi’s study of the neutron (1930)
• The Manhattan project (1944)
• Currently used in– aerodynamics– video games and computer-generated films– share pricing– bioinformatics
![Page 9: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/9.jpg)
9
Buffon’s needle• How to calculate π?
• Consider a random throwing of a needle of length l on a floor with parallel boards of width w (w>l). Then it can be shown that the probability p of the needle crossing a line between boards is
• By estimating p (experimentally through MLE) one can then calculate π
• Using this, the estimate 355/113 was obtained (accurate to 7 decimal places)
w
l
2
![Page 10: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/10.jpg)
10
The classification problem
• Given examples from two or more different classes of objects, and a description of a new object, which class does the new object come from?
• A lot of variation depending on what kind of description we have available
![Page 11: Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c0121a28abf838ccc166/html5/thumbnails/11.jpg)
11
Example classification problems
• Given samples of spam and non-spam email messages, classify an incoming message as spam or non-spam
• Given samples of paying and non-paying credit card holders, accept or reject a credit card application
• Given samples of patients who entered a hospital, predict whether a given patient will exit the hospital alive