![Page 1: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/1.jpg)
Transcriptional regulation &Clustering
Elena Nikolaeva [email protected] University of Tartu, Estonia
MTAT.03.239 Bioinforma2cs
![Page 2: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/2.jpg)
• Part 1: Transcrip2onal regula2on - Gene regula*on in eukaryotes - PWM - TFBS predic*on using PWM
• Part 2: Clustering - Goal - Types of clustering - Distance measures - Applica*ons
![Page 3: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/3.jpg)
Informa2on flow in eukaryo2c cell
h@p://www.nature.com/scitable/topicpage/gene-‐expression-‐14121669
![Page 4: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/4.jpg)
Intron is any nucleo*de sequence within a gene that is removed by RNA splicing while the final mature RNA product of a gene is being generated
Exon is any nucleo*de sequence encoded by a gene that remains present within the final mature RNA product of that gene
![Page 5: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/5.jpg)
![Page 6: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/6.jpg)
Transcrip2on factor
Image from “Op*miza*on of PWMs using sta*s*cally synchrofasosta*c morphogene*c infrastructural modeling” by Konstan*n Tretjakov
TF1
TF2
perform this func*on: alone or with other proteins in a complex, by promo*ng (as an ac*vator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcrip*on of gene*c informa*on from DNA to RNA)
Is a protein that binds to specific DNA sequences, thereby controlling the flow (or transcrip=on) of gene=c informa=on from DNA to messenger RNA
![Page 7: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/7.jpg)
Transcrip2onal regulators can determine cell types
h@p://www.nature.com/scitable/topicpage/gene-‐expression-‐14121669
![Page 8: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/8.jpg)
8
Gene
Enhancer
TSS: Transcription Start Site
“Proximal” promoter (100bp-2Kb 5’ Upstream)
How is gene expression regulated? Transcrip*on begins when an RNA polymerase binds to a so-‐called promoter sequence on the DNA molecule
Binding of regulatory proteins to an enhancer sequence causes a shi\ in chroma*n structure that either promotes or inhibits RNA polymerase and transcrip*on factor binding
Promoter analysis. TFBS Detec*on by D.Rico
![Page 9: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/9.jpg)
Promoters • Promoters are DNA segments upstream of transcripts that ini*ate transcrip*on
• Promoter a"racts RNA Polymerase to the transcrip*on start site
5’ Promoter 3’
9 Promoter analysis. TFBS Detec*on by D.Rico
![Page 10: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/10.jpg)
Enhancers Is a short region of DNA that can be bound with proteins to enhance transcrip=on levels of genes (does not need to be par*cularly close to the genes it acts on)
h@p://www.nature.com/scitable/topicpage/gene-‐expression-‐14121669
![Page 11: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/11.jpg)
11
![Page 12: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/12.jpg)
Transcrip2on repression
An inac*ve repressor protein can become ac*vated by another molecule
interfere with RNA polymerase binding to the promoter, effec*vely preven*ng transcrip*on.
h@p://www.nature.com/scitable/topicpage/gene-‐expression-‐14121669
![Page 13: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/13.jpg)
How to iden2fy Transcrip2on Factor Binding Sites(TFBS)?
h@p://www.nature.com/scitable/topicpage/gene-‐expression-‐14121669
![Page 14: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/14.jpg)
Transcription factors recognize specific sequences.
http://www.bio.jhu.edu/Faculty/Privalov/
TGAGTCATGACTCA
Gcn4
DNA
TFs recognize specific sequences
h@p://www.bio.jhu.edu/Faculty/Privalov/
![Page 15: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/15.jpg)
Some positions can have multiple nucleotides.IUPAC ambiguity codes
Some posi2ons can have different nucleo2des
![Page 16: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/16.jpg)
TGAGTCATGACTCA TGASTCA
Gcn4 consensus sequenceGcn4 consensus sequence
![Page 17: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/17.jpg)
TFBS: Detec2on methods in vivo
Functional analysis ChIP
in vitro on cloned fragment Footprinting reactions Exonuclease digests Gel retardation (EMSA) UV Crosslinking
in vitro on artificial DNA: SELEX: Systematic Evolution of Ligands
by Exponential enrichment
Slide from Promoter analysis. TFBS Detec*on by D.Rico
![Page 18: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/18.jpg)
ChIP-Seq can be used to detect TF
binding sites.
ChIP-‐Seq can be used to detect TF binding sites
Not all nucleo*des are likely to be present at each posi*on
![Page 19: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/19.jpg)
19
TF Binding Sites
• Problems: – o\en poorly defined consensus – Sequences not conserved within species, and even worse between species
– Examples of enhancers func*onally conserved but not sequence-‐conserved
– Most of the TFBS sequence data comes from just a few species
– Very o\en in vitro experiments – 2 completely different binding sites could be merged in the same matrix/consensus
19 Promoter analysis. TFBS Detec*on by D.Rico
![Page 20: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/20.jpg)
Binding sites and mo2fs
• Transcrip*on factor binding is specific, hence binding sites are similar to each other, but variability is o\en seen
• A mo*f is the common sequence pa@ern among binding sites of transcrip*on factor
![Page 21: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/21.jpg)
Data collection
Probabilities can be calculated and corrected for background
Also called posi*on-‐specific scoring matrices (PSSMs). In log scale. 21
![Page 22: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/22.jpg)
From PFM to PWM/PSSM
22 h@p://www.nature.com/nrg/journal/v5/n4/box/nrg1315_BX2.html
![Page 23: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/23.jpg)
SEQUENCE LOGOS: The informa*on content of a matrix column ranges from 0 (no base preference) and 2 (only 1 base used).
h@p://weblogo.berkeley.edu/ h@p://www.lecb.ncifcrf.gov/~toms/sequencelogo.html 23
![Page 24: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/24.jpg)
AAGTTC AAGCTC AGGCTC AAGGTC
A 430000 C 000204 G 014100 T 000140
Consensus: ARGBTC
Summary
24 Slide from Promoter analysis. TFBS Detec*on by D.Rico
![Page 25: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/25.jpg)
25
Transfac: not free, >848 matrices, loads of informa*on and references, quality score based on methods used
Jaspar: open sources, 174 matrices, minimal informa*on, majority based on SELEX method (80%)
25
PWM databases
![Page 26: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/26.jpg)
TRANSFAC®
26 h@p://www.gene-‐regula*on.com/pub/databases.html
![Page 27: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/27.jpg)
h@p://jaspar.genereg.net/
27
![Page 28: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/28.jpg)
28
Jaspar example: Pax6
28
![Page 29: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/29.jpg)
Fu*lity Theorem: Essen*ally all predicted TFBSs will have no func*onal role It’s necessary to constrain the search space
![Page 30: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/30.jpg)
• Promoter regions • Conserved sequences • Open chroma*n • Integrate over a promoter region. • Proximity to transcrip*on start site (TSS) • etc …
Mul2ple approaches to constrain the search space
![Page 31: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/31.jpg)
Cluster Analysis
Adapted from Meelis Kull’s slides Bioinforma*cs course 2011
![Page 32: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/32.jpg)
Clustering is finding groups of objects such that: • similar (or related) to the objects in the same group
and • different from (or unrelated) to the objects in other
groups
What is cluster analysis?
![Page 33: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/33.jpg)
• Intui*on building • Hypothesis genera*on • Summarizing / compressing large data
Why to cluster biological data?
![Page 34: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/34.jpg)
Par22onal vs Hierarchical
![Page 35: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/35.jpg)
Fuzzy vs Non-‐Fuzzy Fuzzy vs Non-Fuzzy
Each object belongs to eachcluster with some weight(the weight can be zero)
Each object belongs to exactly one cluster
Each object belongs to each cluster with some weight (the weight can be zero)
Each object belongs to exactly one cluster
![Page 36: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/36.jpg)
Hierarchical clustering Hierarchical clustering
Hierarchical clustering is usually depicted as a dendrogram (tree)Hierarchical clustering is usually depicted as a dendrogram (tree)
![Page 37: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/37.jpg)
Hierarchical clustering
• Each subtree corresponds to a cluster • Height of branching shows distance
Hierarchical clustering
• Each subtree corresponds to a cluster• Height of branching shows distance
![Page 38: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/38.jpg)
Hierarchical clustering (0)
Algorithm for Agglomerative Hierarchical Clustering:Join the two closest objects
Algorithm for Agglomera*ve Hierarchical Clustering: Join the two closest objects
Hierarchical clustering
![Page 39: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/39.jpg)
Join the two closest objects
Hierarchical clustering (1)
Join the two closest objects
Hierarchical clustering (1)
![Page 40: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/40.jpg)
Hierarchical clustering (2)
Keep joining the closest pairs
Hierarchical clustering (2)
Keep joining the closest pairs
![Page 41: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/41.jpg)
Hierarchical clustering (3)
Keep joining the closest pairs
Hierarchical clustering (3)
Keep joining the closest pairs
![Page 42: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/42.jpg)
Hierarchical clustering (4)
Keep joining the closest pairs
Hierarchical clustering (4)
Keep joining the closest pairs
![Page 43: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/43.jpg)
Hierarchical clustering (5) Hierarchical clustering (5)
Keep joining the closest pairsKeep joining the closest pairs
![Page 44: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/44.jpg)
Hierarchical clustering (10) Hierarchical clustering (10)
After 10 steps we have 4 clusters left
A\er 10 steps we have 4 clusters le\
![Page 45: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/45.jpg)
Hierarchical clustering (10) Hierarchical clustering (10)
Several ways to measure distancebetween clusters:• Single linkage (MIN)
Several ways to measure distance between clusters: • Single linkage(MIN)
![Page 46: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/46.jpg)
Hierarchical clustering (10) Hierarchical clustering (10)
Several ways to measure distancebetween clusters:• Single linkage (MIN) • Complete linkage (MAX)
Several ways to measure distance between clusters: • Single linkage(MIN) • Complete linkage(MAX)
![Page 47: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/47.jpg)
Hierarchical clustering (10) Hierarchical clustering (10)Several ways to measure distancebetween clusters:• Single linkage (MIN) • Complete linkage (MAX)• Average linkage• Weighted• Unweighted• ...
Several ways to measure distance between clusters: • Single linkage(MIN) • Complete linkage(MAX) • Average linkage
• Weighted • Unweighted ...
![Page 48: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/48.jpg)
Hierarchical clustering (11) Hierarchical clustering (11)
In this example and at this stage we have the same result as in partitional clustering
In this example and at this stage we have the same result as in par**onal clustering
![Page 49: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/49.jpg)
Hierarchical clustering (12) Hierarchical clustering (12)
In the final step the two remaining clusters are joined into a single cluster
In the final step the two remaining clusters are joined into a single cluster
![Page 50: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/50.jpg)
Hierarchical clustering (13) Hierarchical clustering (13)
In the final step the two remaining clusters are joined into a single cluster
In the final step the two remaining clusters are joined into a single cluster
![Page 51: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/51.jpg)
Examples of Hierarchical Clustering in Bioinforma2cs
Examples of Hierarchical Clustering in Bioinformatics
PhylogenyGene expression clustering
![Page 52: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/52.jpg)
K-‐means clustering
• Par**onal, non-‐fuzzy • Par**ons the data into K clusters • K is given by the user
Algorithm: • Choose K ini*al centers for the clusters • Assign each object to its closest center • Recalculate cluster centers • Repeat un*l converges
![Page 53: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/53.jpg)
K-‐means (1) K-means (1)
![Page 54: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/54.jpg)
K-‐means (2) K-means (2)
![Page 55: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/55.jpg)
K-‐means (3) K-means (3)
![Page 56: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/56.jpg)
K-‐means (4) K-means (4)
![Page 57: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/57.jpg)
K-‐means (5) K-means (5)
![Page 58: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/58.jpg)
K-‐means (6) K-means (6)
![Page 59: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/59.jpg)
K-‐means clustering summary
• One of the fastest clustering algorithms • Therefore very widely used • Sensi*ve to the choice of ini*al centres
• many algorithms to choose ini*al centres cleverly
• Assumes that the mean can be calculated • can be used on vector data • cannot be used on sequences (what is the mean of A and T?)
![Page 60: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/60.jpg)
Distance measures Distance measuresDistance of vectors and
• Euclidean distance
• Manhattan distance
• Correlation distance
Distance of sequences and
• Hamming distance => 3
• Levenshtein distance
x = (x1, . . . , xn) y = (y1, . . . , yn)
d(x, y) =
����n�
i=1
(xi − yi)2
d(x, y) =n�
i=1
|xi − yi|
d(x, y) = 1− r(x, y)is Pearson
correlation coefficientr(x, y)
ACCTTG TACCTGACCTTGTACCTG
.ACCTTGTACC.TG => 2
![Page 61: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/61.jpg)
K-‐medoids clustering
• The same as K-‐means, except that the center is required to be at an object
• Medoid -‐ an object which has minimal total distance to all other objects in its cluster
• Can be used on more complex data, with any distance measure
• Slower than K-‐means
![Page 62: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/62.jpg)
K-‐medoids (1) K-medoids (1)
![Page 63: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/63.jpg)
K-‐medoids (2) K-medoids (2)
![Page 64: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/64.jpg)
K-‐medoids (3) K-medoids (3)
![Page 65: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/65.jpg)
K-‐medoids (4) K-medoids (4)
![Page 66: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/66.jpg)
K-‐medoids (5) K-medoids (5)
![Page 67: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/67.jpg)
K-‐medoids (6) K-medoids (6)
![Page 68: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/68.jpg)
K-‐medoids (7) K-medoids (7)
![Page 69: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/69.jpg)
K-‐medoids (8) K-medoids (8)
![Page 70: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/70.jpg)
K-‐medoids (9) K-medoids (9)
![Page 71: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/71.jpg)
Examples of K-means and K-medoids in Bioinformatics
Gene expression clustering
Sequence clustering
Examples of K-‐means and K-‐medoids in Bioinforma2cs
![Page 72: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!](https://reader034.vdocument.in/reader034/viewer/2022042804/5f56720c8d67a66a311951cc/html5/thumbnails/72.jpg)
• Aims: intui*on, hypothesis genera*on, summariza*on • Types:
• Hierarchical/Par**onal • Fuzzy/Non-‐Fuzzy • Vector-‐based/Distance-‐based etc.
• Distance measures • Euclidean, Manha@an, Correla*on • Hamming, Levenshtein • etc.
• Applica*ons: • Clustering genes, sequences, organisms, etc.
Summary of Clustering