minería de datos - wordpress.com · minería de datos analisis de un set de datos ! visualization...

37
Minería de Datos ANALISIS DE UN SET DE DATOS Visualization Techniques Combined Graph Charts and Pies Search for specific functions

Upload: others

Post on 29-Feb-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Minería de Datos

ANALISIS DE UN SET DE DATOS

! Visualization Techniques ! Combined Graph !   Charts and Pies ! Search for specific functions

Page 2: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Data Mining on the DAG

ü  When working with large datasets, annotation results need to be summarized

ü  The DAG provides visualization of annotation data within its biological context

ü  In Blast2GO --> Combined Graph Function

Page 3: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Combined Graph

Each term has a number of sequences associated

Nodes can be coloured to indicate relevance

Each term is displayed around its biological context

Node shape to differentiate between direct and indirect annotation

Page 4: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Combined Graph

Different GO branches Reduces nodes by number of annotate sequences

Criterion for highlighting and filtering nodes

Node data to be displayed

Page 5: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Let's paint the DAG of the dataset analized yesterday (1000 sequences)

Too many nodes!!!

Combined Graph

Need way to find relevant information

Page 6: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Accumulated by node (Sequence Count)

3 1

4

5

1 3

1

Incomming information (Node Score)

3 1

2.4

2.5

1

1 3

Node Information Content

Page 7: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Node score We compute a node score that reflects the

amount of direct information at the node

3 1

2.4

2.5

1

1 3

Page 8: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Node score

GO2 3

GO1 1

GO2 2.4

GO4 2.5

1

1 3

NodeScore (GO1) = 1 * 0.6 0 = 1 NodeScore (GO2) = 3 * 0.6 0 = 3

dist=0 dist=0

dist=1 dist=1

NodeScore (GO3) = 1 * 0.6 1 + 3 * 0.61 = 0.6 + 1.8 = 2.4

dist=0

dist=2 dist=2

NodeScore (GO4) = 1 * 0.6 2 + 3 * 0.62 + 1 * 0.60 = 0.36 + 1.08 + 1 = 2.5

α = 0.6

Page 9: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Node score vs Annotation score

3 1

2.4

2.5

1

1 3

GO1 1 seq

GO2 1 child

GO3 50

hit1

hit2 hit3

ROOT

GO4 55

GO1 1 child

GO2 52

GO1 60

Annotation Score:

-  In annotation context

-  Relates to Blast results of ONE sequence

Node Score:

-  In data-mining context

-  Relates to analysis of a GROUP of sequences

DO NOT MIX-UP !!!!!

AS = max{%sim * ECw]}+ (#TPR_GOs-1) * GOw

Page 10: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Filtered Graph

Direct annotations

Transition nodes

# Filtered Nodes

Page 11: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Compacting Graphs by GOSlim

Page 12: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Show node content

Page 13: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Saving Options

Save as picture and as txt

Page 14: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Graph Charts

Page 15: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Graph Charts

•  Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)

•  Sequence Distribution/GO as Bar-Chart

•  Sequence Distribution/GO as Level-Pie (level selection)

Page 16: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Multilevel vs. GO-Slim Chart

GO-Slim: Handy to summarize functional content

Multi-level Pie with a sequence filter of 20

Page 17: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Use DAG to analyze a function

How many sequences are annotated to the function “photosynthesis”?

Option 1: Find in the GO graph à direct & indirect annotation Option 2: Find through the Select function. Two sub options

Option 2.1. Direct annotation (use GOid or description) Option 2.2. Direct&indirect (use GOid and “include GO parents”)

DAG can be used to make queries on general concepts without direct annotations

Page 18: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Example: analyze a specific function

Find a function on the graph

search export

Page 19: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Example: analyze a specific function

Select all sequences annotated to this function and its descendents

Page 20: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Example: analyze a specific function

Locate these sequences

Page 21: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Example: analyze a specific function

Explore the annotation diversity of a given function within the graph

Exporting the sequence table you can see all Sequences annotated to a given function (GO)

Page 22: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Conclusions ü  DAGs are interesting for browsing functional

annotation but can be too large ü  With filtering and pruning options you can create

more navigable DAGs ü  Pies are good to compact information: try out levels ü  GO-Slim compacts to more equivalent terms than

filtering the GO ü  You can use the DAG to query on general terms

Page 23: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Minería de Datos

ANALISIS DE VARIOS SETS DE DATOS

! Functional Enrichment ! Enriched Graphs !   Meta-analysis

Page 24: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Biosynthesis 54% Biosynthesis 18%

Sporulation 18% Sporulation 27%

One Gene List (A) The other list (B)

Are this two groups of genes carrying out

different biological roles?

Enrichment Analysis

Are these differences statistically significant?

???

???

Interpretation of a large list of genes: which are relevant functions?

Page 25: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Biosynthesis 54% Biosynthesis 18%

Sporulation 18% Sporulation 27%

One Gene List (A) The other list (B)

9 5 No biosynthesis

2 6 Biosynthesis

B A

Fisher's Exact Test

Contingency table

p-value for biosynthesis < 0.05 8 9 No sporulation

3 2 Sporulation

B A

p-value for sporulation > 0.05

Page 26: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Multiple testing correction

We do this for all GO term of our dataset!!!

Many tests => Many false positive => We need correction!

FDR control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses.

FWER control: The familywise error rate is the probability of making one or more false discoveries among all the hypotheses when performing multiple pairwise tests.

(more conservative)

Page 27: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

8 9 No GO 3 2 GO B A

Test-set Ref-set

Fisher’s Exact Test in Blast2GO

Three files: ! Blast2GO project with annotations (.dat/.annot) ! One txt file with IDs: Test-set (.txt) ! Other txt file with IDs: Ref-set (.txt)

Page 28: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Different types of comparisons

●  Compare one condition against another

●  Remove Common Ids ●  Test and Ref-Set are

interchangeable

Set 1

Set 2

Common IDs

●  Compare a subset against the total

●  Gossip default setting

●  Test and Ref-Set are NOT interchangeable

Test- Set

Ref- Set

Common IDs

Test- Set

Ref- Set

Common IDs

Page 29: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

FET in Blast2GO ●  Two-Tailed test not only identifies over but also

under represented functions. ●  If no Ref-Set is chosen all annotations are

used as reference

Page 30: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Enrichment Results

●  Result table with link outs to sequence lists

Page 31: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Most specific terms

Retains only the lowest, most specific enriched term per GO branch

Page 32: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Enriched Graph View enriched terms data as DAG graphs!

reduce

=> To draw all nodes, set filter to 1

Page 33: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Bar-Chart ●  Export enriched terms as chart!

=> Filter results

% of sequences in Ref group

% of sequences in Test group

If Test > Ref = over-expressed

If Ref > Test = under-expressed

Page 34: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Meta-analysis in Blast2GO

Sequence_1 GO:0005792 Sequence_1 GO:0006412 Sequence_1 GO:0003735 Sequence_2 GO:0016705 Sequence_2 GO:0005840 Sequence_2 GO:0005506

Treatment_1 GO:0005792 Treatment_1 GO:0006412 Treatment_1 GO:0003735

Annotation Result (.annot) Enrichment Result

ó Equivalent formats

Treatment_1 GO:0005792 Treatment_1 GO:0006412 Treatment_1 GO:0003735 Treatment_2 GO:0016705 Treatment_2 GO:0005840 Treatment_2 GO:0005506

By joining different functional enrichment results we can create and annotation file of conditions that capture their functional profile

Enrichment Result (.annot)

Page 35: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Meta-analysis in Blast2GO

Use seq names to see treatments

Use color by SeqCount

FIND SIMILARITIES BETWEEN TREATMENTS

Page 36: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Meta-analysis in Blast2GO DISPLAY FUNCTIONAL DISSIMILARITIES ON DAG

Use second column number for color

Page 37: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions

Ejercicios: Minería de Datos