cpu time 051010

25
Modeling Bayesian Phylogenetic Inference in Protein Data Analysis by Using Mr. Bayes, Proml, Consensus Applications

Upload: teresashawn-intl-corp

Post on 06-Jul-2015

191 views

Category:

Documents


0 download

DESCRIPTION

CPU Time Applications

TRANSCRIPT

Page 1: Cpu time 051010

Modeling Bayesian Phylogenetic Inference in Protein

Data Analysis by Using Mr. Bayes, Proml, Consensus

Applications

Page 2: Cpu time 051010

Mr. Bayes vs. Proml (maximum likelihood)

1 3 5 7 9

11

13

15

17

19

21

S1

0

2000

4000

6000

8000

10000

12000

Series1Series2

Page 3: Cpu time 051010

CPU time/Mr. Bayes/Proml

1

9

17

S1

S2

S3

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Series1

Series2

Series3

Page 4: Cpu time 051010

Diff. of Maximum Likelihood(Mr. Bayes – Proml)

vs. CPU (sec)

maximum likelihood

0500

10001500200025003000350040004500

0 200 400 600 800

diff (postml - proml)

cp

u t

ime (

sec)

Series1

Page 5: Cpu time 051010

Diff. of Maximum Likelihood(Mr. Bayes – Proml)

vs. CPU (sec)maximum likelihood

0500

10001500200025003000350040004500

1 4 7 10 13 16 19

diff (postml - preml)

cp

u t

ime (

sec)

Series1

Series2

Page 6: Cpu time 051010

Linear Regression in Testing Datasets

linear regression

0

2000

4000

6000

8000

10000

12000

0 5000 10000 15000

Series1

Page 7: Cpu time 051010

Testing Datasets Plus One/Two Long Branch’s Datasets

147101316192225283134

S10

2000

4000

6000

8000

10000

12000

14000

16000

mrbayes vs proml (plus AB,CD data)

Series1

Series2

Page 8: Cpu time 051010

Linear Regression After Bayesian Correction for Testing Datasets & One/Two Long Branch’s Datasets

0

2000

4000

6000

8000

10000

12000

14000

16000

0 5000 10000 15000 20000

Series1

Page 9: Cpu time 051010

Phylogeny for All Testing Datasets

phy all

-0.5

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250 300

no.

leng

th Series1

Page 10: Cpu time 051010

Phylogeny for All Datasets

phylogeny for all datasets

-0.5

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250 300 350 400

no.

leng

th Series1

Page 11: Cpu time 051010

One Long Branch Datasets

one long branch

-0.5

0

0.5

1

1.5

2

2.5

0 10 20 30 40 50 60

no.

leng

th Series1

Page 12: Cpu time 051010

Two Long Branches Datasets

two long branches

00.20.40.60.8

11.21.41.6

0 10 20 30 40 50 60

no.

leng

th Series1

Page 13: Cpu time 051010

Phylogeny (sequence length from Proml)

phy07

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15

species

len

gth

Series1

AB50J

-0.5

0

0.5

1

1.5

2

0 2 4 6 8

no.

len

gth

Series1

CD20J

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8

no.

len

gth

Series1

phy06

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10

species

len

gth

Series1

Page 14: Cpu time 051010

One/Two Long Branch’s Datasets(Maximum Likelihood)

CD

10J

1200

0.74

179

1132

5.10

468

CD

50J

1264

8.05

314

1193

6.72

682

S111000115001200012500

1300013500

14000

Series1

Page 15: Cpu time 051010

Data Analysis• Testing datasets: phy01 ~ phy21, nexus01

~ nexus21)

• Experimental datasets: one long branch (AB10J ~ AB70aj), two long branches (CD10J ~ CD70aj)

• Operation systems: Mac OS X ver. 10.3.9

• Dual 800 MHZ PowerPC G4

• 256 MB SDRAM• Mr. Bayes – 3.1.1

• Phylip 3.67 (Proml, Consensus)

Page 16: Cpu time 051010

continue• Testing sample size: 21x2• Experimental samples: 7x2• Degree of freedom: 20• Chi square: 283.1561 > 31.41(alpha=0.05)• Proml and Mr. Bayes are two dep val.• ANOVA Ssw=2669051, Ssb=24253093• Sstotal=50943143.71• Eta square= 0.476081596• Type I error=0.05• Type II error=1.83%• Power= 98.17%• Instrument threshold=1xE-8

Page 17: Cpu time 051010

Testing Datasets

y(Mr.Bayes)= 1.058351726x(Proml)+14.79771

0.999724correl

14.79771intercept

1.058352slope

Testing datasets in linear regression between Mr. Bayes and Proml)

104.6243131.7778878.5355sd

226.959296.33331576.467mean

diff(Mr.Bayes-Proml)characterCPU

Testing samples:

Page 18: Cpu time 051010

0.996717correl

0.109857f-test5343.856intercept

3.47E-05t-test0.492193slope

Linear regression between experimental samples:

364.0589179.5717sd

13122.8611802.88mean

CD(two long branches)AB(one long branch)

Experimental samples:

Page 19: Cpu time 051010

Linear Regression for All

y(Mr. Bayes)= 1.058352x(Proml)+14.79764

0.999959correl

14.79764intercept

1.058352slope

Linear regression for all datasets(including experimental and testing)

385.302190.05sd

13903.512506.4mean

CD(two long branches)AB(one long branch)

After Bayesian modeling

Page 20: Cpu time 051010

Tree Hierarchical Structure: AB10J• AB10J.JTT• +----------seq.7 • | • +-----5 +---------seq.4 • | | | • | +---2 +-------seq.6 • | | +----4 • | +----3 +----------seq.5 • | | • | +-----------seq.2 • | • 1------------------------------------------seq.3 • | • +--------seq.1 • AB10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7

Page 21: Cpu time 051010

Histogram AB10J

AB10J

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8

no.

len

gth

Series1

Page 22: Cpu time 051010

Tree Hierarchical Structure:CD10J• CD10J.JTT• +----seq.7 • | • +--5 +----seq.4 • | | | • | +-2 +-------------------seq.6 • | | +-4 • | +--3 +-----seq.5 • | | • | +-----seq.2 • | • 1---seq.3 • | • +---------------------seq.1 • CD10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7

Page 23: Cpu time 051010

Histogram CD10J

CD10J

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8

no.

len

gth

Series1

Page 24: Cpu time 051010

Discussion

• Bayesian modeling can be used to evaluate type I,II errors, eta square, power, Chi square X2, Anova, correlated coefficient, linear regression etc ..

• It is possible to design a 2x2 table in order to evaluate risk such as RD, RR, RO

• Proml and consensus features bring out a histogram’s profile including hierarchical tree structure and it is possible for peak area integration

Page 25: Cpu time 051010

Questions

• CPU time can be used to count all activities in hydrogen bonds through kinesthetic module in computer, and hydrogen bond’s configurations of DNA match from pairs of A-T, A-U. C-G, and/or DNA alignment from separate genetic codes of A, T, U, C, G.

• CPU time is possible to count all triggering by stem cell activity through functional proteins.

• CPU time has been already used in Forensic science to count pattern differentiation from suspect sample in judiciary investigations.