1 biostatistics and statistical bioinformatics setia pramana universitas brawijaya malang, 7 october...

35
1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Upload: isaac-donnelly

Post on 26-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

1

Biostatistics and Statistical Bioinformatics

Setia Pramana

Universitas Brawijaya Malang, 7 October 2011

Page 2: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

BECOMING A STATISTICIAN?

2

Page 3: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Who Need Statisticians?• Can only become a lecturer/teacher?• NO…… More applied fields:• My classmates work in:

– Information and Communication Technology.

– Research and Developments – Governments: Ministry of Finance, PLN,

Bank Indonesia, Danareksa, etc.– Entrepreneur – Many more...

• Writer....• Read the book: 9 Summers 10 Autumns

3

Page 4: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

4

Page 5: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

BIOSTATISTICIANS

5

Page 6: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Biostatistics

• The study of statistics as applied to biological areas such as Biological laboratory experiments, medical research (including clinical research), and public health services research.

• Biostatistics, far from being an unrelated mathematical science, is a discipline essential to modern medicine – a pillar in its edifice’ (Journal of the American Medical Association (1966)

6

Page 7: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Biostatistics

• Public Health:– Epidemiology – Modeling Infectious Diseases: HIV, HCV– Disease Mapping– Genetics: family related disease

• Bioinformatics– Image Processing– Data Mining– Pattern recognition – etc

7

Page 8: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Biostatistics

• Agriculture – Experimental Design– Genetics• Biomedical Research• Evidence-based medicine• Clinical studies• Drug Development

8

Page 9: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Statistical Methods?

• t-test• ANOVA• Regression• Cluster analysis• Discriminant analysis• Non-Linear Modeling• Multiple comparison • Linear Mixed Model• Bayesian • Etc,

• z

9

Page 10: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

BIOSTATISTICIANS IN DRUG DEVELOPMENT

10

Page 11: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Drugs Development

• Takes 10-15 years• Cost more than 1 million USD• To ensure that only the drugs that are that

are both safe and effective can be marketed.• Stages:

- Drug Discovery- Pre-clinical Development- Clinical Development -> 4 Phases

Statisticians are involved in all stages (a must)

11

Page 12: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

12

12

Pharmaceutical developmentPharmaceutical development

Pre-clinical (animal) studiesPre-clinical (animal) studies

Investigational New Drug applicationInvestigational New Drug application

Phase I clinical trialsPhase I clinical trials

Phase II clinical trialsPhase II clinical trials

Phase III clinical trialsPhase III clinical trials

New Drug New Drug AApplicationpplication

Phase IV clinical trialsPhase IV clinical trials

pharmacological profilepharmacological profile; ; acute acute toxicitytoxicity; ; effects of long-term usageeffects of long-term usage

ddiscovery of compoundiscovery of compound; s; synthesis ynthesis

and purification of drug substanceand purification of drug substance; ; mmanufacturing proceduresanufacturing procedures

ssmallmall; f; focus on safetyocus on safety

medium size; fmedium size; focus on safety and ocus on safety and short-term efficacyshort-term efficacy; ;

large and comparative; flarge and comparative; focus on ocus on efficacy and cost benefitsefficacy and cost benefits

„„rreal world” experienceeal world” experience; ; demonstrate demonstrate cost benefitscost benefits; rare ; rare adverse reactionsadverse reactions

Page 13: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

International Conference on Harmonization (ICH)

• The international harmonization of requirements for drug research and development so that information generated in one country or area would be acceptable to other countries or areas.

• Regions: Europe, USA, Japan.• All clinical trials must follow ICH

regulations.• Statistics plays important role.• Statistical Principles for Clinical Trials (ICH

E9).

13

Page 14: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Preclinical and Clinical Development

• Statisticians are involved from the beginning of the study

• Planning the study– Formulating the hypothesis– Choosing the endpoint– Choosing the design and sample size

• Conduct of the study– Patient accrual– Data collection

• Data Quality control, Data analysis• Publication of results

14

Page 15: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

BIOINFORMATICS

15

Page 16: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Bioinformatics

• Bioinformatics is a science straddling the domains of biomedical, informatics, mathematics and statistics.

• Applying computational techniques to biology data

• Functional Genomics• Proteomics• Sequence Analysis• Phylogenetic• Etc,.

16

Page 17: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

“Informatics” in Bioinformatics

• Databases– Building, Querying– Object DB

• •Text String Comparison– Text Search

• Finding Patterns– AI / Machine Learning– Clustering– Data mining

• etc

17

Page 18: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Central Dogma of Molecular Biology

• Genes contain construction information

• All structure and function is made up by proteins

18

Page 19: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Genomics

• Premise: Physiological changes -> Gene expression changes -> mRNA abundance level changes

• Objective: Use gene expression levels measured via DNA microarrays to identify a set of genes that are differentially expressed across two sets of samples (e.g., in diseased cells compared to normal cells)

19

Page 20: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Microarrays Technology

• DNA microarrays are a new and promising biotechnology which allow the monitoring of expression of thousand genes simultaneously

20

Page 21: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Gene Expression Analysis

• Overview of the process of generating high throughput gene expression data using microarrays.

21

Page 22: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Preprocessed data

22

Genes Genes C1 C2 C3 C1 C2 C3 T1 T2 T3T1 T2 T3G8521 G8521 6.89 7.18 6.60 6.89 7.18 6.60 7.40 7.15 7.407.40 7.15 7.40G8522 G8522 6.78 6.55 6.37 6.78 6.55 6.37 6.89 6.78 6.926.89 6.78 6.92G8523 G8523 6.52 6.61 6.72 6.52 6.61 6.72 6.51 6.59 6.466.51 6.59 6.46G8524 G8524 5.67 5.69 5.88 5.67 5.69 5.88 7.43 7.16 7.317.43 7.16 7.31G8525 G8525 5.64 5.91 5.61 5.64 5.91 5.61 7.41 7.49 7.417.41 7.49 7.41G8526 G8526 4.63 4.85 5.72 4.63 4.85 5.72 5.71 5.47 5.795.71 5.47 5.79G8527 G8527 8.28 7.88 7.84 8.28 7.88 7.84 8.12 7.99 7.978.12 7.99 7.97G8528 G8528 7.81 7.58 7.24 7.81 7.58 7.24 7.79 7.38 8.607.79 7.38 8.60G8529 G8529 4.26 4.20 4.82 4.26 4.20 4.82 3.11 4.94 3.083.11 4.94 3.08G8530 G8530 7.36 7.45 7.31 7.36 7.45 7.31 7.46 7.53 7.357.46 7.53 7.35G8531 G8531 5.30 5.36 5.70 5.30 5.36 5.70 5.41 5.73 5.775.41 5.73 5.77G8532 G8532 5.84 5.48 5.93 5.84 5.48 5.93 5.84 5.73 5.755.84 5.73 5.75

Page 23: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Applications

• High efficacy and low/no side effect drug• Personalized medicine.• Genes related disease.• Biological discovery

– new and better molecular diagnostics– new molecular targets for therapy– finding and refining biological pathways

• Molecular diagnosis of leukemia, breast cancer,

• Appropriate treatment for genetic signature• Potential new drug targets

23

Page 24: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Challenges

• Mega data, difficult to visualize• Too few records (columns/samples), usually <

100 • Too many rows(genes), usually > 1,000• Too many columns likely to lead to False

positives• for exploration, a large set of all relevant

genes is desired• for diagnostics or identification of therapeutic

targets, the smallest set of genes is needed• model needs to be explainable to biologists

24

Page 25: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Microarray Data Analysis Types

•Gene Selection– find genes for therapeutic targets

•Classification (Supervised)– identify disease (biomarker study)– predict outcome / select best treatment

•Clustering (Unsupervised)– find new biological classes / refine existing

ones– Understanding regulatory

relationship/pathway– exploration

25

Page 26: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Gene Selection

• Modified t-test• Significance Analysis of Microarray (SAM)• Limma (Linear model for microarrays )• Random forest • Lasso (least absolute selection and

shrinkage operator)• Linear Mixed model• Elastic-net• Etc,

26

Page 27: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Visualization

• Dimensionality reduction• PCA (Principal Component Analysis)• Biplot• Multi dimensional scaling• Etc

27

Page 28: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Clustering

• Cluster the genes• Cluster the

arrays/conditions• Cluster both

simultaneously

• K-means• Hierarchical• Biclustering

algorithms

28

Page 29: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Clustering

• Cluster or Classify genes according to tumors

• Cluster tumors according to genes

29

Page 30: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Biclustering

• A biclustering method is an unsupervised learning method which looks for sub-matrices in a data matrix with a high similarity of elements.

• Algorithms: Statistical based, AI, machine learning.

• BiclustGUI: A User Friendly Interface for Biclustering Analysis

30

Page 31: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Bicluster Structure

31

Page 32: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

Software/Statistical Packages

• Minitab • SAS• SPSS• R• S-Plus• Matlab• Stata

32

Page 33: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

• R now is growing, especially in bioinformatics– Statistics, data analysis, machine learning– Free– High Quality– Open Source– Extendable (you can submit and publish

your own package!!)– Can be integrated with other languages

(C/C++, Java, Python)– Large active user community– Command-based (-)

33

Page 34: 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011

34

Summary

• Statisticians can flexibly get involved in many fields.

• Only tools, applications are widely range.• Biostatisticians have many opportunities in

public health services ( Centers for Disease Control and Prevention, CDC), pharmaceutical companies, research institutions etc.

• Statistical Bioinformatics: cutting edge technology -> methods are growing -> many more developments in future.