in silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing...

21
In silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/

Upload: dinhbao

Post on 09-Mar-2018

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

In silico blood genotyping from exomesequencing data

Silvio Tosatto

BioComputing UP, Department of Biology,University of Padova, Italy

URL: http://protein.bio.unipd.it/

Page 2: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Today

• Personalized genetics has been upon us for some time

• How good are we at actually identifying phenotype from whole genome?

Page 3: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

The CAGI Personal Genome Project (PGP) Challenge

• Few goals are more pure to genome interpretation than predicting traitsfrom raw sequence (or genotype) data

• In this CAGI challenge, phenotypes/traits are predicted for real people with genetic data

• 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10)

Dataset provided byGeorge Church

Page 4: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Personal genome project (PGP) ‐ Predict individuals’ phenotype

Numerical traits33. Birth weight (in g)34. HDL level (in mg/dL) *35. LDL level (in mg/dL) *36. Triglyceride level

(in mg/dL) *37. Fasting blood glucose level

(in mg/dL)38. Warfarin dose (in mg)39. Age at Menarche40. Annual income (in $)

Page 5: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Numerical traits33. Birth weight (in g)34. HDL level (in mg/dL) *35. LDL level (in mg/dL) *36. Triglyceride level

(in mg/dL) *37. Fasting blood glucose level

(in mg/dL)38. Warfarin dose (in mg)39. Age at Menarche40. Annual income (in $)

Personal genome project (PGP) ‐ Predict individuals’ phenotype

Page 6: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Blood Groups

• Clear genetic cause of phenotypes

• Model system for phenotype prediction

• Good description in literature

• High relevance, especially for blood transfusions

(Blood. 2009;114: 248-256)

Page 7: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Example: ABO glycosyltransferase

Blood Grp Genes AntigensABO ABO A, B, O

Amino acid residues differingbetween blood group A- and B-active transferases, respectively (Arg176Gly; Gly235Ser; Leu266Met; Gly268Ala) are shown with the single-letter code and theirpositions indicated.

Page 8: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Relevant Blood Types

Blood Grp Genes AntigensABO ABO A, B, O

RH RHCE, RHD D, E, C plus 50 minor

DUFFY DARC FY(a), FY(b)

Kell KEL K1, K2 plus 23 minor

Diego SLC4A1 Dia, Dib, Wra, Wrb

Kidd SLC14A1 Jk(a), Jk(b)

Lewis FUT3 a, b

Lutheran BCAM Lu(a), Lu(b) plus 15 minor

MNS GYPA, GYPB, GYBE

M, N, S plus 40 minor

Bombay FUT1, FUT2 H, secretor

10 out of ca. 30 blood groups are relevantfor transfusions

Page 9: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

BOOGIE: BlOOd Group IdEntifier

• A knowledge-based system to predict blood groups from sequencing data

• All 10 groups relevant for blood transfusions are predicted

• A specialized genotype-phenotype knowledge base is required

Page 10: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

BOOGIE: Knowledge representation

• Stored in tree-like structure

• Rules expressed in “if <mutation(s)>

then <phenotype(s)>” form

Page 11: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

BOOGIE: Knowledge collection

– Manually curated

– 580 rules derived

Blood G rp G enes AntigensABO ABO A, B , O

R H R H C E, R H D D , E, C p lus 50 m inor

D U FFY D AR C FY(a), FY(b)

Kell KEL K1, K2 p lus 23 m inor

D iego SLC 4A1 D ia, D ib, W ra, W rb

K idd SLC 14A1 Jk(a), Jk(b)

Lew is FU T3 a, b

Lutheran BC AM Lu(a), Lu(b) p lus 15 m inor

M N S G YPA, G YPB, G YBE

M , N , S p lus 40 m inor

Bom bay FU T1, FU T2 H , secre tor

Page 12: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Relevant variants

Gene‐based annotation of variants

Select conserved positions

Remove unrelatedgenes

ANNOVARANNOVAR(Wang et al., Nucleic Acids Research 2010)

Millions of SNVs

ANNOVAR is used

to reduce the SNVs

to manageable

number.

Few relevant SNVs

Page 13: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

BOOGIE Pipeline

B lood G rp G enes AntigensABO ABO A, B , O

R H R H C E, R H D D , E , C p lus 50 m inor

D U FFY D AR C FY(a), FY(b)

Kell KEL K1, K2 p lus 23 m inor

D iego SLC 4A1 D ia, D ib, W ra, W rb

K idd SLC 14A1 Jk(a), Jk(b)

Lew is FU T3 a, b

Lutheran BC AM Lu(a), Lu(b) p lus 15 m inor

M N S G YPA, GYPB, G YBE

M , N , S p lus 40 m inor

Bom bay FU T1, FU T2 H , secre tor

Page 14: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Benchmarking

• BOOGIE covers all known blood group variants

• Difficulty in finding genome sequences with known blood phenotypes

• Personal Genome Project (PGP) as annotated benchmark set

Page 15: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Personal Genome Project (PGP)

The mission of the PGP is to encourage the development of personal genomics

• 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10)

• A larger dataset (PGP-1K) aims to cover at least1,000 genomes

Unfortunately, only ABO and Rh blood groupinformation is available

Page 16: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

PGP-10 Data

Back row (left to right): James Sherley, Misha Angrist, John Halamka, Keith Batchelder, Rosalynn Gill.

Front row (left to right): Esther Dyson, George Church, Kirk Maxey.

Not shown: Stan Lapidus and Steven Pinker.

Page 17: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

PGP-10 Data

Page 18: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

PGP-10 Results

PGP1 PGP4 PGP8Known O + A - B +ABO O A BRh c; e; weak D c; e; weak D c; e; weak D

DUFFY FY(a+); FY(b-) FY(a-); FY(b+) FY(a-); FY(b+)KELL K2; K21+; K4-;

K3-; K11; K17; K14; K24; K6+;

K7-

K2; K21+; K4-; K3-; K11; K17; K14; K24; K6+;

K7-

K2; K21+; K4-; K3-; K11; K17; K14; K24; K6+;

K7-Diego Dib; Memph neg Dib; Memph neg Dib; Memph negKIDD Jk(a-); Jk(b+) Jk(a-); Jk(b+) Jk(a+); Jk(b-)Lewis negative negative negative

Lutheran Lu(a-); Lu(b+);Lu6+; Lu9-; Lu4; Lu8+; Aua+;Aub-

Lu(a-); Lu(b+);Lu6-; Lu9+;Lu4-; Lu8+; Aua-;Aub+

Lu(a-); Lu(b+);Lu6+; Lu9-;Lu4-; Lu8+; Aua+;Aub-

MNS M; S M; s M,sBombay H+; secretor H+; secretor H+; secretor

BOOGIE predicts correctly all ABO types and allexcept one (PGP-4) Rh groups

Page 19: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

PGP-1K Results

• A second dataset was built from all PGP-1K participants with availableblood group information for a total of 22 individuals

• This dataset contains micro array data (23&me SNPs)

P = predicted R = real* = missing blood group relevant SNPs from dataset

Page 20: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

Conclusions

• We developed a method, called BOOGIE, to predict the ten blood

groups relevant for transfusions from sequencing data

– Specialized knowledgebase with 580 genotype to phenotype rules

– Novel variants can be easily considered

• Benchmarking was (so far) only possible on PGP data for the ABO and

Rh blood groups

– The ABO and Rh systems are correctly predicted in 85-100% of cases

– The Rh- type presents some additional difficulties

Page 21: In silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova,

AcknowledgementsAcknowledgements

Manuel Giollo

Giovanni Minervini

Marta Scalzotto (not shown)

Emanuela Leonardi

Carlo Ferrari

URL:URL: http://http://protein.bio.unipd.itprotein.bio.unipd.it//

FundingFIRB Futuro in Ricerca

Università di Padova CARIPLOAIRC