carlow it bioinformatics november 2006

21
CFTR – gene cloning and initial bioinformatic analysis Riordan et 12(*) et Tsui (1989) Science 245:1066 Carlow IT Bioinformatics November 2006 * Including Francis Collins, later leader of the Human Genome Sequencing Project

Upload: cleary

Post on 15-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

CFTR – gene cloning and initial bioinformatic analysis Riordan et 12(*) et Tsui (1989) Science 245:1066. Carlow IT Bioinformatics November 2006. * Including Francis Collins, later leader of the Human Genome Sequencing Project. Cystic fibrosis. Horrible inherited disease - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Carlow IT Bioinformatics November 2006

CFTR – gene cloningand initial bioinformatic analysis

Riordan et 12(*) et Tsui (1989)Science 245:1066

Carlow IT BioinformaticsNovember 2006

* Including Francis Collins, later leader of the Human Genome Sequencing Project

Page 2: Carlow IT Bioinformatics November 2006

Cystic fibrosis

• Horrible inherited disease– Affecting lung, pancreas, sweat-glands

• Abnormally high trans-membrane electrical potential– Decreased Cl- ion membrane transport

• Often associated with failure to respond to ATP dependent kinase – no phosphorylation: no function

Page 3: Carlow IT Bioinformatics November 2006

More symptoms etc.

• Difficult breathing• Early death (1959 6mths, 2006 38yrs)• More prone to infections (thicker mucus)• Can do pre-natal diagnosis or sweat test• "Woe is the child who tastes salty from a

kiss on the brow, for he is cursed, and soon must die“ German proverb 1700s

• We modify AMPs defensins: can make one effective in high salt environment??

Page 4: Carlow IT Bioinformatics November 2006

Genetics & epidemiology

• Located on chr 7q31.2 180Kb gene• 1 in 25 europeans carries a CFTR mutation so

1:2500 live birth have the disease• Males and female equally affected• Life expect higher in males – nobody knows why

• Why so common?• Cholera toxin requires normal CFTR• Also possible connexion with typhus

Page 5: Carlow IT Bioinformatics November 2006

Mapping

• Genetic association with markers pinpoints chromosome 7

• Chromosome walking to zero in

• NO genome sequence in those days

Page 6: Carlow IT Bioinformatics November 2006

Clone and sequence

• Why bother?– because we can!– ? can predict features/functions– ? Can compare CF v normal to identify mutation

• Working with cDNA not genomic• Generate cDNA libraries from cells & cell-lines• Screen for cDNAs that hybridise with known

CFTR fragment• Eventually (much hard work) got 19 overlapping

cDNA clones

Page 7: Carlow IT Bioinformatics November 2006

Fig 1

19 normal

clones2 CF clones

Page 8: Carlow IT Bioinformatics November 2006

Fig3 - where expressed

Patchy expressionprofile

Page 9: Carlow IT Bioinformatics November 2006

Gene sequence

• Clones span 6.1kb of RNA• ORF protein of 1480 amino acids

– So bigger than 300AA average

• In 1989 << 1000 human genes sequenced • Bioinformatic analysis possible then:

– Start codon, consensus seq for transl start + AUG– 2nd structure prediction– Hydropathy plot– Homology searches (pre BLAST)– Glycosylation, Ser, Thr kinase sites

Page 10: Carlow IT Bioinformatics November 2006

Start of ORF

• 5’- AGACCAUGCA-3’ in CFTR

• 5’-(CC)[A/G]CCAUGG(G) consensus– Convinced?– I’m not

Page 11: Carlow IT Bioinformatics November 2006

The sequence 1 Exon splice

Pred kinase sites

2 TM domains

Trscr Start

AA countRNA count

Page 12: Carlow IT Bioinformatics November 2006

The sequence 2

First ATP Binding foldIs underlined

Delta F 508circled

Page 13: Carlow IT Bioinformatics November 2006

Protein analysis

Whole protein is two similar halves each with 6 membraneSpanning domains (hydropathic peaks) and two NBFs (hydrophilic regions) and a charged R region

Page 14: Carlow IT Bioinformatics November 2006

Fig6 – homology/similarity F508

Comparing two conserved regions in CFTR and other proteins: some withTwo, some with one similar region, multidrug resistance, transporters etc.

Conserved, hydrophobicAromatic position at 508

Page 15: Carlow IT Bioinformatics November 2006

Structure of the fold

• Two halves similar structure but low AA conservation (best is only 27/66 identities)

• Others in family have much tighter conservation

• No signal peptide says that orientation of first TM domain is (i – o)

• External loops very short• …except between TM7 and TM8 where

there is N glycosylation site

Page 16: Carlow IT Bioinformatics November 2006

More…

• R domain is one exon 69/241 residues are polar alternating +ve and –ve charge regions

• Also most of the phosphorylation kinase sites• All family members secrete something:

– Chloride (CFTR) – Pigment (drosophila white gene)– lytic peptide (E. coli hemolysin)

• …so what about the “function unknown” mbpX gene in liverwort chloroplasts ?

Page 17: Carlow IT Bioinformatics November 2006

More…

• Hypothesise that CFTR is the ion channel

• 10/12 of TM domains have >1 +ve AA– ie. amphipathic helix– cf. brain Na+ channel & GABA-R Cl- channel

• Contrast p-glycoprotein– Closely realted but no +ve TM AAs

• Big protein – maybe also other functions

Page 18: Carlow IT Bioinformatics November 2006

Fig 7 a composite model

Glycosylation

Page 19: Carlow IT Bioinformatics November 2006

In colour from wikipedia

Page 20: Carlow IT Bioinformatics November 2006

Conclude

• From very little data and very small DBN=bases N=seqs

• 198823,800,000 20,579198934,762,585 28,791199049,179,285 39,533

• 200011,101,066,288 10,106,023

• to compare with can make predictions about structure and function that have stood the test of time.

Page 21: Carlow IT Bioinformatics November 2006

Postscript

• F508 may be about delivery of protein to the membrane– Functions fine if you trick cells to deliver!

• By 1995 300 different mutations identified in the gene

• Last month 1531 different mutations at– http://www.genet.sickkids.on.ca/cftr/StatisticsPage.html

• With human genome, SNPs, ESTs much easier to interpret sequence information