the structural and functional analyses of disease...
TRANSCRIPT
The Structural and Functional Analyses of Disease-Causing WD40 Proteins/Domains Mutations
Speaker: Ma Jing Supervisor: Prof. Ye Prof. Wu
2014.09.26
Introduction in Chinese
Background The classification of mutation
Roles and interations that may be affected by mutation(previous work)
HGMD
Introduce the data type and entries numbers of the HGMD
Introduce the databases and tables stored in HGMD
LRRK2 Structural and functional in silico analysis of LRRK2 missense substitutions
The steps of analyses in this paper and some comparison
Edgetic Introduce the Edgetic network model which explain the complex disease
Data
WD40 protein mutations distribution in HGMD
The WD40 protein mutations disease-causing reliability classification and high mutation frequency genes
Outline
Background
Introduction of HGMD
WD40 Protein Mutations Data Collection
Explore the Structural and Functional Analyses Direction
Summary
Background
What is mutation? In genetics, a mutation is a change of the nucleotide sequence of the genome of an organism, virus, or extrachromosomal genetic element. What would caused by the mutation? Mutations in genes can either have no effect, alter the product of a gene, or prevent the gene from functioning properly or completely. Classification of mutation type
By effect on structure By effect on function By impact on protein sequence
Background
By effect on structure Small-scale mutations
• Point mutation
• Insertions • Deletions
Large-scale mutations
• Gene duplications(基因重复) • Deletions of large chromosomal regions • Chromosomal translocations(染色体易位) • Interstitial deletions (中间缺失) • Chromosomal inversions(染色体倒置) • Loss of heterozygosity(杂合性丢失)
Transitions (Alpha) and transversions (Beta)
Single nucleotide Polymorphism(SNP) Substitution rate >1%
A a
inactivation
homozygous genotype
Background
By effect on function
Loss-of-function mutations
Gain-of-function mutations
Dominant negative mutations(显性负突变)
Lethal mutations(致死突变)
A back mutation or reversion
Gain-of-function
Loss-of-function
Lethal mutations
Dominant negative
Reversion
Even when paired with a normal RTK, it still doesn’t work
Background
By impact on protein sequence
Frameshift mutation(移码突变) Nonsense mutation(无义突变)
Missense mutations(错义突变)
Silent mutations(沉默突变)
So how do the gene mutations cause the structural and functional effect on protein?
Degeneracy of codon(密码子的简并性)
Background
Previous report
Each mutation is associated with an effect on one or more roles of the residue concerned. Roles that may be affected are: protein stability or folding, ligand binding, catalysis, regulation by allosteric and other mechanisms, post-translational modification.
The types of interactions that are changed are also identified: hydrophobic, hydrogen bond (H-bond), van der Waals, electrostatic interactions, and disulfide bonds.
Wang, Zhen, and John Moult. "SNPs, protein structure, and disease." Human mutation 17.4 (2001): 263-270
Background
Wang, Zhen, and John Moult. "SNPs, protein structure, and disease." Human mutation 17.4 (2001): 263-270
Show some examples Wild type : colored by element Mutation : colored red
a: Mutant R252L make loss of a salt bridge and hydrogen bonds in coagulation factor XIIIA, destabilizing the folded structure, and resulting in poor clot formation
b: Mutant L106R introduces a positive charge into the hydrophobic core of the protein, destabilizing the folded structure.
c: Mutant D128G removes four hydrogen bonds between a pair of subunits, and introduces a higher entropy cost on folding, destabilizing the folded state.
d: Hindrance of ligand binding. Mutant G75D introduces a negative charge into the cavity, interfering with retinol binding both electrostatically and sterically.
Background
Wang, Zhen, and John Moult. "SNPs, protein structure, and disease." Human mutation 17.4 (2001): 263-270
Previous Work WD40 protein
Monogenic inherited disorders (单基因遗传病)
Complex Disease related partly, such as PD
Protein Structure from: 3D structures 40% or higher sequence identity
protein structure
Protein Structure get: purification and crystallization of
WD40 proteins are very difficult Sequence identities are low among
the WD40 protein
So predicted the relatively accurate WD40 structure is our advantage. It is better for us to understand the pathogenesis(致病
机理) of mutation.
Introduction of HGMD
http://www.hgmd.org/
Data type
Number of entries (public release for
academic/non-profits only)
Number of entries (HGMD Professional
release 2014.2)
Mutation totals 108508 156932 Missense/nonsense 60713 87173 (details)
Splicing 10238 14302 (details) Regulatory 1909 3024 (details)
Small deletions 17123 23731 (details) Small insertions 7058 9917 (details)
Small indels 1606 2282 Repeat variations 336 456
Gross insertions/duplications 1473 2797
Complex rearrangements 1040 1567 Gross deletions 7012 11683
Gene/sequence data Genes 4037 6800
cDNA reference sequences 3896 6567 1985 1990 1995 2000 2005 2010
0
2000
4000
6000
8000
10000
12000
14000
Num
ber
of th
e m
utat
ion
entr
ies
Year
Public Professional
Number of Mutation Entries by Year of Publication
Number of entries in HGMD by type almost no collection in recent
Public release
HGMD(Human Gene Mutation Database)
Introduction of HGMD
The HGMD database comprises four schemata: hgmd_pro Contains all the curated mutation data. hgmd_snp Contains validated dbSNP entries for the genes present in HGMD. hgmd_advanced Contains ontology mappings for the diseases/phenotypes in HGMD from nomenclature sources such as UMLS, ICD or SNOMED. hgmd_phenbase Supports the advanced search interface for HGMD, is not intended for end-user queries, does not contain additional original data and is not covered here.
HGMD_PRO schema overview
Introduction of HGMD
MUTATION Table Description Field Type Null Key Default
disease Varchar(125) YES NULL
gene Varchar(15) NO PRI
Base Varchar(9) NO PRI
Position decimal(1,0) YES NULL
Upstr Varchar(1) YES NULL
Downstr Varchar(1) YES NULL
Amnio Varchar(8) YES NULL
Condon int(11) NO PRI 0
Tag YES NULL
authour Varchar(25) YES NULL
Field Type Null Key Default
author Varchar(25) YES NULL
journal Varchar(10) YES NULL
fullname Varchar(50) YES NULL
vol Varchar(6) YES NULL
Page Varchar(10) YES NULL
year Year(4) YES NULL
pmid Varchar(8) YES NULL
comments Varchar(125) YES NULL
acc_num Varchar(10) YES UNI NULL
new_date Data YES NULL
TAG Mutation reliability type TAG Mutation reliability type
DM pathological mutation FP in vitro/laboratory or in vivo functional polymorphisms (no disease association reported)
DM? Pathological mutation(doubt) FTV polymorphic or rare variants(no disease association reported)
DP disease-associated polymorphisms (no direct evidence)
R retired record(be deemed to no longer be disease causing)
DFP disease-associated polymorphisms(supporting evidence)
WD40 Protein Mutations Data Collection
WD40 Protein Mutation Distribution in HGMD
HGMD WD40(all) WD40(IN HGMD)
Gene Mutation Gene Gene Mutation
6800 156951 536 211 3613
WD40 Protein Mutation Extracted from HGMD
3.1% / 2.3% Error :not BRCA2, just a part
of it 1462/40.5%
41 Gene(>=10 )
WD40 Protein Mutations Data Collection
DM DM? DP DFP FP FTV R0
500
1000
1500
2000
2500
3000
Num
ber
of W
D40
pro
tein
cla
ssifi
ed in
HG
MD
TAG type
number of each type
The TAG Type Distribution of WD40 Proteins Mutations
Gene Number Gene Number
AAAS 68 KIF21A 14 AHI1 47 LGI1 39
APAF1 10 LYST 60 BBS1 71 NBEAL2 33 BBS2 65 NFIA 10
BRCA2 1462 NHLRC1 62 CSF1R 48 PAFAH1B1 142 CTLA4 13 PALB2 107 CUL4B 13 PCSK9 66 DDB2 10 PDGFRA 19 DNAI1 24 PEX7 46 ERCC8 29 PON1 19 FGFR1 163 PSEN2 35 GGCX 27 STAT5B 11 HGD 113 WDR19 18
HLA-DRB1 12 WDR34 15 HPS5 11 WDR35 13
IFT140 21 WDR36 26 IFT172 14 WDR45 28 IL2RG 199 WDR62 28
High Mutation Frequency Genes Just the all mutations which on this related gene, not the WD40 domains genes.
Introduction in Chinese
Explore the Structural and Functional Analyses Direction
Step 1 : Carried out a redefinition of the tertiary structures domain boundaries based on the predicted(EasyModeller and LOMETS meta_server)
Cardona, Fernando, Marta Tormos-Pérez, and Jordi Pérez-Tur. "Structural and functional in silico analysis of LRRK2 missense substitutions." Molecular biology reports 41.4 (2014): 2529-2542.
Explore the Structural and Functional Analyses Direction
Step 2: Distribution of the PD related substitutions in relation to the functional domains of LRRK2 stored in HGMD. The mutations described as pathogenic are underlined
Cardona, Fernando, Marta Tormos-Pérez, and Jordi Pérez-Tur. "Structural and functional in silico analysis of LRRK2 missense substitutions." Molecular biology reports 41.4 (2014): 2529-2542.
R1483Q
R1514Q
Q1823K
D2175H
Y2189C
V2390M
Explore the Structural and Functional Analyses Direction
Step 3:We predicted the effect of the missense mutations related to PD in secondary and tertiary structures, on the electrostatic surface, in polar contacts and charge distribution, as well as stability, function alteration and possible pathogenicity.
Cardona, Fernando, Marta Tormos-Pérez, and Jordi Pérez-Tur. "Structural and functional in silico analysis of LRRK2 missense substitutions." Molecular biology reports 41.4 (2014): 2529-2542.
The column ‘‘other alteration’’ is the result of the structural alterations or putative phosphorylation alterations detected.
Mutations described as pathogenic are marked by an asterisk*
Explore the Structural and Functional Analyses Direction
Cardona, Fernando, Marta Tormos-Pérez, and Jordi Pérez-Tur. "Structural and functional in silico analysis of LRRK2 missense substitutions." Molecular biology reports 41.4 (2014): 2529-2542.
Reported for not pathogenic but pathogenic showed in HGMD
had a drastic effect on the tertiary structure as it introduced torsion into this domain, and changed the inner and outer angles, as well as total domain length
Prediction examples of missense mutation caused
Explore the Structural and Functional Analyses Direction
Nuytemans, Karen, et al. "Founder mutation p. R1441C in the leucine-rich repeat kinase 2 gene in Belgian Parkinson's disease patients." European Journal of Human Genetics 16.4 (2008): 471-479.
Mutations described in HGMD
Reported and novel coding mutations in the LRRK2 Roc and kinase domains. Symbols: red stars, putative pathogenic missense mutations detected in the Flanders-Belgian PD population; green stars, novel rare variants, putatively not pathogenic; yellow stars, known polymorphisms;
Disease gene Mutation Reftag Parkinson disease LRRK2 R1441H DM
Parkinson disease LRRK2 R1441G DM
Parkinson disease LRRK2 R1441C DM
Parkinson disease LRRK2 R1483Q DM
Potential protein deficiency LRRK2 R1483* FTV
Mutations described in report
described as pathogenic in HGMD
Explore the Structural and Functional Analyses Direction
Cardona, Fernando, Marta Tormos-Pérez, and Jordi Pérez-Tur. "Structural and functional in silico analysis of LRRK2 missense substitutions." Molecular biology reports 41.4 (2014): 2529-2542.
LRRK2 WD40 domain structure boundaries are different from the WDSP predicted result.
The structure of the mutant protein was predicted by EasyModeller and LOMETS meta_server. Through this structure to understand the pathogenesis about WD40 protein mutations is not optimal method .
LRRK2 WD40 domain structure mutations analyses
Explore the Structural and Functional Analyses Direction
Zhong, Quan, et al. "Edgetic perturbation models of human inherited disorders." Molecular systems biology 5.1 (2009). Dreze, Matija, et al. "'Edgetic'perturbation of a C. elegans BCL2 ortholog."Nature methods 6.11 (2009): 843-849.
Edgetic (edge-specific genetic perturbation) network
The ’Edgetic’ model concept put forward and defined.
The ’Edgetic’ model test verification
Explore the Structural and Functional Analyses Direction
Zhong, Quan, et al. "Edgetic perturbation models of human inherited disorders." Molecular systems biology 5.1 (2009). Dreze, Matija, et al. "'Edgetic'perturbation of a C. elegans BCL2 ortholog."Nature methods 6.11 (2009): 843-849.
I. A single gene can be associated with multiple disorders (allelic heterogeneity)
II. A single disorder can be caused by mutations in any one of several genes (locus heterogeneity)
III. Only a subset of individuals carrying a mutation are affected by the disease (incomplete penetrance)
IV. Not all individuals with a given mutation are affected equally (variable expressivity)
Complex genotype-to-phenotype Mendelian disorders:
Node removal versus edgetic perturbation models of network changes underlying phenotypic alterations
Summary
HGMD is a good human mutation source, we could use these information to compare with previous work.
WDSP which could predict WD40 protein/domain structure relatively accurate is the advantages for us to understand the mutation pathogenesis better.
Mutant protein MD simulations could help us to get a relatively accurate structure after mutated which is maybe difficult for biologists.
Use model to explain non-monogenic disorder of complex disease is more meaningful than use single mutation.
Acknowledgement
Thank you for your attention!