ana sofia pedrosa pinto [email protected] department of...
TRANSCRIPT
Ana Sofia Pedrosa Pinto
Bioinformatics Group Department of Molecular Biology
Faculty of Science, University of Zagreb Horvatovac 102a, 10 000 Zagreb
December 20, 2012
Study Committee of the Biophysics Doctoral Programme Faculty of Science, University of Split
REQUEST FOR APPROVAL OF THE DISSERTATION TOPIC
Title of the proprosed topic:
Computational analysis of human plasma N‐glycome and genotypes
Mentor:
Prof. Kristian Vlahoviček
Full Professor of Bioinformatics and Group Leader of the Bioinformatics Group in the Faculty of
Science, University of Zagreb.
Proposed members for the committee:
Prof. Kristian Vlahoviček
Prof. Igor Weber
Prof. Andreja Ambriović‐Ristov
Requirements for the dissertation topic defense:
1. Attended and succesfully passed all the courses enrolled in the Biophysics Doctoral Programme.
2. Presented the qualification seminar and was approved by the evaluation committee.
3. Published two papers in high impact journals (see Curriculum Vitae below).
Enclosed documents:
1. Overview and description of the research: AnaSofiaPedrosaPinto_ResearchOverview.pdf
2. Mentor's candidate evaluation about the research: AnaSofiaPedrosaPinto_MentorEvaluation.pdf
Curriculum Vitae of the Doctoral Candidate:
Education October 2008 – present Interdisciplinary PhD Program in Biophysics, University of Split, Croatia
October 2005 – July 2006 Socrates/ Erasmus Program, University of Bologna, Italy
September 2002 – September 2007 Graduated in Biomedical Engineering, Faculty of Sciences and Technology, University of Coimbra, Portugal Thesis title: Home Monitoring for Obstructive Sleep Apnea Diagnosis in Children
Work experience October 2009 – December 2011 Teaching assistant of Algorithms and Programming course (Computational Biology module), University of Zagreb, Croatia
October 2007 – September 2008 Research project entitled Computational Analysis of Exonic Splicing Regulators, Bioinformatics Group, University of Zagreb, Croatia
Publications Pinto, S., Vlahoviček, K. and Buratti, E. (2011), PRO‐MINE: A Bioinformatics Repository and Analytical Tool for TARDBP Mutations. Human Mutation, 32: E1948–E1958. doi: 10.1002/humu.21393
Pučić, M., Pinto, S., Novokmet, M., Knežević, A., Gornik, O., Polašek, O., Vlahoviček, K., Wei, W., Rudd, P. M., Wright, A. F., Campbell, H., Rudan, I., and Lauc, G. (2010) Common aberrations from normal human N‐glycan plasma profile. Glycobiology 20, 970‐975.
Participation in Conferences and Workshops
Participate in the EMBO YIP PhD Course, Heidelberg, Germany (2011) [oral
and poster presentation].
Attended the 10th Congress of the Croatian Society of Biochemistry and
Molecular Biology, Opatija, Croatia (2010) [poster presentation].
Participated in the FEBS workshop: Education in biochemistry and molecular
biology, Opatija, Croatia (2010).
Participated in the workshop Glycomics meets Genomics: novel strategies in
combining omics approaches, Dubrovnik, Croatia (2010).
Attented the 3rd Adriatic Meeting on Computational Solutions in the Life
Sciences, Primosten, Croatia (2009) [poster presentation].
Attended the EMBO Young Scientists Forum, Zagreb, Croatia (2009).
Attended the Scientific Symposium: 50 Years of Molecular Biology in
Croatia, Zagreb, Croatia (2008) [oral presentation].
Participated in the ENSAPI08 Course: Ensembl API access, Programming workshop, Oeiras, Portugal (2008).
RESEARCH OVERVIEW
CANDIDATE: Ana Sofia Pedrosa Pinto
STUDY PROGRAMME: Biophysics Doctoral Programme
Faculty of Science, University of Split
RESEARCH INSTITUTION: Bioinformatics Group, Department of Molecular Biology
Faculty of Science, University of Zagreb
CONTENTS
TITLE ........................................................................................................................................... 2
INTRODUCTION .......................................................................................................................... 2
1. Glycomics ............................................................................................................................ 2
2. Glycosylation ....................................................................................................................... 3
3. Structural analyses and previous studies of N‐glycans ...................................................... 4
4. Aim of the thesis ................................................................................................................. 5
MATERIALS & METHODS ............................................................................................................ 6
1. Human Samples .................................................................................................................. 6
2. Glycan analysis and glycan profiles .................................................................................... 6
3. Phenotype and Genotype Data .......................................................................................... 7
DATA ANALYSIS AND RESEARCH RELEVANCE ............................................................................ 8
1. Clustering ............................................................................................................................ 8
2. Discriminant Analysis of Principal Components ................................................................. 9
3. SNP selection .................................................................................................................... 10
ETHICAL GUIDELINES ................................................................................................................ 11
REFERENCES ............................................................................................................................. 11
TITLE
Computational analysis of human plasma N‐glycome and genotypes
INTRODUCTION
1. Glycomics
Glycans are considered to be the most abundant and diverse nature's biopolymers [1] and
can be found in their free form or attached to proteins and lipids, also called
glycoconjugates.
Investigating the structure, biosynthesis and biological function of glycans is the focus of
glycobiolgy and has been of interest since the early nineteenth century. The field of
glycobiology has expanded with the aid of technological developments and new discoveries
and the concept of glycomics emerged as an analogous to those of genomics and
proteomics.
Glycomics is the systematic study of the entire set of glycan structures expressed by specific
cells, tissues or organisms, comprehending the analyses of their genetic, physiologic and
pathologic aspects, in order to gain more knowledge about the factors regulating the
synthesis of glycans and to understand the role and association of glycans in biological
processes.
The spectrum of all glycans and glycoconjugates is called glycome and is estimated to be
much larger than the proteome itself [2], an evidence of the importance of the glycome and
of the fact that the glycome need to be considered when studying and analysing biological
systems and processes.
The "omics" approaches of genomics, transcriptomics and proteomics have enabled gene
mapping, elucidated the possible relationships between gene and diseases, explored protein
modifications under different physiological states of the cell or organism. Although these
discoveries have been unquestionably important, the view of human physiology is not
complete and many explanations are still lacking. The new emerging field of glycomics
integrated with the other "omics" fields and combined with major advances in technology
can help to reveal some of the missing links.
2. Glycosylation
Glycans are usually composed of 10 to 15 monosaccharides (or simple sugars) and consist of
a core and an antennary structure (figure 1). While the synthesis of the core structure is
mainly conserved, the antennary structure assembly is often regulated in a tissue‐ or cell
lineage‐specific manner [3] and can vary in length, structure (number of branches) and
composition (types of sugars). The variability of the antennary structure gives rise to a vast
glycome composed of thousands of glycan isomers which increase diversity in the proteome.
Glycans participate in many biological processes including protein folding, molecular
trafficking and clearance, signaling and cell‐cell interactions, among others [4].
Figure 1. Eukaryotic glycan structures showing the core (red) and antennary (green) structures. (Adapted from [5])
Glycosylation is the enzymatic process through which glycans are synthesized and attached
to proteins and lipids. Contrary to protein synthesis where a single gene codes for a protein,
glycan synthesis is not template driven but encoded within a complex network of
glycotransferases, glycosidases, transcripton factors, transporters and other proteins [4,6].
Glycosylation is the most complex and abundant post‐translational modification affecting
the stucture of proteins and increasing their functional diversity. Nearly all membrane and
extracellular proteins are glycosylated [7], hence small structural and conformational
modifications of the glycans attached to them can be sufficient to cause loss or impairment
of protein function. Disregulation of glycosylation has been associated with several diseases
such as cancer, diabetes, cardiovascular, congenital and immunological disorders.
N‐glycosylation is the most common type of glycosylation which links glycans (named N‐
linked glycans or N‐glycans in short) to the asparagine residue of the protein through rather
extensive and intricate biosynthetic mechanisms. The complete absence of N‐glycosylation
can be lethal to the embryo demonstrating its importance for the normal growth and
function of the organism [8]. Also, congenital disorders of glycosylation, a group of rare but
severe inherited metabolic disorders, are characterized by an abnormal N‐glycosylation
resulting from genetically inherited mutations which alter the function of the enzymes part
of the N‐glycosylation synthetic pathway.
Immunoglobulin G (IgG) is the most abundant antibody isotype found in human blood and is
an example of a N‐glycosylated protein. Several studies investigating the structural and
functional properties of the glycans attached to IgG, associated changes in these glycans
with various diseases, such as rheumatoid arthritis and diabetes [9‐11]. It was also shown
that different glycosylation of IgG can lead to opposite pro‐ and anti‐inflammatory responses
[12].
3. Structural analyses and previous studies of N‐glycans
Structural analyses of glycans have always been challenging, not only because of the
complexity of glycan structures that make their identification difficult but also because of
restrictions on the level of the analytical tools used that limited the analysis to a small
number of samples.
High‐performace liquid chromatography is one of the techniques widely used for glycan
analysis and it has been recently adapted for high‐throughput glycan quantification [13].
The developed method allowed the first large scale study which evaluated the variability and
heritability of the human plasma N‐glycome [14]. Two main findings should be noted: first,
the variability of glycans at the population level was shown to be larger than expected and
should be taken into account when using glycan levels for diagnostic purposes; second,
different N‐glycan groups are under distinct influence of genetic and environmental control
with variable strength.
Since changes reported to be associated with disease are usually smaller than the found high
glycan variability, a follow up study was conducted to test the stability of the human plasma
N‐glycan profile of healthy individuals over a period of time [15]. The plasma N‐glycome
exhibited temporal stability, indicating that the genetic background might have a significant
influence on it. Consequently, glycan profiles present themselves as potential diagnostic
markers for diseases as their changes will be determined by environmental factors and/or
altered physiological processes.
Associations of N‐glycan levels with gender, age and lifestyle parameters, such as smoking
and diet, were reported [16]. However, these factors explained only a small fraction of the
glycans variability supporting the evidence that glycans are under greater genetic control.
Several individuals exhibiting different deviations from the normal plasma glycan profile
were identified within a population [17]. Groups of individuals with the most similar glycan
profiles to these outliers were determined, by performing consensus scoring of pairwise
distance between vectors containing measured glycan values, and analysed for the presence
of common phenotypic characteristics. Some groups shared clinical conditions, like renal
problems, while others were apparently healthy. The study showed the presence of specific
glyco‐phenotypes indicating a possible association between them and certain
(patho)physiological conditions which could originate from rare mutations and/or
combinations of common mutations. The reported glyco‐phenotypes are the result of a
preliminary analysis that needs additional studies with a larger data set and with genomic
data to validate and corroborate these findings.
These initial studies characterize in detail the N‐glycome and give an overall view of its
behaviour at the population level. However, to demonstrate the practical applicability of the
glycan profiles as biomarkers and to unravel the influence of genetic component on the
glycan levels, both disease traits and genetic data need to be integrated and combined in
further studies.
The first attempt to join genomics and glycomics was recently performed in the first genomic
wide association study of human N‐glycome and identified some polymorphisms affecting
glycan levels [18].
4. Aim of the thesis
Glycans are involved in human health and disease. The great potential of the human plasma
N‐glycome as a disease biomarker is reinforced by the simplicity of the non‐invasive blood
test. Additionally, the glycans composition of plasma is expected to reflect physiological
status of the organism and advances in technology have enabled fast and simple
quantitative analysis of glycans.
The previously reported studies and their results concerning the human plasma N‐glycome
constitute the basis for this thesis research and motivate the analyses proposed to be
performed which are described below in more detail (see Data Analysis and Research
Relevance section).
The aims of the present research include locating disease related N‐glycosylation changes
that could be used as a biomarker and to reveal polymorphisms associated with glycan levels
that would enable more insights into the glycosylation network.
MATERIALS & METHODS
1. Human Samples
This research is based on samples from individuals part of larger genetic epidemology
studies which aim to investigate genetic variability and map genes associated with common
complex disease traits in genetically isolated populations.
The individuals belong to four different population cohorts: the islands of Vis and Korčula in
Croatia [19], the Orkney archipelago in Scotland [20] and the Karesuando community in
Sweden [21].
The data consists of 975 individuals from Vis island, 957 individuals from Korčula island, 890
individuals from Orkney islands and 686 individuals from Sweden community.
2. Glycan analysis and glycan profiles
Plasma samples from the individuals were collected and subjected to a previously reported
procedure for the release and labeling of the plasma N‐glycans [13].
Hydrophilic interaction high performance liquid chromatography (HILIC) was performed on
released glycans before and after desialylation (the process of removing the sialic acids from
the glycans by sialidase digestion). The chromatograms obtained were divided into
chromatographic peaks based on the similarity of the glycan structures present in each peak,
resulting in 16 groups of glycans before desialylation (figure 2) and 13 groups of desialylated
glycans. The amount of glycans present in each peak is expressed as a percentage of the
total integrated chromatogram and calculated as: the amount of glycan structures in the
peak/total N‐glycome.
Glycans were also quantified by weak anion exchange high‐pressure liquid chromatography
(WAX‐HPLC) according to the number of attached sialic acids: monosialylated, disialylated,
trisialylated and tetrasialylated.
Additionally, glycan structural features, such as fucosylation, galactosylation, sialylation of
biantennary structures and branching degree, were approximated by adding the glycans
sharing the same structural characteristic from either HILIC, after sialidase treatment‐
integrated glycan profiles or WAX‐HPLC.
Regarding the Immunoglobulin G, it was isolated from plasma and the released IgG glycans
were subjected to ultra performance liquid chromatography [11]. The chromatograms
obtained for the IgG glycans were divided into 24 peaks and additional IgG glycan structural
features were derived.
Each individual will be represented by a plasma N‐glycan profile consisting of 33
chromatographic peaks and 13 derived glycan structural features (in a total of 46 traits) and
an IgG glycan profile comprising 23 chromatographic peaks and 54 derived glycan structural
features (in a total of 77 traits).
Figure 2. N‐Glycan profile chromatogram from human plasma sample divided into 16 groups of
glycans. Individual glycan structures present in each chromatographic peak are shown. (Adapted
from [22]).
3. Phenotype and Genotype Data
Besides glycan profiles, phenotype data is available for the samples of Vis and Korčula and
genotype data is available for Vis, Korčula and Orkney cohorts.
Phenotypes include age, gender, health‐related lifestyle variables (such as diet and smoking
status) and physiological measurements including blood preasure, uric acid, cholesterol,
insulin and blood glucose.
Genotype data consists of approximately 258.000 SNPs, of which about 900 are related with
glycosylation.
DATA ANALYSIS AND RESEARCH RELEVANCE
The exploratory analysis of the three data sets (glycan profiles, phenotypes and genotypes)
is performed in the R programming environment using specialized packages. R is a free
software for statistical computing and graphics [23].
Several machine learning, data mining and statistical methods, as well as different graphical
representation approaches, are applied to analyse and model the data and to derive
relevant biological information.
1. Clustering
Cluster analysis aims to divide a set of data points into groups (called clusters) by assigning
similar data points to the same cluster. One of the most widely used clustering algorithms is
the K‐means clustering which partitions the data points into k clusters such that the sum of
squares from the data points to the assigned cluster centers is minimized. One drawback of
this algorithm is that the k number of clusters is required in advance and an inappropriate
choice might induce poor or wrong results. Also, it should be noted that K‐means might
originate slightly different final clusters for one data set, as a result of the random
assignment of the initial cluster centers and the data structure. When performing K‐means it
is necessary to determine the number of optimal clusters for a certain data set and to assess
the stability of the clusters obtained.
Consensus Clustering is one approach that has been proposed to assess the stability of
clusters and find reliable clusters [24]. The consensus clustering method consists of
repeating the K‐means algorithm with different subsets of the data and construct a
consensus matrix which can be viewed as a summary of the individual results of all K‐means
repetitions. Each element of the consensus matrix is defined as the ratio between the
number of times two samples clustered together and the number of times the samples were
selected for the clustering. In addition to the consensus matrix, other measures to assess
and validate the robustness of the clusters can be computed.
The consensus clustering was applied to the glycan profiles of the four population cohorts
and performed varying the number of clusters defined for the K‐means algorithm. The most
reliable and stable cluster division for each population is achieved by comparing the
consensus matrices and the stability measures obtained for all the different cluster divisions.
The optimal number of clusters for the population division varies for each population.
The establishment of stable clusters shows an internal structure of the populations and
allows further analyses of the glycan and phenotype signatures of each cluster. The
comparison of these signatures can be useful in the investigation of the association between
glycans and phenotypes and can bring new insights into it.
2. Discriminant Analysis of Principal Components
Principal Component Analysis (PCA) and Discriminant Analysis (DA) are statistical techniques
used to search patterns in data, as well as to characterize and/or separate the data. For high‐
dimensional data sets, where a graphical representation of the data is not
accessible/suitable, these methods are also a means of dimensionality reduction of the
variables describing the data without much loss of information. In both PCA and DA, the
initial set of variables (possibly correlated) are converted into a combination of uncorrelated
variables. In other words, the data is transformed into a new coordinate system where linear
combinations of variables explain most of the data variability.
When investigating the genetic structure of biological populations, PCA might not be the
most appropriate method and yield poor results owing to the fact that it searches for the
direction where the overall variability of data is larger ignoring the possible divergence
between groups. DA can be a better choice because it tries to maximizes the between‐group
separation while minimizing the within‐group variation. However, the performance of DA in
the analysis of genetic data is compromised by the correlation existing between SNPs and by
the much larger number of SNPs compared to the number of samples.
A new multivariate method called Discriminat Analysis of Principal Components (DAPC) has
been developed to overcome the mentioned limitations and to allow DA to be applied to
genetic data [25]. DAPC combines the two described methods: as a first step, PCA is used for
dimensionality reduction and removal of correlation between variables, after which
transformations DA is applied. Moreover, DAPC computes the contribution of the variables
for the obtained population structures.
DAPC and PCA were computed for the individual populations and for the four populations
merged based on the glycan data. In the case of the individual populations, no underlying
internal structure is shown when using PCA as opposed to the results obtained with
clustering and DAPC. In the case of the four populations merged, DAPC performs better than
PCA in discriminating the four populations with only Vis and Korčula not being so clearly
separated from each other, what can somehow be expected as these two populations are
geographically closer. From this analysis, the most important glycans to separate the
populations were obtained and identified as those showing the bigger differences across the
populations.
Employing DAPC to the genetic data in parallel with the glycans data may reveal the
contribution of the genetic component to the structure of the populations and the influence
of genotypes on certain glycan‐phenotypes.
3. SNP selection
Screening for relevant biomarkers is important for the development of better strategies
regarding detection, treatment and prevention of diseases.
Genome‐wide association studies (GWAS) have been used to identify genetic variants
underlying human diseases and traits. In brief, these association studies compare the
frequency of alleles or genotypes of a particular variant or SNP between disease cases and
controls [26]. Establishing associations for polygenic traits is not a straightforward task as for
the single gene disorders due to the complex interplay between the several genes involved
in producing a specific phenotype.
GWAS has identified hundreds of common genetic variants associated with complex
diseases, such as cancer, diabetes and cardiovascular and neurological diseases.
Nevertheless, most of these variants explain only a small proportion of the genetic
contribution to the disease variation and cannot be taken with full reliability as risk factors
for disease [27]. Another drawback of GWAS is that each SNP is analysed individually for the
association with the phenotype ignoring any correlations existent among SNPs.
Recently, multivariate methods that account for the dependencies between SNPs have been
developed and applied to the challenging task of SNP selection [28‐29]. These approaches
can effectively be used in high‐dimensional data sets with comparable or better
performance than GWAS.
The first comprehensive analysis of common genetic polymorphisms that affect protein
glycosylation was recently performed by combining high‐throughput glycan analysis with
GWAS and reported a set of polymorphisms affecting the plasma N‐glycan levels [18]. This
study puts together genomics and glycomics in an effort to map and unravel the complex
network of proteins involved in the genetic regulation of the human glycome.
By incorporating the SNP‐SNP correlation factor, the above mentioned multivariate methods
could have the potential to identify novel associations between SNPs and protein
glycosylation and, in this manner, complement and/or improve the results of this first
genome‐wide association study of human plasma N‐glycome.
ETHICAL GUIDELINES
The data analysed in this research was used in previous studies which were approved by the
appropriate ethical committees.
REFERENCES
[1] Ohtsubo, K. and Marth, J. D. (2006) Glycosylation in cellular mechanisms of health and disease. Cell. 126: 855‐867
[2] Lee, R. T., Lauc, G., et al. (2005) Glycoproteomics: protein modifications for versatile functions. Meeting on glycoproteomics. EMBO Rep. 6: 1018‐1022
[3] Varki, A. Essentials of glycobiology (2nd). Cold Spring Harbor Laboratory Press (Cold Spring Harbor, N.Y.): 2009. Available from: http://www.ncbi.nlm.nih.gov/books/NBK1908/
[4] Freeze, H. H. (2006) Genetic defects in the human glycome. Nat Rev Genet. 7: 537‐551
[5] Balzarini, J. (2007) Targeting the glycans of glycoproteins: a novel paradigm for antiviral therapy. Nat Rev Microbiol. 5: 583‐597
[6] Lauc, G., Rudan, I., et al. (2010) Complex genetic regulation of protein glycosylation. Mol Biosyst. 6: 329‐335
[7] Apweiler, R., Hermjakob, H., et al. (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS‐PROT database. Biochim Biophys Acta. 1473: 4‐8
[8] Marek, K. W., Vijay, I. K., et al. (1999) A recessive deletion in the GlcNAc‐1‐phosphotransferase gene results in peri‐implantation embryonic lethality. Glycobiology. 9: 1263‐1271
[9] Hounsell, E. F. and Davies, M. J. (1993) Role of protein glycosylation in immune regulation. Ann Rheum Dis. 52 Suppl 1: S22‐29
[10] Field, M. C., Amatayakul‐Chantler, S., et al. (1994) Structural analysis of the N‐glycans from human immunoglobulin A1: comparison of normal human serum immunoglobulin A1 with that isolated from patients with rheumatoid arthritis. Biochem J. 299 ( Pt 1): 261‐275
[11] Pucic, M., Knezevic, A., et al. (2011) High throughput isolation and glycosylation analysis of IgG‐variability and heritability of the IgG glycome in three isolated human populations. Mol Cell Proteomics. 10: M111 010090
[12] Huhn, C., Selman, M. H., et al. (2009) IgG glycosylation analysis. Proteomics. 9: 882‐913
[13] Royle, L., Campbell, M. P., et al. (2008) HPLC‐based analysis of serum N‐glycans on a 96‐well plate platform with dedicated database software. Anal Biochem. 376: 1‐12
[14] Knezevic, A., Polasek, O., et al. (2009) Variability, heritability and environmental determinants of human plasma N‐glycome. J Proteome Res. 8: 694‐701
[15] Gornik, O., Wagner, J., et al. (2009) Stability of N‐glycan profiles in human plasma. Glycobiology. 19: 1547‐1553
[16] Knezevic, A., Gornik, O., et al. (2010) Effects of aging, body mass index, plasma lipid profiles, and smoking on human plasma N‐glycans. Glycobiology. 20: 959‐969
[17] Pucic, M., Pinto, S., et al. (2010) Common aberrations from the normal human plasma N‐glycan profile. Glycobiology. 20: 970‐975
[18] Lauc, G., Essafi, A., et al. (2010) Genomics meets glycomics‐the first GWAS study of human N‐Glycome identifies HNF1alpha as a master regulator of plasma protein fucosylation. PLoS Genet. 6: e1001256
[19] Rudan, I., Biloglav, Z., et al. (2006) Effects of inbreeding, endogamy, genetic admixture, and outbreeding on human health: a (1001 Dalmatians) study. Croat Med J. 47: 601‐610
[20] McQuillan, R., Leutenegger, A. L., et al. (2008) Runs of homozygosity in European populations. Am J Hum Genet. 83: 359‐372
[21] Igl, W., Johansson, A., et al. (2010) The Northern Swedish Population Health Study (NSPHS)‐‐a paradigmatic study in a rural population combining community health and basic research. Rural Remote Health. 10: 1363
[22] Lauc, G. and Zoldos, V. (2010) Protein glycosylation‐‐an evolutionary crossroad between genes and environment. Mol Biosyst. 6: 2373‐2379
[23] Team, R. C. R: A Language and Environment for Statistical Computing (2012). Available from: http://www.R‐project.org
[24] Monti, S., Tamayo, P., et al. (2003) Consensus clustering: A resampling‐based method for class discovery and visualization of gene expression microarray data. Mach Learn. 52: 91‐118
[25] Jombart, T., Devillard, S., et al. (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. Bmc Genet. 11:
[26] Hirschhorn, J. N. and Daly, M. J. (2005) Genome‐wide association studies for common diseases and complex traits. Nat Rev Genet. 6: 95‐108
[27] Queitsch, C., Carlson, K. D., et al. (2012) Lessons from model organisms: phenotypic robustness and missing heritability in complex disease. PLoS Genet. 8: e1003041
[28] Zuber, V. and Strimmer, K. (2009) Gene ranking and biomarker discovery under correlation. Bioinformatics. 25: 2700‐2707
[29] Zuber, V. and Strimmer, K. (2011) High‐Dimensional Regression and Variable Selection Using CAR Scores. Stat Appl Genet Mol. 10:
UNIVERSITY OF ZAGREB FACULTY OF SCIENCE
DIVISION OF BIOLOGY Rooseveltov trg 6 10000 Zagreb
tel +385 1 4877700 fax +385 1 4826260 [email protected] http://www.biol.pmf.hr
1
Professor Kristian Vlahoviček, PhD Head, Bioinformatics Group e-mail: [email protected]
Zagreb, December 19th, 2012.
Subject: Ana Sofia Pedrosa Pinto, justification of the PhD thesis topic
Sofia Pinto is a PhD student in my lab since 2007 and has been enrolled in the PhD program since 2008. She is actively pursuing several research directions, of which we propose for her dissertation topic the computational analysis of glycosylation patterns in human plasma and plasma-associated proteins and association to population genotypes.
Sofia has so far published two research papers, where she demonstrated to have mastered the skills for analyzing and visualizing high-throughput biological data, as well as outlined her first results in the analysis of human plasma glycosylation profiles, where she further demonstrated the mastery of advanced statistics and machine learning methods. She demonstrated the ability to independently plan and execute complex computational analyses of large sets of biological data.
I am confidently stating that the proposed topic of Sofia’s doctoral dissertation “Computational analysis of human plasma N-glycome and genotypes” is fully compliant with all the requirements for her doctoral degree.
Professor Kristian Vlahoviček, PhD, EMBO YIP IG Head, Bioinformatics Group Faculty of Science, University of Zagreb, Croatia