ana sofia pedrosa pinto [email protected] department of...

Ana Sofia Pedrosa Pinto

[email protected]

Bioinformatics Group Department of Molecular Biology

Faculty of Science, University of Zagreb Horvatovac 102a, 10 000 Zagreb

December 20, 2012

Study Committee of the Biophysics Doctoral Programme Faculty of Science, University of Split

REQUEST FOR APPROVAL OF THE DISSERTATION TOPIC

Title of the proprosed topic:

Computational analysis of human plasma N‐glycome and genotypes

Mentor:

Prof. Kristian Vlahoviček

Full Professor of Bioinformatics and Group Leader of the Bioinformatics Group in the Faculty of

Science, University of Zagreb.

Proposed members for the committee:

Prof. Kristian Vlahoviček

Prof. Igor Weber

Prof. Andreja Ambriović‐Ristov

Requirements for the dissertation topic defense:

1. Attended and succesfully passed all the courses enrolled in the Biophysics Doctoral Programme.

2. Presented the qualification seminar and was approved by the evaluation committee.

3. Published two papers in high impact journals (see Curriculum Vitae below).

Enclosed documents:

1. Overview and description of the research: AnaSofiaPedrosaPinto_ResearchOverview.pdf

2. Mentor's candidate evaluation about the research: AnaSofiaPedrosaPinto_MentorEvaluation.pdf

Curriculum Vitae of the Doctoral Candidate:

Education October 2008 – present Interdisciplinary PhD Program in Biophysics, University of Split, Croatia

October 2005 – July 2006 Socrates/ Erasmus Program, University of Bologna, Italy

September 2002 – September 2007 Graduated in Biomedical Engineering, Faculty of Sciences and Technology, University of Coimbra, Portugal Thesis title: Home Monitoring for Obstructive Sleep Apnea Diagnosis in Children

Work experience October 2009 – December 2011 Teaching assistant of Algorithms and Programming course (Computational Biology module), University of Zagreb, Croatia

October 2007 – September 2008 Research project entitled Computational Analysis of Exonic Splicing Regulators, Bioinformatics Group, University of Zagreb, Croatia

Publications Pinto, S., Vlahoviček, K. and Buratti, E. (2011), PRO‐MINE: A Bioinformatics Repository and Analytical Tool for TARDBP Mutations. Human Mutation, 32: E1948–E1958. doi: 10.1002/humu.21393

Pučić, M., Pinto, S., Novokmet, M., Knežević, A., Gornik, O., Polašek, O., Vlahoviček, K., Wei, W., Rudd, P. M., Wright, A. F., Campbell, H., Rudan, I., and Lauc, G. (2010) Common aberrations from normal human N‐glycan plasma profile. Glycobiology 20, 970‐975.

Participation in Conferences and Workshops

Participate in the EMBO YIP PhD Course, Heidelberg, Germany (2011) [oral

and poster presentation].

Attended the 10th Congress of the Croatian Society of Biochemistry and

Molecular Biology, Opatija, Croatia (2010) [poster presentation].

Participated in the FEBS workshop: Education in biochemistry and molecular

biology, Opatija, Croatia (2010).

Participated in the workshop Glycomics meets Genomics: novel strategies in

combining omics approaches, Dubrovnik, Croatia (2010).

Attented the 3rd Adriatic Meeting on Computational Solutions in the Life

Sciences, Primosten, Croatia (2009) [poster presentation].

Attended the EMBO Young Scientists Forum, Zagreb, Croatia (2009).

Attended the Scientific Symposium: 50 Years of Molecular Biology in

Croatia, Zagreb, Croatia (2008) [oral presentation].

Participated in the ENSAPI08 Course: Ensembl API access, Programming workshop, Oeiras, Portugal (2008).

RESEARCH OVERVIEW

CANDIDATE: Ana Sofia Pedrosa Pinto

STUDY PROGRAMME: Biophysics Doctoral Programme

Faculty of Science, University of Split

RESEARCH INSTITUTION: Bioinformatics Group, Department of Molecular Biology

Faculty of Science, University of Zagreb

CONTENTS

TITLE ........................................................................................................................................... 2

INTRODUCTION .......................................................................................................................... 2

1. Glycomics ............................................................................................................................ 2

2. Glycosylation ....................................................................................................................... 3

3. Structural analyses and previous studies of N‐glycans ...................................................... 4

4. Aim of the thesis ................................................................................................................. 5

MATERIALS & METHODS ............................................................................................................ 6

1. Human Samples .................................................................................................................. 6

2. Glycan analysis and glycan profiles .................................................................................... 6

3. Phenotype and Genotype Data .......................................................................................... 7

DATA ANALYSIS AND RESEARCH RELEVANCE ............................................................................ 8

1. Clustering ............................................................................................................................ 8

2. Discriminant Analysis of Principal Components ................................................................. 9

3. SNP selection .................................................................................................................... 10

ETHICAL GUIDELINES ................................................................................................................ 11

REFERENCES ............................................................................................................................. 11

TITLE

Computational analysis of human plasma N‐glycome and genotypes

INTRODUCTION

1. Glycomics

Glycans are considered to be the most abundant and diverse nature's biopolymers [1] and

can be found in their free form or attached to proteins and lipids, also called

glycoconjugates.

Investigating the structure, biosynthesis and biological function of glycans is the focus of

glycobiolgy and has been of interest since the early nineteenth century. The field of

glycobiology has expanded with the aid of technological developments and new discoveries

and the concept of glycomics emerged as an analogous to those of genomics and

proteomics.

Glycomics is the systematic study of the entire set of glycan structures expressed by specific

cells, tissues or organisms, comprehending the analyses of their genetic, physiologic and

pathologic aspects, in order to gain more knowledge about the factors regulating the

synthesis of glycans and to understand the role and association of glycans in biological

processes.

The spectrum of all glycans and glycoconjugates is called glycome and is estimated to be

much larger than the proteome itself [2], an evidence of the importance of the glycome and

of the fact that the glycome need to be considered when studying and analysing biological

systems and processes.

The "omics" approaches of genomics, transcriptomics and proteomics have enabled gene

mapping, elucidated the possible relationships between gene and diseases, explored protein

modifications under different physiological states of the cell or organism. Although these

discoveries have been unquestionably important, the view of human physiology is not

complete and many explanations are still lacking. The new emerging field of glycomics

integrated with the other "omics" fields and combined with major advances in technology

can help to reveal some of the missing links.

2. Glycosylation

Glycans are usually composed of 10 to 15 monosaccharides (or simple sugars) and consist of

a core and an antennary structure (figure 1). While the synthesis of the core structure is

mainly conserved, the antennary structure assembly is often regulated in a tissue‐ or cell

lineage‐specific manner [3] and can vary in length, structure (number of branches) and

composition (types of sugars). The variability of the antennary structure gives rise to a vast

glycome composed of thousands of glycan isomers which increase diversity in the proteome.

Glycans participate in many biological processes including protein folding, molecular

trafficking and clearance, signaling and cell‐cell interactions, among others [4].

Figure 1. Eukaryotic glycan structures showing the core (red) and antennary (green) structures. (Adapted from [5])

Glycosylation is the enzymatic process through which glycans are synthesized and attached

to proteins and lipids. Contrary to protein synthesis where a single gene codes for a protein,

glycan synthesis is not template driven but encoded within a complex network of

glycotransferases, glycosidases, transcripton factors, transporters and other proteins [4,6].

Glycosylation is the most complex and abundant post‐translational modification affecting

the stucture of proteins and increasing their functional diversity. Nearly all membrane and

extracellular proteins are glycosylated [7], hence small structural and conformational

modifications of the glycans attached to them can be sufficient to cause loss or impairment

of protein function. Disregulation of glycosylation has been associated with several diseases

such as cancer, diabetes, cardiovascular, congenital and immunological disorders.

N‐glycosylation is the most common type of glycosylation which links glycans (named N‐

linked glycans or N‐glycans in short) to the asparagine residue of the protein through rather

extensive and intricate biosynthetic mechanisms. The complete absence of N‐glycosylation

can be lethal to the embryo demonstrating its importance for the normal growth and

function of the organism [8]. Also, congenital disorders of glycosylation, a group of rare but

severe inherited metabolic disorders, are characterized by an abnormal N‐glycosylation

resulting from genetically inherited mutations which alter the function of the enzymes part

of the N‐glycosylation synthetic pathway.

Immunoglobulin G (IgG) is the most abundant antibody isotype found in human blood and is

an example of a N‐glycosylated protein. Several studies investigating the structural and

functional properties of the glycans attached to IgG, associated changes in these glycans

with various diseases, such as rheumatoid arthritis and diabetes [9‐11]. It was also shown

that different glycosylation of IgG can lead to opposite pro‐ and anti‐inflammatory responses

[12].

3. Structural analyses and previous studies of N‐glycans

Structural analyses of glycans have always been challenging, not only because of the

complexity of glycan structures that make their identification difficult but also because of

restrictions on the level of the analytical tools used that limited the analysis to a small

number of samples.

High‐performace liquid chromatography is one of the techniques widely used for glycan

analysis and it has been recently adapted for high‐throughput glycan quantification [13].

The developed method allowed the first large scale study which evaluated the variability and

heritability of the human plasma N‐glycome [14]. Two main findings should be noted: first,

the variability of glycans at the population level was shown to be larger than expected and

should be taken into account when using glycan levels for diagnostic purposes; second,

different N‐glycan groups are under distinct influence of genetic and environmental control

with variable strength.

Since changes reported to be associated with disease are usually smaller than the found high

glycan variability, a follow up study was conducted to test the stability of the human plasma

N‐glycan profile of healthy individuals over a period of time [15]. The plasma N‐glycome

exhibited temporal stability, indicating that the genetic background might have a significant

influence on it. Consequently, glycan profiles present themselves as potential diagnostic

markers for diseases as their changes will be determined by environmental factors and/or

altered physiological processes.

Associations of N‐glycan levels with gender, age and lifestyle parameters, such as smoking

and diet, were reported [16]. However, these factors explained only a small fraction of the

glycans variability supporting the evidence that glycans are under greater genetic control.

Several individuals exhibiting different deviations from the normal plasma glycan profile

were identified within a population [17]. Groups of individuals with the most similar glycan

profiles to these outliers were determined, by performing consensus scoring of pairwise

distance between vectors containing measured glycan values, and analysed for the presence

of common phenotypic characteristics. Some groups shared clinical conditions, like renal

problems, while others were apparently healthy. The study showed the presence of specific

glyco‐phenotypes indicating a possible association between them and certain

(patho)physiological conditions which could originate from rare mutations and/or

combinations of common mutations. The reported glyco‐phenotypes are the result of a

preliminary analysis that needs additional studies with a larger data set and with genomic

data to validate and corroborate these findings.

These initial studies characterize in detail the N‐glycome and give an overall view of its

behaviour at the population level. However, to demonstrate the practical applicability of the

glycan profiles as biomarkers and to unravel the influence of genetic component on the

glycan levels, both disease traits and genetic data need to be integrated and combined in

further studies.

The first attempt to join genomics and glycomics was recently performed in the first genomic

wide association study of human N‐glycome and identified some polymorphisms affecting

glycan levels [18].

4. Aim of the thesis

Glycans are involved in human health and disease. The great potential of the human plasma

N‐glycome as a disease biomarker is reinforced by the simplicity of the non‐invasive blood

test. Additionally, the glycans composition of plasma is expected to reflect physiological

status of the organism and advances in technology have enabled fast and simple

quantitative analysis of glycans.

The previously reported studies and their results concerning the human plasma N‐glycome

constitute the basis for this thesis research and motivate the analyses proposed to be

performed which are described below in more detail (see Data Analysis and Research

Relevance section).

The aims of the present research include locating disease related N‐glycosylation changes

that could be used as a biomarker and to reveal polymorphisms associated with glycan levels

that would enable more insights into the glycosylation network.

MATERIALS & METHODS

1. Human Samples

This research is based on samples from individuals part of larger genetic epidemology

studies which aim to investigate genetic variability and map genes associated with common

complex disease traits in genetically isolated populations.

The individuals belong to four different population cohorts: the islands of Vis and Korčula in

Croatia [19], the Orkney archipelago in Scotland [20] and the Karesuando community in

Sweden [21].

The data consists of 975 individuals from Vis island, 957 individuals from Korčula island, 890

individuals from Orkney islands and 686 individuals from Sweden community.

2. Glycan analysis and glycan profiles

Plasma samples from the individuals were collected and subjected to a previously reported

procedure for the release and labeling of the plasma N‐glycans [13].

Hydrophilic interaction high performance liquid chromatography (HILIC) was performed on

released glycans before and after desialylation (the process of removing the sialic acids from

the glycans by sialidase digestion). The chromatograms obtained were divided into

chromatographic peaks based on the similarity of the glycan structures present in each peak,

resulting in 16 groups of glycans before desialylation (figure 2) and 13 groups of desialylated

glycans. The amount of glycans present in each peak is expressed as a percentage of the

total integrated chromatogram and calculated as: the amount of glycan structures in the

peak/total N‐glycome.

Glycans were also quantified by weak anion exchange high‐pressure liquid chromatography

(WAX‐HPLC) according to the number of attached sialic acids: monosialylated, disialylated,

trisialylated and tetrasialylated.

Additionally, glycan structural features, such as fucosylation, galactosylation, sialylation of

biantennary structures and branching degree, were approximated by adding the glycans

sharing the same structural characteristic from either HILIC, after sialidase treatment‐

integrated glycan profiles or WAX‐HPLC.

Regarding the Immunoglobulin G, it was isolated from plasma and the released IgG glycans

were subjected to ultra performance liquid chromatography [11]. The chromatograms

obtained for the IgG glycans were divided into 24 peaks and additional IgG glycan structural

features were derived.

Each individual will be represented by a plasma N‐glycan profile consisting of 33

chromatographic peaks and 13 derived glycan structural features (in a total of 46 traits) and

an IgG glycan profile comprising 23 chromatographic peaks and 54 derived glycan structural

features (in a total of 77 traits).

Figure 2. N‐Glycan profile chromatogram from human plasma sample divided into 16 groups of

glycans. Individual glycan structures present in each chromatographic peak are shown. (Adapted

from [22]).

3. Phenotype and Genotype Data

Besides glycan profiles, phenotype data is available for the samples of Vis and Korčula and

genotype data is available for Vis, Korčula and Orkney cohorts.

Phenotypes include age, gender, health‐related lifestyle variables (such as diet and smoking

status) and physiological measurements including blood preasure, uric acid, cholesterol,

insulin and blood glucose.

Genotype data consists of approximately 258.000 SNPs, of which about 900 are related with

glycosylation.

DATA ANALYSIS AND RESEARCH RELEVANCE

The exploratory analysis of the three data sets (glycan profiles, phenotypes and genotypes)

is performed in the R programming environment using specialized packages. R is a free

software for statistical computing and graphics [23].

Several machine learning, data mining and statistical methods, as well as different graphical

representation approaches, are applied to analyse and model the data and to derive

relevant biological information.

1. Clustering

Cluster analysis aims to divide a set of data points into groups (called clusters) by assigning

similar data points to the same cluster. One of the most widely used clustering algorithms is

the K‐means clustering which partitions the data points into k clusters such that the sum of

squares from the data points to the assigned cluster centers is minimized. One drawback of

this algorithm is that the k number of clusters is required in advance and an inappropriate

choice might induce poor or wrong results. Also, it should be noted that K‐means might

originate slightly different final clusters for one data set, as a result of the random

assignment of the initial cluster centers and the data structure. When performing K‐means it

is necessary to determine the number of optimal clusters for a certain data set and to assess

the stability of the clusters obtained.

Consensus Clustering is one approach that has been proposed to assess the stability of

clusters and find reliable clusters [24]. The consensus clustering method consists of

repeating the K‐means algorithm with different subsets of the data and construct a

consensus matrix which can be viewed as a summary of the individual results of all K‐means

repetitions. Each element of the consensus matrix is defined as the ratio between the

number of times two samples clustered together and the number of times the samples were

selected for the clustering. In addition to the consensus matrix, other measures to assess

and validate the robustness of the clusters can be computed.

The consensus clustering was applied to the glycan profiles of the four population cohorts

and performed varying the number of clusters defined for the K‐means algorithm. The most

reliable and stable cluster division for each population is achieved by comparing the

consensus matrices and the stability measures obtained for all the different cluster divisions.

The optimal number of clusters for the population division varies for each population.

The establishment of stable clusters shows an internal structure of the populations and

allows further analyses of the glycan and phenotype signatures of each cluster. The

comparison of these signatures can be useful in the investigation of the association between

glycans and phenotypes and can bring new insights into it.

2. Discriminant Analysis of Principal Components

Principal Component Analysis (PCA) and Discriminant Analysis (DA) are statistical techniques

used to search patterns in data, as well as to characterize and/or separate the data. For high‐

dimensional data sets, where a graphical representation of the data is not

accessible/suitable, these methods are also a means of dimensionality reduction of the

variables describing the data without much loss of information. In both PCA and DA, the

initial set of variables (possibly correlated) are converted into a combination of uncorrelated

variables. In other words, the data is transformed into a new coordinate system where linear

combinations of variables explain most of the data variability.

When investigating the genetic structure of biological populations, PCA might not be the

most appropriate method and yield poor results owing to the fact that it searches for the

direction where the overall variability of data is larger ignoring the possible divergence

between groups. DA can be a better choice because it tries to maximizes the between‐group

separation while minimizing the within‐group variation. However, the performance of DA in

the analysis of genetic data is compromised by the correlation existing between SNPs and by

the much larger number of SNPs compared to the number of samples.

A new multivariate method called Discriminat Analysis of Principal Components (DAPC) has

been developed to overcome the mentioned limitations and to allow DA to be applied to

genetic data [25]. DAPC combines the two described methods: as a first step, PCA is used for

dimensionality reduction and removal of correlation between variables, after which

transformations DA is applied. Moreover, DAPC computes the contribution of the variables

for the obtained population structures.

DAPC and PCA were computed for the individual populations and for the four populations

merged based on the glycan data. In the case of the individual populations, no underlying

internal structure is shown when using PCA as opposed to the results obtained with

clustering and DAPC. In the case of the four populations merged, DAPC performs better than

PCA in discriminating the four populations with only Vis and Korčula not being so clearly

separated from each other, what can somehow be expected as these two populations are

geographically closer. From this analysis, the most important glycans to separate the

populations were obtained and identified as those showing the bigger differences across the

populations.

Employing DAPC to the genetic data in parallel with the glycans data may reveal the

contribution of the genetic component to the structure of the populations and the influence

of genotypes on certain glycan‐phenotypes.

3. SNP selection

Screening for relevant biomarkers is important for the development of better strategies

regarding detection, treatment and prevention of diseases.

Genome‐wide association studies (GWAS) have been used to identify genetic variants

underlying human diseases and traits. In brief, these association studies compare the

frequency of alleles or genotypes of a particular variant or SNP between disease cases and

controls [26]. Establishing associations for polygenic traits is not a straightforward task as for

the single gene disorders due to the complex interplay between the several genes involved

in producing a specific phenotype.

GWAS has identified hundreds of common genetic variants associated with complex

diseases, such as cancer, diabetes and cardiovascular and neurological diseases.

Nevertheless, most of these variants explain only a small proportion of the genetic

contribution to the disease variation and cannot be taken with full reliability as risk factors

for disease [27]. Another drawback of GWAS is that each SNP is analysed individually for the

association with the phenotype ignoring any correlations existent among SNPs.

Recently, multivariate methods that account for the dependencies between SNPs have been

developed and applied to the challenging task of SNP selection [28‐29]. These approaches

can effectively be used in high‐dimensional data sets with comparable or better

performance than GWAS.

The first comprehensive analysis of common genetic polymorphisms that affect protein

glycosylation was recently performed by combining high‐throughput glycan analysis with

GWAS and reported a set of polymorphisms affecting the plasma N‐glycan levels [18]. This

study puts together genomics and glycomics in an effort to map and unravel the complex

network of proteins involved in the genetic regulation of the human glycome.

By incorporating the SNP‐SNP correlation factor, the above mentioned multivariate methods

could have the potential to identify novel associations between SNPs and protein

glycosylation and, in this manner, complement and/or improve the results of this first

genome‐wide association study of human plasma N‐glycome.

ETHICAL GUIDELINES

The data analysed in this research was used in previous studies which were approved by the

appropriate ethical committees.

REFERENCES

[1] Ohtsubo, K. and Marth, J. D. (2006) Glycosylation in cellular mechanisms of health and disease. Cell. 126: 855‐867

[2] Lee, R. T., Lauc, G., et al. (2005) Glycoproteomics: protein modifications for versatile functions. Meeting on glycoproteomics. EMBO Rep. 6: 1018‐1022

[3] Varki, A. Essentials of glycobiology (2nd). Cold Spring Harbor Laboratory Press (Cold Spring Harbor, N.Y.): 2009. Available from: http://www.ncbi.nlm.nih.gov/books/NBK1908/

[4] Freeze, H. H. (2006) Genetic defects in the human glycome. Nat Rev Genet. 7: 537‐551

[5] Balzarini, J. (2007) Targeting the glycans of glycoproteins: a novel paradigm for antiviral therapy. Nat Rev Microbiol. 5: 583‐597

[6] Lauc, G., Rudan, I., et al. (2010) Complex genetic regulation of protein glycosylation. Mol Biosyst. 6: 329‐335

[7] Apweiler, R., Hermjakob, H., et al. (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS‐PROT database. Biochim Biophys Acta. 1473: 4‐8

[8] Marek, K. W., Vijay, I. K., et al. (1999) A recessive deletion in the GlcNAc‐1‐phosphotransferase gene results in peri‐implantation embryonic lethality. Glycobiology. 9: 1263‐1271

[9] Hounsell, E. F. and Davies, M. J. (1993) Role of protein glycosylation in immune regulation. Ann Rheum Dis. 52 Suppl 1: S22‐29

[10] Field, M. C., Amatayakul‐Chantler, S., et al. (1994) Structural analysis of the N‐glycans from human immunoglobulin A1: comparison of normal human serum immunoglobulin A1 with that isolated from patients with rheumatoid arthritis. Biochem J. 299 ( Pt 1): 261‐275

[11] Pucic, M., Knezevic, A., et al. (2011) High throughput isolation and glycosylation analysis of IgG‐variability and heritability of the IgG glycome in three isolated human populations. Mol Cell Proteomics. 10: M111 010090

[12] Huhn, C., Selman, M. H., et al. (2009) IgG glycosylation analysis. Proteomics. 9: 882‐913

[13] Royle, L., Campbell, M. P., et al. (2008) HPLC‐based analysis of serum N‐glycans on a 96‐well plate platform with dedicated database software. Anal Biochem. 376: 1‐12

[14] Knezevic, A., Polasek, O., et al. (2009) Variability, heritability and environmental determinants of human plasma N‐glycome. J Proteome Res. 8: 694‐701

[15] Gornik, O., Wagner, J., et al. (2009) Stability of N‐glycan profiles in human plasma. Glycobiology. 19: 1547‐1553

[16] Knezevic, A., Gornik, O., et al. (2010) Effects of aging, body mass index, plasma lipid profiles, and smoking on human plasma N‐glycans. Glycobiology. 20: 959‐969

[17] Pucic, M., Pinto, S., et al. (2010) Common aberrations from the normal human plasma N‐glycan profile. Glycobiology. 20: 970‐975

[18] Lauc, G., Essafi, A., et al. (2010) Genomics meets glycomics‐the first GWAS study of human N‐Glycome identifies HNF1alpha as a master regulator of plasma protein fucosylation. PLoS Genet. 6: e1001256

[19] Rudan, I., Biloglav, Z., et al. (2006) Effects of inbreeding, endogamy, genetic admixture, and outbreeding on human health: a (1001 Dalmatians) study. Croat Med J. 47: 601‐610

[20] McQuillan, R., Leutenegger, A. L., et al. (2008) Runs of homozygosity in European populations. Am J Hum Genet. 83: 359‐372

[21] Igl, W., Johansson, A., et al. (2010) The Northern Swedish Population Health Study (NSPHS)‐‐a paradigmatic study in a rural population combining community health and basic research. Rural Remote Health. 10: 1363

[22] Lauc, G. and Zoldos, V. (2010) Protein glycosylation‐‐an evolutionary crossroad between genes and environment. Mol Biosyst. 6: 2373‐2379

[23] Team, R. C. R: A Language and Environment for Statistical Computing (2012). Available from: http://www.R‐project.org

[24] Monti, S., Tamayo, P., et al. (2003) Consensus clustering: A resampling‐based method for class discovery and visualization of gene expression microarray data. Mach Learn. 52: 91‐118

[25] Jombart, T., Devillard, S., et al. (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. Bmc Genet. 11:

[26] Hirschhorn, J. N. and Daly, M. J. (2005) Genome‐wide association studies for common diseases and complex traits. Nat Rev Genet. 6: 95‐108

[27] Queitsch, C., Carlson, K. D., et al. (2012) Lessons from model organisms: phenotypic robustness and missing heritability in complex disease. PLoS Genet. 8: e1003041

[28] Zuber, V. and Strimmer, K. (2009) Gene ranking and biomarker discovery under correlation. Bioinformatics. 25: 2700‐2707

[29] Zuber, V. and Strimmer, K. (2011) High‐Dimensional Regression and Variable Selection Using CAR Scores. Stat Appl Genet Mol. 10:

UNIVERSITY OF ZAGREB FACULTY OF SCIENCE

DIVISION OF BIOLOGY Rooseveltov trg 6 10000 Zagreb

tel +385 1 4877700 fax +385 1 4826260 [email protected] http://www.biol.pmf.hr

1

Professor Kristian Vlahoviček, PhD Head, Bioinformatics Group e-mail: [email protected]

Zagreb, December 19th, 2012.

Subject: Ana Sofia Pedrosa Pinto, justification of the PhD thesis topic

Sofia Pinto is a PhD student in my lab since 2007 and has been enrolled in the PhD program since 2008. She is actively pursuing several research directions, of which we propose for her dissertation topic the computational analysis of glycosylation patterns in human plasma and plasma-associated proteins and association to population genotypes.

Sofia has so far published two research papers, where she demonstrated to have mastered the skills for analyzing and visualizing high-throughput biological data, as well as outlined her first results in the analysis of human plasma glycosylation profiles, where she further demonstrated the mastery of advanced statistics and machine learning methods. She demonstrated the ability to independently plan and execute complex computational analyses of large sets of biological data.

I am confidently stating that the proposed topic of Sofia’s doctoral dissertation “Computational analysis of human plasma N-glycome and genotypes” is fully compliant with all the requirements for her doctoral degree.

Professor Kristian Vlahoviček, PhD, EMBO YIP IG Head, Bioinformatics Group Faculty of Science, University of Zagreb, Croatia

http://www.biol.pmf.hr/�

mailto:[email protected]�

ana sofia pedrosa pinto [email protected] department of...

Documents