Download - High throughput urine biomarker discovery and integrative analysis for translational medicine High throughput urine biomarker discovery and integrative

High throughput urine biomarker discovery and integrative analysis for translational medicine

Bruce Ling, Ph.D.

A molecular indicator of a specific biological property; a biochemical feature or facet that can be used to measure the progress of disease or the effects of treatment (NIH, 2002)

Biomarker

• Small molecules •Glucose (diabetes)• Serum cholesterol (cardiovascular disease)

•Proteins• PSA (prostate cancer)• HER2 (IHC) (breast cancer Herceptin Therapy)• hCG (pregnancy test)

• RNA/DNA• HER2 (FISH) (breast cancer)• OncoDX (Genomic Health, breast cancer)

Biomarker examples

Pediatric Diseases• Kidney transplant Acute Rejection

• Kawasaki Disease

• Systemic Juvenile Idiopathic Arthritis

• Necrotizing Enterocolitis

• Inflammatory Bowel Disease

• Glioblastoma multiforme

• Preterm Labor

Where to look for biomarkers

– Disease tissue

– Proximal/distal fluids

• Plasma/serum, urine, amniotic, synovial fluid, CSF, saliva, tears, etc.

Why Urine?

• Patient consenting

• Non-invasive

• Easy to collect for time course analysis

• Abundant and stable

Urine is a rich resource for biomarker discovery

• Filtration of plasma• 900 liters daily

• Urine proteome • > 1500 proteins, ~30 mg/day

• 30% from circulation

• 70% from urogenital tract

• Urine peptidome• > 100, 000 naturally occurring peptide, ~20 mg/day

1) Equal mass of protein and peptide in urine translates into at least a ten-fold greater molar abundance of peptides than proteins

2) Urine peptide analysis is not hampered by highly abundant protein issues

3) One hour one dimensional HPLC separation is sufficient for the analysis of greater than 100,000 urine peptides, allowing a high throughput biomarker discovery

Urine Peptidome: a fertile ground for biomarker discovery

Challenges of Urine Analysis• Dilution factor causing concentration variations

– Solution: content normalization• Creatinine; house keeping urine abundant peptides; equal peptide mass

• Peptide content can be complicated by– Diet, exercise, circadian rhythm, circulatory levels of hormones– Solution: careful experimental design to avoid these confounding

issues, e.g., • Cohorts of patients of similar demographics • Multi-center sample collection and validation

5 ml Urine

Filtrate

Peptides <6K

Post Ethyl acetate Fraction

Mass Spectrometry Analysis and Protein Identification

Centricon 10K

C18 desalt (Sep-pak)

Ethyl acetate extraction

C18 HPLC 30 seconds per fraction

Collect on MALDI target plate

Proteins >6K

Urine Peptidome Profiling by Mass Spectrometry

Biomarker HTS FlowsSample peptides:

-Class 1:1,2,3…

-Class 2:1,2,3…

-Class 3:1,2,3…

RP-HPLC

Collect 120 fractions on MALDI plates

MALDI-TOF MS on each fraction

MASS-Conductor ®

Machine learning

feature discovery and classification

Candidate Biomarkers

987.62

1027.51

1098.55

etc.

Biomarker Confirmation/Validation

Identify

Differentiating Markers

New sample

Sets

ValidationNew Center sample sets

Higher throughput

Quantitative methods

Quantitative MS

Immunoassay

Testing

New Longitudinal sample sets

Exploration

Protein ID

MS/MS

Data Challenges in Urine Peptide Biomarker Discovery

• Data tracking and storage– Patient demographics– Peptide profiles in various fractions/samples

• Dimension reduction and data reduction– Multi-dimensional data sets– Huge data sets and lots of noise

A project of 40 samples produced 241.5 GB raw data in MYSQL database

HPLC fractionPep

tid

e m

ass

Patient ID

Patient

demographicsPep

tide

signal

Decode the Urine Peptidome

Patient 1 Patient 2 Patient 3 Patient 4 …

peptide 1 signal signal signal signal …

peptide 2 signal signal signal signal …

peptide 3 … … … … …

peptide 4 … … … … …

peptide 5 … … … … …

… … … … … …

peptide 100,000

… … … … …

???

Decode the Urine Peptidome

• Peak finding in each fraction for each sample

• Align the peaks across the samples

• Create common peak index

Data mining issues in Biomarker Discovery

• Peak number >> sample number

• False discovery in multiple hypothesis testing

• Multi-class classification and validation

• Discovery of biomarker signature

• Robustly loading and tracking of high volume proteomic data

• Robust reduction of raw data sets and enabling of efficient and accurate peak finding, alignment and indexing

• Robust and automatic high throughput computing for expensive algorithms

• Integration of FDR analysis and multi-class classification algorithms to obtain statistically differentiating feature panels

• Automatic generation of data reports with graphics

MASS-Conductor® Platform Support Urine Peptide Biomarker Discovery

MASS-Conductor® Platform High Throughput Computing

Urine Biomarker Discovery: Case Study

Integrative Urinary Peptidomics in Renal Transplantation Identifies Novel Biomarkers

for Acute Rejection

Xuefeng B. Ling2*, Tara K. Sigdel1*, Kenneth Lau2, Lihua Ying1, Irwin Lau2, James

Schilling2¥, Minnie M. Sarwal1¥

Divisions of 1Nephrology and Department of Pediatrics, 2Biotechnology Core, Stanford

University School of Medicine, Stanford University, Stanford, CA 94305

Kidney Transplant Rejection

• Most effective treatment for end stage renal disease

• 16,000 per year in US

• Grafts monitored by biopsy

• Unmet needs:– Less invasive and more frequent monitoring

– Acute rejection vs. stable graft

– Acute rejection vs. BK virus

Allograft Acute Rejection Urine Biomarker Discovery

Peak finding

Peak alignment

Peak indexing

Supervised Data mining

Feature selection

Training

Testing

LCMS Data reduction

Unsupervised Data mining

2D - Clustering

QuantitativeLCMS

Validation

1 2 3 4

Biomarker Panel: Supervised Analysis

Biomarker Panel: Unsupervised Analysis

NH2

ZP-d

omai

n

EGF-likeDomain I

EGF-

like

Dom

ain

II

EGF-likeDomain III

COOH

286465

107 108149

334

585

Urine THP Peptide Biomarkers Fall into a Tight Cluster in C-Terminus

1. R.VLNLGPITR.K2. G.SVIDQSRVLNLGPI.T3. I.DQSRVLNLGPITR.K4. R.SGSVIDQSRVLNLGPI.T5. S.VIDQSRVLNLGPITR.K6.R.SGSVIDQSRVLNLGPIT.R7. G.SVIDQSRVLNLGPITR.K8.R.SGSVIDQSRVLNLGPITR.K

MRM: Multiplexed Quantitative Biomarker Validation

0.0 0.2 0.4 0.6 0.8 1.0

SAMPLE: URINE PEPTIDES SAMPLE: URINE PEPTIDES

THP 1680.98 VIDQSRVLNLGPITR

THP 1912.07 SGSVIDQSRVLNLGPITR

THP 1680.98 VIDQSRVLNLGPITR

THP 1912.07 SGSVIDQSRVLNLGPITR

AR versus STA AR versus BK

Sen

sitiv

ity

1- Specificity 1- Specificity

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

AUC: 0.83

AUC: 0.74

AUC: 0.92

AUC: 0.83

ROC Analysis of THP Peptide Biomarkers Quantified by MRM

1. COL1A1 1235.56 APGDRGEPGPPGP2. COL1A1 1251.55 APGDRGEPGPPGP3. COL1A1 1322.57 APGDRGEPGPPGPA4. COL1A1 1316.59 DAGPVGPPGPPGPPG5. COL1A1 1409.66 GPPGPPGPPGPPGPPS6. COL1A1 2048.92 NGDDGEAGKPGRPGERGPPGP 7. COL1A1 2064.91 NGDDGEAGKPGRPGERGPPGP 8. COL1A1 2192.97 NGDDGEAGKPGRPGERGPPGPQ 9. COL1A1 2362.12 GKNGDDGEAGKPGRPGERGPPGPQ10. COL1A1 2378.10 GKNGDDGEAGKPGRPGERGPPGPQ11. COL1A1 2645.24 GPPGKNGDDGEAGKPGRPGERGPPGPQ12. COL1A1 1709.79 PPGEAGKPGEQGVPGDLG13. COL1A1 2031.95 PPGEAGKPGEQGVPGDLGAPGP14. COL1A1 2221.97 ADGQPGAKGEPGDAGAKGDAGPPGP15. COL1A1 2205.99 ADGQPGAKGEPGDAGAKGDAGPPGP 16. COL1A1 2277.01 ADGQPGAKGEPGDAGAKGDAGPPGPA17. COL1A1 2293.01 ADGQPGAKGEPGDAGAKGDAGPPGPA18. COL1A1 2617.15 GPPGADGQPGAKGEPGDAGAKGDAGPPGPA19. COL1A1 2086.93 EGSPGRDGSPGAKGDRGETGPA20. COL1A1 2157.96 AEGSPGRDGSPGAKGDRGETGPA21. COL1A1 3014.41 ESGREGAPGAEGSPGRDGSPGAKGDRGETGPA22. COL1A1 1266.58 SPGPDGKTGPPGPA23. COL1A1 2129.99 DGKTGPPGPAGQDGRPGPPGPPG24. COL1A1 2017.93 GRPGEVGPPGPPGPAGEKGSPG25. COL1A2 2081.94 DGPPGRDGQPGHKGERGYPG 26. COL1A2 2195.99 NDGPPGRDGQPGHKGERGYPG27. COL2A1 1861.85 SNGNPGPPGPPGPSGKDGPK28. COL3A1 1738.76 NDGAPGKNGERGGPGGPGP29. COL3A1 2008.93 DGESGRPGRPGERGLPGPPG30. COL3A1 2079.92 DAGAPGAPGGKGDAGAPGERGPPG31. COL3A1 2565.18 GAPGQNGEPGGKGERGAPGEKGEGGPPG 32. COL3A1 2743.24 KNGETGPQGPPGPTGPGGDKGDTGPPGPQG33. COL4A1 1424.66 PGQQGNPGAQGLPGP34. COL4A2 1126.51 GLPGLPGPKGFA 35. COL4A3 1161.52 GEPGPPGPPGNLG36. COL4A4 1218.55 GLPGPPGPKGPRG 37. COL4A5 1144.52 GPPGPPGPLGPLG38. COL4A5 1269.53 PGLDGMKGDPGLP39. COL4A5 1733.76 GIKGEKGNPGQPGLPGLP 40. COL4A6 1158.52 GLPGPPGPPGPPS 41. COL5A1 1748.82 KGPQGKPGLAGMPGANGPP 42. COL7A1 1690.80 PGLPGQVGETGKPGAPGR43. COL9A1 1732.84 KRPDSGATGLPGRPGPPG44. COL11A1 1441.64 GPPGPPGLPGPQGPKG45. COL11A1 1828.84 DGPPGPPGERGPQGPQGPV 46. COL17A1 1368.62 LPGPPGPPGSFLSN47. COL18A1 1142.51 GPPGPPGPPGPPS

1. THP 982.59 VLNLGPITR2. THP 1047.48 SGSVIDQSRV3. THP 1211.66 DQSRVLNLGPI 4. THP 1225.69 SRVLNLGPITR5. THP 1324.76 IDQSRVLNLGPI6. THP 1423.83 VIDQSRVLNLGPI 7. THP 1468.82 DQSRVLNLGPITR8. THP 1510.87 SVIDQSRVLNLGPI9. THP 1567.91 GSVIDQSRVLNLGPI10. THP 1581.91 IDQSRVLNLGPITR11. THP 1654.91 SGSVIDQSRVLNLGPI12. THP 1680.98 VIDQSRVLNLGPITR13. THP 1755.96 SGSVIDQSRVLNLGPIT14. THP 1768.01 SVIDQSRVLNLGPITR15. THP 1912.07 SGSVIDQSRVLNLGPITR16. THP 2040.16 SGSVIDQSRVLNLGPITRK

A B

AR Urine Biomarkers are Collagen and THP Peptides

Col

lage

n p

epti

de

bio

mar

ker

s

TH

P p

epti

de

bio

mar

ker

s

Hypothesis 1Gene expressionalteration in AR

Hypothesis 2Protease expression

alteration in AR

Hypothesis 3Protease inhibitor

expressionalteration in AR

Hypothesis of Molecular Mechanisms for AR Urine Biomarkers

Exploration data set6

(TGCG)

1

Affymetirics HG-U95Av2

(AR: PBL, n=6; BX, n=7)(STA: PBL, n=9; BX, n=10)(NR: PBL, n=8; BX, n=5)(HC: PBL, n=8; BX, n=9)

Exploration Analysis

Confirmation

2

Affymetirics HU-133

(AR: BX, n=37)(HC: BX, n=23)

Confirmation Analysis

Validation

3

Quantitative RT-PCR

(AR: BX, n=14)(STA: BX, n=10)(HC: BX, n=10)

Validation Analysis

Expression analysis of peptide biomarkers’ corresponding

precursor genes

Expression analysis of metzincin superfamily genes

Expression analysis of protease inhibitor genes

Discovery mechanism biomarkers

Confirmation data set(Stanford )

Validation data set(Stanford )

Transcriptome Analysis of Allograft Biopsies

Parental Protein Expression Analysis of Allograft Biopsies Contrasting Urine Peptide Biomarker Changes

Genome-wide Protease and Protease Inhibitor Expression Analysis of Allograft Biopsies Revealed Up Regulation of MMP7, SERPING1, TIMP1

AR STA HC

Sig

nal I

nten

sity

0

10

20

30

40

50

TIMP1COL1A2 UMODSERPING1MMP7 COL3A1 0.0 0.2 0.4 0.6 0.8 1.0

1- Specificity

Mean ( AUC): 0.98

Sen

sitiv

ity

0.0

0.2

0.4

0.6

0.8

1.0

Allograft Biopsies Expression Biomarkers Effectively Classified AR

Proposed Underlying Mechanisms for the AR Urine Peptide Biomarkers

Hypothesis: Collagen Breakdown and Deposition in AR

Decreased Collagen Peptides In AR

IncreasedTIMP1 (Collagenase

Inhibitor) in AR

Increased Collagen Deposition in AR

More Graft FibrosisAfter an AR episode?

Biopsy Gene ExpressionGSE 14328

Increased MMP7 in AR

Decreased Collagen Breakdown in AR

Decreased Collagenase

Activity In AR tissue

Increased Collagen Expression in AR

Integrated Analysis Urine Peptidomics

Urine

Renal Biopsy

Urine Peptide Analysis by MS

Urine Biomarker Discovery: Case Study

Ensemble Analyses of Urine Peptide Profiles with Clinical Findings

Sufficiently Predict Pediatric Necrotizing Enterocolitis Outcomes

Running title: NEC peptide biomarkers

Xuefeng B. Ling1, Kenneth Lau1, Roger Lu1, Gigi Liu1, Harvey Cohen1,

James Schilling1, Karl G. Sylvester1¥

Department of Pediatrics, Stanford University1, Stanford, CA 94305;

Unmet Medical Needs in Necrotizing Entrocolitis

Necrotizing enterocolitis (NEC) is a medical condition primarily seen in premature infants, where portions of the bowel undergo necrosis (tissue death).

Despite decades of research the pathogenesis of NEC remains obscure, the diagnostic parameters unclear, and both treatment and prevention strategies remain inadequate and dated.

There is the real need for better molecular identification of NEC in order to assist in altering its onset and progression.

Clinical parameters do not adequately predict outcome in Necrotizing Enterocolitis

Low Risk Group

Intermediate Risk Group

High Risk Group

Rat

e o

f N

EC

-S o

ccu

rren

ce (

% p

atie

nts

)

NEC score

-10 0 10 20 30 40

0

10

20

30

M: n = 2S: n = 15

M: n = 16S: n = 10

M: n = 26S: n = 0

M S

NEC

Clinical Parameters Based Model stratifies Necrotizing Enterocolitis Patients

NEC Urine Naturally Occurring Peptide Biomarker Discovery

Peak finding

Peak alignment

Peak indexing

Supervised Data mining

Feature selection

Training

Testing

LCMS Data reduction

Unsupervised Data mining

2D - Clustering

1 2 3

Biomarker Panel: Supervised Analysis (Training and Testing)

Biomarker Panel: Unsupervised Analysis

Biomarker Panel: Combined data set and ROC analysis

Permutation based FDR analysis of the biomarker signature

Discovery setn = 34

17 17Clinical

Diagnosis

Medical NEC Scoring

PercentAgreementwith clinicaldiagnosis

M S

NEC

7 0

Urine peptide based Classification

M S

Lown=7

Classified as M

Classified as S

7 0

0 0

NEC RiskGroups

9 6

M S

Intermediaten=15

8 1

1 5

0 9

M S

Highn=9

0 0

0 9

100 % 100 %

+ -

100 %

100 % 100 %

+ -

100 %

88.9 % 83.3 %

+ -

86.1 %

Diagnosed as M

Diagnosed as S

7 0

0 0

4 3

5 3

0 1

0 8

P = 0.01

ClinicalDiagnosis

N/An=3

Proposed Ensemble Approach to Diagnose Necrotizing Enterocolitis Patients

NEC Patients

Clinical Model

NEC Risk

Urine Biomarkers

NEC Diagnosis

TABLE 2

Cluster Protein Location MH+ SequenceRelative

Abundance U test P value

M S

1 COL1A1 220-249 2924.41 RGppGPPGKNGDDGEAGKPGRPGERGPpGp 0.2562 -0.2562 4.25E-03

COL1A1 220-249 2940.36 RGPPGppGKNGDDGEAGKpGRpGERGpPGP 0.2541 -0.2541 6.80E-03

2 COL1A2 485-514 2889.36 ARGEPGNIGFPGPKGPTGDPGKNGDKGHAG 0.2265 -0.2265 8.93E-05 3 COL1A2 925-952 2865.31 GRDGNpGNDGpPGRDGQpGHKGERGYpG 0.2919 -0.2919 1.99E-03 COL1A2 933-952 2081.94 DGpPGRDGQpGHKGERGYpG 0.2655 -0.2655 5.39E-03

4

COL1A2 135-157 2229.06 AGpPGKAGEDGHpGKPGRpGERG 0.2732 -0.2732 1.45E-02 COL1A2 131-157 2626.27 ARGpAGpPGKAGEDGHpGKPGRpGERG 0.223 -0.223 2.16E-02 COL1A2 131-157 2642.28 ARGpAGpPGKAGEDGHpGKpGRpGERG 0.2016 -0.2016 3.14E-02 COL1A2 137-157 2142.05 GpPGKAGEDGHPGKPGRpGERG 0.2624 -0.2624 1.06E-02 COL1A2 131-157 2158.03 GPpGKAGEDGHpGKPGRpGERG 0.3038 -0.3038 2.16E-02 5 COL3A1 813-840 2565.18 GApGQNGEPGGKGERGApGEKGEGGpPG 0.2623 -0.2623 2.58E-03

6 COL3A1 1168-1194 2680.19 NRGERGSEGSPGHPGQpGppGppGAPGP -0.2382 0.2382 1.06E-02

COL3A1 1168-1194 2696.22 NRGERGSEGSpGHpGQpGPPGPpGApGp 0.1893 -0.1893 1.96E-02

Overlapping Urine Peptide Biomarkers for NEC

Proposed Underlying Mechanisms of Urine Naturally Occurring Peptide Biomarkers

1 2 3 4 5 6 7 8 9 10 11 12 130.00E+00

2.00E-01

4.00E-01

6.00E-01

8.00E-01

1.00E+00

1.20E+00 PR

Enbrel

CR CR

Anakinra

CRPR CR

Enbrel Anakinra A

B

C

Prediction of drug response in SJIA

Urine peptide biomarkers: the discovery process

Sample peptides:

-Class 1:1,2,3…

-Class 2:1,2,3…

-Class 3:1,2,3…

SCX/RP-HPLC

Collect 100 fractions on MALDI plates

MALDI-TOF MS

for each sample

LC fraction -- m/.z --abundance

MASS-Conductor ®

Machine learning

feature discovery and classification

Biomarker panels

MSMS protein IDProspective validation

with quantitative mass spec (MRM)

Interdisciplinary Skillsfor Biomarker Discovery

BiologyAnalytic

biochemistry

BiostatisticsComputer Science

Medicine

Genome vs. Proteome

The Isotope Envelope

Predictor discoveryin training set

2

Training set(10 AR, 10 STA, 6 BK)

1

LCMSraw spectra

Peak findingpeak alignment

feature extraction

20937 unique features

Classifier training

Six-foldCross-validation

ClassifyAR, STA, BK

MASS-Conductor Urine biomarker discovery and testing

Predictor confirmationin testing set

3

Testing set(10 AR, 10 STA, 4 BK)

Predictor sets

Linear discriminant analysis(LDA)

Calculate estimates ofpredicted class probabilities

Analysis of goodness of class separation

Pattern analysisin all set

4

Cluster analysis

All set(20 AR, 20 STA,

10 BK, 10 NS, 10 HC)

Predictors of 40 peptides

2d hierarchicalclustering

heatmap plottingRemove

background signals

Normalization

Platform Validation

5

Correlation Analysis

2 peptide biomarkers

MRM assay development

MRM assay AR, STA, BK, NS, HC

Training + Testing Samples

LC-MALDI MRM

Allograft Acute Rejection Urine Biomarker Discovery

Correlation Studies Between LCMS and MRM Platforms

Analytical ChallengesHigh complexity and wide dynamic range

Tirumalai, R. S. (2003) Mol. Cell. Proteomics 2: 1096-1103

Plasma Proteins

Big Trees


Plasma Proteins

Big Trees Bushes


Plasma Proteins

Big Trees Bushes

Grass + Bugs

www.genwaybio.com

Analytical ChallengesDetect low abundance proteins

Big Trees = HAP

Bushes = MAP

Grass + Bugs = LAP

Bottom up LCMS Biomarker Discovery

Sample preparation

Digestion

Peptidepurification

SCX RP

Protein mixture Digested peptides

Mass-spec Spectra

Data Analysis

Multi-dimensionalchromatography

MS/MS Protein ID

Mass Spectrometry In A Nutshell

time

hνF=ma

Ion sourcedetector

m/z

MS Spectrum

Mass Analyzer

MS/MS Peptide Sequencinghν

sourcedetector

Fragment ions

gate

Collision cell

MS/MS Spectrum

1st Mass Analyzer 2nd Mass Analyzer

Differential Expression Analysis in Quantitative LCMS

Peptide 1: M/ZPeptide 2: M/Z’Peptide 3: M/Z’’

Peptide 1: protein IDPeptide 2: protein ID’Peptide 3: protein ID’’

MS based MS/MS based

MASS-Conductor®Exhaustive MS comparison

Spectrum counting

Labeling, e.g. iTRAQ

Qualitative Comparative Analysis– Spectrum Counting

PROTEIN X

Sample A Sample B

MS/MS

Number of Detected Peptides

Number of Detected Peptides

[PROTEIN X] [PROTEIN X]

IF

THEN

PROTEIN IDENTIFICATION

- Peptide fragments EQUAL

MS/MS b

y

b

yb

yb

yMS

Mix -N H

114 31-N H

115 30-N H

116 29-N H

117 28

+

+

+

+

-PRG114 31

-PRG115 30

-PRG116 29

-PRG117 28

S1

S2

S3

S4

Par

alle

l D

enat

ure

& D

iges

t - Reporter-Balance-Peptide INTACT- 4 samples identical m/z

114

115

116

117

- Reporter ions DIFFERENT

-Chemically identical

-Migrate together in HPLC

MSMS Based Comparative Analysis– iTRAQ (isobaric tag)

Reporter Ions114, 115, 116, 117

• More abundant proteins tends to get more sequence coverage in MS/MS, masking away the MSMS opportunities for the peptides coming from the low abundant proteins

• Spectrum counting is semi-quantitative• iTRAQ is not scalable for a moderate throughput

biomarker discovery• iTRAQ cost• iTRAQ tag number

Issues in MS/MS Based Analysis

MS Based Comparative Analysis– Targeted MASS-Conductor® Approach

1. ALL peptide MS signals will be exhaustively comparedleading to the discovery of statistically differential signals

2. ONLY peptides of interest, usually a very small number, will be tried with full attention for the MS/MS ID. If necessary, MS/MS signals can be enhanced by more loading or fraction enrichment before MS

• Robustly handling of high volume proteomic data– e.g. One SCX fraction and 120 RP fractions

• 40 sample project MYSQL data storage– raw data is 241.5 GB – Peak data is 4.4 GB

• Robust and automatic high throughput computing• Robust reduction of raw data sets and enabling of efficient and accurate feature

discovery • Sophisticated data mining approaches to obtain statistically differentiating

features• Graphic data analysis

MASS-Conductor® Platform Data Mining Requirements

“MASS-Conductor ®” An in house software platform, including JAVA, PERL, R, RUBY and MYSQL

implementations

• Interface with AB and Thermo mass specs– Convert LC-MALDI T2D files in a batch manner to

text files

• Extract mono-isotopic LC-MALDI peaks• Track multiple scans of the same MALDI plate

and HPLC SCX/RP fractions where each peak resides

• Cluster mono-isotopic peaks across categorical samples for comparative analysis

• Interface and integrate SAM, PAM, 1d classifiers, 2d classifiers, margin tree, CART algorithm packages for differential feature selection and classification

Common FeatureAlignment/Extraction

Spectrum Raw datasets

Peak datasets

Feature datasets

Indexed datasets

Mass-Conductor Database

Binary/Multi-class ClassificationFalse Discovery Rate Analysis Biomarker Discovery

PotentialBiomarkers

Web-ServiceCollaboration

Peak Extraction

Feature indexing

Patient datasets “MASS-Conductor ®”

DATA REDUCTION in “MASS-Conductor ®” Peak Extraction from Spectra Raw Data

Patient sample LC-MALDI Spot/fraction 13. m/z 900 – 4000: 118142 raw data points 1690 peak data points

0

200

400

600

800

1000

1200

1400

1600

1800

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61

62 peaks2530 data points

0

200

400

600

800

1000

1200

1400

1600

1800

1 153 305 457 609 761 913 1065 1217 1369 1521 1673 1825 1977 2129 2281 2433

m/z 1200 – 1250

Before data reduction

AR

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120

S

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120

V

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120

S

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120

AR

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120

V

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120

After data reduction

Class A

Class B

Class C

fractions

MS

sig

nal DATA REDUCTION – One Peptide Example

Peak Extraction from Spectra Raw Data

SEQUENCE 640 AA; 69761 MW 001 MGQPSLTWML MVVVASWFIT TAATDTSEAR WCSECHSNAT CTEDEAVTTC TCQEGFTGDG 061 LTCVDLDECA IPGAHNCSAN SSCVNTPGSF SCVCPEGFRL SPGLGCTDVD ECAEPGLSHC 121 HALATCVNVV GSYLCVCPAG YRGDGWHCEC SPGSCGPGLD CVPEGDALVC ADPCQAHRTL 181 DEYWRSTEYG EGYACDTDLR GWYRFVGQGG ARMAETCVPV LRCNTAAPMW LNGTHPSSDE 241 GIVSRKACAH WSGHCCLWDA SVQVKACAGG YYVYNLTAPP ECHLAYCTDP SSVEGTCEEC 301 SIDEDCKSNN GRWHCQCKQD FNITDISLLE HRLECGANDM KVSLGKCQLK SLGFDKVFMY 361 LSDSRCSGFN DRDNRDWVSV VTPARDGPCG TVLTRNETHA TYSNTLYLAD EIIIRDLNIK 421 INFACSYPLD MKVSLKTALQ PMVSALNIRV GGTGMFTVRM ALFQTPSYTQ PYQGSSVTLS 481 TEAFLYVGTM LDGGDLSRFA LLMTNCYATP SSNATDPLKY FIIQDRCPHT RDSTIQVVEN

541 GESSQGRFSV QMFRFAGNYD LVYLHCEVYL CDTMNEKCKP TCSGTRFRSG SVIDQSRVLN 601 LGPITRKGVQ ATVSRAFSSL GLLKVWLPLL LSATLTLTFQ

Human THP precursor, Swiss-Prot: P07911

Urine THP Peptide Biomarkers Fall into Tight Clusters in C-Terminus

Download - High throughput urine biomarker discovery and integrative analysis for translational medicine High throughput urine biomarker discovery and integrative

Top Related