making the most of your edman sequencing data: a primer on ... · a primer on data calling,...
TRANSCRIPT
Making the Most of Your Making the Most of Your EdmanEdmanSequencing Data:Sequencing Data:
A Primer on Data Calling,A Primer on Data Calling,Analysis, Interpretation, andAnalysis, Interpretation, and
ReportingReporting
ESRG Tutorial, ABRF 2003, Feb 10-13ESRG Tutorial, ABRF 2003, Feb 10-13Denver, CODenver, CO
Ben Madden,Ben Madden, Mayo Clinic, Rochester, MN Mayo Clinic, Rochester, MN
Topics CoveredTopics Covered
1. Aspects of calling amino acids1. Aspects of calling amino acids 2. Factors that interfere with making2. Factors that interfere with making
amino acid assignmentsamino acid assignments 3. Database searching3. Database searching 4. Reporting results4. Reporting results 5. Calling, searching, and interpreting5. Calling, searching, and interpreting
sample examples.sample examples.
Goals of Goals of Edman Edman SequencingSequencing
Assign N-terminal sequenceAssign N-terminal sequenceIdentify the protein(s)/peptide(s) presentIdentify the protein(s)/peptide(s) presentin samplein sampleLocate position of mutation/modificationLocate position of mutation/modificationIndirectly establish presence of aminoIndirectly establish presence of aminoterminal modification (terminal modification (acetylationacetylation,,pyroglutamicpyroglutamic, etc..), etc..)
Data generationData generation
Detecting changes in the heights/areasDetecting changes in the heights/areasof peaks corresponding to PTH-aminoof peaks corresponding to PTH-aminoacids in a series of consecutive HPLCacids in a series of consecutive HPLCchromatograms.chromatograms.-- Increases in height, signal presence ofIncreases in height, signal presence of
amino acid at a particular cycle followed byamino acid at a particular cycle followed bydecrease at a later cycledecrease at a later cycle
-- some noise level changes are presentsome noise level changes are presentthroughout the runthroughout the run
Analysis of raw dataAnalysis of raw data
Requires the means to compare peakRequires the means to compare peakheights or peak areas in oneheights or peak areas in onechromatogram with the peak heightschromatogram with the peak heightsand areas in succeedingand areas in succeedingchromatograms.chromatograms.
Methods of calling aminoMethods of calling aminoacidsacids
Strip chart recorder /light boxStrip chart recorder /light boxComputer chromatography softwareComputer chromatography software-- ABI Model 610ABI Model 610-- HP/HP/Agilent ChemstationAgilent Chemstation-- SequenceProSequencePro-- any chromatography softwareany chromatography software
Chromatography software:Chromatography software:Overlay or stacking optionOverlay or stacking option
Compares 2 orCompares 2 ormoremorechromatogramschromatograms-- manually visualizemanually visualize
the differences inthe differences inpeak heightspeak heights
-- more forgiving ofmore forgiving ofinconsistencies ininconsistencies inchromatographychromatography
6.0 8.0 10.0 12.0 14.0 16.0 18.0
-1.50
-1.20
-0.90
-0.60
-0.30
:2:3:Std1
Chromatography software:Chromatography software:Subtraction mode optionSubtraction mode option
Shows combinedShows combinedimage of currentimage of currentchromatogramchromatogramminus peaks ofminus peaks ofprior chromatogramprior chromatogram-- requires tightrequires tight
chromatographychromatography-- hard to seehard to see
consecutive aminoconsecutive aminoacidsacids
A-singleexm Residue 3-2
DN S
QTG E
H
A R P M
V
REFWDPU
F I K L
6.0 9.0 12.0 15.0 18.0 21.0
-2.00
0.00
2.00
4.00
Chromatography softwareChromatography softwareoptionsoptions
HistogramsHistograms-- peak height/areas of a single amino acid inpeak height/areas of a single amino acid in
each cycle of the runeach cycle of the run-- requires good integration/requires good integration/quantitationquantitation
Chromatography softwareChromatography softwareoptionsoptions
Software callingSoftware calling-- requires tight chromatography and solidrequires tight chromatography and solid
integrationintegration-- may miss problematic amino acidsmay miss problematic amino acids-- not always reliablenot always reliable
What constitutes a calledWhat constitutes a calledamino acid?amino acid?
Potentially any signal that rises abovePotentially any signal that rises abovebackground level variations and falls atbackground level variations and falls ata later cycle can be an assignmenta later cycle can be an assignmentCalls may be defined as positive orCalls may be defined as positive ortentative depending on how far abovetentative depending on how far abovebackground levels the peaks risebackground levels the peaks riseExperience is a factorExperience is a factor
Calling amino acidsCalling amino acids
Although all amino acids are similar inAlthough all amino acids are similar inthethe Edman Edman degradation reactions and degradation reactions andthe resulting PTH-amino acids all havethe resulting PTH-amino acids all havevery similar extinction coefficients atvery similar extinction coefficients at269nm, there are some modifications269nm, there are some modificationsthat can affect the height/areas of thethat can affect the height/areas of theHPLC peaksHPLC peaks
Calling amino acidsCalling amino acidsPTH-PTH-serser recoveries are lower due to recoveries are lower due toloss of Hloss of H22OO-- forms PTH-DTT-forms PTH-DTT-dehydroalanine dehydroalanine derivativederivative
S
6.0 8.0 10.0 12.0 14.0 16.0 18.0-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
dha
A
Calling amino acidsCalling amino acidsPTH-PTH-thr thr recovery can be lower due torecovery can be lower due toloss of Hloss of H22OO-- forms numerous forms numerous dehydrodehydro--aminoisobutyricaminoisobutyric
acid-DTT derivativesacid-DTT derivatives B- Celgene/JM UbcH7 011003 8:Residue 5
DN S
Q
T
G
E A RY
PV
dptu
Wdpu
F IK
L
6.0 8.0 10.0 12.0 14.0 16.0 18.0
-1.00
0.00
1.00
2.00
3.00
4.00
x xx x
T
Calling amino acidsCalling amino acidsMethionineMethionine can oxidize, resulting in can oxidize, resulting inlower recovery of PTH-Metlower recovery of PTH-Met
6.0 8.0 10.0 12.0 14.0 16.0 18.0-1.50
-1.00
-0.50
0.00
0.50
5.0 6.0 7.0 8.0 9.0 10.0
-1.20
-0.90
-0.60
-0.30
0.00
M
MetO
Calling amino acidsCalling amino acidsPTH-PTH-GluGlu is accompanied by PTH- is accompanied by PTH-GluGlu--aniline amide which becomes moreaniline amide which becomes moreabundant in later cyclesabundant in later cycles
6.0 8.0 10.0 12.0 14.0 16.0 18.0-2.00
-1.00
0.00
:13:14
E’E
Calling amino acidsCalling amino acids
TryptophanTryptophan is often oxidized to several is often oxidized to severalkynureninylkynureninyl adducts,resulting in low or adducts,resulting in low orno recovery of PTH-no recovery of PTH-TrpTrp
16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0
-4.00
-3.00
-2.00
-1.00
0.00
1.00
:8:9:Std1
ox W
P V DPTU W DPU F
Std pks in orange
Calling amino acidsCalling amino acidsNo PTH-No PTH-cysteinecysteine peaks are observed peaks are observeddue to loss of Hdue to loss of H22SS-- might see some PTH-DTT-might see some PTH-DTT-dehydroalaninedehydroalanine-- must bemust be alkylated alkylated ( (iodoacetamideiodoacetamide, vinyl-, vinyl-
pryidinepryidine,, acrylamide acrylamide, etc…), etc…)PTC-PTC-prolineproline is slow to cleave leaving is slow to cleave leavinggreater than normal laggreater than normal lag
PTH-S-proprionamide-Cys
EH
Calling amino acidsCalling amino acids
6.0 8.0 10.0 12.0 14.0 16.0
-1.00
0.00
1.00
2.00
3.00
:6:7:Std1
P
PVDF blot where PVDF blot where cysteine cysteine reacted withreacted withacrylamide acrylamide during electrophoresisduring electrophoresisProline Proline laglag
Calling amino acidsCalling amino acids
PTH-PTH-AsnAsn and PTH- and PTH-GlnGln will partially will partiallydeamidatedeamidate to give PTH-Asp and PTH- to give PTH-Asp and PTH-GluGluPTH-His and PTH-PTH-His and PTH-Arg Arg can be low duecan be low dueto poor extractionto poor extraction
Calling amino acidsCalling amino acids
“Blank” cycles can occur due to:“Blank” cycles can occur due to:-- unalkylated cysunalkylated cys-- completely oxidized completely oxidized trptrp-- modified amino acidmodified amino acid
•• asnasn-CHO , (N)-X-S/T motif-CHO , (N)-X-S/T motif
Calling amino acidsCalling amino acidsMultiple amino acids per cycleMultiple amino acids per cycle-- major and minor signals might be used tomajor and minor signals might be used to
assign more than one sequenceassign more than one sequence-- more challenging to distinguish major andmore challenging to distinguish major and
minor with comparable signals, taking intominor with comparable signals, taking intoaccount low recovery a.a.’saccount low recovery a.a.’s(C,W,S,T,R,H,M)(C,W,S,T,R,H,M)
-- numerous low level signals in each cyclenumerous low level signals in each cyclemay just be noisemay just be noise
Calling amino acidsCalling amino acids
Rising background throughout the runRising background throughout the run-- dependent on protein stabilitydependent on protein stability-- more prominent with larger protein runs more prominent with larger protein runs-- will limit length of calls will limit length of calls
Factors preventing goodFactors preventing goodassignmentsassignments
Presence of non sample amino acidsPresence of non sample amino acids-- contaminated cartridge blockscontaminated cartridge blocks
•• cleaning procedures using cleaning procedures using MeOHMeOH, , acnacn/H2O,/H2O,nitric acid, nitric acid, pyrolysispyrolysis
-- contaminated supports (PVDF blots,contaminated supports (PVDF blots,PVDFstripsPVDFstrips, GF,etc), GF,etc)
-- “dirty”“dirty” Polybrene Polybrene
Factors preventing goodFactors preventing goodassignmentsassignments
Sequencer performance (chemical orSequencer performance (chemical ormechanical problems)mechanical problems)-- excessive lagexcessive lag-- poor repetitive yieldpoor repetitive yield-- evaluate by running a standard proteinevaluate by running a standard protein
frequently or use an internal peptidefrequently or use an internal peptidestandard for each runstandard for each run
Factors preventing goodFactors preventing goodassignmentsassignments
Sequencer chemical artifactsSequencer chemical artifacts-- bad solvents and reagents, additivesbad solvents and reagents, additives
•• excessive DTT in S2Bexcessive DTT in S2B-- HPLC solvents, buffers, additivesHPLC solvents, buffers, additives-- co-co-GlnGln, aniline, DPTU, DPU, aniline, DPTU, DPU
Factors preventing goodFactors preventing goodassignmentsassignments
Sequencer chemical artifacts Sequencer chemical artifacts contcont..-- R2B (red) vsR2C (blue) (+/- PMTC)R2B (red) vsR2C (blue) (+/- PMTC)
15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0
-2.00
-1.60
-1.20
-0.80
-0.40
:10:13
17.0 18.0 19.0 20.0 21.0 22.0 23.0
0.00
10.00
20.00
:9:8
17.0 18.0 19.0 20.0 21.0 22.0 23.0-3.00
-2.00
-1.00
0.00
1.00
2.00
:9:8
PMTCDPTU
ox Trp
R2C red vs R2B blue
Factors preventing goodFactors preventing goodassignmentsassignments
HPLC problemsHPLC problems-- retention time reproducibilityretention time reproducibility
•• replace worn pump sealsreplace worn pump seals•• eliminating leaky fittingseliminating leaky fittings•• gradients / column equilibrationgradients / column equilibration
-- baseline flatnessbaseline flatness•• acetone, KHacetone, KH22POPO44
-- column lifecolumn life•• lower, broader peaks with older columnlower, broader peaks with older column
6.0 9.0 12.0 15.0 18.0 21.0
-3.00
0.00
3.00
6.00
9.00
:3:4:5
6.0 9.0 12.0 15.0 18.0 21.0
-2.00
0.00
2.00
4.00 normal
Pump seal failure
6.0 9.0 12.0 15.0 18.0 21.0 24.0-5.00
-4.00
-3.00
-2.00
-1.00
Factors preventing goodFactors preventing goodassignmentsassignments
cLC cLC guard column failureguard column failure
Factors affecting goodFactors affecting goodassignmentsassignments
SamplesSamples-- sample amount / puritysample amount / purity
•• the lower the amount, the higher thethe lower the amount, the higher thepurity required for confident callspurity required for confident calls
-- sample prep (see last years tutorial onlinesample prep (see last years tutorial onlineat WWW.ABRF.ORG)at WWW.ABRF.ORG)
Reasons for databaseReasons for databasesearchingsearching
Identification of protein sampleIdentification of protein sample-- Is assigned sequence in the database?Is assigned sequence in the database?-- Does the hit clearly identify the protein?Does the hit clearly identify the protein?
•• Is the match real or by chance?Is the match real or by chance?•• Longer sequences for definitive hitsLonger sequences for definitive hits
Determine if the sequence is unique.Determine if the sequence is unique.-- Is assigned sequence not in the database?Is assigned sequence not in the database?-- Does no exact hit mean you have a newDoes no exact hit mean you have a new
sequence?sequence?
Reason for databaseReason for databasesearchingsearching
Determine homology to otherDetermine homology to othersequences in the databasesequences in the database-- need enough sequence to establish aneed enough sequence to establish a
relationrelationSort multipleSort multiple Edman Edman assignments assignments-- multiple amino acids per cyclemultiple amino acids per cycle
Statistical based searchStatistical based searchalgorithmsalgorithms
BLASTBLAST-- AltschulAltschul,S.F., ,S.F., GishGish,W., Miller,W., Myers.E.W., and,W., Miller,W., Myers.E.W., and
LipmanLipman,D.J. (1990) J.,D.J. (1990) J.MolMol.. Biol Biol. 215, 403-410. 215, 403-410
FASTAFASTA-- LipmanLipman, D.J. and Pearson, W.R. (1985) Science, D.J. and Pearson, W.R. (1985) Science
227,1435-1441227,1435-1441
SSEARCH (Smith-SSEARCH (Smith-WatermanWaterman))-- Smith, T.F. and Smith, T.F. and WatermanWaterman, M.S. (1981) J., M.S. (1981) J.
MolMol. . BiolBiol. 147, 196-197. 147, 196-197
Text based search algorithmsText based search algorithms
FINDPATTERN (GCG)FINDPATTERN (GCG)MSPATTERN / MSEDMANMSPATTERN / MSEDMANPeptidesearchPeptidesearch
Protein DatabasesProtein Databases
NCBI NCBI nrnrSWALLSWALLSwissprotSwissprotTrEMBLTrEMBLLudwig Ludwig nrnrOwlOwlPIRPIRPRFPRF
Factors that influenceFactors that influencedatabase searchingdatabase searching
Search algorithmSearch algorithm(FASTA,BLAST,SSEARCH)(FASTA,BLAST,SSEARCH)Length of query sequence (>5)Length of query sequence (>5)Scoring matrix (PAM#,BLOSUM#)Scoring matrix (PAM#,BLOSUM#)Gap cost / PenaltyGap cost / Penalty
Factors that influenceFactors that influencedatabase searchingdatabase searching
WordsizeWordsize(1-3)(1-3)FilteringFilteringExpect (E)Expect (E)DatabaseDatabase
Web-Based SearchingWeb-Based Searching
National Center for BiotechnologyNational Center for BiotechnologyInformation (NCBI)Information (NCBI)-- www.www.ncbincbi..nlmnlm..nihnih..govgov/BLAST//BLAST/-- online BLAST tutorialonline BLAST tutorial-- BLAST searchesBLAST searches-- nr nr databasedatabase
Web-Based SearchingWeb-Based Searching
European Molecular BiologyEuropean Molecular BiologyLaboratory-European Laboratory-European BioinformaticsBioinformaticsInstitute (EMBL-EBI)Institute (EMBL-EBI)-- www.www.ebiebi.ac..ac.ukuk/tools/tools-- FASTAFASTA-- BLASTBLAST-- SSEARCHSSEARCH-- SWALL databaseSWALL database
Database Searching: FirstDatabase Searching: FirstAttemptAttempt
BLAST defaultBLAST defaultparametersparameters-- filterfilter-- BLOSUM62BLOSUM62-- expect 10expect 10-- wordsize wordsize 33-- database database nrnr
FASTA defaultFASTA defaultparametersparameters-- BLOSUM50BLOSUM50-- expect 1expect 1-- wordsize wordsize (k-(k-tuptup) 2) 2-- database database swallswall
Search returns no hitsSearch returns no hits
Search parameters too strictSearch parameters too strict-- remove filtersremove filters-- increase Expectincrease Expect-- use lower PAM or higher BLOSUM matrixuse lower PAM or higher BLOSUM matrix-- decrease word sizedecrease word size-- At NCBI BLAST can use “nearly identicalAt NCBI BLAST can use “nearly identical
short sequences” optionshort sequences” option•• E 20000E 20000•• PAM30PAM30
Search returns many hitsSearch returns many hits
Are the hits occurring by randomAre the hits occurring by randomchance?chance?-- parameters not strict enough parameters not strict enough
•• decrease Edecrease E•• higher PAM or lower BLOSUMhigher PAM or lower BLOSUM•• use filteruse filter
Are the hits to a highly conservedAre the hits to a highly conservedsequence?sequence?Need more sequence dataNeed more sequence data
Searching still returns noSearching still returns noexact hitexact hit
Sequence not in protein databaseSequence not in protein database-- search nucleotide databasesearch nucleotide database
•• TBLASTN, TFASTATBLASTN, TFASTA-- EST’sEST’s
Database searching withDatabase searching withmultiple amino acidsmultiple amino acids
Amino acids are similar in amountAmino acids are similar in amountCannot assign a major sequenceCannot assign a major sequenceUse search algorithms that allowUse search algorithms that allowmultiple entriesmultiple entriesUse search results to sort amino acidUse search results to sort amino acidassignments into protein sequencesassignments into protein sequences
Web based searching:Web based searching:multiple amino acidsmultiple amino acids
Text based searchingText based searching-- Protein ProspectorProtein Prospector
((www.prospector.www.prospector.ucsfucsf..eduedu))•• MSPATTERNMSPATTERN•• numerous databasesnumerous databases•• M.W. / species filteringM.W. / species filtering
-- PepSearch PepSearch (www.(www.mannmann..emblembl--heidelbergheidelberg.de/.de/GroupPagesGroupPages//PageLinkPageLink//peptipeptidesearchpagedesearchpage.html).html)
Web based searching:Web based searching:multiple amino acidsmultiple amino acids
Statistical based searchingStatistical based searching-- FASTF / FASTF3FASTF / FASTF3
•• http://http://fastafasta..biochbioch..virginiavirginia..eduedu•• http://www.http://www.ebiebi.ac..ac.ukuk/fasta33/fasta33•• numerous databasesnumerous databases
Reporting Results: ContentsReporting Results: Contents
Whatever the user wantsWhatever the user wantsRaw data (PTH chromatograms, list ofRaw data (PTH chromatograms, list ofall amino acid yields)all amino acid yields)Called amino acids in each cycleCalled amino acids in each cycle-- major and minormajor and minor-- positive or tentativepositive or tentativePmolesPmoles (raw or background subtracted) (raw or background subtracted)Initial yield / repetitive yieldInitial yield / repetitive yield
Reporting Results: ContentsReporting Results: Contents
Individual cycle commentsIndividual cycle commentsComments on the sequencing runComments on the sequencing runAssigned sequence/sAssigned sequence/sDatabase search parameters andDatabase search parameters andresultsresultsReconcile the sequencer data andReconcile the sequencer data anddatabase resultsdatabase results
Reporting Results: contentsReporting Results: contents
Sequencer and run conditionsSequencer and run conditionsSample Sample workupworkup
Reporting Results: stylesReporting Results: styles
Lab designed report formsLab designed report formsSequence analysis software printoutsSequence analysis software printoutsSpreadsheetsSpreadsheetsWordprocessorWordprocessorDatabase programs (Filemaker Pro)Database programs (Filemaker Pro)Handwritten report / emailHandwritten report / email
ESRG 2003 Sample UserESRG 2003 Sample UserReportsReports
36 reports received36 reports received-- 25 lab designed report form25 lab designed report form-- 6 sequence analysis software printout6 sequence analysis software printout-- 3 spreadsheet3 spreadsheet-- 1 email1 email-- 1 copy of handwritten lab notebook page1 copy of handwritten lab notebook page
Information included in ABRFInformation included in ABRFESRG 2003 user reportsESRG 2003 user reports
Information included in ABRF ESRG 2003 user reports %Sample information 58%Sample preparation 17%Sequencer run conditions 14%Raw data 8%Manuallly called amino acids 89%Positive / tentative call distinction 69%Place for minor calls 33%Computer called amino acids 14%Pmole raw 44%Pmole background subtracted 22%Individual cycle discussion 39%Assigned sequence 69%IY / RY information 31%Sequencing run discussion 33%Edman degradation discussion 17%
Perform database search 44%Database search parameters 25%Copy of database search results 88%Copy of database protein entry 38%Database search discussion 50%
ESRG 2003 Report ExamplesESRG 2003 Report Examples