influenza research database (ird) - fludb. · pdf fileannotations for influenza virus sequence...
TRANSCRIPT
In order to provide comprehensive, consistent annotations for influenza virus sequence data, the IRD team has developed a variety of custom annotation and computational pipelines. This special newsletter issue highlights the variant protein annotation efforts carried out by IRD.
Variant Protein Annotations
In recent years, the influenza community has identified several novel proteins generated from non-canonical translation strategies such as leaky ribosomal scanning (PB1-F2, PB1-N40, PA-N155 and PA-N182), ribosomal frameshift (PA-X) and alternative splicing (M42 and NS3). Anticipating the desire to search and analyze these newly discovered variant proteins, the IRD team developed a custom annotation algorithm that predicts the open reading frames and protein sequences for each of the PB1-N40, PA-N155, PA-N182, PA-X, M42 and NS3 variant proteins based on the presence of experimentally defined sequence features (SOP).
Using this algorithm, the IRD team has annotated all relevant influenza segment sequences with variant proteins if they are predicted to be present. As of March 2017, over 93% of complete genome strains in IRD have predicted PB1-N40, PA-N155, PA-N182 and PA-X (in three variant forms: +41, +61 or other) proteins (Table 1). M42 and NS3 have very rare and strict alternative splicing, and are therefore only found in 0.2% and 0.1% of influenza strains, respectively. These annotations can be viewed on the Strain Details page (Figure 1).
Search for Variant Proteins
These predicted sequences can be retrieved from the Nucleotide Sequence Search and Protein Sequence Search pages (Figure 2), transferred to any IRD analysis tools (Figure 3) and downloaded.
Outreach Events
• May 29, 2017: IRD/ViPR workshop, Erasmus University Medical Center, Netherlands
• June 24-28, 2017: American Society for Virology Annual Meeting, Madison, Wisconsin
Figure 2. The Protein Sequence Search page supports queries based on ‘classical proteins’, ‘variant proteins’ and sequence-associated metadata.
3/10/2017 Influenza Research Database - A/northern shoveler/Minnesota/Sg-00651/2008 - PA Polymerase (acidic) protein, PA-N155 protein(562), PA-N182 protein(535), …
https://www.bacpathbrc.org/brc/fluSegmentDetails.spg?ncbiProteinId=AGS48650&decorator=influenza&context=1489182921105 3/6
Prediction Details
MHC Supertype # Predictions
A3 32
A2 23
A24 48
B7 10
B44 34
Total 147
Identical SequencesThere are 3 other protein(s) with a sequence identical to this protein
Sequence accession Collection Date Host Species Country Subtype Strain Name
CY042759 20080731 Northern Shoveler/Avian USA H1N1 A/northern shoveler/Minnesota/Sg00651/2008(H1N1)
CY042775 20080731 Northern Shoveler/Avian USA H1N1 A/northern shoveler/Minnesota/Sg00655/2008(H1N1)
KF424249 20080731 Northern Shoveler/Avian USA H1N1 A/northern shoveler/Minnesota/Sg00655/2008(H1N1)
Gene Ontology ClassificationName GO ID Annotation Source Evidence Similar Sequences
Biological Process
transcription, DNAdependent GO:0006351 UniProtKB IEA
Molecular Function
RNA binding GO:0003723 UniProtKB IEA
RNAdirected RNA polymerase activity GO:0003968 UniProtKB IEA
Database Cross References*2
Database Name Accession Description
INTERPRO IPR001009 RNA_pol_P2
PFAM PF00603 Influenza RNAdependent RNA polymerase subunit PA
Protein: PAN155 protein(562)
Protein Information *2
Protein Name: PAN155 protein(562)
Gene Symbol: PAN155
UniProtKB Accession: N/A
IRD Protein Accession: IRD_528916889_463_2151.1
IRD Protein GI: IRD_528916889_463_2151
Source: IRD
Protein Sequence: View Sequence
HMM/Pfam Domains (SOP)Accession Name Description Start End
PF00603 Flu_PA Influenza RNAdependent RNA polymerase subunit PA 1 561
Other Domains/Motifs (SOP)Domain/Motif Start End Program
low_complexity 41 55 seg
Predicted Epitopes (SOP)Prediction Details
MHC Supertype # Predictions
A2 18
A3 22
A24 36
B7 10
B44 26
Loading Influenza Research Database...
Add to Working Set Save Search Download
< Previous 56 57 58 59 60 61 62 Next > Page: 59 of 1988
< Previous 56 57 58 59 60 61 62 Next > Page: 59 of 1988
Your search returned 39,757 proteins. Search Criteria Displaying 20 records per page, sorted by Strain Name inascending order.
Display Settings
Protein Sequence Search ResultsYour Selected Items: 0 items selected
Select all 39,757 proteins
More columns were returned than can be displayed without scrolling. Use scroll bars at top and bottom of display to move right and left or reduce the number ofcolumns displayed by using the Display Settings link above.
Your Selected Items: 0 items selected
Name SequenceAccession
CompleteGenome Segment Segment
Length Subtype * CollectionDate Host Species Country State/Province
FluSeason (SOP)
Strain Name
PAXprotein(+61)
CY181574 Yes 3 2232 H7N9 2013 Human China N/A N/A A/Anhui/DEWH7208/2013
PAXprotein(+61)
CY181582 Yes 3 2232 H7N9 2013 Human China N/A N/A A/Anhui/DEWH7209/2013
PAXprotein(+61)
EU008580 No 3 2102 H5N1 2006 Human China N/A N/A A/Anhui/T2/2006
PAXprotein(+41)
CY071593 No 3 2151 H1N1 10/19/2009 *Human Turkey N/A N/A *A/Ankara/WR1429T/2009(H1N1)
PAXprotein(+41)
CY073063 No 3 2151 H1N1 11/15/2009 *Human Turkey N/A N/A *A/Ankara/WRAIR1425T/2009(H1N1)
PAXprotein(+41)
CY073071 No 3 2151 H1N1 10/20/2009 *Human Turkey N/A N/A *A/Ankara/WRAIR1426N/2009(H1N1)
PAXprotein(other)
CY073079 No 3 2154 H1N1 10/21/2009 *Human Turkey N/A N/A *A/Ankara/WRAIR1428T/2009(H1N1)
PAXprotein(+41)
CY073087 Yes 3 2151 H1N1 11/13/2009 *Human Turkey N/A N/A A/Ankara/WRAIR1435T/2009
PAXprotein(+41)
CY073095 * No 3 2148 H1N1 11/04/2009 *Human Turkey N/A N/A *A/Ankara/WRAIR1440T/2009(H1N1)
PAXprotein(+61)
CY125915 Yes 3 2208 H2N2 1957 Human USA Michigan N/A *A/Ann Arbor/23/1957(H2N2)
PAXprotein(+61)
CY031584 Yes 3 2190 H2N2 1957 Human USA Michigan N/A A/Ann Arbor/23/1957
PAXprotein(+61)
M23974 Yes 3 2233 H2N2 1960 Human USA Michigan N/A A/Ann Arbor/6/1960
PAXprotein(+61)
CY125907 Yes 3 2208 H2N2 1960 Human USA Michigan N/A *A/Ann Arbor/6/1960(H2N2)
PAXprotein(+61)
AY209994 No 3 2151 H2N2 1960 Human USA Michigan N/A A/Ann Arbor/6/60
PAXprotein(+61)
CY125843 Yes 3 2193 H2N2 1967 Human USA Michigan N/A *A/Ann Arbor/7/1967(H2N2)
PAXprotein(+61)
AY210005 No 3 2151 H2N2 1967 Human USA Michigan N/A A/AnnArbor/7/67
PAXprotein(+61)
KT699055 Yes 3 2151 H9N2 12/18/2014 *Goose/Avian China N/A N/A A/Anser fabalis/Anhui/L139/2014
PAXprotein(+61)
KM076703 Yes 3 2151 H9N2 01/16/2014 *Goose/Avian China N/A N/A A/Anser fabalis/China/HuBS428/2014
PAXprotein(+41)
CY066828 Yes 3 2175 H1N1 10/28/2009 *Human Belgium N/A 0910 *A/Antwerp/INS221/2009(H1N1)
PAXprotein(+61)
CY055174 Yes 3 2151 H11N1 12/23/2007 AquaticBird/Avian
India N/A N/A *A/aquatic bird/India/NIV17095/2007(H11N1)
Run Analysis ▼
Home Protein Sequence Search Results
SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HELP
About Us Community Announcements Links Resources Support Sign Out
Figure 1. Variant protein annotations are displayed on the Strain Details page
Figure 3. A portion of the Protein Sequence Search Results page from a query for PA-X, showing annotations of the three different PA-X variants: PA-X (+41), PA-X (+61) and PA-X (other). Selected records from this page can be input to any of the analysis tools under the ‘Run Analysis’ dropdown menu (red arrow), or downloaded to a local computer.
Release Date: Mar 16, 2017
This system is provided for authorized users only. Anyone using this system expressly consents to monitoring while using the system. Improper use of this system may be referred to lawenforcement officials.This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272201400028C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute, and Vecna Technologies.
DATA TO RETURN
Segment / Nucleotide
Protein
Strain
VIRUS TYPE
A
B
C
Provisional Influenza D(PMID:24595369)
SUB TYPE
* Use comma to separate multipleentries.Ex: H1N1, H7, H3N2.
STRAIN NAME
* Use comma to separate multipleentries.Ex: A/chicken/Israel/1055/2008,A/chicken/Laos/16/2008.
Include Partial SequencesComplete Segments OnlyComplete Genomes Only
Include pH1N1 proteinsInclude only pH1N1 proteinsExclude all pH1N1 proteins
'CLASSICAL' PROTEINS1 PB22 PB13 PA4 HA5 NP6 NA7 M17 M28 NS18 NS2
'VARIANT' PROTEINS (SOP)
2 PB1-F22 PB1-N403 PA-N1553 PA-N1823 PA-X7 M428 NS3
COMPLETE SEQUENCES
2009 pH1N1 SEQUENCES(SOP)
DATE RANGE
From: YYYY To: YYYY
To add month to search, seeAdvance Options: Month Range
HOSTAllAnteaterAvianBatBeetleBovineCamelCivetCivet CatDogDomestic CatDonkeyEnvironmentEquineFerretFlat-Faced BatHuman
AVIANAllAdelie PenguinAfrican StarlingAfrican StonechatAmerican Black DuckAmerican Green-Winged TealAmerican White PelicanAmerican Widgeon
GEOGRAPHIC GROUPINGAllAfricaAntarcticaAsiaEuropeNorth AmericaOceania
COUNTRYMexicoMontserratNicaraguaPanamaPuerto RicoTrinidad and TobagoUSA
USA STATEAlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelaware
ADVANCED OPTIONSSearchClear
Results matching your criteria: 8,360
Tip: To select multiple or deselect, Ctrl-click (Windows) or Cmd-click (MacOS)Show All
Protein Sequence SearchSearch for influenza sequences, proteins, and strains using two types of searches. Use the advanced search to allow you to refine your search with the more fine grained search,and you can pick your viewing options.
Home Protein Sequence Search
SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HELP
About Us Community Announcements Links Resources Support Workbench Sign In
Influenza Research Database - Sequence Search https://www.fludb.org/brc/influenza_sequence_search_segment_d...
1 of 1 3/24/17, 4:35 PM
The March 2017 release of IRD is now available, visit
www.fludb.org
Variant Protein Variant Protein from
Complete Genomes
Percentage* Source
PB1-F2 22585 70.9% GenBank
PB1-N40 31509 99.0% IRD
PA-N155 31650 99.1% IRD
PA-N182 29755 93.1% IRD
PA-X 31649 99.1% IRD
PA-X protein(+41) 9178 28.7% GenBank & IRD
PA-X protein(+61) 22392 70.1% GenBank & IRD
PA-X protein(other)
79 0.2% GenBank & IRD
PA-X protein 2837 8.9% GenBank
M42 76 0.2% IRD
NS3 38 0.1% IRD
Table 1. Variant protein annotations in IRD as of March 10, 2017
New Features in IRD
Influenza Research Database (IRD)
Questions? Problems? Suggestions?Click Here
March2017