bioinformacs resources - genbank · 2016. 4. 29. · bioinfres sose 16 most growing divisions...

73
BioinfRes SoSe 16 Bioinforma)cs Resources - Genbank - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

Upload: others

Post on 01-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • BioinfRes SoSe 16

    Bioinforma)csResources-Genbank-

    Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb

    Ins)tutfürInforma)kI12

  • BioinfRes SoSe 16

    Na)onalCenterforBiotechnologyInforma)on,NCBI

    http://nihrecord.nih.gov/newsletters/2013/07_19_2013/images/milestonesPic6.jpg

    ●  firstideasinthemiddleofthe80s

    ●  divisionoftheNa)onalLibraryofMedicine(NLM)insidetheNa)onalIns)tutesofHealth(NIH)

    ●  poli)calmission

    ●  foundedin1988

    ●  DavidLipman

  • BioinfRes SoSe 16

    NCBI’spoli)calmissionasdefinedbythebill:1.  design,develop,implement,andmanageautomatedsystems

    forthecollec)on,storage,retrieval,analysis,anddissemina)onofknowledgeconcerninghumanmolecularbiology,biochemistry,andgene)cs;

    2.  performresearchintoadvancedmethodsofcomputer-basedinforma)onprocessingcapableofrepresen)ngandanalyzingthevastnumberofbiologicallyimportantmoleculesandcompounds;

    3.  enablepersonsengagedinbiotechnologyresearchandmedicalcaretousesystemsdevelopedunderparagraph(1)andmethodsdescribedinparagraph(2);and

    4.  coordinate,asmuchasisprac)cable,effortstogatherbiotechnologyinforma)ononaninterna)onalbasis.

  • BioinfRes SoSe 16

    SelectedNCBIAccomplishmentsBlastGenBankatNCBI

    NCBIwebsite

    GenomesOMIM

    PubMed

    1990

    1992

    1994

    1995

    1996

    1997

    HumanGenomePubMedCentral

    EntrezGene/DTDs

    NIHPublicAccessGenomeReferenceConsor)um

    1000GenomesProject

    1999

    2000

    2003

    2005

    2007

    2008

  • BioinfRes SoSe 16

    NCBIResources●  NCBIcurrentlyhostsavastbunchofresourceshap://www.ncbi.nlm.nih.gov/guide/all/

    ●  groupedaccordingtovariouscriteria-  metadata,project-centric-  methodoriented-  topicoriented

    ●  sortedinthesec)ons:databases,downloads,submissions,tools,howtos

  • BioinfRes SoSe 16

    Genbank’sOrigin

    ●  WalterGoad,LosAlamosNa)onalLaboratory

    ●  LosAlamosSequenceDatabase1979

    ●  Crea)onandreleaseofGenBankin1982

    ●  Endof1982:2000sequences

    ●  MovetoNCBIin1992http://www.lanl.gov/science-innovation/features/innovations/images/light/thumbnails/21.jpg

  • BioinfRes SoSe 16

    Minutesfrom20thanniversaryofGenBankin2002

    “....AmongthemisamemoonLosAlamosNa)onalLaboratorysta)onerydatedMay9,1980,thatreads:Monday,May12at10:30SteveSimoninvitesyouforcakeandcoffeetocelebrate100,000basesnowintheDNAsequencelibrary.”

    takenfromhaps://www.genomeweb.com/genbank-turns-20

  • BioinfRes SoSe 16

    GrowthofGenBankandWGS

    -doublingapprox.every18months,diagramforrelease207,Apr.2015-currentversion:release213,Apr.2016:211.423.912.047basesinGenbank,1.452.207.704.949basesinWGS-takenfromhap://www.ncbi.nlm.nih.gov/genbank/sta)s)cs

  • BioinfRes SoSe 16

    GrowthofGenBankandWGS

    -currentrelease213:193.739.511sequenceinGenbank,338.922.537sequencesinWGS-takenfromhap://www.ncbi.nlm.nih.gov/genbank/sta)s)cs,release207,Apr.2015

  • BioinfRes SoSe 16

    ReferencesforGenBank●  thecurrentcita)onsource:“GenBank”.NucleicAcidsRes.2014Jan;42(Databaseissue):D32-7.doi:10.1093/nar/gkt1030.Epub2013Nov11.

    ●  PMID:24217914●  partoftheInterna)onalNucleo)deSequenceDatabaseCollabora)on(INSDC)togetherwithEMBLNucleo)deSequenceDatabase(EMBL-Bank),partoftheEuropeanNucleo)deArchive(ENA)andtheDNADataBankofJapan(DDBJ)

  • BioinfRes SoSe 16

    MostGrowingDivisionsDivision Description Release 197

    (8/2013) Annual Increase (%)

    WGS* Whole-genome shotgun data 500.420.412.665 62.4.

    TSA* Transcriptome shotgun data 8.6333123.935 49.9

    PHG Phages 119.812.712 42.5

    VRL Viruses 1.757.202.472 22.9

    BCT Bacteria 10.281.048.518 21.8

    ENV Environmental samples 3.743.277.434 10.9

    INV Invertebrates 2.737.140.464 9.8

    PAT Patented sequences 13.290.161.247 9.7

    PLN Plants 5.963.882.822 8.8

    GSS Genome survey sequences 23.726.384.753 8.1

    VRT Other vertebrates 3.068.956.026 6.3

    MAM Other mammals 911.342.025 5.6

    ... ... ... ...

    TOTAL All GenBank sequences 654.613.333.676 45.1

    * not distributed with the release; there specific project server sections

  • BioinfRes SoSe 16

    TopOrganisms(Rel.207)Organism Entries Non-WGS base

    pair Homo sapiens 20.921.637 17.714.786.437 Mus musculus 9.727.522 9.995.696.539

    Rattus norvegicus 2.193.812 6.526.236.496 Bos taurus 2.227.298 5.410.360.312 Zea mays 4.177.175 5.201.714.457 Sus scrofa 3.297.029 4.895.127.638

    Danio rerio 1.727.668 3.133.901.682 Triticum aestivum 1.796.780 1.927.718.314

    ... ... ... Oryza sativa

    Japonica Group 1.376.410 1.265.556.227

    ... ... ... Arabidopsis thaliana 2.578.785 1.202.100.008

    ... ...

  • BioinfRes SoSe 16

    Distribu)onofSequenceFiles(Rel.207)Division Number of Files

    BCT 178 CON 317 ENV 81 EST 478 HTG 142 INV 126 PAT 219 PLN 107 TSA 175 VRL 34

    Release 207 consists of 2333 text files in total.

  • BioinfRes SoSe 16

    DatabaseFiles

    ●  GenBankcomesinasetofcompressedtextfilesavailableviaFTP

    ●  seejp://jp.ncbi.nih.gov/genbank/gbrel.txt●  2333ASCIIfiles(listedindivisionplusaddi)onallistfiles)intherangeof0.7-520MB

    ●  uncompressed~709GB●  eachfileconsistsoftwopor)ons

  • BioinfRes SoSe 16

    DatabaseFiles●  Part1:highlyconserveddatabasefileheaders1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- GBBCT1.SEQ Genetic Sequence Data Bank April 15 2015 NCBI-GenBank Flat File Release 207.0 Bacterial Sequences (Part 1) 51396 loci, 92682287 bases, from 51396 reported sequences ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79

    ●  Part1:sequenceentriesforthatdivisiondescribedintheheader

  • BioinfRes SoSe 16

    TheGenBankFlatFileFormat

    ●  asequenceentryconsistsofmanyrecords(lines)●  eachrecordconsistsoftwoparts

    ●  Part1:columns1-10/EntryFieldName

    ●  Part2:remaininglinewiththecontent

  • BioinfRes SoSe 16

    Part1/1●  akeyword,beginningincolumn1oftherecord(e.g.,REFERENCEisakeyword)

    ●  asubkeywordbeginningincolumn3,withcolumns1and2blank(e.g.,AUTHORSisasubkeywordofREFERENCE)

    ●  orasubkeywordbeginningincolumn4,withcolumns1,2,and3blank(e.g.,PUBMEDisasubkeywordofREFERENCE)

  • BioinfRes SoSe 16

    Part1/2

    ●  blankcharacters,indica)ngthatthisrecordisacon)nua)onoftheinforma)onunderthekeywordorsubkeywordaboveit

    ●  acode,beginningincolumn6,indica)ngthenatureofanentry(featurekey)intheFEATUREStable

  • BioinfRes SoSe 16

    Part1/3●  anumber,endingincolumn9oftherecord:-  Thisnumberoccursinthepor)onoftheentrydescribingtheactualnucleo)desequenceanddesignatesthenumberingofsequenceposi)ons

    ●  twoslashes(//)inposi)ons1and2,markingtheendofanentry

  • BioinfRes SoSe 16

    Part2●  Thesecondpartofeachsequenceentryrecordcontainstheinforma)onappropriatetoitskeyword

    ●  inposi)ons13to80forkeywords

    ●  inposi)ons11to80forthesequence

  • BioinfRes SoSe 16

    EntryFieldTypes(incomplete)●  Locus:Ashortmnemonicnamefortheentry,chosentosuggestthesequence'sdefini)on;mandatorykeyword/exactlyonerecord.

    ●  Defini4on:Aconcisedescrip)onofthesequence;mandatorykeyword/oneormorerecords

    ●  Accession:-  theprimaryaccessionnumberisaunique,unchangingiden4fierassignedtoeachGenBanksequencerecord.

    -  tobeusedforcita)onsfromGenBank-  mandatorykeyword/oneormorerecords.

  • BioinfRes SoSe 16

    EntryFieldTypes(incomplete)

    ●  Version:-  compoundiden)fierconsis)ngoftheprimaryaccessionnumberandanumericversionnumberassociatedwiththecurrentversionofthesequencedataintherecord

    -  op)onallyfollowedbyanintegeriden)fier(a"GI")assignedtothesequencebyNCBI

    -  mandatorykeyword/exactlyonerecord

  • BioinfRes SoSe 16

    EntryFieldTypes(incomplete)

    ●  DBLINK:providescross-referencestoresourcesthatsupporttheexistenceasequencerecord;op4onalkeyword/oneormorerecords

    ●  Keywords:shortphrasesdescribinggeneproductsandotherinforma)onaboutanentry;mandatorykeywordinallannotatedentries/oneormorerecords

  • BioinfRes SoSe 16

    EntryFieldTypes(incomplete)

    ●  Source:Commonnameoftheorganismorthenamemostfrequentlyusedintheliterature;mandatorykeywordinallannotatedentries/oneormorerecords/includesonesubkeyword

    ●  Organism:Formalscien)ficnameoftheorganism(firstline)andtaxonomicclassifica)onlevels(secondandsubsequentlines);mandatorysubkeywordinallannotatedentries/twoormorerecords

  • BioinfRes SoSe 16

    EntryFieldTypes(incomplete)●  Reference:-  Cita)onsforallar)clescontainingdatareportedinthisentry

    -  includessevensubkeywordsandmayrepeat-  mandatorykeyword/oneormorerecords

    ●  Journal:liststhejournalname,volume,year,andpagenumbersofthecita)on;mandatorysubkeyword/oneormorerecords

    ●  op)onalsubkeywords:Authors,Consor)um,Title,Medline,Pubmed,Remark

  • BioinfRes SoSe 16

    EntryFieldTypes(incomplete)●  Features:tablecontaininginforma)ononpor)onsofthesequencethatcodeforproteinsandRNAmolecules;sitesofbiologicalsignificance;op4onalkeyword/oneormorerecords

    ●  Origin:-  specifica)onofhowthefirstbaseofthereportedsequenceisopera)onallylocatedwithinthegenome

    -  mandatorykeyword/exactlyonerecord-  followedbysequencedata(mul)plerecords)

    ●  //:entrytermina)onsymbol;mandatoryattheendofanentry/exactlyonerecord

  • BioinfRes SoSe 16

    DetailedLocusFormatColumns Contents 01-05 'LOCUS'

    06-12 spaces

    13-28 Locus name

    29-29 space

    30-40 Length of sequence, right-justified

    41-41 space

    42-43 bp

    44-44 space

    45-47 spaces, ss- (single-stranded), ds- (double-stranded), or ms- (mixed-stranded)

    48-53 NA, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA), mRNA (messenger RNA), uRNA (small nuclear RNA), left justified

    54-55 space

    56-63 'linear' followed by two spaces, or 'circular'

    64-64 space

    65-67 The division code

    68-68 space

    69-79 Date, in the form dd-MMM-yyyy (e.g., 15-MAR-1991)

  • BioinfRes SoSe 16

    AccessionFormat●  sixoreightcharacters●  sixcharacterformat:-  singleuppercaseleaer-  5digits

    ●  eigthcharacterformat:-  twouppercaseleaers-  6digits

    ●  primaryaccessionnumberalwaysthefirstone

  • BioinfRes SoSe 16

    Features(Incomplete)

    ●  authorita)vesource:hap://www.insdc.org/documents/feature-table

    ●  featuretablecontainsinforma)onabout:-  geneandgeneproducts-  regionsofbiologicalsignificance-  canenumeratedifferencesbetweenvariousreports-  providescross-referencestootherdatacollec)ons-  allowshierarchicalrela)onbetweenthefeatures

  • BioinfRes SoSe 16

    Layout●  firstlineofthefeaturetableisaheader●  includesthekeyword‘FEATURES’andthecolumnheader‘Loca)on/Qualifiers’

    ●  eachfeatureconsistsof:-  descriptorlinecontainingafeaturekeyandaloca)on

    -  acon)nua)onlinefortheloca)onmayfollow-  featurequalifiersmayfollowthedescriptorline-  key:column6-20,loca)onstartsincolumn22-  qualifiersonsubsequentlinesatcolumn22star)ngwitha‘/’

  • BioinfRes SoSe 16

    AFewFrequentFeatures●  CDS:sequencecodingforaminoacidsinprotein(includesstopcodon)

    ●  exon:regionthatcodesforpartofsplicedmRNA●  gene:regionthatdefinesafunc)onalgene,possiblyincludingupstream(promotor,enhancer,etc)anddownstreamcontrolelements,andforwhichanamehasbeenassigned

    ●  mRNA:messengerRNA

    ●  .......>60featurescurrently

  • BioinfRes SoSe 16

    Loca)onandQualifiers

    ●  Loca)on:-  aloca)oncanbe:asinglebase,aspanofbases,asitebetweentwobases,ajoinofsequences,...

    -  examples:23,23..56,23^24,join(23..56,87..110)

    ●  Qualifiers:-  format:fromcolumn22/qualifier_name[=value]-  types:freetext,enumera)onorcontrolledvocabulary,cita)ons,sequences,featurelabels

  • BioinfRes SoSe 16

    DatabaseCrossReferences/db_xref

    ●  hap://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/

    ●  Qualifier:/db_xref="database:idenDfier”●  Defini4on:databasecross-reference:pointertorelatedinforma)oninanotherdatabase

    ●  Scope:allfeaturekeys●  Example:/db_xref="Swiss-Prot:P12345”

    ●  currently>120databasesavailable

  • BioinfRes SoSe 16

    AnatomyofaGenbankFlatFile

    . . .

  • BioinfRes SoSe 16

    AnatomyofaGenbankFlatFile

    . . .

    Locus line

  • BioinfRes SoSe 16

    AnatomyofaGenbankFlatFile

    . . . Accession Number, Version and GI number

  • BioinfRes SoSe 16

    AnatomyofaGenbankFlatFile

    . . . Feature table with annotations

  • BioinfRes SoSe 16

    UsefulResourcesfromNCBI

    ●  Materials:●  Electronicbookshelf

    ●  hap://www.ncbi.nlm.nih.gov/educa)on/factsheets/

    ●  jp://jp.ncbi.nih.gov/pub/factsheets/Factsheet_Books.pdf

    ●  NCBImanuals

    ●  textbooks

  • BioinfRes SoSe 16

    UsefulResourcesfromNCBI

    ●  Processes,e.g.Prokaryo)cGenomeAnnota)onPipeline

    ●  designedforbacterialandarchaealgenomes●  mul)-levelprocessincludingprotein-codinggenepredic)onandfunc)onalgenomeunitlikerRNAs,tRNAs,smallRNAs,pseudogenescontrolregions,repeats,inser)onelementsa.s.f.

    ●  combina)onofab-iniDopredic)onandhomologybasedmethods

  • BioinfRes SoSe 16

    UsefulResourcesfromNCBI●  referencedatabases:RefSeq●  hap://www.ncbi.nlm.nih.gov/refseq/

    ●  comprehensive,integrated,non-redundant,well-annotatedsetofsequences,includinggenomicDNA,transcripts,andproteins

    ●  stablereferenceforgenomeannota)on,esp.subsetofRefSeqGene

    ●  referencesequences

    ●  referencecoordinates●  accessibleviaBLAST,EntrezandFTP

  • BioinfRes SoSe 16

    RefSeq●  createdby:-  Eukaryo)cGenomeAnnota)onPipeline-  Prokaryo)cGenomeAnnota)onPipeline-  Manualcura)on-  SubmissiontoINSDCmembers

    ●  reflectcurrentknowledgeofsequencesdataandbiology

    ●  formatconsistency●  Accessionnumbercontainsan“_”

  • BioinfRes SoSe 16

    RefSeqGrowth

  • BioinfRes SoSe 16

    DatabasesAccessibleviaEntrez

    http://www.ncbi.nlm.nih.gov/gquery/

  • BioinfRes SoSe 16

    Computa)on:BlastatNCBI

  • BioinfRes SoSe 16

  • BioinfRes SoSe 16

  • BioinfRes SoSe 16

  • BioinfRes SoSe 16

  • BioinfRes SoSe 16

    SearchingtheNCBI/Entrez●  provideanintegratedsearchinterfacetothedifferentNCBIdatabases:EntrezProgrammingU)li)es(E-u)li)es)

    ●  Base-URL:hap://eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/

    ●  >40databases

    ●  stableinterfaceofnineserver-sideprograms

    ●  hap://www.ncbi.nlm.nih.gov/books/NBK25501/

  • BioinfRes SoSe 16

    EntrezGuidelines●  ifyouusetheeu)lsagainsttheguidelinesyoumightbebanned!

    ●  >100requests:weekendsoroutsideUSpeak)mes(9pm-5am,EST)

    ●  notmorethan3requestpersecond

    ●  provideemailandtoolname:&tool=&email=!

    ●  registra)onwithemailandtoolnamewithNCBImayrelaxtheserestric)ons

    ●  supportedbyBioPython

  • BioinfRes SoSe 16

    Construc)ngURLs

    ●  parameter:&lowerCaseName●  excep)on:&WebEnv

    ●  norequiredorder

    ●  nullvaluesandinappropriateparameteraregenerallyignored

    ●  nospaces,use+instead

    ●  useURLencodingsforspecialcharacterlike:%22for“or%23for#or%40for@

  • BioinfRes SoSe 16

    E-u)li)es●  Einfo●  Esearch

    ●  EPost

    ●  ESummary●  EFetch

    ●  ELink

    ●  EGQuery

    ●  ESpell●  ECitMatch

  • BioinfRes SoSe 16

    ESearch

    ●  textsearch●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esearch.fcgi

    ●  respondstoatextquerywiththelistofmatchingUIDsinagivendatabase(forlateruseinESummary,EFetchorELink),alongwiththetermtransla)onsofthequery

  • BioinfRes SoSe 16

    ESummary

    ●  documentsummarydownloads●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esummary.fcgi

    ●  respondstoalistofUIDsfromagivendatabasewiththecorrespondingdocumentsummaries

  • BioinfRes SoSe 16

    EGQuery

    ●  globalquery●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/egquery.fcgi

    ●  respondstoatextquerywiththenumberofrecordsmatchingthequeryineachEntrezdatabase

  • BioinfRes SoSe 16

    EInfo

    ●  databasesta)s)cs●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/einfo.fcgi

    ●  providesthenumberofrecordsindexedineachfieldofagivendatabase,thedateofthelastupdateofthedatabase,andtheavailablelinksfromthedatabasetootherEntrezdatabases

    ●  without&db:listsallavailabledatabases

  • BioinfRes SoSe 16

    EFetch

    ●  datarecorddownloads●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/efetch.fcgi

    ●  respondstoalistofUIDsinagivendatabasewiththecorrespondingdatarecordsinaspecifiedformat

  • BioinfRes SoSe 16

    ELink

    ●  Entrezlinks●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/elink.fcgi

    ●  respondstoalistofUIDsinagivendatabasewitheitheralistofrelatedUIDs(andrelevancyscores)inthesamedatabaseoralistoflinkedUIDsinanotherEntrezdatabase

  • BioinfRes SoSe 16

    ELink

    ●  checksfortheexistenceofaspecifiedlinkfromalistofoneormoreUIDs

    ●  createsahyperlinktotheprimaryLinkOutproviderforaspecificUIDanddatabase,orlistsLinkOutURLsandaaributesformul)pleUIDs

  • BioinfRes SoSe 16

    EPost

    ●  UIDuploads●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/epost.fcgi

    ●  acceptsalistofUIDsfromagivendatabase,storesthesetontheHistoryServer,andrespondswithaquerykeyandwebenvironmentfortheuploadeddataset

  • BioinfRes SoSe 16

    ESpell

    ●  spellingsugges)ons●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/espell.fcgi

    ●  retrievesspellingsugges)onsforatextqueryinagivendatabase

  • BioinfRes SoSe 16

    ECitMatch

    ●  batchcita)onsearchinginPubMed●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/ecitmatch.cgi

    ●  retrievesPubMedIDs(PMIDs)correspondingtoasetofinputcita)onstrings

  • BioinfRes SoSe 16

    Iden)ficators●  recordsareiden)fiedbyanintegerIDcalledUID●  UIDaredatabasespecificlikeGInumbers,PMIDS,MMDB-IDs

    ●  UIDareaswellinputandoutput

    ●  especiallyusefulincombina)onwiththeHistoryserver

    ●  afulldescrip)onofparametersandsyntaxcanbefoundat:hap://www.ncbi.nlm.nih.gov/books/NBK25499/

  • BioinfRes SoSe 16

    SelectedUIDsEntrez Database UID common name E-utility Database Name Books Book ID books Conserved Domains PSSM-ID cdd dbVar dbVar ID dbvar EST GI number nucest Gene Gene ID gene Genome Genome ID genome MeSH MeSH ID mesh NCBI Web Site Web Site ID ncbisearch Nucleotide GI number nuccore PubMed PMID pubmed ... ... ...

  • BioinfRes SoSe 16

    EntrezCoreEngine●  EGQuery,ESearch,andESummary●  twotasks:-  assemblealistofUIDsthatmatchatextquery(ESearch)-  retrieveabriefsummaryrecordcalledaDocumentSummary(DocSum)foreachUIDESummary)

    ●  EGQuey:globalversionofESearch●  esearch.fcgi?db=database&term=query 
esummary.fcgi?db=database&id=uid1,uid2,uid3,...!

    ●  expandedintomorecomplicatedEntrezqueries

  • BioinfRes SoSe 16

    EntrezDatabases(EInfo,EFetch,andELink)

    ●  EInfo:-  providesdetailedinforma)onabouteachdatabase-  includinglistsoftheindexingfieldsinthedatabase-  availablelinkstootherEntrezdatabases

  • BioinfRes SoSe 16

    EntrezDatabases(EInfo,EFetch,andELink)

    ●  addedvaluetotherawdata:-  supportsavarietyofdisplayformats:EFetchUIDlistsinXMLandplaintext(&retmode)foralldatabases,otherformats(&rettype)aredatabasespecific

    -  hap://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly

    -  efetch.fcgi?db=database&id=uid1,uid2,uid3 
&rettype=report_type&retmode=data_mode!

  • BioinfRes SoSe 16

    EntrezDatabases(EInfo,EFetch,andELink)

    ●  addedvaluetotherawdata:-  linkstorecordsinotherEntrezdatabasesmanifestedaslistofassociatedUIDs

    -  UIDsmustbevalidinsourcedatabase(&dbfrom)-  elink.fcgi?dbfrom=protein&db=gene&id=15718680,157427902

  • BioinfRes SoSe 16

    EntrezHistoryServer

    ●  simple:intheGUIaccessibleviatherespec)vetabs

    ●  youcanstoretemporarilysetsofUIDsasinputforlaterqueriesthroughothertools

    ●  eachlistofUIDsisspecifiedby:-  &query_key(integerlabel)-  &WebEnv(cookiestring)

  • BioinfRes SoSe 16

    Crea)onofastoredUIDlist

    ●  EPost:-  EPostcanbeuseduploadaUIDlist-  returns&query_keyand&WebEnv!

    ●  ESearch:-  storestheresultsifgiven&usehistory=y!

    ●  ELink:-  storestheresultsifgiven&cmd=neighbor_history!

  • BioinfRes SoSe 16

    UsageofstoredUIDlists●  Useofstoredlists:esummary.fcgi?db=database&WebEnv=webenv 
&query_key=key!

    ●  onewebenvironmentcanholdmul)pleresultlists

    ●  listsinthesamewebenvironmentcanbecombinedwithAND,OR,NOT

    ●  bydefaulteverycallcreatesanewenvironment

    ●  ->give&WebEnvinsubsequentcallstostorethelistsinthesamewebenvironment

  • BioinfRes SoSe 16

    SketchingPipelines

    ●  getDocSummariesorentriesforkeywordsorIDs:-  ESearch->ESummary/EFetch-  EPost->ESummary/EFetch

    ●  filter/limitarecordset:-  EPost/ELink->ESearch

    ●  moreadvancedqueries:-  ESearch->ELink->ESummary/EFetch-  EPost->ELink->ESearch->EFetch

  • BioinfRes SoSe 16

    E-u)lityWebinar

    ●  haps://www.youtube.com/watch?v=iCFVVexp30o