basics on bioinformatics lecture 2 - unina.it bioinf. (1).pdfbasics on bioinformatics lecture 2...

50
Basics on bioinformatics Lecture 2 Nunzio D’Agostino [email protected]; [email protected] Lecture 2

Upload: others

Post on 09-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Basics on bioinformatics

Lecture 2

Nunzio D’[email protected]; [email protected]

Lecture 2

Page 2: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Database or databank?

Initially

o Databank (UK)

o Database (USA)

Solution

The abbreviation db

2

Page 3: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Entity-Relationship (ER) modeling

Notation uses three main constructs:

o Data entities

Represents a set or collection of objects in the real world that share the

same properties. Person, place, object, event or concept about which data is

to be maintained.

o Attributes

Named property or characteristic of an entity

o Relationships

Association between the instances of one or more entity typesAssociation between the instances of one or more entity types

Relationships can be classified as either

one – to – one 1�1one – to – many 1�Nmany – to –many N�N

Connectivity

3

Page 4: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

1 : N

Cardinality

1 : 1

4

N : M

Page 5: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

ER example

5

Page 6: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

database: basic structure

Databases are composed of tables of data.

Gi Accession Length Cultivar Dev.stag Tissue sequence

30320090 CD003352 356 -Turning stage

of fruit ripeningPericarp GTACTCCTAAAC…..

15195408 BI421671 492 TA496 25-40 days old callus CCACAACCACA…..

50892290 AJ784669 346West Virginia

106

8 days post

anthesisfruit CAAATTTA…..

Databases are composed of tables of data.

Tables hold logically related sets of data. A table is essentially

the same thing as a spreadsheet: a set of rows and columns

6

Page 7: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

database: basic structure

Gi Accession Length Cultivar Dev.stag Tissue sequence

30320090 CD003352 356 -Turning stage

of fruit ripeningPericarp GTACTCCTAAAC…..

15195408 BI421671 492 TA496 25-40 days old callus CCACAACCACA…..

50892290 AJ784669 346West Virginia

106

8 days post

anthesisfruit CAAATTTA…..

Each table has several records or entries : Each table has several records or entries :

a record stores all the information for a given individual

Records are the rows of a data table

7

Page 8: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

database: basic structure

Gi Accession Length Cultivar Dev.stag Tissue sequence

30320090 CD003352 356 -Turning stage

of fruit ripeningPericarp GTACTCCTAAAC…..

15195408 BI421671 492 TA496 25-40 days old callus CCACAACCACA…..

50892290 AJ784669 346West Virginia

106

8 days post

anthesisfruit CAAATTTA…..

Each record has several fields:Each record has several fields:

A field is an individual piece of data, a single attribute of the

record.

Fields are the columns of a data table

8

Page 9: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

database: basic structure

Gi Accession Length Cultivar Dev.stag Tissue sequence

30320090 CD003352 356 -Turning stage

of fruit ripeningPericarp GTACTCCTAAAC…..

15195408 BI421671 492 TA496 25-40 days old callus CCACAACCACA…..

50892290 AJ784669 346West Virginia

106

8 days post

anthesisfruit CAAATTTA…..

Each record (row) has a unique identifier, the primary key.Each record (row) has a unique identifier, the primary key.

the primary key serves to identify the data stored in this

record across all the tables in the database.

Databases are manipulated with a language called SQL (Structured

Query Language). It’s a “baby English” type of language: uses real

words, but rigid in terms of the order and placement.

Various database software: Oracle, MS Access, MySQL, etc.9

Page 10: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Why biological databases?

oMake biological data available to scientistsConsolidation of data (gather data from different sources)Provide access to large dataset that cannot be publishedexplicitly (genome, …)

oMake biological data available in computer-readable formatMake data accessible for automated analysisMake data accessible for automated analysis

10

Page 11: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Biological db

o Vary in size, quality, coverage, level of interest

o Many of the major ones covered in the annual Database Issue of

Nucleic Acids Research

11

2010

Page 12: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Biological db

12

Page 13: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Biological db

13

Page 14: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

What makes a good db?

o comprehensiveness

o accuracy

o is up-to-date

o good interface

o batch search/download

o API (web services, DAS, etc.)

14

Page 15: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

“must have” item when using db

o Remember the server, the database, and the program

version used

o Write down sequence identification numbers

o Databases are not like good wine

(use up-to-date builds)

o Use local installs when it becomes necessary15

Page 16: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Primary and derived data

Primary databases:

Databases consisting of data derived experimentally such as

nucleotide sequences and three dimensional structures.

Secondary databases:

Those data that are derived from the analysis or treatment ofThose data that are derived from the analysis or treatment of

primary data

16

Page 17: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Nucleotide sequence databases

GenBank www.ncbi.nlm.nih.gov/GenBank

17

www.ebi.ac.uk/emblwww.ddbj.nig.ac.jp

The 3 databases are synchronized on a daily basis, and the accessionnumbers are consistent.

There are no legal restriction in the usage of these databases.However, there are some patented sequences in the database

Page 18: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

GenBank sample record

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.htmlLOCUS AF115338 591 bp DNA linear BCT 19-AUG-1999 DEFINITION Pseudomonas fluorescens ECF sigma factor SigX (sigX) gene, complete cds. ACCESSION AF115338 VERSION AF115338.1 GI:4959391 KEYWORDS . SOURCE Pseudomonas fluorescens. ORGANISM Pseudomonas fluorescens Bacteria; Proteobacteria; gamma subdivision; Pseudomonadaceae; Pseudomonas. REFERENCE 1 (bases 1 to 591) AUTHORS Brinkman,F.S., Schoofs,G., Hancock,R.E. and De Mot,R. TITLE Influence of a putative ECF sigma factor on expression of the major outer membrane protein, OprF, in Pseudomonas aeruginosa and Pseudomonas fluorescens JOURNAL J. Bacteriol. 181 (16), 4746-4754 (1999) MEDLINE 99369842 PUBMED 10438740 REFERENCE 2 (bases 1 to 591) AUTHORS De Mot,R. TITLE Direct Submission JOURNAL Submitted (04-DEC-1998) F.A. Janssens Laboratory of Genetics,

headertitle

taxonomy

citation

18

JOURNAL Submitted (04-DEC-1998) F.A. Janssens Laboratory of Genetics, Applied Plant Sciences, K. Mercierlaan 92, Heverlee B-3001, Belgium FEATURES Location/Qualifiers source 1..591 /organism="Pseudomonas fluorescens" /strain="M114" /db_xref="taxon:294" gene 1..591 /gene="sigX" CDS 1..591 /gene="sigX" /codon_start=1 /transl_table=11 /product="ECF sigma factor SigX" /protein_id="AAD34329.1" /db_xref="GI:4959392" /translation="MNKAQTLSTRYDPRELSDEELVARSHTELFHVTRAYEELMRRYQ RTLFNVCARYLGNDRDADDVCQEVMLKVLYGLKNLEGKSKFKTWLYSITYNECITQYR KERRKRRLMDALSLDPLEEASEEKALQPEEKGGLDRWLVYVNPIDRGILVLRFVAELE FQEIADIMHMGLSATKMRYKRALDKLREKFAGETET" BASE COUNT 157 a 133 c 170 g 131 t ORIGIN 1 atgaataaag cccaaacgct atccacgcgc tacgaccccc gcgagctctc tgatgaggag 61 ttggtcgcgc gctcgcatac cgagcttttt cacgtaacgc gcgcctatga agaactgatg 121 cggcgttacc agcgaacatt atttaacgtt tgtgcgagat atcttgggaa cgatcgcgac 181 gcagacgatg tctgtcagga agtcatgttg aaggtgctgt atggcctgaa gaacctcgag 241 gggaaatcga agttcaaaac gtggctctac agcatcacgt acaacgaatg tattacgcag 301 tatcggaagg aacggcgaaa gcgtcgcttg atggacgcat tgagtcttga ccccctcgag 361 gaagcgtccg aagaaaaggc gcttcaaccc gaggagaagg gcgggcttga tcgctggctg 421 gtgtatgtga acccgattga ccgtggaatt ctggtgcttc gatttgtcgc agagctggaa 481 tttcaggaga tcgcagacat catgcacatg ggtttgagtg cgacaaaaat gcgttacaaa 541 cgtgctctag ataaattgcg tgagaaattt gcaggcgaga ctgaaactta g

features

sequence

Page 19: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Protein sequence database

The mission of UniProt is to provide the

scientific community with a comprehensive,

high-quality and freely accessible resource of

protein sequence and functional information.

UniprotKB Knowledgebase

is the central hub for the collection of functional information on proteins, with accurate,

consistent and rich annotation.

Swiss-Prot, which is

manually annotated

and reviewed.

TrEMBL, which is

automatically annotated

and is not reviewed.

The UniProt Reference

Clusters (UniRef), which is

used to speed up sequence

similarity searches.

19

Page 20: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

UniProt entry

20

Page 21: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Protein data bank

The PDB archive contains information about experimentally

determined structures of proteins, nucleic acids, and complex

assemblies. (XrayXray,, NMR,NMR, ComputationallyComputationally predictedpredicted)

Mission: maintain a single archive of macromolecular structural data that is freely

and openly available to the global community

Number of Structures Available

21

Page 22: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

PDB entry

22

Page 23: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Protein structure levels

23

Page 24: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

The gene Ontology (GO)

GO goals

The GO Website http://www.geneontology.org 24

Page 25: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

The gene Ontology (GO)

GO is divided in 3 domain (levels of annotation):

o Molecular function - basic activities of a gene product atthe molecular level

o Biological process - set of molecular events with a definedbeginning and an endbeginning and an end

o Cellular component - the parts of a cell or its extracellularenvironment

25

Page 26: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

GO structure

nucleus chromosome mitochondrion

The structure of GO can be described in terms of direct acyclic graph (DAG), where each

GO term is a node, and the relationships between the terms are arcs between the nodes

Is_a

part_of part_of

Nuclear chromosome mitochondrial chromosome

GO currently has 2 relationship types:Is_a

An is_a child of a parent means that the child is a complete type of its parent, but can be discriminated in some way from other children of the parent.

Part_ofA part_of child of a parent means that the child is always a constituent of the parent that in combination with other constituents of the parent make up the parent.

26

Page 27: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Searching for papers

http://www.ncbi.nlm.nih.gov/pubmedhttp://scholar.google.com/

http://www.scopus.com/home.url

http://portal.isiknowledge.com/

27

Page 28: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

http://www.ncbi.nlm.nih.gov/sites/gquery

Search from the Entrez main page the gene whose accession

number is BC043443.

o How many results we get in the Gene db?

o What is the official name of the gene? Other possible

28

o What is the official name of the gene? Other possible

names?

o On which DNA strand is it located?

o How many variants of splicing it has?

o Which disease is the gene associated to?

o Is it involved in the apoptosis process?

o How long is the coding sequence of the first variant of

slicing?

Page 29: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

http://www.ncbi.nlm.nih.gov/genbank/

NG_000007

29

Page 30: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

What kind of molecule is it? Genomic DNA

30

Page 31: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

Where is locate the promoter of the gene HBB? Upstream the nucleotide 70545

31

Page 32: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

Indicate the number of exons =

Indicate the length of the second exon =

Indicate the number of introns =

Indicate the length of the first intron =

3

71039-70817 +1 = 223 nts

2

70816-70685+1 = 132 nts

32

Page 33: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

Indicate the location of the 5 'UTR =

Indicate the length of the 5 'UTR =

Indicate the location of the 3 'UTR =

Indicate the length of the 3 'UTR =

70545..70594

70594-70545 +1 = 50 nts

72019..72150

72150-72019 +1 = 132 nts

33

Page 34: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

Indicate the nucleotide positions of the start codon = 70595,70596,70597

34

Page 35: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

Download in FASTA format the sequence of the HBB gene

35

Page 36: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

70545 72150

36

Page 37: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

37

Page 38: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

>gi|28380636:70545-72150 Homo sapiens beta globin region (HBB@); and hemoglobin, beta (HBB); and hemoglobin, delta (HBD); and hemoglobin, epsilon 1 (HBE1); and hemoglobin, gamma A (HBG1); and hemoglobin, gamma G (HBG2), RefSeqGene on chromosome 11 ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGTTGGTATCAAGGTTACAAGACAGGTTTAAGGAGACCAATAGAAACTGGGCATGTGGAGACAGAGAAG ACTCTTGGGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTCTATTTTCCCACCCTTAGGCTGCTGG TGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGG CAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACT TCAGGGTGAGTCTATGGGACGCTTGATGTTTTCTTTCCCCTTCTTTTCTATGGTTAAGTTCATGTCATAG GAAGGGGATAAGTAACAGGGTACAGTTTAGAATGGGAAACAGACGAATGATTGCATCAGTGTGGAAGTCT CAGGATCGTTTTAGTTTCTTTTATTTGCTGTTCATAACAATTGTTTTCTTTTGTTTAATTCTTGCTTTCT TTTTTTTTCTTCTCCGCAATTTTTACTATTATACTTAATGCCTTAACATTGTGTATAACAAAAGGAAATA TCTCTGAGATACATTAAGTAACTTAAAAAAAAACTTTACACAGTCTGCCTAGTACATTACTATTTGGAAT ATATGTGTGCTTATTTGCATATTCATAATCTCCCTACTTTATTTTCTTTTATTTTTAATTGATACATAAT CATTATACATATTTATGGGTTAAAGTGTAATGTTTTAATATGTGTACACATATTGACCAAATCAGGGTAA TTTTGCATTTGTAATTTTAAAAAATGCTTTCTTCTTTTAATATACTTTTTTGTTTATCTTATTTCTAATA CTTTCCCTAATCTCTTTCTTTCAGGGCAATAATGATACAATGTATCATGCCTCTTTGCACCATTCTAAAG CTTTCCCTAATCTCTTTCTTTCAGGGCAATAATGATACAATGTATCATGCCTCTTTGCACCATTCTAAAG AATAACAGTGATAATTTCTGGGTTAAGGCAATAGCAATATCTCTGCATATAAATATTTCTGCATATAAAT TGTAACTGATGTAAGAGGTTTCATATTGCTAATAGCAGCTACAATCCAGCTACCATTCTGCTTTTATTTT ATGGTTGGGATAAGGCTGGATTATTCTGAGTCCAAGCTAGGCCCTTTTGCTAATCATGTTCATACCTCTT ATCTTCCTCCCACAGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

38

Page 39: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank

39

Page 40: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying GenBank: link to geneID

40

Page 41: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

41

Page 42: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

How many articles did Nunzio D’Agostino publish?

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

42

Page 43: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

43

Page 44: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

D'Agostino, Nunzio [Full Author Name] AND EST [Title/Abstract]

44

Page 45: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

D'Agostino, Nunzio [Full Author Name] AND EST [Title/Abstract]

How many of these are on the BMC Genomics Journal?

45

How many of these are on the BMC Genomics Journal?

Page 46: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

D'Agostino, Nunzio [Full Author Name] AND EST [Title/Abstract]

How many of these are on the BMC Genomics Journal?

46

How many of these are on the BMC Genomics Journal?

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author

Name] AND BMC Genomics [journal]

Page 47: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

D'Agostino, Nunzio [Full Author Name] AND EST [Title/Abstract]

How many of these are on the BMC Genomics Journal?

47

How many of these are on the BMC Genomics Journal?

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author

Name] AND BMC Genomics [journal]

How many articles do include the word “RNA-Seq” in the title?

Page 48: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

D'Agostino, Nunzio [Full Author Name] AND EST [Title/Abstract]

How many of these are on the BMC Genomics Journal?

48

How many of these are on the BMC Genomics Journal?

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author

Name] AND BMC Genomics [journal]

How many articles in PubMED do include the word “RNA-Seq” in the title?

RNA-Seq [title]

Page 49: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

D'Agostino, Nunzio [Full Author Name] AND EST [Title/Abstract]

How many of these are on the BMC Genomics Journal?

49

How many of these are on the BMC Genomics Journal?

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author

Name] AND BMC Genomics [journal]

How many articles in PubMED do include the word “RNA-Seq” in the title?

RNA-Seq [title]

How many reviews have been published in 2008 containing the word

"transcriptome”?

Page 50: Basics on bioinformatics Lecture 2 - unina.it Bioinf. (1).pdfBasics on bioinformatics Lecture 2 Nunzio D’Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Database

How many articles did Nunzio D’Agostino publish?

Querying PUBMEDhttp://www.ncbi.nlm.nih.gov/pubmed

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author Name]

How many of these are releted to EST?

D'Agostino, Nunzio [Full Author Name] AND EST [Title/Abstract]

How many of these are on the BMC Genomics Journal?

50

How many of these are on the BMC Genomics Journal?

D'Agostino, Nunzio [Full Author Name] OR D Agostino, Nunzio [Full Author

Name] AND BMC Genomics [journal]

How many articles in PubMED do include the word “RNA-Seq” in the title?

RNA-Seq [title]

How many reviews have been published in 2008 containing the word

"transcriptome”?

transcriptome [title] AND review [Publication Type] AND 2008[publication date]