cacao biocurator training
DESCRIPTION
CACAO Biocurator Training. CACAO Fall 2011. CACAO. Syllabus What is CACAO & why is it important? Training Examples. Mutualistic Relationship. We want you to get experience with: CRITICALLY reading scientific papers Bioinformatics resources Collaborating with other biocurators - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/1.jpg)
CACAO Biocurator Training
CACAO Fall 2011
![Page 2: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/2.jpg)
CACAO
• Syllabus
• What is CACAO & why is it important?
• Training
• Examples
![Page 3: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/3.jpg)
Mutualistic Relationship
• We want you to get experience with: 1. CRITICALLY reading scientific papers 2. Bioinformatics resources3. Collaborating with other biocurators4. Synthesizing functional annotations
• We want to get high quality functional annotations to contribute back to the GO Consortium and other biological databases
![Page 4: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/4.jpg)
What is an annotation?
Hint: try looking for a definition on Wikipedia.
![Page 5: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/5.jpg)
What is a functional annotation?
• Process of attaching information from the scientific literature to proteins
![Page 6: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/6.jpg)
Growing need for functional annotations
• Advances in DNA sequencing mean lots of new genomes & metagenomes
![Page 7: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/7.jpg)
Classic MODel
Literature
Datasets
Curators(rate limiting)
Database
![Page 8: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/8.jpg)
Classic MODel is Expensive
YIKES!
![Page 9: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/9.jpg)
Growing need for high quality functional annotations
• High quality annotations allow us to infer the function of genes
• Which allows us to understand the capabilities of genomes and understand the patterns of gene expression
![Page 10: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/10.jpg)
Two problems meet
How can we get more curators
with finite budgets?
How can we incorporate more
critical analysis intoundergraduate
education?
![Page 11: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/11.jpg)
What does a functional annotation have to do with this course?
• Process of attaching information from the scientific literature to proteins
• CACAO will teach you to become a biocurator– you will be adding functional annotations to the biological database GONUTS
(http://gowiki.tamu.edu)
![Page 12: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/12.jpg)
CACAO
Community
Assessment- How well can
Community - you (with our coaching)
Annotation with- assign gene functions
Ontologies- using GO?
![Page 13: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/13.jpg)
Can students become biocurators? YES!
Spring 2010 Fall 2010 Spring 2011
Institutions TAMU TAMU
UCL
TAMU
Miami (Ohio)
N. Texas
Penn State
Mich. State
Rounds 1 round 4 rounds 5 rounds
Annotations* / Submitted
118/153 496/753 726/1013
1340 GO annotations in 2 & 1/2 semesters!
![Page 14: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/14.jpg)
Functional annotation with Gene Ontology
• Controlled vocabulary with – Term identifiers
• GO:0000075
– Name• cell cycle checkpoint
– Definitions• "A point in the eukaryotic cell cycle where
progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN:0815316194]
– Relationships• is_a GO:0000074 ! regulation of progression
through cell cycle
• Terms arranged in a Directed Acyclic Graph (DAG)
![Page 15: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/15.jpg)
Why use Ontologies?
• Standardization• facilitate comparison across systems• facilitate computer based reasoning systems
– Good for data mining!
• leading functional annotation ontology = Gene Ontology (GO)
![Page 16: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/16.jpg)
What is GO? Who is the GO Consortium (GOC)?
• GO = ~30,000 terms for gene product attributes
1. Molecular Function (enzyme activity)
2. Biological Process (pathways)
3. Cellular Component (parts of the cell)
• GO Consortium - set of biological databases that are involved in developing GO and contributing GO annotations
![Page 17: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/17.jpg)
Cellular Component
• where a gene product acts
![Page 18: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/18.jpg)
Molecular Function
• activities or “jobs” of a gene product
glucose-6-phosphate isomerase activity
figure from GO consortium presentations
![Page 19: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/19.jpg)
Biological Process
• a commonly recognized series of events
cell divisionFigure from Nature Reviews Microbiology 6, 28-40 (January 2008)
![Page 21: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/21.jpg)
Search for GO terms on GONUTS
http://gowiki.tamu.edu
![Page 22: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/22.jpg)
Which subontology (MF, BP or CC) would the following terms fit in?
GO:0003909 DNA ligase activity
GO:0071705 Nitrogen compound transport
GO:0007124 Pseudohyphal growth
GO:0015123 Acetate transmembrane transporter activity
GO:0071514 Genetic imprinting
GO:0005773 Vacuole
GO:0000312 Plastid small ribosomal subunit
![Page 23: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/23.jpg)
Questions?
1. You will be making functional (GO) annotations using GO terms.
2. You can search for GO terms on GONUTS.
What do we know so far?
![Page 25: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/25.jpg)
Why are we using GONUTS?
• Students can add functional annotations to proteins.
• It has all the GO terms in it, too.• Some of the GO terms have usage notes. • It works a lot like Wikipedia, so it’s familiar.• It has the ability to keep track of each student’s
and team’s annotations.• We run it.
http://gowiki.tamu.edu
![Page 26: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/26.jpg)
REQUIRED parts of a GO annotation
GO
http://gowiki.tamu.edu/wiki/index.php/ECOLI:LPOB
** I will cover this again!!
![Page 27: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/27.jpg)
Parts of a GO annotation (cont)
Evidence code
![Page 28: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/28.jpg)
Parts of a GO annotation (cont)
Reference Notes (about evidence)
![Page 29: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/29.jpg)
Questions?
1. You will be making functional (GO) annotations using GO terms.2. You can search for GO terms on GONUTS.
3. You will be adding your GO annotations to GONUTS.4. There are 4 required parts to a GO annotation.5. You have to base your annotation on an experiment
published in a scientific paper.
What do we know so far?
![Page 30: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/30.jpg)
Next week
• Review of GO & GO annotations
• More biocurator training– lots of examples– lots of practice
BICH 485 & 689 students - please stick around to talk about these courses!
![Page 31: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/31.jpg)
Plan for training
1. Synthesizing GO annotations
2. Refinements
3. Judging & Assessment
4. Individual & Team tracking
![Page 32: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/32.jpg)
Part 1: Synthesizing GO annotations
![Page 33: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/33.jpg)
What can you annotate?
• Proteins. – Any protein with a record in UniProt (Universal Protein Resource -
http://uniprot.org)
• How can you find proteins to annotate?– Think of ways to identify a protein or paper to annotate
![Page 34: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/34.jpg)
Choosing a protein to annotate
1. randomly2. topics of interest (ie efflux pump proteins, biofilms, marine biology)3. papers you have come across while doing other stuff4. methods you know or want to learn5. phenotypes and mutants you are interested in6. by author7. by pathway or regulon 8. suggested by another
- high ratio of IEA:manual annotations in GONUTS- mentioned in another class
9. current paper mentions another gene product10. review papers (ie Annual Reviews are excellent sources)11. Uniprot, GONUTS, WikiPathways, PubMed searches12. protein annotated by other teams13. ask a coach
![Page 35: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/35.jpg)
Search for GO terms on GONUTS
http://gowiki.tamu.edu
![Page 36: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/36.jpg)
Practice
1. What is the GO term for GO:0004713?
2. What is the GO identifier for mitosis?
3. How many results (ballpark) do you get when you search for cell division using the Go, Search or G buttons?
4. How many child terms are there for plasma membrane? How many grandchildren?
5. What term is the parent of GO:006825?
http://gowiki.tamu.edu
![Page 37: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/37.jpg)
Finding a scientific paper on a certain protein
• Has to be a scientific paper with experimental data in it.– Anything else is a valid reason to challenge!
• PubMed, PubMed Central, GoogleScholar…• No review articles• no books, textbooks, wikipedia articles, class
notes…• You will need the PMID number
![Page 38: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/38.jpg)
Practice - searching PubMed
1. How many papers do you get when you search for “coli”?2. How many of those papers are reviews?3. What is the title of the oldest paper when you search for “coli AND
RNA polymerase”?4. How many results are there when you search for “GTPase activity
and Gene Ontology”?5. What is the PMID of the paper when you search for “Hu JC AND
coli AND lysR AND 2010”?
http://pubmed.org
![Page 39: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/39.jpg)
Why do we annotate on GONUTS?
• UniProt (Universal Protein Resource) will not let us annotate protein records on their site.
• They are a professionally-curated & closed database.
• GONUTS will.• GONUTS pulls the info from the UniProt record when it
makes a page for you to edit.
![Page 40: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/40.jpg)
• UniProt - http://www.uniprot.org
• UniProt is not community edited, so we can’t add annotations directly to their database
Making a protein page on GONUTS requires a UniProt accession
![Page 41: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/41.jpg)
Practice - Searching UniProt
Find the UniProt accessions for:a) Mouse Lsr proteinb) Diptheria toxin from Corynebacteriumc) mutS from E. coli K-12
http://uniprot.org
![Page 42: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/42.jpg)
How do you make a new gene page in GONUTS?
1 2
• Use a UniProt accession to make a page on GONUTS that you can add your own annotations to.
• GoPageMaker will:- Check if the page exists in GONUTS & take you there if it does.- Make a page & pull all of the annotations from UniProt into a table that you can edit.
![Page 43: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/43.jpg)
Practice
1. How many annotations are on the page for the p53 protein from humans?
2. How many different evidence codes are there on the page for the Bub1a protein from mice?
3. Give one of the paper identifiers for an annotation for the LpxK protein from E. coli.
http://gowiki.tamu.edu
![Page 44: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/44.jpg)
Questions?
1. You will be making functional (GO) annotations using GO terms.2. You can search for GO terms on GONUTS.3. You will be adding your GO annotations to GONUTS.4. There are 4 required parts to a GO annotation.5. You have to base your annotation on an experiment published in a
scientific paper.
6. You can annotate any protein with a record in UniProt.
7. You have to make a page in GONUTS for your protein using the UniProt accession.
What do we know so far?
![Page 45: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/45.jpg)
What are evidence codes?
• Describe the type of work or analysis done by the authors
• 5 general categories of evidence codes:1. Experimental2. Computational3. Author Statement4. Curator Assigned5. Automatically assigned by GO
![Page 46: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/46.jpg)
• Describe the type of work or analysis done by the authors• 5 general categories of evidence codes:
1. Experimental2. Computational3. Author Statement4. Curator Assigned5. Automatically assigned by GO
• CACAO biocurators may only use certain experimental and computational evidence codes
What are the evidence codes?
![Page 47: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/47.jpg)
Experimental Evidence Codes
• IDA: Inferred from Direct Assay• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• IPI: Inferred from Physical Interaction• EXP: Inferred from Experiment
![Page 48: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/48.jpg)
Experimental Evidence Codes
• IDA: Inferred from Direct Assay• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• IPI: Inferred from Physical Interaction• EXP: Inferred from Experiment
http://geneontology.org/GO.evidence.shtml
![Page 49: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/49.jpg)
Computational Evidence Codes
• ISS: Inferred from Sequence or Structural Similarity• ISO: Inferred from Sequence Orthology• ISA: Inferred from Sequence Alignment• ISM: Inferred from Sequence Model• IGC: Inferred from Genomic Context• IBA: Inferred from Biological Aspect of Ancestor• IBD: Inferred from Biological Aspect of Descendant• IKR: Inferred from Key Residues• IRD: Inferred from Rapid Divergence• RCA: Inferred from Reviewed Computational Analysis
http://geneontology.org/GO.evidence.shtml
![Page 50: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/50.jpg)
Computational Evidence Codes
• ISS: Inferred from Sequence or Structural Similarity• ISO: Inferred from Sequence Orthology• ISA: Inferred from Sequence Alignment• ISM: Inferred from Sequence Model• IGC: Inferred from Genomic Context• IBA: Inferred from Biological Aspect of Ancestor• IBD: Inferred from Biological Aspect of Descendant• IKR: Inferred from Key Residues• IRD: Inferred from Rapid Divergence• RCA: Inferred from Reviewed Computational Analysis
http://geneontology.org/GO.evidence.shtml
![Page 51: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/51.jpg)
Summary of Evidence Codes for CACAO
• IDA: Inferred from Direct Assay• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• ISO: Inferred from Sequence Orthology• ISA: Inferred from Sequence Alignment• ISM: Inferred from Sequence Model• IGC: Inferred from Genomic Context
• If it’s not one of these 8, your annotation is incorrect!!!
![Page 52: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/52.jpg)
Required parts (for every annotation)
GO:0004713
PMID:1111
IDA: Inferred from direct assay
Figure 2a
![Page 53: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/53.jpg)
What you might also have to fill in
http://geneontology.org/GO.evidence.shtml
![Page 54: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/54.jpg)
Questions?
1. You will be making functional (GO) annotations using GO terms.2. You can search for GO terms on GONUTS.3. You will be adding your GO annotations to GONUTS.4. There are 4 required parts to a GO annotation.5. You have to base your annotation on an experiment published in a
scientific paper.
6. You can annotate any protein with a record in UniProt.
7. You have to make a page in GONUTS for your protein using the UniProt accession.
What do we know so far?
![Page 55: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/55.jpg)
Practice - Identify the problem annotation(s) & why
1. GO:0003674 PMID:20372022 IDA: Inferred from Direct Assay Table 2. 2. GO:0016985 PMID:20372022 IMP: Inferred from Mutant Phenotype Table 2. 3. GO:0016985 PMID:20372022 IDA: Inferred from Direct Assay 4. GO:0016985 PMID:20372022 IDA: Inferred from Direct Assay Table 2. 5. GO:0003674 PMID:20372022 IDA: Inferred from Direct Assay Table 2.6. GO:0016985 PMID:20372002 IGI: Inferred from Genetic Interaction Table 2.7. GO:0016985 20372022 IDA: Inferred from Direct Assay Table 2.8. GO:0016985 PMID:20372002 EXP: Inferred from Experiment Table 2.
9. What is the UniProt accession of the protein described/annotated?
GO ID Reference Evidence Code Notes
![Page 56: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/56.jpg)
How is CACAO scored?
• Points for a complete annotation• GO term (right level of specificity)• Reference (paper)• Evidence code• Identify where in the paper the evidence is
• Refinements used to steal points for incorrect &/or incomplete annotations
• Identify a problem • Suggest correct alternative
• Refinements can be entered by any team (including the original team)
![Page 57: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/57.jpg)
How can you get the annotations required by Rubric #2?
1. Synthesize complete & correct annotations.
2. Correctly refine (challenge & correct) someone else’s annotation.
3. If your annotation gets challenged, offer the best correction.
![Page 58: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/58.jpg)
Summary
• You will be searching literature for experimental evidence for a protein’s function (MF), processes (BP) and location (CC)
![Page 59: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/59.jpg)
Where do annotations show up?
![Page 60: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/60.jpg)
Refinements & Challenges
![Page 61: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/61.jpg)
What can you challenge?
![Page 62: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/62.jpg)
Scoreboard
![Page 63: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/63.jpg)
Schedule
![Page 64: CACAO Biocurator Training](https://reader036.vdocument.in/reader036/viewer/2022062804/56814bd9550346895db8b321/html5/thumbnails/64.jpg)
Spring 2011 - Results by organism
0
50
100
150
200
250
Bacillus
Burkholderia
E. coli
PseudomonasSalmonella
StaphylococcusStreptococcus
Vibrio
S. cerevisiaeChlamydomonas
ArabidopsisC. elegansDrosophilahumanmouse
Olive baboon
# wrong
# change
# perfect