i. introduction and red line education for data-unlimited science

24
I. Introduction and Red Line Education for Data-unlimited Science

Upload: clara-walton

Post on 03-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

I. Introduction and Red Line Education for Data-unlimited Science

Research Education

For the first time in the history of biology students can work with the same data at the same time and

with the same tools as research scientists.

Educational Challenge

Context of scientific discovery

My own suspicion is that the universe is not only queerer than we suppose, but queerer than we can suppose. J.B.S. Haldane, Possible Worlds and Other Essays (1927)

50-70

46

28

25

13

14

9

150-300

Monocots

Dicots

Time (million years) Present204060

Oryza (rice)

Avena (oats)

Hordeum (barley)Triticum (wheat)

Setaria (foxtail millet)

Pennisetum (pearl millet)Sorghum

Zea (maize)

Arabidopsis

Brachypodium

Glycine max (soy)

2,500 Mb

750 Mb

20,000 Mb

270 Mb

430 Mb

145 Mb

1,115 Mb

?? Mb

5,200 Mb

>20,000 Mb

?? Mb

Plant Genomes Vary Widely in Size

= Genome duplication event

Genome Duplication/Factionation

DNA Subway Concepts (Big Ideas)

• Genomes are complex and dynamic (queer).• DNA sequence is information.• DNA sequence is biological identity.• Gene annotation adds meaning to DNA sequence.• Concept of gene continues to evolve.• A genome is more than genes.

Insights from Genomics in Education Washington University, June 16-19, 2009

44 participants from three worlds and three kingdoms

• Bioinformatics: Students have limited patience for pure computer work and want a wet bench hook.

• Student-scientists partnerships: Someone has to care about the data generated by students.

• Students as co-investigators: Projects should potentially lead to publication.

• Scale: Need to move from individual experiments to course-based and distributed research projects.

Walk or…

Ride…

DNA Subwayan educational Discovery Environment

• Simplified bioinformatics workflows• Developed with 25 collaborators at 11 institutions • Since March 2010 launch: 2,905 registered users

52,591 visits, 24,593 unique visits

• Red Line: predict and annotate genes in <150 kb• Yellow Line: identify homologs in sequenced genomes• Blue Line: analyze DNA barcodes and build gene trees• Green Line: align and analyze RNA-seq data (coming)

Red Line Learning Questions

• What is a gene and how does it relate to DNA sequence?

• What are the components of genes?• How does a gene relate to the central dogma of

molecular biology: DNA <> RNA > Protein?• How does a gene encode a protein?• How is the mathematical evidence used to predict

genes?• How does biological evidence (from RNA and proteins)

confirm gene predictions?

Genes as Beads on a String

http://www.ncbi.nlm.nih.gov/genome/guide/human/

Morgan’s Beads on a String

Human Globin Locus on Chromosome 11

Is that r

eally all t

here is

?

Human Genome Insights (ENCODE)

• Majority of genome is transcribed• ~50% transposons• ~25% protein coding genes/1.3% exons• ~23,700 protein coding genes• ~160,000 transcripts• Average Gene ~ 36,000 bp

7 exons @ ~ 300 bp6 introns @ ~5,700 bp

• 7 alternatively spliced products (95% of genes)

Piano Keys?

Keys dynamically placed by real data (features, coordinates)

• This map can allow student to appreciate some of the complexity of the genome.

• Clicking on links to sequence confirms a relationship between something called a gene and a DNA sequence.

What is a gene is and how does it relate to DNA ?

SubmitSequence

Identify & MaskRepeats

PredictGenes

SearchDatasets

 

BuildGene Models

 

ProspectGenomes

CompareAnnotations

 

(Optional) LoadUser Data

 

PredictFunction

Gene Annotation Workflow

Brent Buckner, Ph.D. Truman State University

“I have found that students are overwhelmed by their first introduction to genome sequences viewed on a genome browser. Students who used DNA Subway needed little or no guidance when they moved on to use MaizeGDB and had an easier time transitioning to genomes depicted in different genome browsers.”

DNA Subway Case StudyBrent Buckner, Ph.D., Truman State University

• Sophomore genetics class, spring 2010 and 2011– 70 students used Red Line to annotate 3.7 mbp of maize genome– 12 hours effort, each student annotated 100 kb– Follow-up research projects by 7 undergraduates:

• Compared syntenic regions of maize Chr. 6 and sorghum• 65 hours effort, each student annotated 1 million bp• MaizeGDB, MaizeSequence.org, InterProScan, CoGE, PlexDB, Circos

• Sophomore genetic class, spring 2012– 19 students used Red Line to visualize next-gen RNA-Seq data to

investigate presence/absence variation (PAV) in maize– 12 hours effort, each student group annotated 100 kb and then

imported next-gen RNA-Seq data from 5 different tissues in 30 maize inbred lines for a gene that they had previously shown exhibits PAV

Judy Brusslan, Ph.D.CSU, Long Beach

“When I used the Red Line exercise in six lab sections of my General Genetics class this Fall, it went smoothly and best of all, there was a mass “Ah-ha” moment when the results of the gene prediction programs were displayed on the Genome Browser. The use of BLASTX and BLASTN within the Red Line allowed the students to visualize the different outputs and understand the value of sequenced cDNAs for gene prediction.”