modeling molecular evolution jodi schwarz and marc smith vassar college biol/cs353 bioinformatics
TRANSCRIPT
Modeling molecular evolution
Jodi Schwarz and Marc SmithVassar College
Biol/CS353 Bioinformatics
Biol / CS 353 Bioinformatics• Team taught Biol and CompSci course• 7 students:
– CS experience: 3 yes, 4 no– Bio experience: 5 yes, 2 no
• Project-based course; no exams• Worked in Biol/CS pairs on projects• I3U near end of course; last project before
independent research projects
Common approach for all projects• Biological question• Algorithm design
– Step-by-step approach to complete a task or solve the problem
• Implementation– The actual programming “script” that will carry
out the steps of the algorithm
• Evaluation of implementation and algorithm• Revision or augmentation
I3U: added an experimental component to our basic approach
• Previous projects focused on pattern finding, mining whole genome data
Goal of I3U:• Model a biological/evolutionary process• Test the model with empirical data• Perform computational experiments
Model molecular evolution• Step 1: model the effect of random vs targeted
nucleotide substitutions on a protein sequence– What do we mean by random?– determine the similarity of the original protein
sequence to the “evolved” sequence
• Step 2: Assess the real nt diversity at positions 1, 2, 3 of codons in real homologs (HSP70)– Construct alignment of homologs and determine nt
diversity at each position
• Evaluate the models using the empirical data
Learning goals• CS students: To apply their knowledge of data
structures and algorithms to a biological domain
• Biology students: To apply their knowledge of the biology to design algorithms
• For the collaboration: – To become familiar with modeling a biological process: a
simple model must be constructed and tested first– To test the model using empirical data
Assessment• Assignments
– Alignment assignment– 2 Perl scripts
• Model random vs targeted substitution pattern• Determine the codon nt diversity in HSP70 genes
– Output from the 2 Perl scripts• Raw output• Graphs summarizing data
• Observation– Collaboration– Critical thinking
Random substitutions substitutions targeted to 3rd psn
Example student resultsEffect of random vs targeted substitutions on a protein sequence(compared the “ancestral” sequence to the “evolved” sequence)
100 runs
Example student results of empirical data
Average diversity by nucleotide position within codons:
Codon position 1: 1.50 Codon position 2: 1.29 Codon position 3: 2.32
Most variation occurs in position 3
Collaboration across disciplines• How we tried to teach collaboration:
– We defined the meaning of collaboration• CS students do not need to become biologists and vice versa• Each person contributes a different set of expertise• Learning how to speak each other’s language• Communication
– We modeled it • Overt reliance on each other’s expertise• Spontaneous discussions
– Giving students lots of experience collaborating: several shifts in pairs over the semester
Assessment of collaborationAttitude: reluctant vs eager
At beginning (self) vs. during project (experience)
Gradational Assessment of Collaboration
Score Self Experience0 reluctant avoided1 eager problems2 reluctant positive3 eager positive
Student Score Team Score TeamsA 0 2 A+CB 1 4 B+FC 2 6 E+GD 3E 3 3 D worked aloneF 3G 3
1 how a genomics approach crosses levels of biological organization2 how genomic-level science is conducted3 how computational approaches are deployed to answer genomic questions?4 how to find potential functional /evolutionary patterns in DNA/protein sequence 5 independently use bioinformatic tools to address biological/genomic questions.6 examine the output of a bioinformatic analysis and relate it to a biological question.
7 provide one or more clear examples of how genomics uses an interdisciplinary approach
Most improvement: questions that are explicitly bioinformatic Least: questions that are more broadly about genomics (CS)
LikertScale (1-5)
What worked well
• Overall approach was great: question, algorithm, implementation, analysis, iteration
• Use of starter code allowed students to– Undertake much more sophisticated projects– see examples of more advanced algorithm/code
• Encountering unanticipated results and problems– Gaps in alignments not in groups of 3– Spontaneous discussions leading to AHA moments
• Students enjoyed the modeling process– One student’s final project focused on modeling molecular
evolution
What didn’t work as well• Some collaborations are not successful
• Ran out of time: insufficient analysis and reflection
• For the I3U: Assessment strategy not well developed– Can we retroactively extract more informative
assessment?
Assessing biology knowledge
• Algorithm development– Ability to help partner understand different
mutation vs selection– Ability to recognize assumptions of model– Ability to use the empirical data to evaluate model
Assessing the CS• Variables
– Abstraction: representing information as data– Types of data: predefined, atomic, aggregate– Scope: declaration, initialization, mutation
• Algorithms– Control flow: unconditional, conditional, repetition– Input/Output and regex (pattern matching) – Top-down design: subroutines– To reuse or not to reuse (code)?
• Incremental development / experimentation• Elegance: readability and maintainability
• Biological question– What pattern of nucleotide substitution occurs in protein-coding genes?
• Algorithm– What does we know about mutation, nt/AA sequences?– Assumptions
• Implementation– Instructors provided “starter code”– Students read and ran the code to see what it did– Pairs discussed how to add and refine it, and did so
• Evaluation – Analyze the CS: Did it run and did it do the job we asked?– Analyze the biology: Did it accurately represent the biological process?
• Testing the models against empirical evidence– Aligned HSP70 genes and evaluated the pattern of substitution
• Which model most closely matched the biology?