modeling molecular evolution jodi schwarz and marc smith vassar college biol/cs353 bioinformatics

Modeling molecular evolution

Jodi Schwarz and Marc SmithVassar College

Biol/CS353 Bioinformatics

Biol / CS 353 Bioinformatics• Team taught Biol and CompSci course• 7 students:

– CS experience: 3 yes, 4 no– Bio experience: 5 yes, 2 no

• Project-based course; no exams• Worked in Biol/CS pairs on projects• I3U near end of course; last project before

independent research projects

Common approach for all projects• Biological question• Algorithm design

– Step-by-step approach to complete a task or solve the problem

• Implementation– The actual programming “script” that will carry

out the steps of the algorithm

• Evaluation of implementation and algorithm• Revision or augmentation

I3U: added an experimental component to our basic approach

• Previous projects focused on pattern finding, mining whole genome data

Goal of I3U:• Model a biological/evolutionary process• Test the model with empirical data• Perform computational experiments

Model molecular evolution• Step 1: model the effect of random vs targeted

nucleotide substitutions on a protein sequence– What do we mean by random?– determine the similarity of the original protein

sequence to the “evolved” sequence

• Step 2: Assess the real nt diversity at positions 1, 2, 3 of codons in real homologs (HSP70)– Construct alignment of homologs and determine nt

diversity at each position

• Evaluate the models using the empirical data

Learning goals• CS students: To apply their knowledge of data

structures and algorithms to a biological domain

• Biology students: To apply their knowledge of the biology to design algorithms

• For the collaboration: – To become familiar with modeling a biological process: a

simple model must be constructed and tested first– To test the model using empirical data

Assessment• Assignments

– Alignment assignment– 2 Perl scripts

• Model random vs targeted substitution pattern• Determine the codon nt diversity in HSP70 genes

– Output from the 2 Perl scripts• Raw output• Graphs summarizing data

• Observation– Collaboration– Critical thinking

Random substitutions substitutions targeted to 3rd psn

Example student resultsEffect of random vs targeted substitutions on a protein sequence(compared the “ancestral” sequence to the “evolved” sequence)

100 runs

Example student results of empirical data

Average diversity by nucleotide position within codons:

Codon position 1: 1.50 Codon position 2: 1.29 Codon position 3: 2.32

Most variation occurs in position 3

Collaboration across disciplines• How we tried to teach collaboration:

– We defined the meaning of collaboration• CS students do not need to become biologists and vice versa• Each person contributes a different set of expertise• Learning how to speak each other’s language• Communication

– We modeled it • Overt reliance on each other’s expertise• Spontaneous discussions

– Giving students lots of experience collaborating: several shifts in pairs over the semester

Assessment of collaborationAttitude: reluctant vs eager

At beginning (self) vs. during project (experience)

Gradational Assessment of Collaboration

Score Self Experience0 reluctant avoided1 eager problems2 reluctant positive3 eager positive

Student Score Team Score TeamsA 0 2 A+CB 1 4 B+FC 2 6 E+GD 3E 3 3 D worked aloneF 3G 3

1 how a genomics approach crosses levels of biological organization2 how genomic-level science is conducted3 how computational approaches are deployed to answer genomic questions?4 how to find potential functional /evolutionary patterns in DNA/protein sequence 5 independently use bioinformatic tools to address biological/genomic questions.6 examine the output of a bioinformatic analysis and relate it to a biological question.

7 provide one or more clear examples of how genomics uses an interdisciplinary approach

Most improvement: questions that are explicitly bioinformatic Least: questions that are more broadly about genomics (CS)

LikertScale (1-5)

What worked well

• Overall approach was great: question, algorithm, implementation, analysis, iteration

• Use of starter code allowed students to– Undertake much more sophisticated projects– see examples of more advanced algorithm/code

• Encountering unanticipated results and problems– Gaps in alignments not in groups of 3– Spontaneous discussions leading to AHA moments

• Students enjoyed the modeling process– One student’s final project focused on modeling molecular

evolution

What didn’t work as well• Some collaborations are not successful

• Ran out of time: insufficient analysis and reflection

• For the I3U: Assessment strategy not well developed– Can we retroactively extract more informative

assessment?

Assessing biology knowledge

• Algorithm development– Ability to help partner understand different

mutation vs selection– Ability to recognize assumptions of model– Ability to use the empirical data to evaluate model

Assessing the CS• Variables

– Abstraction: representing information as data– Types of data: predefined, atomic, aggregate– Scope: declaration, initialization, mutation

• Algorithms– Control flow: unconditional, conditional, repetition– Input/Output and regex (pattern matching) – Top-down design: subroutines– To reuse or not to reuse (code)?

• Incremental development / experimentation• Elegance: readability and maintainability

• Biological question– What pattern of nucleotide substitution occurs in protein-coding genes?

• Algorithm– What does we know about mutation, nt/AA sequences?– Assumptions

• Implementation– Instructors provided “starter code”– Students read and ran the code to see what it did– Pairs discussed how to add and refine it, and did so

• Evaluation – Analyze the CS: Did it run and did it do the job we asked?– Analyze the biology: Did it accurately represent the biological process?

• Testing the models against empirical evidence– Aligned HSP70 genes and evaluated the pattern of substitution

• Which model most closely matched the biology?

modeling molecular evolution jodi schwarz and marc smith vassar college biol/cs353 bioinformatics

Documents