discrete modeling, discovery and prediction for evolving ......discrete modeling, discovery and...

Post on 26-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Discrete Modeling, Discovery and Prediction for Evolving, Living

Systems

Myra B. Cohen1, Nicole R. Buan2, Christine Kelley3, Mikaela Cashman1, Jennie L. Catlett2

1. Department of Computer Science & Engineering 2. Department of Biochemistry 3. Department of Mathematics

Motivation

vs.

Green Energy Petroleum based Fuels

Methane-producing archaea (methanogens)

•  Phylogenetically distinct group •  Derive all their energy from

reduction of C1 compounds to methane

•  4% of the global C cycle (2 Gigatons per year)*

•  Strict anaerobes

* Thauer RK. et al. 2008. Microbiology. 6:579-591.

Global C cycle

Methanogen Biotechnology

www.spaceX.com

www.sagentpharma.com

WHO Essential Medicine ~50% all chemotherapy

www.fordcngokc.com

www.mineralhq.com

Transportation Cleaner than diesel

Methanogen Biotechnology

Deer Island, MA Hyperion, CA

Lincoln, NE

Biomass Energy - Nebraska

Aliens Among As

Adapted from Pace, NR. 2009. MMBR. 73(4):565-76

Humans

E. Coli

methanogens

A Tale of two pathways… Methylotrophic Acetoclastic

Entropy-retarded Enthalpy-retarded

We can control behavior

Typical Organism Behaviors

(e.g E. coli)

First-principles reasoning? •  Methanogens are ruled by:

– Thermodynamics and biochemistry, information processing, regulation, selection, mutation, etc.

•  To date no general set of equations describes behavior and evolution that – Applies equally well to methanogens,

bacteria, eukaryotes

Dynamic •  Organisms reproduce with ~99.999% probability of

genetic information being passed to next generation •  Mutations occur which can change gene functionality •  Environment impacts the behavior:

–  Food sources –  Light –  Temperature –  Pressure – …?

Data Driven •  As these organisms grow/die within their

environment they are sensing both the environment as well as receiving messages (communicating) with other organisms in their vicinity

•  Based on what they sense they produce outputs (e.g. methane)

Models Today Chemical Reaction Networks

Reaction Networks •  Allow us to model the chemical reactions (as

PDEs) through a cell •  Based on the “whole cell model”

Physical Models •  Flux balance analysis:

–  Optimization algorithm that solves the series of reaction equations to calculate the steady-state fluxes of an organism’s reaction network

–  Can use to predict biomass based on inputs

•  Gapfilling: –  Incomplete models may have incomplete

networks and will not grow. Gapfllling fills in missing reaction pathways using mixed linear programming

Problems with Existing Models

•  Highly dependent on human annotations from empirical data

•  Infer unknown behavior from organisms that are annotated

•  Complex – difficult to reason about high level behavior

Variance of Pathways

Lieber, Catlett, Madayiputhiya, Nandukumar, Lopez, Metcalf and Buan. 2014. PLOS One. 9(9): e107563.

Application Systems

Lieber, Catlett, Madayiputhiya, Nandukumar, Lopez, Metcalf and Buan. 2014. PLOS One. 9(9): e107563.

Organisms sense, adapt

Use DDDAS?

Software (Discrete) Testing Perspective

Configurable Software

Discrete/Model Sampling

Observe Behavior

Optimize Parameters for

an objective

Pierobon, Cohen, Buan, Kelley, SCIM: Sampling, Characterization, Inference and Modeling of Biological Consortia, 2015

Methanogen Configuration Options

•  Media compounds (e.g. glucose) •  Light •  Pressure •  Temp •  Oxygen Use discrete values for sampling

Reasoning about Configurations with Coding

Theory Error correcting codes: transmit information reliably and efficiently across space/time Factor graphs •  Variable nodes represent information •  Constraint nodes represent constraints/dependencies •  Decoding and error-correction is performed via message

passing on the edges of the graph. •  Update rules of the messages at the nodes follow belief

propagation algorithm on Bayesian networks

Factor Graph

•  The input (i.e., “channel information”) to each variable node is a vector with n parameters (one for each factor)

•  Update rules are designed for each factor, and iterative decoding is performed to determine how the system behaves for various inputs

•  We can test how the system changes with modifications to certain factors

f1 f2 f3 f4

x1 x2 x3 x4 x5 x6

f(x1,x2,x3,x4,x5,x6) = f1(x1,x3,x6) f2(x2,x4) f3(x1,x5) f4(x3,x5)

µx1àf1

Configurations

ρf1àx1 ρf3àx1

inx1

Fitness methane/flux Population

by fitness

p

……

popula(on

1 2 3 n

Crossover

p

1 2 3 n

X

Mutation

p

1 2 3 n

X

p

……

popula(on

1 2 3 n

Population

DDDAS System

Sensors Evolution/Adaptation

Simulation/updating of models

Feasibility

Goals •  Evaluate models for optimization •  Use a well studied methanogen

– Methanosarcina acetivorans •  Explore a part of configuration space

contained in KBase •  Understand how well current models

describe the organism

Exploring Environment •  Iteration One (729 data points)

– 12 compounds in growth media H2O, Phosphate, CO2, NH3, Acetate, Sulfate, H+, L-Cysteine ,Co2+, Ni2+, Fe2+, H2

– Vary max flux for 6 (3 different flux values) •  Iteration Two (2187 data points)

– Two compounds that have no impact. Made constant, added 3 more –> 7 factors

Results (iteration 1)

Phosphate

1.2

4.6

Flux=1

L-Cysteine

5.1

Flux=1 Flux=10 or 100

Flux=10or100

Results (iteration 2) Acetate

.05 Flux=1

Flux=100

Phosphate

1.2

4.6

Flux=1

L-Cysteine

Flux=1

Flux=10or100

C02

4.6

Flux=10or100

5.1

.5 Flux=10

But •  We know the models are not perfect •  Still need laboratory data

Next Iterations •  Drill down on the four primary factors:

– Acetate, Phosphate, L-Cysteine and CO2 •  Use smaller flux distances •  Run generic algorithm on a large

number of flux values and more compounds

•  Validate results in lab and update model

Summary •  View biological organisms as part of a

DDDA system •  Developing techniques for discrete

sampling/modeling of their configuration space

•  Developing optimization techniques to fit into the DDDAS loop

References 1.  Thauer RK. et al. 2008. Microbiology. 6:579-591 2.  Pace, NR. 2009. MMBR. 73(4):565-76 3.  Lieber, Catlett, Madayiputhiya, Nandukumar, Lopez, Metcalf

and Buan. 2014. PLOS One. 9(9): e107563 4.  Pierobon, Cohen, Buan, Kelley, SCIM: Sampling,

Characterization, Inference and Modeling of Biological Consortia, 2015

5.  J. Swanson, M.B. Cohen, M.B. Dwyer, B.J. Garvin and J. Firestone, Beyond the Rainbow: Self-Adaptive Failure Avoidance in Configurable Systems, Foundations of Software Engineering, 2014, pp. 377-388

Acknowledgements

CCF-1161767 CNS-1205472 IOS-1449525

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies

top related