csc 599: computational scientific discovery

CSC 599: Computational Scientific Discovery

Lecture 10: The Scienceomatic Systems:

Deductive Reasoning and Model Design

Outline

History Trajectory Object Application: Exhaustive Search

Explanation Usage Object Application: Explanation Preference

Assertion Reasoning Object Application: Simulation

What Scientists Want

Tell me:“What is the prediction?”“How shall I improve this model?”“How did you get that answer?”“What shall we do next?”“What's the precision of that value?”

User sees one interface, butAssertion Reasoning Object answers

“Predict value of X at time Y”Explanation Usage Object answers

“Why is value of X at time Y equal to Z?”History Trajectory Object answers

“What is good to try next?”

Recall

History Trajectory Object

Does several things:1. Decides which operator to do next based on:

a) How successful they have been (operator id)b) Type of data (data id)c) Tacticsd) Strategy

2. Keeps track of what's been tried beforea) operator/datab) success/failurec) “by how much”d) who/when/why/etc.

3. Modifiablea) Learns best operators on for given datab) PROGRAMMABLE?!? (Under these conditions

create an operator that does this . . .)

History Trajectory Object: Application 1: Systematic Search

Exhaustive Search: simplest to more complex Plays to strengths of computers Examples:

MECHEM Inductive Process Modeling

History Trajectory Object can do this:1. Queries Assertion Usage Object for assertions2. Manipulates assertions to build more complex

ones

History Trajectory Object: Systematic Search (2)

Idea: There are 3 processes total. Two (leaf1 and leaf2) are primitive. The third (node2) can be made arbitrarily more

complex by taking a type1 process as 1st parameter and type2 process as 2nd.

leaf1 is type1; leaf2 and node2 are type2

Program (In Prolog):process1(leaf1).process2(leaf2).process2(node2(P1,P2)) :- process1(P1), process2(P2).

History Trajectory Object: Systematic Search (3)

Program generates process templates“Template” means model of same structure

parameters have not yet been computed (by calculus, simulated annealing, etc.)

Generated from simplest -> increasingly more complex

Output:?- process2(A).A = leaf2 ;A = node2(leaf1, leaf2) ;A = node2(leaf1, node2(leaf1, leaf2)) ;A = node2(leaf1, node2(leaf1, node2(leaf1, leaf2))) etc.

History Trajectory Object: Implementation

This algorithm is fundamentally same as MECHEM and Inductive Process Modeling

One algorithm that generates model templates! Calls numeric program to do parameter fitting Can do both MECHEM and IPM!

Making it efficient Like MECHEM rely on domain knowledge

“Reactions that form pure Carbon are unlikely” Like IPM rely on object type information

“Rabbits are prey, coyotes are predators or prey”Implement in:

Prolog? (Scienceomatic)Lisp? (MECHEM, IPM re-write)ML? Haskell? Anything else?

Explanation Usage Object

Sample of important methods: Predict object1's attribute attribute1

Satisfy with assertion usage object Satisfy with solved problem library

Philosophy of science justification Kuhnian exemplar: what scientists do

Artificial Intelligence justification: EBL: cheaper than de novo reasoning

Give trace why object object1's attribute attribute1 is value value1.

Give trace how assertion assertion1 is justified (e.g. derived)

Refine reasoning method“I like traces like this over traces like that because . . .”

Explanation Usage Object: Application 2: Explanation

Preference Default behavior:

Favor shallowest explanation(more on details of this later)

ProblemShallowest may not be most correct!

We've seen this before with MECHEM: Computer scientist's “best mechanism” means

shortest syntax Chemist's “best mechanism” means least energetic

rate determining step

Explanation Usage Object: Explanation Preference (2)

falling_without_drag:F = mg

falling_with_drag:F = mg - 0.5 * * v2 * A * Cdwhere:

= fluid's (e.g. air's) densityv = velocityA = object's areaCd = Drag coefficient

Explanation Usage Object: Explanation Preference (3)

Scienceomatic would can compute both answers falling_without_drag answer might be

returned first if uses less deep explanation tree

Can tell Scienceomatic preferences:“Prefer falling_with_drag answer before falling_without_drag answer”

Explanation Usage Object: Implementation

An explanation datastructure is a tree Lisp, ML, Haskell most general Prolog acceptable

May need Prolog's “2nd order” predicates:=.., functor, arg

Example:?- f(a,b) =.. L.L = [f, a, b] ;

C/C++/Java/C#Possible (of course) but may not be natural

Your ideas?

Assertion Reasoning Object

Sample of important methods: Retrieve assertion assertion1

Show assertion Edit assertion

Predict object object1's attribute attribute1 Plot these values Compare predicted and recorded values

Justify (e.g. logical resolution) assertion assertion1

Large Scale Knowledge Organization

Five components Definitions/Expectations/Assumptions

“Meters measure length”“100 cm = 1 meter”“Evolution is impossible because of X, Y, Z”

TheoryNewton's Laws of Motion and Gravitation

GeneralizationJohannes Kepler's Laws

DataTycho Brahe's Observations

AnalyticsHow to sum & integrate, change coord. systems, etc.

Reasoning Over Components

Which components queried -> Reasoning typetheorize:

d/e/a, theory, general, data, analyticsempiricize:

d/e/a, data, general, theory, analyticsab_initio:

d/e/a, theory, analyticsread_data:

d/e/a, data, analyticsd/e/a always first

Enforce agreement with base assumptionsanalytics always last

Recast query to other form if all else fails

Each Component Inherited Knowledge

Birds fly, but penguins don't fly

Dynamic Knowledge (processes)For falling things:

a = F/m = gv = v0 + gtheight = h0 + v0t + gt2

Static KnowledgeFor homogeneous gases:

PV = nRT

Inherited Knowledge

Works with ontologyis_a(bird,animal).inherit(bird,can_fly_attr,true).

is_a(penguin,bird).inherit(bird,can_fly_attr,false).

instance_of(tweety,bird).instance_of(opus,penguin).

Deduce:“tweety can fly”“opus can not fly”

Inherited Knowledge (2)

Works with ontology, cont'dRemoveinstance_of(opus,penguin).Addis_a(penguin_with_pilots_license, penguin).inherit(penguin_with_pilots_license, can_fly_attr,true).instance_of(opus,penguin_with_pilots_license).

Deduce:“opus can fly”

Static Knowledge

Assertions Modular units of knowledge

Numeric relations (e.g. equations)Decision trees

A name to uniquely identify themideal_gas_law

A typed entity list of entity names and set or domain in which they must reside,gas_ent (in single_compound_gas_class),container_ent (in fluid_container_class),molecule_ent (in molecule_class),

Static Knowledge A condition list telling when knowledge is

applicable:1.molecule_ent.is_gas_phase_mutually_attractive_attr

== false2. molecule_ent.is_gas_phase_mutually_replusive_attr

== false,3. gas_ent.total_molecular_volume_attr <<

container_ent.containers_volume_attr4. gas_ent.is_gas_randomly_moving_attr = true5. molecule_ent.is_newtonian_particle_attr == true6. gas_ent.materials_molecule_attr = molecule_ent7. gas_ent.fluids_container_attr == container_ent

Static Knowledge

An expression:PV = RnT:

gas_ent.gases_pressure_attr* container_ent.objects_internal_volume_attr ==value(normal(8.3145,0.00005), joules_per_mole_kelvin_domain )* gas_ent.materials_mole_num_attr* gas_ent.objects_temperature_attr

Dynamic (Process) KnowledgeProcesses have

Name Types entity list

Entity mappings from inherited to base process entities Conditions

If process happened we know they held Sources of knowledge rather than things to check

Subassertions Numeric relations or decision trees telling what happens

Simulation code Compiles to Java source

Test constraints Instances of the class

Serial or parallel decomposition

Example: Pendulum (1)Entities

process_ent: The process init_state_ent: init. state intermediate_state_ent:

intermediate states final_state_ent: final state axes_configuration_ent:

configuration of X and Y axes in which pendulum swings

pendulum_ent: pendulum on end of string

arm_ent: swinging arm gravitational_field_ent:

gravitational field

Example: Pendulum (2)

Conditionsarm_ent.entities_mass_attr << pendulum_ent.entities_mass_attr

(maybe more)


Subassertionsobject.attribute.descriptordescriptors:

.cont: continuous

.init: initial

.final: final

.delta: attribute

.current/.next/.prev: current/next/previous statesdiscrete

Arm's length is constant:pendulum_ent.x.cont ^ 2 + pendulum_ent.y.cont ^ 2 ==

arm_ent.length


Subassertions X axis forces:

pendulum_ent.objects_mass_attr * (pendulum_end.x.delta.delta / process_ent.time.delta.delta) ==

arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.x.cont * arm_ent.objects_length_attr


Subassertions, cont'd Y axis forces:

pendulum_ent.objects_mass_attr * (pendulum_end.y.delta.delta / process_ent.time.delta.delta) ==

arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.y.cont * arm_ent.objects_length_attr – pendulum_ent.objects_mass_attr *

gravitational_field_ent.grav_fields_acceleration_attr

Hierarchy of ProcessesMotion

Very abstract1-D motion

Specifies that motion along one dimension only

abstract means “fnc to be given in derived class”

1-D uniform acceleration Specifies uniform accel. abstract_const means

“constant to be given in derived class”

1-D gravitational accel. Gives conditions

Reasoning, Revisited

“Deduction”1st Inherited, 2nd Dynamic, 3rd Static, (Prolog?) Iterative Deepening-Depth First Search Assumption: “shallowest” answer is simplest Specify preference with Explanation Usage Obj

Probabilistic reasoning Bayesian Logic Program on top of Prolog

Simulator -> Conditional Probability Table (CPTs) CPTs -> Bayesian Logic Program

Thanks Tony Garcia!Simulation

Assertion Usage Object: Application 3: Make a simulatorUse simulator code for each process to make

Java (or C++) simulator Free parameters picked based on domain

ranges Does N Monte Carlo simulation runs

Biology: Intelligent Design

Life is too complex to have arisen by chance, therefore someone must have designed it ≠ Creationism: Doesn't say who designer was (maybe

Space Aliens) Selection occurs, but Evolution requires:

geographical isolation, AND new trait mutation, AND superior fitness

For all of these to be true is improbable

Biology: Evolution

No such restraints New Synthesis view of Evolution:

Speciation by geographic isolation and selection Post New Synthesis:

Keep geographic isolation Do we need maximal selection?

Biology: Common Model1. Logistic Growth

dN = (growth_rate) * N * (1 – N/(regions_capacity)) Small population -> fast growth; Large pop. -> slow

2. Mendelian Genetics2 genes: A dominates over a, B dominates over b

3. Hardy-Weinberg Conditions H-W proof: no change in allele freq., given assumptions Relaxed assumptions: large population, no fitness diff. Retained assumptions: no mutation, no immigration, no

emigration, only same-generation random mating

Specific Model: “Best of a Bad Circumstance”

Everyone has 2 gene_a alleles and 2 gene_b alleles Having at least one copy of A is best Having at least one copy of B is 2nd best Having neither is worst

fitness(A???) >= fitness(aaB?) >= fitness(aabb)

Genotypes FitnessStrong :)

aaBB, aaBb Medium :|

aabb Weak :(

AABB, AABb, AAbb, AaBB, AaBb, Aabb

To Test

Intelligent Design “B's frequency will never increase other than due to

fitness” We test “Does B ever increase?” Do more detailed statistical analysis afterward

Evolution The Null hypotheses: That B can increase above

fitness

Biological Processespopulation_over_time

Runs create_1st_generation and then create_subsequent_generation 100 times

create_1st_generation

Calls randomly_create_organism the number of times given by the initial generation size.

randomly_create_organism

Stochastic decision trees initialize alleles for gene_a and gene_b according to the free parameter initial probabilities of A and B. A third decision tree then computes the organism's fitness.

create_subsequent_generation

Uses Logistic Growth to compute how many organisms to create. For each it randomly chooses parents (weighted by their fitness) and calls randomly_mate_organisms.

randomly_mate_organisms

Decision trees stochastically define alleles for gene_a and gene_b according to parents and Mendelian genetics. A third computes its fitness. The organism is also attached to its generation.

Biological Free Parameters

Carrying capacity x, where x = floor(10^y+0.5), y in {1.0, 1.0625, .. 3.0}

Init. generation size

Growth rate {0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}# generations {100}

{0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}

{1.0}{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}

x, where x = floor(10^y+0.5), y in {1.0, 1.0625, .. 3.0} (but never greater than Carrying Capacity)

p(A) in init. popp(B) in init. pop

fitness(A???)fitness(aaB?)

fitness(aabb) {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} (but never greater than fitness(aaB?))

Simulator Algorithm

Scilog + Bio Model -> C++ simulator Has random seed as

parameter Simulation run

100,000 times Index used as random

seed

Set random seed

Pick all free params

Param constraintviolated?

Run simulation

Output results

Y

N

Simulator Results

frac(A) always increases frac(B) increases in 57,038 of 100,000 trials

I.D. “Isn't that selection?”

Principle Component Analysis

V1 V2 V3 V4 V5Eigenvalue 1.47 1.17 1 0.83 0.53Proportion 29.41% 23.38% 20.00% 16.62% 10.59%growth rate 0 -0.03 1 0.02 0

carrying capacity 0.71 0.02 0 0 -0.71init. generation size 0.71 0.02 0 -0.01 0.71

0.03 -0.71 -0.03 0.71 0.01

-0.02 0.71 0.01 0.71 0.01

fitness(aaB?) - fitness(aabb)

freq(B) increase

V4: increase in B with selection (16.62%), but . . . V2: increase in B without selection (23.38%)

Discussion

Q: How can B increase without selection? A: The Founder Effect!

Random variations in the initial generation influence later generations

For our simulations: (init. generation size) < 20 This may not be real-world threshold Only requirements:

Geographic isolation Diverse initial population

Values

“Chicago is 185 meters above sea level, ±5 meters”

Primary value or sample distribution The value being heldOne value: 1.85e+2Explicit set: [180, 185, 190]Implicit set: normal(185,5)

Domain Metadata about the valueDimensions (e.g. length)Units (e.g. meters)Axis (e.g. “height above sea level, somewhere on Earth”)Legal values (e.g. “0 meters to 10,000 meters”)

Values (2)

State When the value is said to hold“Mean/median/mode value during all of 2007”

Subject Object the value describeschicago

Attribute Aspect of object being describedheight_above_sea_level_attr

Questions for you:

Easiest language to write History Trajectory and Explanation Usage Objects in? Lisp? ML? Haskell? Prolog?

Easiest to program GUI?

csc 599: computational scientific discovery

Documents

queries assertion usage

object type informationrabbits

type1 leaf2

type1 process

assertion usage objectsatisfy

process templatestemplate

type2 process

value value1