csc 599: computational scientific discovery

46
CSC 599: Computational Scientific Discovery Lecture 10: The Scienceomatic Systems: Deductive Reasoning and Model Design

Upload: zandra

Post on 16-Mar-2016

41 views

Category:

Documents


1 download

DESCRIPTION

Lecture 10: The Scienceomatic Systems: Deductive Reasoning and Model Design. CSC 599: Computational Scientific Discovery. Outline. History Trajectory Object Application: Exhaustive Search Explanation Usage Object Application: Explanation Preference Assertion Reasoning Object - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSC 599: Computational Scientific Discovery

CSC 599: Computational Scientific Discovery

Lecture 10: The Scienceomatic Systems:

Deductive Reasoning and Model Design

Page 2: CSC 599: Computational Scientific Discovery

Outline

History Trajectory Object Application: Exhaustive Search

Explanation Usage Object Application: Explanation Preference

Assertion Reasoning Object Application: Simulation

Page 3: CSC 599: Computational Scientific Discovery

What Scientists Want

Tell me:“What is the prediction?”“How shall I improve this model?”“How did you get that answer?”“What shall we do next?”“What's the precision of that value?”

User sees one interface, butAssertion Reasoning Object answers

“Predict value of X at time Y”Explanation Usage Object answers

“Why is value of X at time Y equal to Z?”History Trajectory Object answers

“What is good to try next?”

Page 4: CSC 599: Computational Scientific Discovery

Recall

Page 5: CSC 599: Computational Scientific Discovery

History Trajectory Object

Does several things:1. Decides which operator to do next based on:

a) How successful they have been (operator id)b) Type of data (data id)c) Tacticsd) Strategy

2. Keeps track of what's been tried beforea) operator/datab) success/failurec) “by how much”d) who/when/why/etc.

3. Modifiablea) Learns best operators on for given datab) PROGRAMMABLE?!? (Under these conditions

create an operator that does this . . .)

Page 6: CSC 599: Computational Scientific Discovery

History Trajectory Object: Application 1: Systematic Search

Exhaustive Search: simplest to more complex Plays to strengths of computers Examples:

MECHEM Inductive Process Modeling

History Trajectory Object can do this:1. Queries Assertion Usage Object for assertions2. Manipulates assertions to build more complex

ones

Page 7: CSC 599: Computational Scientific Discovery

History Trajectory Object: Systematic Search (2)

Idea: There are 3 processes total. Two (leaf1 and leaf2) are primitive. The third (node2) can be made arbitrarily more

complex by taking a type1 process as 1st parameter and type2 process as 2nd.

leaf1 is type1; leaf2 and node2 are type2

Program (In Prolog):process1(leaf1).process2(leaf2).process2(node2(P1,P2)) :- process1(P1), process2(P2).

Page 8: CSC 599: Computational Scientific Discovery

History Trajectory Object: Systematic Search (3)

Program generates process templates“Template” means model of same structure

parameters have not yet been computed (by calculus, simulated annealing, etc.)

Generated from simplest -> increasingly more complex

Output:?- process2(A).A = leaf2 ;A = node2(leaf1, leaf2) ;A = node2(leaf1, node2(leaf1, leaf2)) ;A = node2(leaf1, node2(leaf1, node2(leaf1, leaf2))) etc.

Page 9: CSC 599: Computational Scientific Discovery

History Trajectory Object: Implementation

This algorithm is fundamentally same as MECHEM and Inductive Process Modeling

One algorithm that generates model templates! Calls numeric program to do parameter fitting Can do both MECHEM and IPM!

Making it efficient Like MECHEM rely on domain knowledge

“Reactions that form pure Carbon are unlikely” Like IPM rely on object type information

“Rabbits are prey, coyotes are predators or prey”Implement in:

Prolog? (Scienceomatic)Lisp? (MECHEM, IPM re-write)ML? Haskell? Anything else?

Page 10: CSC 599: Computational Scientific Discovery

Explanation Usage Object

Sample of important methods: Predict object1's attribute attribute1

Satisfy with assertion usage object Satisfy with solved problem library

Philosophy of science justification Kuhnian exemplar: what scientists do

Artificial Intelligence justification: EBL: cheaper than de novo reasoning

Give trace why object object1's attribute attribute1 is value value1.

Give trace how assertion assertion1 is justified (e.g. derived)

Refine reasoning method“I like traces like this over traces like that because . . .”

Page 11: CSC 599: Computational Scientific Discovery

Explanation Usage Object: Application 2: Explanation

Preference Default behavior:

Favor shallowest explanation(more on details of this later)

ProblemShallowest may not be most correct!

We've seen this before with MECHEM: Computer scientist's “best mechanism” means

shortest syntax Chemist's “best mechanism” means least energetic

rate determining step

Page 12: CSC 599: Computational Scientific Discovery

Explanation Usage Object: Explanation Preference (2)

falling_without_drag:F = mg

falling_with_drag:F = mg - 0.5 * * v2 * A * Cdwhere:

= fluid's (e.g. air's) densityv = velocityA = object's areaCd = Drag coefficient

Page 13: CSC 599: Computational Scientific Discovery

Explanation Usage Object: Explanation Preference (3)

Scienceomatic would can compute both answers falling_without_drag answer might be

returned first if uses less deep explanation tree

Can tell Scienceomatic preferences:“Prefer falling_with_drag answer before falling_without_drag answer”

Page 14: CSC 599: Computational Scientific Discovery

Explanation Usage Object: Implementation

An explanation datastructure is a tree Lisp, ML, Haskell most general Prolog acceptable

May need Prolog's “2nd order” predicates:=.., functor, arg

Example:?- f(a,b) =.. L.L = [f, a, b] ;

C/C++/Java/C#Possible (of course) but may not be natural

Your ideas?

Page 15: CSC 599: Computational Scientific Discovery

Assertion Reasoning Object

Sample of important methods: Retrieve assertion assertion1

Show assertion Edit assertion

Predict object object1's attribute attribute1 Plot these values Compare predicted and recorded values

Justify (e.g. logical resolution) assertion assertion1

Page 16: CSC 599: Computational Scientific Discovery

Large Scale Knowledge Organization

Five components Definitions/Expectations/Assumptions

“Meters measure length”“100 cm = 1 meter”“Evolution is impossible because of X, Y, Z”

TheoryNewton's Laws of Motion and Gravitation

GeneralizationJohannes Kepler's Laws

DataTycho Brahe's Observations

AnalyticsHow to sum & integrate, change coord. systems, etc.

Page 17: CSC 599: Computational Scientific Discovery

Reasoning Over Components

Which components queried -> Reasoning typetheorize:

d/e/a, theory, general, data, analyticsempiricize:

d/e/a, data, general, theory, analyticsab_initio:

d/e/a, theory, analyticsread_data:

d/e/a, data, analyticsd/e/a always first

Enforce agreement with base assumptionsanalytics always last

Recast query to other form if all else fails

Page 18: CSC 599: Computational Scientific Discovery

Each Component Inherited Knowledge

Birds fly, but penguins don't fly

Dynamic Knowledge (processes)For falling things:

a = F/m = gv = v0 + gtheight = h0 + v0t + gt2

Static KnowledgeFor homogeneous gases:

PV = nRT

Page 19: CSC 599: Computational Scientific Discovery

Inherited Knowledge

Works with ontologyis_a(bird,animal).inherit(bird,can_fly_attr,true).

is_a(penguin,bird).inherit(bird,can_fly_attr,false).

instance_of(tweety,bird).instance_of(opus,penguin).

Deduce:“tweety can fly”“opus can not fly”

Page 20: CSC 599: Computational Scientific Discovery

Inherited Knowledge (2)

Works with ontology, cont'dRemoveinstance_of(opus,penguin).Addis_a(penguin_with_pilots_license, penguin).inherit(penguin_with_pilots_license, can_fly_attr,true).instance_of(opus,penguin_with_pilots_license).

Deduce:“opus can fly”

Page 21: CSC 599: Computational Scientific Discovery

Static Knowledge

Assertions Modular units of knowledge

Numeric relations (e.g. equations)Decision trees

A name to uniquely identify themideal_gas_law

A typed entity list of entity names and set or domain in which they must reside,gas_ent (in single_compound_gas_class),container_ent (in fluid_container_class),molecule_ent (in molecule_class),

Page 22: CSC 599: Computational Scientific Discovery

Static Knowledge A condition list telling when knowledge is

applicable:1.molecule_ent.is_gas_phase_mutually_attractive_attr

== false2. molecule_ent.is_gas_phase_mutually_replusive_attr

== false,3. gas_ent.total_molecular_volume_attr <<

container_ent.containers_volume_attr4. gas_ent.is_gas_randomly_moving_attr = true5. molecule_ent.is_newtonian_particle_attr == true6. gas_ent.materials_molecule_attr = molecule_ent7. gas_ent.fluids_container_attr == container_ent

Page 23: CSC 599: Computational Scientific Discovery

Static Knowledge

An expression:PV = RnT:

gas_ent.gases_pressure_attr* container_ent.objects_internal_volume_attr ==value(normal(8.3145,0.00005), joules_per_mole_kelvin_domain )* gas_ent.materials_mole_num_attr* gas_ent.objects_temperature_attr

Page 24: CSC 599: Computational Scientific Discovery

Dynamic (Process) KnowledgeProcesses have

Name Types entity list

Entity mappings from inherited to base process entities Conditions

If process happened we know they held Sources of knowledge rather than things to check

Subassertions Numeric relations or decision trees telling what happens

Simulation code Compiles to Java source

Test constraints Instances of the class

Serial or parallel decomposition

Page 25: CSC 599: Computational Scientific Discovery

Example: Pendulum (1)Entities

process_ent: The process init_state_ent: init. state intermediate_state_ent:

intermediate states final_state_ent: final state axes_configuration_ent:

configuration of X and Y axes in which pendulum swings

pendulum_ent: pendulum on end of string

arm_ent: swinging arm gravitational_field_ent:

gravitational field

Page 26: CSC 599: Computational Scientific Discovery

Example: Pendulum (2)

Conditionsarm_ent.entities_mass_attr << pendulum_ent.entities_mass_attr

(maybe more)

Page 27: CSC 599: Computational Scientific Discovery

Example: Pendulum (3)

Subassertionsobject.attribute.descriptordescriptors:

.cont: continuous

.init: initial

.final: final

.delta: attribute

.current/.next/.prev: current/next/previous statesdiscrete

Arm's length is constant:pendulum_ent.x.cont ^ 2 + pendulum_ent.y.cont ^ 2 ==

arm_ent.length

Page 28: CSC 599: Computational Scientific Discovery

Example: Pendulum (4)

Subassertions X axis forces:

pendulum_ent.objects_mass_attr * (pendulum_end.x.delta.delta / process_ent.time.delta.delta) ==

arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.x.cont * arm_ent.objects_length_attr

Page 29: CSC 599: Computational Scientific Discovery

Example: Pendulum (5)

Subassertions, cont'd Y axis forces:

pendulum_ent.objects_mass_attr * (pendulum_end.y.delta.delta / process_ent.time.delta.delta) ==

arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.y.cont * arm_ent.objects_length_attr – pendulum_ent.objects_mass_attr *

gravitational_field_ent.grav_fields_acceleration_attr

Page 30: CSC 599: Computational Scientific Discovery

Hierarchy of ProcessesMotion

Very abstract1-D motion

Specifies that motion along one dimension only

abstract means “fnc to be given in derived class”

1-D uniform acceleration Specifies uniform accel. abstract_const means

“constant to be given in derived class”

1-D gravitational accel. Gives conditions

Page 31: CSC 599: Computational Scientific Discovery

Reasoning, Revisited

“Deduction”1st Inherited, 2nd Dynamic, 3rd Static, (Prolog?) Iterative Deepening-Depth First Search Assumption: “shallowest” answer is simplest Specify preference with Explanation Usage Obj

Probabilistic reasoning Bayesian Logic Program on top of Prolog

Simulator -> Conditional Probability Table (CPTs) CPTs -> Bayesian Logic Program

Thanks Tony Garcia!Simulation

Page 32: CSC 599: Computational Scientific Discovery

Assertion Usage Object: Application 3: Make a simulatorUse simulator code for each process to make

Java (or C++) simulator Free parameters picked based on domain

ranges Does N Monte Carlo simulation runs

Page 33: CSC 599: Computational Scientific Discovery

Biology: Intelligent Design

Life is too complex to have arisen by chance, therefore someone must have designed it ≠ Creationism: Doesn't say who designer was (maybe

Space Aliens) Selection occurs, but Evolution requires:

geographical isolation, AND new trait mutation, AND superior fitness

For all of these to be true is improbable

Page 34: CSC 599: Computational Scientific Discovery

Biology: Evolution

No such restraints New Synthesis view of Evolution:

Speciation by geographic isolation and selection Post New Synthesis:

Keep geographic isolation Do we need maximal selection?

Page 35: CSC 599: Computational Scientific Discovery

Biology: Common Model1. Logistic Growth

dN = (growth_rate) * N * (1 – N/(regions_capacity)) Small population -> fast growth; Large pop. -> slow

2. Mendelian Genetics2 genes: A dominates over a, B dominates over b

3. Hardy-Weinberg Conditions H-W proof: no change in allele freq., given assumptions Relaxed assumptions: large population, no fitness diff. Retained assumptions: no mutation, no immigration, no

emigration, only same-generation random mating

Page 36: CSC 599: Computational Scientific Discovery

Specific Model: “Best of a Bad Circumstance”

Everyone has 2 gene_a alleles and 2 gene_b alleles Having at least one copy of A is best Having at least one copy of B is 2nd best Having neither is worst

fitness(A???) >= fitness(aaB?) >= fitness(aabb)

Genotypes FitnessStrong :)

aaBB, aaBb Medium :|

aabb Weak :(

AABB, AABb, AAbb, AaBB, AaBb, Aabb

Page 37: CSC 599: Computational Scientific Discovery

To Test

Intelligent Design “B's frequency will never increase other than due to

fitness” We test “Does B ever increase?” Do more detailed statistical analysis afterward

Evolution The Null hypotheses: That B can increase above

fitness

Page 38: CSC 599: Computational Scientific Discovery

Biological Processespopulation_over_time

Runs create_1st_generation and then create_subsequent_generation 100 times

create_1st_generation

Calls randomly_create_organism the number of times given by the initial generation size.

randomly_create_organism

Stochastic decision trees initialize alleles for gene_a and gene_b according to the free parameter initial probabilities of A and B. A third decision tree then computes the organism's fitness.

create_subsequent_generation

Uses Logistic Growth to compute how many organisms to create. For each it randomly chooses parents (weighted by their fitness) and calls randomly_mate_organisms.

randomly_mate_organisms

Decision trees stochastically define alleles for gene_a and gene_b according to parents and Mendelian genetics. A third computes its fitness. The organism is also attached to its generation.

Page 39: CSC 599: Computational Scientific Discovery

Biological Free Parameters

Carrying capacity x, where x = floor(10^y+0.5), y in {1.0, 1.0625, .. 3.0}

Init. generation size

Growth rate {0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}# generations {100}

{0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}

{1.0}{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}

x, where x = floor(10^y+0.5), y in {1.0, 1.0625, .. 3.0} (but never greater than Carrying Capacity)

p(A) in init. popp(B) in init. pop

fitness(A???)fitness(aaB?)

fitness(aabb) {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} (but never greater than fitness(aaB?))

Page 40: CSC 599: Computational Scientific Discovery

Simulator Algorithm

Scilog + Bio Model -> C++ simulator Has random seed as

parameter Simulation run

100,000 times Index used as random

seed

Set random seed

Pick all free params

Param constraintviolated?

Run simulation

Output results

Y

N

Page 41: CSC 599: Computational Scientific Discovery

Simulator Results

frac(A) always increases frac(B) increases in 57,038 of 100,000 trials

I.D. “Isn't that selection?”

Page 42: CSC 599: Computational Scientific Discovery

Principle Component Analysis

V1 V2 V3 V4 V5Eigenvalue 1.47 1.17 1 0.83 0.53Proportion 29.41% 23.38% 20.00% 16.62% 10.59%growth rate 0 -0.03 1 0.02 0

carrying capacity 0.71 0.02 0 0 -0.71init. generation size 0.71 0.02 0 -0.01 0.71

0.03 -0.71 -0.03 0.71 0.01

-0.02 0.71 0.01 0.71 0.01

fitness(aaB?) - fitness(aabb)

freq(B) increase

V4: increase in B with selection (16.62%), but . . . V2: increase in B without selection (23.38%)

Page 43: CSC 599: Computational Scientific Discovery

Discussion

Q: How can B increase without selection? A: The Founder Effect!

Random variations in the initial generation influence later generations

For our simulations: (init. generation size) < 20 This may not be real-world threshold Only requirements:

Geographic isolation Diverse initial population

Page 44: CSC 599: Computational Scientific Discovery

Values

“Chicago is 185 meters above sea level, ±5 meters”

Primary value or sample distribution The value being heldOne value: 1.85e+2Explicit set: [180, 185, 190]Implicit set: normal(185,5)

Domain Metadata about the valueDimensions (e.g. length)Units (e.g. meters)Axis (e.g. “height above sea level, somewhere on Earth”)Legal values (e.g. “0 meters to 10,000 meters”)

Page 45: CSC 599: Computational Scientific Discovery

Values (2)

State When the value is said to hold“Mean/median/mode value during all of 2007”

Subject Object the value describeschicago

Attribute Aspect of object being describedheight_above_sea_level_attr

Page 46: CSC 599: Computational Scientific Discovery

Questions for you:

Easiest language to write History Trajectory and Explanation Usage Objects in? Lisp? ML? Haskell? Prolog?

Easiest to program GUI?