csc 599: computational scientific discovery
DESCRIPTION
Lecture 10: The Scienceomatic Systems: Deductive Reasoning and Model Design. CSC 599: Computational Scientific Discovery. Outline. History Trajectory Object Application: Exhaustive Search Explanation Usage Object Application: Explanation Preference Assertion Reasoning Object - PowerPoint PPT PresentationTRANSCRIPT
CSC 599: Computational Scientific Discovery
Lecture 10: The Scienceomatic Systems:
Deductive Reasoning and Model Design
Outline
History Trajectory Object Application: Exhaustive Search
Explanation Usage Object Application: Explanation Preference
Assertion Reasoning Object Application: Simulation
What Scientists Want
Tell me:“What is the prediction?”“How shall I improve this model?”“How did you get that answer?”“What shall we do next?”“What's the precision of that value?”
User sees one interface, butAssertion Reasoning Object answers
“Predict value of X at time Y”Explanation Usage Object answers
“Why is value of X at time Y equal to Z?”History Trajectory Object answers
“What is good to try next?”
Recall
History Trajectory Object
Does several things:1. Decides which operator to do next based on:
a) How successful they have been (operator id)b) Type of data (data id)c) Tacticsd) Strategy
2. Keeps track of what's been tried beforea) operator/datab) success/failurec) “by how much”d) who/when/why/etc.
3. Modifiablea) Learns best operators on for given datab) PROGRAMMABLE?!? (Under these conditions
create an operator that does this . . .)
History Trajectory Object: Application 1: Systematic Search
Exhaustive Search: simplest to more complex Plays to strengths of computers Examples:
MECHEM Inductive Process Modeling
History Trajectory Object can do this:1. Queries Assertion Usage Object for assertions2. Manipulates assertions to build more complex
ones
History Trajectory Object: Systematic Search (2)
Idea: There are 3 processes total. Two (leaf1 and leaf2) are primitive. The third (node2) can be made arbitrarily more
complex by taking a type1 process as 1st parameter and type2 process as 2nd.
leaf1 is type1; leaf2 and node2 are type2
Program (In Prolog):process1(leaf1).process2(leaf2).process2(node2(P1,P2)) :- process1(P1), process2(P2).
History Trajectory Object: Systematic Search (3)
Program generates process templates“Template” means model of same structure
parameters have not yet been computed (by calculus, simulated annealing, etc.)
Generated from simplest -> increasingly more complex
Output:?- process2(A).A = leaf2 ;A = node2(leaf1, leaf2) ;A = node2(leaf1, node2(leaf1, leaf2)) ;A = node2(leaf1, node2(leaf1, node2(leaf1, leaf2))) etc.
History Trajectory Object: Implementation
This algorithm is fundamentally same as MECHEM and Inductive Process Modeling
One algorithm that generates model templates! Calls numeric program to do parameter fitting Can do both MECHEM and IPM!
Making it efficient Like MECHEM rely on domain knowledge
“Reactions that form pure Carbon are unlikely” Like IPM rely on object type information
“Rabbits are prey, coyotes are predators or prey”Implement in:
Prolog? (Scienceomatic)Lisp? (MECHEM, IPM re-write)ML? Haskell? Anything else?
Explanation Usage Object
Sample of important methods: Predict object1's attribute attribute1
Satisfy with assertion usage object Satisfy with solved problem library
Philosophy of science justification Kuhnian exemplar: what scientists do
Artificial Intelligence justification: EBL: cheaper than de novo reasoning
Give trace why object object1's attribute attribute1 is value value1.
Give trace how assertion assertion1 is justified (e.g. derived)
Refine reasoning method“I like traces like this over traces like that because . . .”
Explanation Usage Object: Application 2: Explanation
Preference Default behavior:
Favor shallowest explanation(more on details of this later)
ProblemShallowest may not be most correct!
We've seen this before with MECHEM: Computer scientist's “best mechanism” means
shortest syntax Chemist's “best mechanism” means least energetic
rate determining step
Explanation Usage Object: Explanation Preference (2)
falling_without_drag:F = mg
falling_with_drag:F = mg - 0.5 * * v2 * A * Cdwhere:
= fluid's (e.g. air's) densityv = velocityA = object's areaCd = Drag coefficient
Explanation Usage Object: Explanation Preference (3)
Scienceomatic would can compute both answers falling_without_drag answer might be
returned first if uses less deep explanation tree
Can tell Scienceomatic preferences:“Prefer falling_with_drag answer before falling_without_drag answer”
Explanation Usage Object: Implementation
An explanation datastructure is a tree Lisp, ML, Haskell most general Prolog acceptable
May need Prolog's “2nd order” predicates:=.., functor, arg
Example:?- f(a,b) =.. L.L = [f, a, b] ;
C/C++/Java/C#Possible (of course) but may not be natural
Your ideas?
Assertion Reasoning Object
Sample of important methods: Retrieve assertion assertion1
Show assertion Edit assertion
Predict object object1's attribute attribute1 Plot these values Compare predicted and recorded values
Justify (e.g. logical resolution) assertion assertion1
Large Scale Knowledge Organization
Five components Definitions/Expectations/Assumptions
“Meters measure length”“100 cm = 1 meter”“Evolution is impossible because of X, Y, Z”
TheoryNewton's Laws of Motion and Gravitation
GeneralizationJohannes Kepler's Laws
DataTycho Brahe's Observations
AnalyticsHow to sum & integrate, change coord. systems, etc.
Reasoning Over Components
Which components queried -> Reasoning typetheorize:
d/e/a, theory, general, data, analyticsempiricize:
d/e/a, data, general, theory, analyticsab_initio:
d/e/a, theory, analyticsread_data:
d/e/a, data, analyticsd/e/a always first
Enforce agreement with base assumptionsanalytics always last
Recast query to other form if all else fails
Each Component Inherited Knowledge
Birds fly, but penguins don't fly
Dynamic Knowledge (processes)For falling things:
a = F/m = gv = v0 + gtheight = h0 + v0t + gt2
Static KnowledgeFor homogeneous gases:
PV = nRT
Inherited Knowledge
Works with ontologyis_a(bird,animal).inherit(bird,can_fly_attr,true).
is_a(penguin,bird).inherit(bird,can_fly_attr,false).
instance_of(tweety,bird).instance_of(opus,penguin).
Deduce:“tweety can fly”“opus can not fly”
Inherited Knowledge (2)
Works with ontology, cont'dRemoveinstance_of(opus,penguin).Addis_a(penguin_with_pilots_license, penguin).inherit(penguin_with_pilots_license, can_fly_attr,true).instance_of(opus,penguin_with_pilots_license).
Deduce:“opus can fly”
Static Knowledge
Assertions Modular units of knowledge
Numeric relations (e.g. equations)Decision trees
A name to uniquely identify themideal_gas_law
A typed entity list of entity names and set or domain in which they must reside,gas_ent (in single_compound_gas_class),container_ent (in fluid_container_class),molecule_ent (in molecule_class),
Static Knowledge A condition list telling when knowledge is
applicable:1.molecule_ent.is_gas_phase_mutually_attractive_attr
== false2. molecule_ent.is_gas_phase_mutually_replusive_attr
== false,3. gas_ent.total_molecular_volume_attr <<
container_ent.containers_volume_attr4. gas_ent.is_gas_randomly_moving_attr = true5. molecule_ent.is_newtonian_particle_attr == true6. gas_ent.materials_molecule_attr = molecule_ent7. gas_ent.fluids_container_attr == container_ent
Static Knowledge
An expression:PV = RnT:
gas_ent.gases_pressure_attr* container_ent.objects_internal_volume_attr ==value(normal(8.3145,0.00005), joules_per_mole_kelvin_domain )* gas_ent.materials_mole_num_attr* gas_ent.objects_temperature_attr
Dynamic (Process) KnowledgeProcesses have
Name Types entity list
Entity mappings from inherited to base process entities Conditions
If process happened we know they held Sources of knowledge rather than things to check
Subassertions Numeric relations or decision trees telling what happens
Simulation code Compiles to Java source
Test constraints Instances of the class
Serial or parallel decomposition
Example: Pendulum (1)Entities
process_ent: The process init_state_ent: init. state intermediate_state_ent:
intermediate states final_state_ent: final state axes_configuration_ent:
configuration of X and Y axes in which pendulum swings
pendulum_ent: pendulum on end of string
arm_ent: swinging arm gravitational_field_ent:
gravitational field
Example: Pendulum (2)
Conditionsarm_ent.entities_mass_attr << pendulum_ent.entities_mass_attr
(maybe more)
Example: Pendulum (3)
Subassertionsobject.attribute.descriptordescriptors:
.cont: continuous
.init: initial
.final: final
.delta: attribute
.current/.next/.prev: current/next/previous statesdiscrete
Arm's length is constant:pendulum_ent.x.cont ^ 2 + pendulum_ent.y.cont ^ 2 ==
arm_ent.length
Example: Pendulum (4)
Subassertions X axis forces:
pendulum_ent.objects_mass_attr * (pendulum_end.x.delta.delta / process_ent.time.delta.delta) ==
arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.x.cont * arm_ent.objects_length_attr
Example: Pendulum (5)
Subassertions, cont'd Y axis forces:
pendulum_ent.objects_mass_attr * (pendulum_end.y.delta.delta / process_ent.time.delta.delta) ==
arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.y.cont * arm_ent.objects_length_attr – pendulum_ent.objects_mass_attr *
gravitational_field_ent.grav_fields_acceleration_attr
Hierarchy of ProcessesMotion
Very abstract1-D motion
Specifies that motion along one dimension only
abstract means “fnc to be given in derived class”
1-D uniform acceleration Specifies uniform accel. abstract_const means
“constant to be given in derived class”
1-D gravitational accel. Gives conditions
Reasoning, Revisited
“Deduction”1st Inherited, 2nd Dynamic, 3rd Static, (Prolog?) Iterative Deepening-Depth First Search Assumption: “shallowest” answer is simplest Specify preference with Explanation Usage Obj
Probabilistic reasoning Bayesian Logic Program on top of Prolog
Simulator -> Conditional Probability Table (CPTs) CPTs -> Bayesian Logic Program
Thanks Tony Garcia!Simulation
Assertion Usage Object: Application 3: Make a simulatorUse simulator code for each process to make
Java (or C++) simulator Free parameters picked based on domain
ranges Does N Monte Carlo simulation runs
Biology: Intelligent Design
Life is too complex to have arisen by chance, therefore someone must have designed it ≠ Creationism: Doesn't say who designer was (maybe
Space Aliens) Selection occurs, but Evolution requires:
geographical isolation, AND new trait mutation, AND superior fitness
For all of these to be true is improbable
Biology: Evolution
No such restraints New Synthesis view of Evolution:
Speciation by geographic isolation and selection Post New Synthesis:
Keep geographic isolation Do we need maximal selection?
Biology: Common Model1. Logistic Growth
dN = (growth_rate) * N * (1 – N/(regions_capacity)) Small population -> fast growth; Large pop. -> slow
2. Mendelian Genetics2 genes: A dominates over a, B dominates over b
3. Hardy-Weinberg Conditions H-W proof: no change in allele freq., given assumptions Relaxed assumptions: large population, no fitness diff. Retained assumptions: no mutation, no immigration, no
emigration, only same-generation random mating
Specific Model: “Best of a Bad Circumstance”
Everyone has 2 gene_a alleles and 2 gene_b alleles Having at least one copy of A is best Having at least one copy of B is 2nd best Having neither is worst
fitness(A???) >= fitness(aaB?) >= fitness(aabb)
Genotypes FitnessStrong :)
aaBB, aaBb Medium :|
aabb Weak :(
AABB, AABb, AAbb, AaBB, AaBb, Aabb
To Test
Intelligent Design “B's frequency will never increase other than due to
fitness” We test “Does B ever increase?” Do more detailed statistical analysis afterward
Evolution The Null hypotheses: That B can increase above
fitness
Biological Processespopulation_over_time
Runs create_1st_generation and then create_subsequent_generation 100 times
create_1st_generation
Calls randomly_create_organism the number of times given by the initial generation size.
randomly_create_organism
Stochastic decision trees initialize alleles for gene_a and gene_b according to the free parameter initial probabilities of A and B. A third decision tree then computes the organism's fitness.
create_subsequent_generation
Uses Logistic Growth to compute how many organisms to create. For each it randomly chooses parents (weighted by their fitness) and calls randomly_mate_organisms.
randomly_mate_organisms
Decision trees stochastically define alleles for gene_a and gene_b according to parents and Mendelian genetics. A third computes its fitness. The organism is also attached to its generation.
Biological Free Parameters
Carrying capacity x, where x = floor(10^y+0.5), y in {1.0, 1.0625, .. 3.0}
Init. generation size
Growth rate {0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}# generations {100}
{0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}
{1.0}{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}
x, where x = floor(10^y+0.5), y in {1.0, 1.0625, .. 3.0} (but never greater than Carrying Capacity)
p(A) in init. popp(B) in init. pop
fitness(A???)fitness(aaB?)
fitness(aabb) {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} (but never greater than fitness(aaB?))
Simulator Algorithm
Scilog + Bio Model -> C++ simulator Has random seed as
parameter Simulation run
100,000 times Index used as random
seed
Set random seed
Pick all free params
Param constraintviolated?
Run simulation
Output results
Y
N
Simulator Results
frac(A) always increases frac(B) increases in 57,038 of 100,000 trials
I.D. “Isn't that selection?”
Principle Component Analysis
V1 V2 V3 V4 V5Eigenvalue 1.47 1.17 1 0.83 0.53Proportion 29.41% 23.38% 20.00% 16.62% 10.59%growth rate 0 -0.03 1 0.02 0
carrying capacity 0.71 0.02 0 0 -0.71init. generation size 0.71 0.02 0 -0.01 0.71
0.03 -0.71 -0.03 0.71 0.01
-0.02 0.71 0.01 0.71 0.01
fitness(aaB?) - fitness(aabb)
freq(B) increase
V4: increase in B with selection (16.62%), but . . . V2: increase in B without selection (23.38%)
Discussion
Q: How can B increase without selection? A: The Founder Effect!
Random variations in the initial generation influence later generations
For our simulations: (init. generation size) < 20 This may not be real-world threshold Only requirements:
Geographic isolation Diverse initial population
Values
“Chicago is 185 meters above sea level, ±5 meters”
Primary value or sample distribution The value being heldOne value: 1.85e+2Explicit set: [180, 185, 190]Implicit set: normal(185,5)
Domain Metadata about the valueDimensions (e.g. length)Units (e.g. meters)Axis (e.g. “height above sea level, somewhere on Earth”)Legal values (e.g. “0 meters to 10,000 meters”)
Values (2)
State When the value is said to hold“Mean/median/mode value during all of 2007”
Subject Object the value describeschicago
Attribute Aspect of object being describedheight_above_sea_level_attr
Questions for you:
Easiest language to write History Trajectory and Explanation Usage Objects in? Lisp? ML? Haskell? Prolog?
Easiest to program GUI?