introduction into analysis of methods and tools for hypothesis-driven...

54
Tools and Methods for HD Experiment Support D. Kovalev Motivation Hypotheses-Oriented Approach Revised Facilities for Hypothesis-Driven Experiment Support Examples of Hypothesis-Driven Scientific Research Summary 1/52 Introduction into Analysis of Methods and Tools for Hypothesis-Driven Scientific Experiment Support L.A. Kalinichenko 1 D.Y. Kovalev 1 D.A. Kovaleva 2 O.Y. Malkov 2 1 Institute of Informatics Problems Russian Academy of Sciences 2 Institute of Astronomy Russian Academy of Sciences ACM SIGMOD CHAPTER, Moscow November 27, 2014

Upload: others

Post on 17-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

1/52

Introduction into Analysis ofMethods and Tools for

Hypothesis-Driven ScientificExperiment Support

L.A. Kalinichenko1 D.Y. Kovalev1 D.A. Kovaleva2

O.Y. Malkov2

1Institute of Informatics ProblemsRussian Academy of Sciences

2Institute of AstronomyRussian Academy of Sciences

ACM SIGMOD CHAPTER, MoscowNovember 27, 2014

Page 2: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

2/52

Motivation

• Science is increasingly dependent on data as the coresource for discovery:

• scientific instruments,• sensors,• simulations,• Web or social nets

• Big Data "movement": 3V’s, new infrastructures, newalgorithms

The basic objective of Data-Intensive Sciences (DIS) is to inferknowledge from the integrated data organized in networkedinfrastructures such as warehouses, grids, clouds. Openaccess to large volumes of data therefore becomes a keyprerequisite for discoveries in the 21st century.

Page 3: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

3/52

Motivation

We have to do better at producing tools to support the wholeresearch cycle – from data capture and data curation to dataanalysis and data visualization.

— Jim Gray, 2007

New tools are needed to bring humans into the data-analysisloop at all stages, recognizing that knowledge is oftensubjective and context-dependent and that some aspects ofhuman intelligence will not be replaced anytime soon bymachines.

— Frontiers in Massive Data Analysis, 2013

Page 4: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

4/52

Motivation

• Data deluge has affected the way scientific experiment isdone

• X-informatics were the first to deal with it• Providing valuable insight into how modern data intensive

research is done (scheme, methods, algorithms, etc.)

Page 5: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

5/52

Main Ideas of Talk

• Hypothesis remains the central research unit in DIS• The most promising approaches and examples of problem

solving in different DIS are collected• The general scheme to do scientific research and provided

possible methods for it is pinpointed• Examples of complex DIS with various data and algorithm

problems are highlighted

Page 6: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

6/52

The Automation of Systems BiologyRevised Approach to Scientific Experiment

ExampleThis is a physically implemented laboratory automation systemthat exploits techniques from the field of artificial intelligence toexecute cycles of scientific experiment

Figure: Lab equipment

Adam formulatedand tested 20 hypothesesconcerning genes encoding13 orphan enzymes . . . 12hypotheses with no previousevidence were confirmed.

Page 7: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

© 2010 IBM Corporation

IBM Research

DeepQA: The architecture underlying Inside Watson Generates many hypotheses, collects a wide range of evidence and balances the combined

confidences of over 100 different analytics that analyze the evidence form different dimensions

. . .

Answer Scoring

Models

Answer & Confidence

Question

Evidence Sources

Models

Models

Models

Models

Models Primary Search

Candidate Answer

Generation

Hypothesis Generation

Hypothesis and Evidence Scoring

Final Confidence Merging & Ranking

Synthesis

Answer Sources

Question & Topic

Analysis

Evidence Retrieval

Deep Evidence Scoring

Learned Models help combine and

weigh the Evidence

Hypothesis Generation

Hypothesis and Evidence Scoring

Question Decomposition

1000’s of Pieces of Evidence

Multiple Interpretations

100,000’s Scores from many Deep Analysis

Algorithms

100’s sources

100’s Possible Answers

Balance & Combine

Page 8: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

8/52

Outline1 Motivation

2 Hypotheses-Oriented Approach RevisedHypothesis GenerationHypothesis EvaluationAlgorithmic Generation and Evaluation of HypothesesBayesian Motivation for Discovery

3 Facilities for Hypothesis-Driven Experiment SupportConceptualization of Scientific ExperimentScientific Hypothesis FormalizationHypotheses as Data in Probabilistic Databases

4 Examples of Hypothesis-Driven Scientific ResearchBesançon Galaxy ModelAnalysis of Connectome Based on Network DataClimate in AustraliaFinancial Markets

5 Summary

Page 9: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

9/52

Hypothesis-Oriented Approach toScientific ExperimentFirst hypothesize, then experiment . . .

DefinitionA scientific hypothesis is a proposed falsifiable explanation of aphenomenon which still has to be rigorously tested.

“Without hypothesis there is no science”

— M. Poincaré

Page 10: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

10/52

Relationship between hypotheses, laws,theories

DefinitionA scientific theory has undergone extensive testing and isgenerally accepted to be the accurate explanation behind anobservation.

DefinitionA scientific law is a proposition, which points out any suchorderliness or regularity in nature, the prevalence of aninvariable association between a particular set of conditionsand particular phenomena.

• No essential difference between constructs used forexpressing hypotheses, theories and laws.

Page 11: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

11/52

Relationship between hypotheses, laws,theories

Page 12: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

12/52

Enhanced Knowledge Production Diagram

Page 13: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

13/52

Induction, Deduction, Abduction

DefinitionInduction is a technique by which individual pieces of evidenceare collected and examined until a law is discovered or atheory is invented

DefinitionDeduction is the process of reasoning from one or morestatements (premises) to reach a logically certain conclusions

DefinitionAbduction is the process of validating a given hypothesisthrough reasoning by successive approximation

Page 14: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

14/52

Different Representations of ScientificHypothesisFrom classical hypothesis to the DIS hypothesis• Mathematical equation

• a(t) = −g• v(t) = −gt + v0

• s(t) = −(g/2)t2 + v0t + s0

• Existentional formula• ∀x ∈ X ∀y ∈ Y , p(x)→ q(y)

Database relation

t v s0 0 50001 -32 49842 -64 49363 -96 48564 -128 4744

Algorithmfor k = 0:n;

t = k * dt;v = -g*t + v_0;s = -(g/2)*t^2 +

v_0*t + s_0;t_plot(k) = t;v_plot(k) = v;s_plot(k) = s;

end

Page 15: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

15/52

Different Representations of ScientificHypothesisAssociative or causal relationship

Page 16: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

16/52

Hypothetico-Deductive Method

• Scientific inquiry proceeds by formulating a hypothesis in aform that could conceivably be falsified by a test onobservable data.

• A test that could and does run contrary to predictions ofthe hypothesis is taken as a falsification of the hypothesis.

• A test that could but does not run contrary to thehypothesis corroborates the theory.

Page 17: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

17/52

Hypothetico-Deductive Method

• Predictions are made from the independent variable to thedependent variable. It is the dependent variable that theresearcher is interested in understanding.

Example

Effect of drug dosage on symptom severity. The independentvariable is the dose and the dependent variable is thefrequency/intensity of symptoms.

• Usually, statistical hypothesis testing is used:• Null and alternative hypothesis are stated;• The null hypothesis states that there is no relationship

between the phenomena (variables) whose relation is underinvestigation, or at least not of the form given by thealternative hypothesis.

• Rejecting the null hypothesis suggests that the alternativehypothesis may be true

Page 18: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

18/52

Elements of Hypothesis-Driven Research

lsc:Hypothesis

Model Phenomenon

lsc:Data Observable

represents

realizes

lsc:dataUsed

lsc:dataProducedhasAspect

Page 19: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

19/52

Research Lattice

DefinitionA hypothesis lattice is formed by considering a set ofhypotheses equipped with wasDerivedFrom as a strict order <(from the bottom to the top). Hypotheses directly derived fromexactly one hypothesis are atomic, while those directly derivedfrom at least two hypotheses are complex.

Page 20: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

20/52

Research Lattices I

p1. Mass

net flux

p2. Fluid

compression

p3. Momentum

net flux

p4. Fluid

friction

p5. Body

force effects

p6. Fluid

divergence

p7. Fluid

dynamics

p8. Fluid

behavior

h1 . Mass

conversation

h2. Incompressible

fluidh3. Momentum

conservation

h4. Viscous

fluid

h5. No

body forces

h6. blend(h1,h2) h7. blend(h3,h4,h5)

h8. blend(h6,h7)

explainsm1. m2. m3. m4. m5.

m6. m7.

m8.

Page 21: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

21/52

Research Lattices II

DefinitionThe lattices are isomorphic if one takes subsets of M (Model),H (Hypotheses) and P (Phenomenon) such that formulates,explains and represents are both one-to-one and ontomappings (i.e., bijections), seen as structure-preservingmappings (morphisms).

Page 22: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

22/52

Hypothesis GenerationFew possible methods to produce hypotheses

• Discovery as abduction

P1,P2H1,H2 → P1H2,H3 → P2=⇒ H2 is a possiblehypothesis

Abductive logicprogramming systems

ACLP, A-system, ABDUAL,ProLogICA

• Anomalies search• Computational analogies

Page 23: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

23/52

Approaches to hypothesis evaluation

• Logic-based approach• Frequentist approach

• relative frequencies of events• fixed unknown parameters

• Bayesian approach• degree of subject belief• inferences by producing probability distribution

• Parameter estimation

Page 24: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

24/52

Logic-based hypothesis

• Clean deduction exists in physics, very rare in biology• The premises logically entail the conclusion, where the

entailment means that the truth of the premises provides aguarantee of the truth of the conclusion

• Inductive logic:• Criterion of Adequacy (CoA): as evidence accumulates, the

degree to which the collection of true evidence statementscomes to support a hypothesis

• Combinations with Bayesian approach

Page 25: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

25/52

Statistical testing of hypothesisTesting the effect of a drug

100 rats are injected a drug to test its effect on the responsetime. Not injected response time – 1.2 s. The mean of the 100injected rats’ response time is 1.05 s. with a sample standarddeviation of 0.5 s. Does the drug have the effect on responsetime?

H0 : Drug has no effect =⇒ µ = 1.2s.H1 : Drug has an effect =⇒ µ 6= 1.2s.Assume H0:approximate σx = σ/

√100 = 0.05

1.21.05 1.1 1.15x

y

How many σ away:z = (1.2− 1.05)/0.05 = 3 =⇒ pvalue = 0.003

Page 26: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

26/52

Bayesian approach to hypothesisevaluation

0.5 0.5

1

0.4 0.6

2

0.6

0.88 0.12

3

0.9

Bayes formula is used tocompute posteriorprobabilities:P(h|D) = P(D|h)P(h)

P(D)

Page 27: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

27/52

Outline1 Motivation

2 Hypotheses-Oriented Approach RevisedHypothesis GenerationHypothesis EvaluationAlgorithmic Generation and Evaluation of HypothesesBayesian Motivation for Discovery

3 Facilities for Hypothesis-Driven Experiment SupportConceptualization of Scientific ExperimentScientific Hypothesis FormalizationHypotheses as Data in Probabilistic Databases

4 Examples of Hypothesis-Driven Scientific ResearchBesançon Galaxy ModelAnalysis of Connectome Based on Network DataClimate in AustraliaFinancial Markets

5 Summary

Page 28: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

28/52

Conceptualization of ScientificExperiment

It becomes paramount to offer scientists mechanisms tomanage the variety of knowledge produced during suchinvestigations.

Page 29: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Sc Hypothesis

Ph_Process

PhenomenonPhenomenon

physical quantities

Space-Time

Dimension

1..1 1..*

11..*

explains

0..*1

1..1

0..*

elements

Domain

ontology URL

Scientific

Hypothesis Model

Phenomenon

quantities

0..*1..1

Sc Hypothesis

Conceptual Modelis_composed_of

1..1

0..*

represents

Formal

Language

Formal

Representation

Discrete Ph

Process

Continuous Ph

Process1..1

0..* 1..1 1..1

modeled_by

1..1

1..1

represented_as

represented_as

0..* 0..*

Mathematical

Formulae

Mathematical

Formulae XML

constant

function

equation

1..*

1..*

1..*

represented_with

1..1 Discrete

Phenomenon

simulation

1..1

0..*

Event Mesh

Computational

Model View

Mesh Data

View

modeled_as

1..*

1..11..1

1..1

refers_to

0..1

0..1

Observational

element

Simulated

element

compared_with0..*0..1

State

0..1

0..*

Data View

(query over

Data View

modeled_as

1..1

0..*

1..1

Page 30: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

30/52

Scientific Hypothesis FormalizationDiversity of the components of scientific hypothesis model

• Borrowed from applications in NeuroScience and in humancardiovascular system in Computational Hemodynamics

• Provided• by a mathematical model;• a set of differential equations for continuous processes;• quantifying the variations of physical quantities in

continuous space-time;• by the mathematical solver (HEMOLAB) for discrete

processes.

• Mathematical equations in MathML (enabling modelsinterchange and reuse);

Page 31: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

31/52

Scientific Hypothesis FormalizationPossible problems and extensions

• Mostly monotonic and not suitable for incompleteknowledge representation

• Another framework implemented by extendingBioSigNet-RR:

1 support elaboration tolerant representation andnon-monotonic reasoning;

2 seamless integration of hypothesis formation withknowledge representation and reasoning;

3 use of various resources of biological data as well ashuman expertise to intelligently generate hypotheses;

4 support for ranking hypotheses and for designingexperiments to verify hypotheses.

• Prototype of an intelligent research assistant of molecularbiologists.

Page 32: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

32/52

Hypotheses as Uncertain Data inProbabilistic Databasesγ –DB approach

• Set of MathML equations is used to build databaserelation schema and functional dependencies

• Database is filled with both simulated data and observeddata

• It enables to maintain several hypotheses explaining somephenomena

• and provides evaluation mechanism based on Bayesianapproach to rank them.

Page 33: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

33/52

Outline1 Motivation

2 Hypotheses-Oriented Approach RevisedHypothesis GenerationHypothesis EvaluationAlgorithmic Generation and Evaluation of HypothesesBayesian Motivation for Discovery

3 Facilities for Hypothesis-Driven Experiment SupportConceptualization of Scientific ExperimentScientific Hypothesis FormalizationHypotheses as Data in Probabilistic Databases

4 Examples of Hypothesis-Driven Scientific ResearchBesançon Galaxy ModelAnalysis of Connectome Based on Network DataClimate in AustraliaFinancial Markets

5 Summary

Page 34: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

34/52

Besançon Galaxy ModelMultiparameter Hypotheses Examples

DefinitionBesançon Galaxy Model is a model of stellar populationsynthesis of the Galaxy, which allows to test hypotheses on thestar formation history, star evolution, and chemical anddynamical evolution of the Galaxy

• BGM is based on a set of different interconnectedhypotheses

• Some of the hypotheses are passed as free inputparameters

• When a new data arrives (e.g. Tycho-2) a comparison isdone between simulations from the model with data as afirst test to verify and constrain the underlying hypotheses.

Page 35: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

35/52

Besançon Galaxy ModelExample of underlying hypotheses

SFRStar formation rate history: How many stars are formed at agiven time

IMFInitial Mass Function: How many stars of a given mass

More on BGM hypotheses

Page 36: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Galaxy Model (Besançon): from data to Galactic formation and evolution

Sky survey data: Color-magnitude diagramof observed stars in a given direction

Problem: how to

extract informations? Data on distances are not available.

thin disc stars

thick disc stars

halo

Galaxies

Page 37: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

1

Star formation rate history : How many stars are formed at a given time

Initial Mass Function (IMF) : How many stars of a given mass

Stellar models (evolutionary tracks): How stars evolve with time

Stellar atmosphere models : How the physical state of the star atmosphere

are observed (getting magnitudes and colors)

Stellar density distributions: How the star density changes across the sky

Chemical evolution : How the chemical abundances of the stars change with

their birth time

Dynamical consistency: How to include dynamical constraints (based on

gravity, conservation of energy, conservation of mass and of angular momentum)

GALAXY MODEL: Ingredients

(basic complex multiparameter hypotheses)

Page 38: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

38/52

Analysis of Connectome Based onNetwork Data

DefinitionConnectome is a comprehensive map of neural connections inthe human brain. Functional connectome denotes thecollective set of functional connections in the human brain (its"wiring diagram").

• Uses fMRI, where associations are thought to representfunctional connectivity, in the sense that the two regions ofthe brain participate together in the achievement of somehigher-order function

• "Is there a difference between the networks of these twogroups of subjects?"

• Projects:• 1000 Functional Connectomes Project (FCP)• Human Connectome Project (HCP) - build a network map of

the human brain in healthy, living adults.

Page 39: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

39/52

Connectomic challenges

• On the microscale , the raw data for a the size of thehuman brain would require a zettabyte (a trillion gigabytes)of memory, currently beyond the storage capacity of anycomputer

• Connectome data sets obtained at the macroscale withnoninvasive imaging technology are manageable (has apetabyte scale).

• The network theory and respective data model shouldcapture connectivity at any scale and thus naturallyencompass multiscale architectures and nestedconnectivity patterns.

• Regarding the human mind, defining a classificationscheme or ontology of mental processes has remained anunfulfilled challenge.Once a mapping between mental states and neuralresponses has been established, it should be possible toinfer mental states from neural observations.

Page 40: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

40/52

Multiscale Framework

• Virtual Brain and related modeling efforts explicitly build ona multiscale theoretical framework

• Multiscale approaches embrace models for dimensionreduction where processes at smaller scales become partof compact descriptions of regularities at larger scales

Page 41: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

The Human Connectome Project: An NIH-funded effort to chart a comprehensive map of

neuronal connections and its variability in healthy adults (on the macro-scale)

Macro-connectome(whole-brain, long-distance)

Micro-connectome(synapses, neurons)

Page 42: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Study a large population:

1,200 healthy adults

300 twin pairs and their non-twin siblings

Cutting-edge neuroimaging methods

3T Skyra MRI, customized gradient (UMinn -> Wash U)

7T MRI (UMinn, 200 subjects); perhaps also 10.5T

dMRI/tractography; R-fMRI; Task-fMRI

MEG/EEG (100 subjects)

Extensive behavioral testing

Blood samples for genotyping

Project Goals I

Page 43: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

43/52

Climate in AustraliaAnother view on hypothesis representation

• As data gets continuously aggregated• Hypotheses should be represented as programs, that are

executed repeatedly as new data arrives• This method is tested by examining hypothesis about

temperature trends in Australia during the 20th century

Hypothesis

Temperature series is not stationary and is integrated oforder 1 (I(1))

More on climate hypothesis

Page 44: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

44/52

Climate in AustraliaData and tools

Data sources:1 The National Oceanographic and Atmospheric

Administration marine and weather information2 Australian Bureau of Meteorology dataset

Statistical tests:• Phillips-Perron test• Kwiatkowski-Phillips-Schmidt-Shin (KPSS)

Tools:• R SPARQL, tseries packages• agINFRA for scientific workflows for natural sciences

Support for hypothesis

Authors received further evidence on different independentdataset that time series is integrated of order 1.

Page 45: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

45/52

Efficient Market HypothesisTwo approaches

DefinitionWeak form of Efficient Market Hypothesis• The weak form of the weak form of efficient markets

hypothesis (EMH) states that prices on traded assets (e.g.,stocks, bonds, or property) already reflect all past publiclyavailable information.

• One of possible formulations of the efficient markethypothesis used for weak form tests is that share pricesfollow a random walk.

Two approaches to evaluate this hypothesis are suggested:

1 by analyzing stock market movements for severalcountries’ indices in selected period

2 by investigating how public sentiment(daily Twitter posts)predicts the stock market

Page 46: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

46/52

Efficient Market HypothesisClassical approach

Page 47: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

47/52

Efficient Market HypothesisClassical approach

1 Collected daily closing prices from the six European stockmarkets (France, Germany and UK, Greece, Portugal andSpain) during the period between 1993 and 2007

2 Stated null hypothesis (successive prices changes areindependent (random walk)) and alternative hypothesis(they are dependent)

3 Applied• serial correlation test,• runs test• augmented Dickey-Fuller test• multiple variance ratio test

EMH is supported

The null hypothesis should not be rejected for all six markets.EMH is supported

Page 48: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

48/52

Efficient Market HypothesisSentiment approach

Page 49: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

49/52

Efficient Market HypothesisSentiment approach

1 Build public mood time series by sentiment analysis oftweets from February 28, 2008 to December 19, 2008

2 Null hypothesis states that the mood time series do notpredict DJIA values

3 Granger causality analysis in which Dow Jones values andmood time series are correlated is used to test the nullhypothesis

EMH is not supportedResults reject the null hypothesis and claim that public opinionis predictive of changes in DJIA closing values.

Page 50: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

50/52

Outline1 Motivation

2 Hypotheses-Oriented Approach RevisedHypothesis GenerationHypothesis EvaluationAlgorithmic Generation and Evaluation of HypothesesBayesian Motivation for Discovery

3 Facilities for Hypothesis-Driven Experiment SupportConceptualization of Scientific ExperimentScientific Hypothesis FormalizationHypotheses as Data in Probabilistic Databases

4 Examples of Hypothesis-Driven Scientific ResearchBesançon Galaxy ModelAnalysis of Connectome Based on Network DataClimate in AustraliaFinancial Markets

5 Summary

Page 51: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

51/52

Summary

• Role of hypotheses in DIR is emphasized• Basic concepts defining the role of hypotheses in the

formation of scientific knowledge and organization of thescientific experiments are presented

• Basic approaches for hypothesis formulation applyinglogical reasoning, various methods for hypothesismodeling and testing (including classical statistics,Bayesian hypothesis and parameter estimation methods,hypothetico-deductive approaches) are presented

Page 52: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Motivation

Hypotheses-OrientedApproach Revised

Facilities forHypothesis-DrivenExperiment Support

Examples ofHypothesis-DrivenScientific Research

Summary

52/52

Summary

• Facilities are aimed at the conceptualization of scientificexperiment, hypothesis formulation and browsing invarious domains (including biology, biomedicalinvestigations, neuromedicine, astronomy), automaticorganization of hypothesis-driven experiments

• Examples of scientific researches applying hypothesesinclude modeling of population and structure synthesis ofthe Galaxy, connectome-related hypothesis testing,studying of temperature trends in Australia, analysis ofstock markets applying the EMH

Page 53: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Appendix

1/2

Examples of BGM Hypotheses

Return

Page 54: Introduction into Analysis of Methods and Tools for Hypothesis-Driven …synthesis.ipi.ac.ru/sigmod/seminar/2014.11.27 hypotheses... · 2018-10-18 · Facilities for Hypothesis-Driven

Tools and Methodsfor HD Experiment

Support

D. Kovalev

Appendix

2/2

Definitions for Climate Hypothesis

DefinitionNon-stationarity means that the level of the time series is notstable in time and can show increasing and decreasing trends

DefinitionI(1) means that by differentiating the stochastic process astationary process (main statistical properties of the seriesremain unchanged) is obtained

Return