Download - Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

A General Approach for Assimilation of Multi‐scale, Multi‐type Data

Yoram RubinUC Berkeley

Outline

• Perspective on data assimilation: Spatial variability in geologic media

• The Method of Anchored Distributions for data assimilation (principles, application, computational tools)

• Forward looking:– Measurement Theory (what to measure, where to measure, how to maximize information yield)

– Open‐source community tool for data assimilation.

The Media

• Geological media (soil, rock) are complex and spatially‐variable

• Larger observation scales reveal additional length scales of variability (multiple length‐scales)

• Characterization (mapping of soil properties) is subject to large uncertainty due to scarcity of data and spatial variability

Scale‐dependent variability

Small scale

Large Scale

Type-A and Type-B data

Hubbard and Rubin, 2005

REL

ATI

VE S

CA

LES

OF

INVE

STIG

ATI

ON

Labo

rato

ry o

r

Loc

al

R

egio

nal

Poi

nt

~10-

3to

10

~

10-1

to 1

02

~ 1

01 to

105

(m)

~10-4 to 1 ~10-1 to 10 ~1 to 102 (m)

High Moderate Low

RELATIVE RESOLUTION

Core/Tank Measurements

SurfaceGeophysics

Airborne/Satellite

Acquisition approaches near this end of the chart providehigh resolution information over small spatial extents

Acquisition approaches near this end of the chart providelow resolution information over large spatial extents

Wellbore Logging

Crosshole measurements and well tests

Example from ocean circulation

Another example is the mapping of ocean circulation,which relies on a variety of data types (e.g., temperature,density, velocity vector components) obtained from shipsurveys, moored instruments, buoys drifting freely on orfloating below the ocean surface, and satellites.

These data are measured over a wide range of scales and frequencies, and they need to be assimilated to yield accurate circulation models.

What are the challenges?

• Multi‐type, multi‐scale data assimilation:– Multiple perspectives are expected to provide a more coherent 3D image of the subsurface…

– But data are collected at different scales, and many types of data are only weakly‐ or indirectly‐related with hydrogeologic parameters so how to glean information for conditioning the target variable(s)?

What is MAD?

MAD (Method of Anchored Distributions) is a stochastic inverse modeling approach that:– can be used to assimilate data from multiple sources

– is not constrained by modeling choices or specific data types

– Reduces the computational effort using sparse parameterization

Rubin, Y., X. Chen, H. Murakami, and M. Hahn, A. 2010, Water Resources Research

MAD principles

• Data classification– Type‐A data: Direct or local– Type‐B data: Indirect and non‐local

• Localization: A strategy for unifying of multiple data using anchors.

• Projection: A geostatistical model is used for modeling global trends and for projecting data from measurements onto un‐sampled locations.

Systematic classification of data

Spatial variable of interest: Y(x):

Entire field: Type‐A: data give point values of Y either directly or through

models:

Type‐B: data that is function of the field:

Y~

Anchors are the carriers of information relevant to Y

• Anchors intend to capture the relevant information from the Type‐B data that is relevant for the target variables

• Anchors are statistical distributions of the target variables that could be assumed (priors) or inferred through a Bayesian data assimilation.

• The distributions represent measurement errors and the quality of the relationship between the data and the target variables.

• Anchors are placed at strategic locations (in terms of information yield)

Anchor

Type‐B

Type‐A

Anchor

Anchor

Inverse modeling with MAD

• The model is defined through a joint statistical distribution of vector that includes structural parameters and anchors – Global trends are captured via the geostatistical model– Local effects are captured by the anchors

The Architecture of MAD

Block 1Problem SetupDerive Prior

(Fixed)



(Fixed)

Forward Model Driver

(Portable)



(Fixed)


(Portable)Forward Model

(External Software)



(Fixed)

Block 2Likelihood Analysis

(Fixed)



(External Software)



(Fixed)

Block 2Likelihood Analysis

(Fixed)

Block 3Derive PosteriorDiagnostics(Fixed)



(External Software)

• There is a need for a general, easy to apply data‐assimilation computational tool that is modular, assumption free, and not linked with any particular modeling tool.

• Such a tool would put enormous expertize at the hands of environmental scientists, saving the need to re‐create data assimilation solutions with each project.

Why is the block structure important?

Example

Data Classification– Type A data includes observations that are a function of a point value

• Small‐scale pump‐tests (EBF)• Core samples

– Type B data includes observations that are non‐local including:

• Pumping tests• Tracer tests• Geophysical Data

-Locations where test conditionsresulted in non-representative EBF profiles

Normalized Hydraulic Conductivity Profile, Well 399-2-8

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

0.23

Normalized Hydraulic ConductivityProfile, Well 399-2-16

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic Conductivity Profile, Well 399-2-13

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)


30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)


30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

0.25


30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)


30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)


30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)


30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)


30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Point values

Point values + anchors

unknown true field (transect)estimated mean field

measured point valuesanchors

Anchors capture local features and improve prediction

2-7 2-10

Summary

• Anchored distributions (anchors) are statistical localization devices that represent the target variables at the smallest scale, conditioned on multi‐scale data.

• Anchors can be used for assimilating multiple types of data and migrating information across scales.

• Flexible data classification makes it applcable in multiple disciplines.

What’s on the horizon:

Measurement Theory

• Theory that addresses the following questions:– What data to collect? Where? How many? What Frequency?

– How to answer these questions when addressing different goals?

– How to maximize the information yield from measurements?

What’s the problem?

• Current approaches for site characterization (and monitoring) are “need to know everything” approaches (suitable for research sites) whereas in many cases characterization should be “goal‐oriented” and should be viewed in the context of the application.

• In applications, we must deal with budgetary constraints, and so we need to make choices.

• A rational framework is needed for sound planning and prioritizing.

The First concept: Looking beyond hydrogeology

Tools are needed for comparing the contributions of hydrogeological and non‐hydrogeological data for the overall benefit of the project: recognizing that there maybe contributors to uncertainty other than hydrogeology.

Such comparison can be addressed using Comparative Information Yield Curves (de Barros et al., WRR, 2009, 2010)

The second concept: Hypothesis‐driven Approach for Site Characterization

• Characterization is performed in support of an hypothesis that could either be accepted or rejected, for example:– Hypothesis: a water supply well is not in real danger of being

contaminated– Hypothesis: Contaminated site will enhance cancer risk in humans

due to exposure of some sort. • The challenge is to design a data acquisition strategy that

would lead to accepting or rejecting the hypothesis with an a‐priori defined confidence level: characterization is viewed as the means for achieving a goal (that is, confirming or rejecting an hypothesis), not as a goal onto itself…

• and the goal is to minimize the risk of making the wrong decisions (Nowak et al., WRR, 2011)

Accept H1when true

Accept H0when true

Venn diagram of decisions/events

Accept H1when not true


Type‐ error

Type‐ error

H0 – null hypothesisH1 – alternative hypothesis

Accept H1when true

Accept H0when true

Venn diagrams of decisions/events



Type‐ error

Type‐ error


Accept H1when true


Accept H0when true

H0 – null hypothesisH1 – alternative hypothesis

How to increase information yield?

• Where to place anchors (or other localization devices)? Where they would be most efficient in gleaning information from observations.

• Stochastic sensitivity analysis coupled with Monte‐Carlo simulations identifies promising locations for placing anchors (Yang et al., WRR, in press).

Strategic placement of localization devices (such as pilot‐points and anchors) in inverse modeling schemes (Yang et al., Water Resources Research, 2012, in press).

Bayesian MAD

Data Assimilation: CT Scan for the Earth

Data assimilation for subsurfaceInvestigations: multiple datasources (of different quality and resolution), including direct measurements and tomographic data, are used simultaneously, leading to a more coherent 3D image of soil properties.

Hydraulic Conductivity log

Download - Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Top Related