yoram rubin uc berkeley - ufz · 2021. 2. 13. · a general approach for assimilation of multi‐...

43
A General Approach for Assimilation of Multiscale, Multitype Data Yoram Rubin UC Berkeley

Upload: others

Post on 07-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

A General Approach for Assimilation of Multi‐scale, Multi‐type Data

Yoram RubinUC Berkeley 

Page 2: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Outline

• Perspective on data assimilation: Spatial variability in geologic media

• The Method of Anchored Distributions for data assimilation (principles, application, computational tools)

• Forward looking:– Measurement Theory (what to measure, where to measure, how to maximize information yield)

– Open‐source community tool for data assimilation.  

Page 3: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The Media

• Geological media (soil, rock) are complex and spatially‐variable

• Larger observation scales reveal additional length scales of variability (multiple length‐scales)

• Characterization (mapping of soil properties) is subject to large uncertainty due to scarcity of data and spatial variability

Page 4: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley
Page 5: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley
Page 6: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Scale‐dependent variability

Small scale

Large Scale

Page 7: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Type-A and Type-B data

Hubbard and Rubin, 2005

REL

ATI

VE S

CA

LES

OF

INVE

STIG

ATI

ON

Labo

rato

ry o

r

Loc

al

R

egio

nal

Poi

nt

~10-

3to

10

~

10-1

to 1

02

~ 1

01 to

105

(m)

~10-4 to 1 ~10-1 to 10 ~1 to 102 (m)

High Moderate Low

RELATIVE RESOLUTION

Core/Tank Measurements

SurfaceGeophysics

Airborne/Satellite

Acquisition approaches near this end of the chart providehigh resolution information over small spatial extents

Acquisition approaches near this end of the chart providelow resolution information over large spatial extents

Wellbore Logging

Crosshole measurements and well tests

Page 8: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Example from ocean circulation

Another example is the mapping of ocean circulation,which relies on a variety of data types (e.g., temperature,density, velocity vector components) obtained from shipsurveys, moored instruments, buoys drifting freely on orfloating below the ocean surface, and satellites. 

These data are measured over a wide range of scales and frequencies, and they need to be assimilated to yield accurate circulation models.

Page 9: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

What are the challenges?

• Multi‐type, multi‐scale data assimilation:– Multiple perspectives are expected to provide a more coherent 3D image of the subsurface…

– But data are collected at different scales, and many types of data are only weakly‐ or indirectly‐related with hydrogeologic parameters so how to glean information for conditioning the target variable(s)?

Page 10: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

What is MAD?

MAD (Method of Anchored Distributions) is a stochastic inverse modeling approach that:– can be used to assimilate data from multiple sources

– is not constrained by modeling choices or specific data types

– Reduces the computational effort using sparse parameterization

Rubin, Y., X. Chen, H. Murakami, and M. Hahn, A. 2010, Water Resources Research

Page 11: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

MAD principles

• Data classification– Type‐A data: Direct or local– Type‐B data: Indirect and non‐local

• Localization: A strategy for unifying of multiple data using anchors.

• Projection: A geostatistical model is used for modeling global trends and for projecting data from measurements onto un‐sampled locations. 

Page 12: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Systematic classification of data

Spatial variable of interest: Y(x):

Entire field: Type‐A: data give point values of Y either directly or through 

models: 

Type‐B: data that is function of the field:

Y~

Page 13: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Anchors are the carriers of information relevant to Y

• Anchors intend to capture the relevant information from the Type‐B data that is relevant for the target variables

• Anchors are statistical distributions of the target variables that could be assumed (priors) or inferred through a Bayesian data assimilation. 

• The distributions represent measurement errors and the quality of the relationship between the data and the target variables. 

• Anchors are placed at strategic locations (in terms of information yield)

Page 14: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Anchor

Type‐B

Type‐A

Anchor

Anchor

Page 15: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Inverse modeling with MAD

• The model is defined through a joint statistical distribution of vector that includes structural parameters and anchors – Global trends are captured via the geostatistical model– Local effects are captured by the anchors 

Page 16: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The Architecture of MAD

Block 1Problem SetupDerive Prior

(Fixed)

Page 17: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The Architecture of MAD

Block 1Problem SetupDerive Prior

(Fixed)

Forward Model Driver

(Portable)

Page 18: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The Architecture of MAD

Block 1Problem SetupDerive Prior

(Fixed)

Forward Model Driver

(Portable)Forward Model

(External Software)

Page 19: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The Architecture of MAD

Block 1Problem SetupDerive Prior

(Fixed)

Block 2Likelihood Analysis

(Fixed)

Forward Model Driver

(Portable)Forward Model

(External Software)

Page 20: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The Architecture of MAD

Block 1Problem SetupDerive Prior

(Fixed)

Block 2Likelihood Analysis

(Fixed)

Block 3Derive PosteriorDiagnostics(Fixed)

Forward Model Driver

(Portable)Forward Model

(External Software)

Page 21: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

• There is a need for a general, easy to apply data‐assimilation computational tool that is modular, assumption free, and not linked with any particular modeling tool.

• Such a tool would put enormous expertize at the hands of environmental scientists, saving the need to re‐create data assimilation solutions with each project. 

Why is the block structure important?

Page 22: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Example

Page 23: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Data Classification– Type A data includes observations that are a function of a point value

• Small‐scale pump‐tests (EBF)• Core samples

– Type B data includes observations that are non‐local including: 

• Pumping tests• Tracer tests• Geophysical Data

Page 24: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley
Page 25: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

-Locations where test conditionsresulted in non-representative EBF profiles

Normalized Hydraulic Conductivity Profile, Well 399-2-8

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

0.23

Normalized Hydraulic ConductivityProfile, Well 399-2-16

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic Conductivity Profile, Well 399-2-13

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic ConductivityProfile, Well 399-2-14

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic ConductivityProfile, Well 399-2-19

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

0.25

Normalized Hydraulic ConductivityProfile, Well 399-2-18

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic ConductivityProfile, Well 399-2-17

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic ConductivityProfile, Well 399-2-7

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic ConductivityProfile, Well 399-2-12

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Normalized Hydraulic ConductivityProfile, Well 399-2-15

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

600.0 0.1 0.2

Normalized Ki

Dep

th (f

t bgs

)

Page 26: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Point values

Point values + anchors

unknown true field (transect)estimated mean field

measured point valuesanchors

Anchors capture local features and improve prediction

Page 27: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

2-7 2-10

Page 28: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Summary

• Anchored distributions (anchors) are statistical localization devices that represent the target variables at the smallest scale, conditioned on multi‐scale data.  

• Anchors can be used for assimilating multiple types of data and migrating information across scales.

• Flexible data classification makes it applcable in multiple disciplines.  

Page 29: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

What’s on the horizon: 

Page 30: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Measurement Theory

• Theory that addresses the following questions:– What data to collect? Where? How many? What Frequency?

– How to answer these questions when addressing different goals?

– How to maximize the information yield from measurements?

Page 31: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

What’s the problem?

• Current approaches for site characterization (and monitoring) are “need to know everything” approaches (suitable for research sites) whereas in many cases characterization should be “goal‐oriented” and should be viewed in the context of the application. 

• In applications, we must deal with budgetary constraints, and so we need to make choices.

• A rational framework is needed for sound planning and prioritizing. 

Page 32: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The First concept: Looking beyond hydrogeology

Tools are needed for comparing the contributions of hydrogeological and non‐hydrogeological data for the overall benefit of the project: recognizing that there maybe contributors to uncertainty other than hydrogeology. 

Such comparison can be addressed using Comparative Information Yield Curves (de Barros et al., WRR, 2009, 2010)

Page 33: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

The second concept: Hypothesis‐driven Approach for Site Characterization

• Characterization is performed in support of an hypothesis that could either be accepted or rejected, for example:– Hypothesis: a water supply well is not in real danger of being 

contaminated– Hypothesis: Contaminated site will enhance cancer risk in humans 

due to exposure of some sort.  • The challenge is to design a data acquisition strategy that 

would lead to accepting or rejecting the hypothesis with an a‐priori defined confidence level: characterization is viewed as the means for achieving a goal (that is, confirming or rejecting an hypothesis), not as a goal onto itself…

• and the goal is to minimize the risk of making the wrong decisions (Nowak et al., WRR, 2011)

Page 34: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Accept H1when true

Accept H0when true

Venn diagram of decisions/events

Accept H1when not true

Accept H0when not true

Type‐ error

Type‐ error

H0 – null hypothesisH1 – alternative hypothesis

Page 35: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Accept H1when true

Accept H0when true

Venn diagrams of decisions/events

Accept H1when not true

Accept H0when not true

Type‐ error

Type‐ error

Accept H1when not true

Accept H1when true

Accept H0when not true

Accept H0when true

H0 – null hypothesisH1 – alternative hypothesis

Page 36: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley
Page 37: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

How to increase information yield? 

• Where to place anchors (or other localization devices)? Where they would be most efficient in gleaning information from observations. 

• Stochastic sensitivity analysis coupled with Monte‐Carlo simulations identifies promising locations for placing anchors (Yang et al., WRR, in press). 

Page 38: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Strategic placement of localization devices (such as pilot‐points and anchors) in inverse modeling schemes (Yang et al., Water Resources Research, 2012, in press).    

Page 39: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley
Page 40: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley
Page 41: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Bayesian MAD

Page 42: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Bayesian MAD

Page 43: Yoram Rubin UC Berkeley - UFZ · 2021. 2. 13. · A General Approach for Assimilation of Multi‐ scale, Multi‐type Data Yoram Rubin UC Berkeley

Data Assimilation: CT Scan for the Earth

Data assimilation for subsurfaceInvestigations: multiple datasources (of different quality and resolution), including direct measurements and tomographic data, are used simultaneously, leading to a more coherent 3D image of soil properties.

Hydraulic Conductivity log