hydrological modeling in the era of big data ... · hydrological modeling in the era of big data:...
TRANSCRIPT
Hydrological Modeling in the Era of Big
Data: Opportunities and Challenges
Big Data Challenges for Predictive Modeling of Complex Systems, Hong Kong, Nov. 2018
Yi Zheng * and Shijie Jiang
School of Environmental Science and EngineeringSouthern University of Science and Technology
Global water issues
(Vörösmarty et al., Nature, 2010, 467: 555-561)(Oki and Kanae, Science, 2006, 313: 1068-1072)
Water shortage Water pollution
Flood risk
(Alfieri et al., Earth’s Future, 2017, 5: 171-182)
Water-related ecological disasters
Disappearing inland lakes (Aral Sea)
Vegetation degradation
Algae bloom
Global water issues
东深供水工程 东部供水水源工程
Old diversion project
New diversion project
Maozhou River(茅洲河)
Subway station flooded
Red tide
In Shenzhen, we have all…
Hydrological cycle
Source: Meteorological Office, United Kingdom (https://www.metoffice.gov.uk/)
Water, energy and materials
Physically based modelingFreeze & Harlan (1969), in a classical paper, provide a blueprint for physically based hydrological modeling
Continuity between groundwater flow and unsaturated flow Coupling with surface water Role of vegetation Influence of meteorological phenomena
Water resources
Ecosystem
Physically based modeling
(By courtesy of Stefan Kollet)
Typical hydro. models (water balance only)
Climate model (energy balance)New hydro.
models (water & energy balances)
Physically based modeling
(By courtesy of Stefan Kollet)
Typical hydro. models
New hydro. models
Water and energy equations (an example)
From the subsurface into the atmosphere!
Model applications
Hydropower production
Soil and water conservation
Agricultural development
Watershed restoration
Flood control
Drinking water supply
Water quality protection
Receive wide applications!
(Pics are from the internet)
Big Earth DataFar space satellites
Gro
und
leve
lSp
ace
leve
l
Geosynchronous meteorological satellites Near space satellitesPolar orbiting meteorological satellites
Airborne sensorsAircraft weather radar Weather balloon Airborne lidar
Radar-basedX-band S-band
Ground level instrumentsWeather station Solar radiation Soil moisture Groundwater
Ground penetrating radar
Drone
Water level radar
Big Earth DataMultisource (variety) Plain text Point data Polyline data Polygon data Raster data …
Large volume
High-frequency measurements of certain variables (velocity)
Water level
MeteorologyWater quality
GPS data
Arguably big data!
NASA’s Water Cycle Missions Hydrological variable Missions/instruments
Spatial resolution
(km)
Temporal resolution
(days)
Launch year
Rainfall GPM 5 0.125 2014Snowfall GPM 5 0.125 2014
EvaporationTerra/MODIS
0.5 11999
Aqua/MODIS 2002Suomi/VIIRS 2013
Runoff SWOT 0.1 11 2021
Snow coverTerra/MODIS
0.5 11999
Aqua/MODIS 2002Suomi/VIIRS 2013
Surface soil moisture
SMOS 36 3 2009
SMAP (radiometer) 36 3 2015
ASCAT 25 1 2006GCOM-W/AMSR2 50 1 2012
Deep soil moisture Biomass 0.2 18 days/yr 2021
Surface water elevation
Jason-3 10 10 2016SARAL 10 35 2013SWOT 0.1 11 2021ICESat-2 1.5 90 2018
Terrestrial water storage change GRACE 220 30 2002
Vegetation/land cover/irrigated area
Terra/MODIS0.5 1
1999Aqua/MODIS 2002Suomi/VIIRS 2013Landsat 8
0.03 162013
Landsat 9 2023
Vegetation stress ISS/ECOSTRESS 0.07 4 2018
Water vapour Aqua/AIRS 13.5 1 2002
Source: The future of Earth observation in hydrology, Hydrol. Earth Syst. Sci., 21, 3879-3914
Big data for hydrology
Data-driven modelingData-driven hydrological modeling using machine learning techniques has been practiced for nearly two decades.
Tech
niqu
es
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Fuzzy Inference System Genetic Programming Regression Tree Gaussian Process
Regression …
Appl
icat
ions Rainfall-runoff modelling
Sediment yield forecasting Evaporation forecasting Lake and reservoir water
level prediction Drought forecasting …
Rainfall Catchment Runoff
Input Black box model Output
Ad hoc applications
Application
Model
Data
Some challenges
‘Big’ in the air, but ‘small’ at the ground.Direct measurements are still limited.
Physically based models involve significant uncertainty, due to data gap and model structure
Data-driven models have higher accuracy in many cases, but with low transferability
Physically based models can offer deep insights, but are computationally expensive
Data-driven models are cheap to run, but can only produce case-dependent results
Model
Data
Research opportunitiesKey questions
What kind of roles could opportunistic sensing and crowdsourcing play in hydrological modeling?
What kind of roles could big earth data and deep learning play in hydrological modeling?
How to merge the strengths of both physically based and data-driven models?
Application
A high-resolution, low-cost, ground-level precipitation monitoring network has yet to be developed.
Rain gauge Disdrometer Radar Satellite
Principle straightforward microphysics reflectivity radiation
Coverage point point area area
Measuredheight 1 m 1 m 5 - 20 km 500 km
Case 1: Camera-based Rain Gauge
Can existing closed-circuit television (CCTV) surveillance cameras be used for opportunistic sensing?
Rainfall intensity estimation
Captured videos
Rain streaks identificationSpatial sparsity
Vertical smoothness
Horizontal discontinuity
Temporal independence
Focused?
Identified rain streaks (background removed)
Depth of field (DoF)
Camera settings
Rain streaks in DoF(without blur effect)
Remove
Calculate actual size of each streak
Calculate image depth of each streak
Estimate raindrop size distribution
Instantaneous rainfall intensity
Yes
No
Exploit the visual and temporal
properties
Part 1: Rain streaks identification Separate rain streaks from the rain-free background
Case 1: Camera-based Rain GaugeA new approach to measuring rainfall intensity in real-world conditions based on videos acquired by ordinary surveillance cameras.
Framework
Part 2: Rainfall intensity estimationEstimate rainfall intensity from the identified rain streaks
(Jiang et al., Advancing opportunistic hydrology sensing: a novel approach to measuring rainfall with ordinary surveillance cameras, 2018, submitted to Water Resources Research)
Algorithm 1: Rain-streak identificationNot only based on the difference between consecutive frames, but also the visual properties of each frame.
Case 1: Camera-based Rain Gauge
min𝒰𝒰,𝒱𝒱,𝒲𝒲,𝒳𝒳,ℛ
𝜆𝜆1‖𝒰𝒰‖1 + 𝜆𝜆2‖𝒱𝒱‖1 + 𝜆𝜆3‖𝒲𝒲‖1 + 𝜆𝜆4‖𝒳𝒳‖1
s. t.𝒰𝒰 = ℛ,𝒱𝒱 = 𝛻𝛻𝑦𝑦ℛ,𝒲𝒲 = 𝛻𝛻𝑥𝑥(𝒪𝒪 − ℛ),𝒳𝒳 = 𝛻𝛻𝑡𝑡(𝒪𝒪 − ℛ),
ℛ
ℬ
𝒪𝒪Rain-containing image
Rain-free background
Rain-streak layer
Hist. of grayscale Image layer
Grayscale in X and Y direct.
Hist. of differencein grayscale
𝒪𝒪 = ℬ + ℛ A convex optimization problem
Sparsity of 𝓡𝓡
Temporal stability of ℬ
Vertical total variation of ℛ
Horizontal total variation of ℬ
Decomposition:
Alternating direction method of multipliers (ADMM) used to solve the problem
Algorithm 1: Rain-streak identification
Case 1: Camera-based Rain Gauge
Parameter sensitivityParameter optimization
Convergence speed
Impact of wind
Comprehensive tests performed
Algorithm 2: Rainfall intensity estimationBased on geometrical optics and microphysical characteristics of raindrops.
Case 1: Camera-based Rain Gauge
An ordinary surveillance camera is installed on SUSTechcampus
Three different scenes were shot
Five rainfall events with varying intensities were recorded in 2018 summer
One frame per second The instantaneous
rainfall intensity was estimated every 30seconds
Case 1: Camera-based Rain Gauge
Lower cost, more realistic scenes, and higher estimation accuracy
Potential of using existing CCTV networks for the opportunistic hydrology sensing
Case 1: Camera-based Rain GaugeValidation
Comparison with previous approaches
We use computer vision (CV) to prepare model inputs for data-driven models, which represents a new framework to assimilate big earth data into data-driven hydrological models.
Case 2: Flow modeling using CV
Input data:• Precipitation• Temperature• Leaf area indexTarget:• Streamflow
On the northern margin of the Qinghai-Tibetan Plateau
(Jiang et al., A computer vision-based approach to fusing spatiotemporal data for hydrological modeling, Journal of Hydrology, 2018, 25-40)
Traditional strategies cannot sufficiently account for the spatial heterogeneity of data fields.
Spatial information is not fully exploited
Curse of dimensionality
Spatial information is not exploited at all
Spatial information is not fully exploited
Use data at selected points
Use data information at every spatial unit (e.g., grid cell)
Average over entire watershed
Average over individual sub-basins
Case 2: Flow modeling using CV
Key idea: Use the CV technique to transform a data field into an one-dimensional feature vector. The feature vector is then used as the model input.
Intensity Texture
PurposeRepresenting pixel values
Representing spatial variability
Feature descriptor Brightness Local binary
pattern (LBP)
Feature vector
Average brightness values of a series of sub-images
Histograms of LBP values of a series of sub-images
Case 2: Flow modeling using CV
Two features in this case
An ANN model
Input data fields as videos
1-D feature vector
Model output
The prediction well matches the observation.
Case 2: Flow modeling using CV
Short-term forecasting (compared against existing data-driven models)
Long-term simulation (compared against existing process-based models)
More stable performance for varying prediction lengths
Case 2: Flow modeling using CV
The CV model is much easier to build, but achieves a comparable or even better performance.
Transfer learning for ungauged basins Roles of different predictors
The CV model outperforms traditional data-driven models in transfer learning, especially for flow peaks.
The flexibility in fusing multisource spatiotemporal data enables a systematic analysis of the roles of different predictors.
Case 2: Flow modeling using CV
Traditional approach
CV approach
Case 3: Surrogate-based optimization Analyses requiring numerous model evaluations, such as optimization analysis, uncertainty analysis, etc.
Computationally expensive model
Surrogate model:
1) To fully or partially replace the original complex model during the analysis
2) Computationally much cheaper than the original model
3) Usually in form of a response surface, such as support vector machine (SVM), radial basis function (RBF), etc.
Difficulty to apply!Surrogate modeling (meta-modeling)
approaches
Case 3: Surrogate-based optimization
饱和带
河流
水文响应单元
坡面流
地下差分网格
抽水井
图 例
包气带
地表-地下水交互
GSFLOW: a complex integrated surface water-groundwater modelZhangye Basin (张掖盆地): ~9,106 km2, where intensive agricultural irrigation competes with ecosystems for precious water resources
Surrogate-based Optimization for Integrated surface water-groundwater Modeling (SOIM), which couples SVM and SCE-UA.
Framework of SOIM
(Wu B, Zheng Y*, et al., Water Resources Research, 2015)(Wu, et al. Optimizing water resources management in large river basins with integrated surface water-groundwater modeling: a surrogate-based approach, Water Resources Research, 2015, 2153-2173.)
Case 3: Surrogate-based optimization
Obtain 100-200 Initial samples
Add some additional samples
e.g., 200 initial samples, converges with less than 300 samples
SVM surrogates can adequately replace the original GSFLOW model in many aspects (i.e., for many output variables).
(Wu B, Zheng Y*, et al., Water Resources Research, 2015)
Case 3: Surrogate-based optimization
The strengths of physically based and data-driven models are merged here!
Spatially optimize the conjunctive use of river flow and groundwater for irrigation in the 18 irrigation districts
A number of optimization schemes were addressed, considering different hydrological conditions, management concerns and constraints.
Case 3: Surrogate-based optimization
Only hundreds of GSFLOW runs are necessary, and the total computing cost is reduced from years to days (on computer clusters).
Case 3: Surrogate-based optimization
An example of the optimization results
Original pumping pattern
Suggested pumping pattern
Suggested changes
This study presents a good case in which the strengths of both physically based and data-driven models are nicely merged in a real-world application.
A. The recent progress of big data and AI open a new door for the hydrology community which has been relatively conservative in the past.
B. Hydrologists have much to learn from data scientists.C. Opportunistic sensing can help solve the problem of
“small at the ground”, and deserves further studies.D. Computer vision can bridge the gap between big
earth data and hydrological modeling.E. Physically based models and data-driven models
should be friends, not enemies, and co-evolve.F. The existing data are still not big enough to support
deep learning in classic hydrological modeling.G. How to make use of big data from other domains?