regional workshop on development of radon maps and the ... · late jurassic late permian middle...
TRANSCRIPT
Regional Workshop on Development of Radon Maps and
the Definition of Radon-Prone Areas
Javier Elio ([email protected])
Vilnius, 9 - 11 July 2019
Software: R (and QGIS)
Reasons for Geoscientists and Engineers to Learn Coding (Michael Pyrcz, University of Texas at Austin - @GeostatGuy)
Caveats:1. Any type of coding, scripting, workflow matched to your working environment is great. We don’t need to be C++ experts (me: actually I do know nothing
about C++!!!!)2. We need experience component of geoscience and engineering expertise. This is beyond coding and is essential to workflow logic development, best use
of data, etc.3. Some expert judgement will remain subjective and not completely reproducible. I’m not advocating for the geoscientist or engineer being replaced by a
computer.
Transparency – no compiler accepts hand waiving! Coding forces your logic to be uncovered for any other scientific or engineer to review.
Reproducibility – run it, get an answer, hand it over, run it, get the same answer. This is a main principle of the scientific method.
Quantification – programs need numbers. Feed the program and discover new ways to look at the world.
Open-source – leverage a world of brilliance. Check out packages, snippets and the amazed with what great minds have freely shared.
Break Down Barriers – don’t throw it over the fence. Sit at the table with the developers and share more of your subject matter expertise for a better product.
Deployment – share it with others and multiply the impact. Performance metrics or altruism, your good work benefits many others.
Efficiency – minimize the boring parts of the job. Build a suite script for automation of common task and spend more time doing science and engineering!
Always Time to Do it Again! – how many times did you only do it once? It probably takes 2-4 times as long to script and automate a workflow. Usually worth it.
Be Like Us – it will change you. Users feel limited, programmers truly harness the power of their application and hardware.
R software
R-CRAN: https://cran.r-project.org/
R Studio
R Studio: https://www.rstudio.com/
R Studio
Introduction
https://r4ds.had.co.nz/https://bookdown.org/rdpeng/rprogdatascience/
Index
1. Exploratory analysis
� Outliers, histograms, Q-Q plots, etc.
� Non-detected values
� ANOVA analysis
� Spatial distribution of indoor radon measurements (2D Kernel density plots)
2. Summary statistics by small areas (e.g. grids, municipalities, …)
� Map summary statistics (e.g. N, AM)
� Map probability of having an indoor radon concentration higher than a reference level
3. Interpolation techniques
� Inverse distant weighted (IDW)
� Ordinary kriging (OK)
4. Dose maps
5. Interactive maps
Index
1. Exploratory analysis
� Outliers, histograms, Q-Q plots, etc.
� Non-detected values
� ANOVA analysis
� Spatial distribution of indoor radon measurements (2D Kernel density plots)
2. Summary statistics by small areas (e.g. grids, municipalities, …)
� Map summary statistics (e.g. N, AM)
� Map probability of having an indoor radon concentration higher than a reference level
3. Interpolation techniques
� Inverse distant weighted (IDW)
� Ordinary kriging (OK)
4. Dose maps
5. Interactive maps
Lognormal distribution?
�� − 1� � ≠ 0Log(X) � = 0Box-Cox transformation
Histogram Indoor Radon
Rn [Bqm−3
]
Den
sity
0 500 1000 1500
0.0
00
0.0
04
0.0
08
0.0
12
0 200 400 600 800 1200
-3-2
-10
12
3
Q-Q plot (InRn)
InRn$Rn
no
rm q
uantile
s
-1.0 -0.5 0.0 0.5 1.0
-52
00
-48
00
-440
0-4
00
0
λ
log-L
ike
lihoo
d
95%
Box-Cox Transformation
2 3 4 5 6 7
-3-2
-10
12
3
Q-Q plot (log InRn)
InRn$LogRn
no
rm q
uantile
s
How to treat non-detected values?
What can we do with values bellow the DL?
1. Assign a value equal to the DL, however it produces an overestimation of the mean value.
2. Report as “zero”. Not realistic for indoor radon, and probably underestimate the mean.
3. Report as half of the DL (or other fraction). It may be adequate if the percentage of non-detected is low
(< 10-15%). However, the variance of the data is altered which may lead to inaccurate results.
4. Assign a concentration based on a statistical estimation, effective for dataset with high proportion of
detects (> 50%).
5. If the percentage of non-detected is large (> 50%) we may only consider if radon was detected (or not)
above some level
Detected AND quantifiedNot detected Detected NOT quantified
Zero
No element
Detection
limit
Quantification
limit
Saturation of the detector
Upper
limit
Imputation methods
The final method depends on multiple factors (e.g. the number of data, the proportion of non-detected, objectives of the
survey). Here, I present a method for replacing non-detected (i.e. ROS: Regression on Order Statistics) but if you think
censored data were an issue in the data analysis you would need to consult an statistician.
N = 1000
AM = 64.76
SD = 133.75
GM = 24.67
GSD = 3.77
Prob[Rn > 200] = 5.73%
N = 1000
AM = 64.73
SD = 133.77
GM = 23.18
GSD = 4.28
Prob[Rn > 200] = 6.91%
2 3 4 5 6 7
-3-2
-10
12
3
Original data
InRn$LogRn
norm
quantile
s
0 2 4 6
-3-2
-10
12
3
After ROS
InRn_DL$LogRn
norm
quantile
s
Histogram – Boxplot – QQ Plot
Histogram
LogRn [Bqm−3
]
Density
0 2 4 6
0.0
00.0
50.1
00.1
50.2
00.2
5
02
46
Boxplot
LogRn [Bqm−3
]
Lognorm
al tr
ansfo
rmation
0 2 4 6
-3-2
-10
12
3
Normal Q-Q plot
Observed Value
Expecte
d N
orm
al V
alu
e
Plot spatial distribution of the data
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
[500,1.43e+03]
[300,500)
[200,300)
[100,200)
[50,100)
[0,50)
Indoor radon measurements (Simulated)
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
< 200
>= 200
Indoor radon measurements (Simulated)
2D Kernel density plots
InRn vs. Geology (ANOVA)
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
AgeName
Early Cretaceous
Early Triassic
Late Devonian
Late Jurassic
Late Permian
Middle Jurassic
Geology 1:1M
Early C
reta
ceous
Early T
riassic
Late
Devonia
n
Late
Jura
ssic
Late
Perm
ian
Mid
dle
Jura
ssic
0
2
4
6
Geology (AgeName)
LogR
n [B
qm
−3]
anova(lm_BG)
Analysis of Variance Table
Response: LogRn
Df Sum Sq Mean Sq F value Pr(>F)
AgeName 5 46.23 9.2450 4.4495 0.0005156 ***
Residuals 994 2065.32 2.0778
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Index
1. Exploratory analysis
� Outliers, histograms, Q-Q plots, Outliers, etc.
� Non-detected values
� ANOVA analysis
� Spatial distribution of indoor radon measurements (2D Kernel density plots)
2. Summary statistics by small areas (e.g. grids, municipalities, …)
� Map summary statistics (e.g. N, AM)
� Map probability of having an indoor radon concentration higher than a reference level
3. Interpolation techniques
� Inverse distant weighted (IDW)
� Ordinary kriging (OK)
4. Dose maps
5. Interactive maps
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
0
5
10
15
20
N
Number of data
Summary statistics
Number od dwelling sampled (N) by grid cells of 10 km x 10 km, and arithmetic mean (AM)
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
NA
[200,274]
[100,200)
[75,100)
[50,75)
[25,50)
[0,25)
Arithmetic mean
Probabilistic map
� Objective:
• Estimate the probability of having an indoor radon
concentration higher than a reference level (e.g. 200 Bq m-3)
� Method:
• Select radon measurements in each grid cell (or municipality,
small areas, district, etc.)
• Calculate the GM and the GSD in each grid cell
• Supposing a log-normal distribution, estimate the probability
of having a values above the reference level (RL)
• In the grids cells with few data (e.g. N ≤ 5), estimates are
carried out based on neighbour grids (interpolation – IDW)
GM = 57; GSD = 2.4
InRn
Fre
qu
en
cy
0 100 200 300 400 500
05
01
50
RL = 200
GMµ = 0
σ = 1
k
� = � �� − � �� � ���⁄ )
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
%
[30,44.5]
[20,30)
[10,20)
[5,10)
[1,5)
[0,1)
Prob[InRn > 200 Bq m-3]
Index
1. Exploratory analysis
� Outliers, histograms, Q-Q plots, Outliers, etc.
� Non-detected values
� ANOVA analysis
� Spatial distribution of indoor radon measurements (2D Kernel density plots)
2. Summary statistics by small areas (e.g. grids, municipalities, …)
� Map summary statistics (e.g. N, AM)
� Map probability of having an indoor radon concentration higher than a reference level
3. Interpolation techniques
� Inverse distant weighted (IDW)
� Ordinary kriging (OK)
4. Dose maps
5. Interactive maps
Points for interpolation
DegreesN/S or E/W
(aprox.)
1.0 111.32 km
0.1 11.132 km
0.01 1.1132 km
Not projected data: Great-circle distance
Degrees vs. distance
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
��� =∑ 1������� ��∑ 1�������
1.0 1.5 2.0 2.5 3.0 3.5 4.0
118
12
21
26
13
0
idp
RS
ME
10-fold cross-validation
Inverse Distance Weighted (IDW)
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
[500,648]
[300,500)
[200,300)
[100,200)
[50,100)
[0,50)
IDW - Predictions
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
[500,529]
[300,500)
[200,300)
[100,200)
[50,100)
[0,50)
OK - Predictions
Model
Nug
psill
Exp
range
0.54
1.91
0.00
11.470.0
0.5
1.0
1.5
2.0
2.5
0 10 20 30 40
dist
gam
ma id
Model
var1
Variogram
Ordinary kriging (OK)
� !" = # $ %& !' +)**(,-)(.%&/ (!'0/ −1)
ϕ34� =5X3 − 1λ log X ϕ x = < x · λ �3e? ϕ** x = < 1 − λ x · λ + 1 �34@� ≠ 0e?� = 0
Trans-Gaussian kriging using Box-Cox transforms:
Predictions are carried out over the transformed data, and then unbiased
back-transformed to the original scale using the Lagrange multiplier
(function krigeTg in R software, “gstat” and “MASS”).
Radon concentration (AM) by small areas
In our cases Grid Cells of 10 x 10 km; but we could choose other geometries (e.g. municipalities, districts, 1 km x 1 km, etc.)
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
NA
[200,274]
[100,200)
[75,100)
[50,75)
[25,50)
[0,25)
Arithmetic mean
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
NA
[195,200]
[100,195)
[75,100)
[50,75)
[25,50)
[0,25)
IDW - Predictions
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
NA
[200,253]
[100,200)
[75,100)
[50,75)
[25,50)
[0,25)
OK - Predictions
Index
1. Exploratory analysis
� Outliers, histograms, Q-Q plots, Outliers, etc.
� Non-detected values
� ANOVA analysis
� Spatial distribution of indoor radon measurements (2D Kernel density plots)
2. Summary statistics by small areas (e.g. grids, municipalities, …)
� Map summary statistics (e.g. N, AM)
� Map probability of having an indoor radon concentration higher than a reference level
3. Interpolation techniques
� Inverse distant weighted (IDW)
� Ordinary kriging (OK)
4. Dose maps
5. Interactive maps
Dose map
ABBCDEFGHI: K 1LMN4O = PQR · ST · U · S% · SVChallenges (to be further investigated):
1. Annual indoor radon concentration (CRn):
a) Predictions over small areas (grid cells of 10 km x 10 km, 1 km x 1 km, municipalities, districts, etc.). Can we improve
predictions? New statistical models (e.g. ML)? Other secondary variables (e.g. soil permeability)?
b) Floor correction model
2. Equilibrium factor (FE): convert CRn to the Equivalent Equilibrium Concentration (EEC) of radon daughters. Take default values (e.g.
0.4 UNSCEAR)? Regional trends? Dependence of usage?
3. Time expend indoors? (Occupancy factor - Fo). UNSCEAR recommend a value of 0.8. Can we use the same value for all the country
(e.g. differences between rural vs. urban areas)? Same value for different nationalities?
4. Time spend at home vs. workplaces? Rn characteristic at workplaces is in general different than at dwellings. Can we model this?
5. Commuting patters. Workplace in most cases not at same location as home. Sometimes quite far away… people commute 100 km.
How to model such effect?
6. Dose conversion factor (FD): dose coefficient applied to the EEC. International recommendation 9·10−6 mSv per Bq m−3 h (under
discussion).
Indoor radon (AM - SD)
Select the best method according to our data
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
NA
[200,274]
[100,200)
[75,100)
[50,75)
[25,50)
[0,25)
Arithmetic mean
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
NA
[195,200]
[100,195)
[75,100)
[50,75)
[25,50)
[0,25)
IDW - Predictions
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
Bq/m3
NA
[200,253]
[100,200)
[75,100)
[50,75)
[25,50)
[0,25)
OK - Predictions
Dose map
ABBCDEFGHI: K 1LMN4O = PQR · ST · U · S% · SVInitial try (to be adjusted):
� Indoor radon concentration at ground floor level and
standard dose conversion factors.
� Uncertainty analysis by Monte Carlo simulation:
(Elío et al, Environnent International 114: 69–76, 2018)
Nsim = 100
CRn ~ N(AM, SD) [truncated InRn > 0]
FE ~ LN(0.4, 1.15)
FO ~ N(0.8, 0.03)
FD ~ N(9·10-6 ; 1.5 ·10-6 )
T = 8760 h/y
� Map the AM and SD of the simulated values 55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
2
4
6
mSv/y
Radiation dose - AM
55°N
55.2°N
55.4°N
55.6°N
55.8°N
56°N
23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E
1
2
3
mSv/y
Radiation dose - SD
Interactive maps(leaflet)
It is possible to save as Web page (HTLM) and upload on the web:
File:///C:/Users/elioj/Documents/JAVIER_Trabajo/IAEA/2019_07_Lithuany/R-Lithuania/Rresults/Interactive_OK_Pred_Map.html
Geographic Information System
https://qgis.org/en/site/
Geographic Information System
More info