development of an idl-based geospatial data processing...

45
1 Development of an IDL-based geospatial data processing framework for meteorology and air quality modeling AQRP PROJECT 13-TN2 FINAL REPORT SUBMITTED TO UNIVERSITY OF TEXAS - AUSTIN AIR QUALITY RESEARCH PROGRAM (AQRP) BY: Hyun Cheol Kim 1,2,3 , Fong Ngan 1,2,4 , Pius Lee 2,3 , and Daniel Tong 1,2,5 1 Cooperative Institute for Climate and Satellites, University of Maryland, College Park, Maryland 2 Air Resources Laboratory, NOAA, College Park, Maryland 3 Co-Principal Investigator 4 Investigator 5 Principal Investigator November 30, 2013

Upload: others

Post on 26-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

1

Development of an IDL-based geospatial data processing framework for meteorology and air quality modeling

AQRP PROJECT 13-TN2

FINAL REPORT

SUBMITTED TO

UNIVERSITY OF TEXAS - AUSTIN AIR QUALITY RESEARCH PROGRAM (AQRP)

BY:

Hyun Cheol Kim 1,2,3, Fong Ngan1,2,4, Pius Lee 2,3, and Daniel Tong1,2,5

1Cooperative Institute for Climate and Satellites, University of Maryland, College Park, Maryland

2 Air Resources Laboratory, NOAA, College Park, Maryland

3 Co-Principal Investigator 4Investigator

5 Principal Investigator

November 30, 2013

Page 2: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

2

Project summary

This project investigates basic computational algorithms to handle Geographic Information System (GIS) data and satellite data essential in regional meteorological and chemical modeling. It develops a set of generalized libraries within a geospatial data processing framework aiming to process geospatial data more efficiently and accurately. The tool can process GIS data both in vector format (e.g., ESRI shapefiles) and raster format (e.g., GEOTIFF and IMG) for any given domain. Processing speeds are improved through selective usages of polygon-clipping routines and other algorithms optimized for specific applications. The raster tool is developed utilizing a histogram reverse-indexing method that enables easy access of grouped pixels. It generates statistics of pixel values within each grid cell with improved speed and enhanced control of memory usage. Geospatial data processing tools to determine spatial allocation that use polygon clipping algorithms require huge computational resources to calculate fractional weighting between GIS polygons of the physical space (and/or polylines) and gridded cells of the modeling space. To overcome the speed and computational accuracy issues, an efficient polygon/polyline clipping algorithm is crucial. One key element for faster spatial allocation is to optimize computational iterations in both polygon clipping and map projection calculations.

The project had the following specific objectives: (A) To develop an optimized geospatial data processing tool that can transform raster data format and vector data format to any target domain within the data coverage with vastly shortened processing time and enhanced accuracy. (B) To collect and to process sample GIS and satellite data so that they are readily deployable for modeling studies. Applications include a spatial regridding method for emissions and satellite data. (C) To perform engineering tests to demonstrate the tool's capability in improving routine data processing for meteorological and air quality models. An example test case has been included in the user-guide and users’ installation sample testing package delivered in conjunction with this report.

Page 3: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

3

Table of contents

1. Introduction .......................................................................................................................................... 8

1.1 Problem statement ....................................................................................................................... 8

1.2 Project Objectives ......................................................................................................................... 9

1.3 Data analysis, interpretation and management ........................................................................... 9

2. Methodology: computational algorithms ........................................................................................... 11

2.1 Vector data processing ............................................................................................................... 11

2.1.1 Polygon clipping algorithms ................................................................................................ 11

2.1.2 Implementation of polygon clipping algorithms in IDL ....................................................... 13

2.1.3 Polyline clipping algorithm .................................................................................................. 14

2.1.4 Application of vector processing tool ................................................................................. 14

2.1.4.1 Spatial regridding ............................................................................................................ 14

2.1.4.2 Conservative remapping ................................................................................................. 15

2.1.4.3 Conservative downscaling............................................................................................... 16

2.1.4.4 Discussion on regirdding methods .................................................................................. 19

2.1.4.5 Fractional weighting ........................................................................................................ 22

2.1.4.6 Grid masking ................................................................................................................... 22

2.2 Raster data processing ................................................................................................................ 23

3. Application of IGDP ............................................................................................................................. 26

3.1 Satellite data processing ............................................................................................................. 26

3.2 Land use land cover data ............................................................................................................ 27

4. Test run with WRF ............................................................................................................................... 28

4.1 SST construction from geostationary satellite ............................................................................ 28

4.2 Model setting .............................................................................................................................. 29

4.3 Basic model evaluation ............................................................................................................... 30

4.4 Test run ....................................................................................................................................... 32

4.5 Discussion .................................................................................................................................... 34

5. Summary and conclusions .................................................................................................................. 35

6. References .......................................................................................................................................... 36

7. Appendix ............................................................................................................................................. 37

7.1 User guide ................................................................................................................................... 37

Page 4: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

4

7.2 IDL library description ................................................................................................................. 39

Page 5: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

5

List of figures

Figure 2.1 Steps of Sutherland-Hodgman polygon clipping algorithm (http://www.cs.helsinki.fi/group/goa/viewing/leikkaus/intro2.html) .......................................... 12

Figure 2.2 Implementation of Southerland-Hodgman polygon clipping algorithm in IDL. ........................ 13 Figure 2.3 Example of conservative remapping method. The raw data is from 4-km MODIS Chlorophyll

concentration, and the horizontal grid resolution of the target domain is 4-km in Houston, Texas. ....................................................................................................................................................... 16

Figure 2.4 A schematic of “Downscaling” method, using fine resolution weighting kernel (a) GOME-2 NO2 column raw pixels, (b) CMAQ NO2 column, (c) spatial weighting kernel from CMAQ, and (d) Downscaled GOME-2 NO2 column ................................................................................................. 18

Figure 2.5 Examples of “Downscaling” regridding methods. Left panel shows original NO2 columns from GOME-2 (upper) and OMI (lower), and right panel shows 4-km CMAQ simulated NO2 columns. In the middle panel, downscaled NO2 columns are shown. Domain averaged values are also shown in the bottom of each plots. Only valid cells both from satellite and CMAQ are used for the average........................................................................................................................................... 19

Figure 2.6 Comparison of MODIS AOD data spatial processing methods, with traditional pixel aggregation method (left column) and conservative remapping (right column), onto 108-km, 36-km, 12-km, 4-km and 1-km target domains. ................................................................................. 21

Figure 2.7 Examples of model-site comparison methods: (a) “On-the-cell”, (b) “Linear-interpolation”, and (c) “ fractional weighting”. ...................................................................................................... 22

Figure 2.8 Examples of grid masking routine. Level 1 masking (state level) for Texas and Maryland (left), and level 0 masking (country level) for China, S. Korea, and Japan. ............................................. 23

Figure 2.9 30-m NLCD land use land cover data set near Houston (left), and an example pixel distribution from a 4-km grid cell. ................................................................................................. 24

Figure 2.10 Examples of “inside polygon” determination. Shaded areas are county/FIPS boundaries for (a) Sacramento County, CA (06067), (b) Hancock county, ME (23009), (c) Roanoke County, VA (51161), and (d) Roanoke City, VA (21770). Blues crosses indicate AIRNow site locations outside given county/FIPS boundary and red crosses indicate inside county/FIPS boundary. .................. 24

Figure 2.11 Example of USGS LULC data processing for urban fraction (left) and water fraction (right). 25 Figure 3.1 Examples of satellite data processing in various domain definitions using the conservative

regridding method. ........................................................................................................................ 26 Figure 3.2 An Example of LULC data processing using IGDP tool. USGS Land Use/Land Cover Scheme,

International Geosphere Biosphere Programme (IGBP), Simple Biosphere 2 Model, Vegetation Lifeforms, Biosphere Atmosphere Transfer Scheme, Simple Biosphere Model Scheme, and Global Ecosystems data sets are shown from the upper left. ....................................................... 27

Figure 4.1 SST diurnal variations inside Galveston bay and outer ocean. ................................................. 28 Figure 4.2 WRF domains used for model simulations in three different spatial resolutions: 36-km (NA36),

12-km (SUS12) and 4-km (TX04). ................................................................................................... 29

Page 6: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

6

Figure 4.3 Time series plot for 2-m temperature comparison between CMAQ (red) and MADIS surface observations (circle). ...................................................................................................................... 31

Figure 4.4 Same with Figure 4.3 except for 10-m wind speed. ................................................................. 31 Figure 4.5 2-m temperature and 10-m wind speed spatial distribution. Plots are snapshots at Jun. 7,

2013, 00UTC. .................................................................................................................................. 32 Figure 4.6 Difference plots of SST (simulation with NCEP SST and simulation with IGDP updated SST) at

11 UTC and 23 UTC on June 6th, 2006 ............................................................................................ 33 Figure 4.7 Difference plots (simulation with NCEP SST and simulation with IGDP SST) of 2-m

temperature (left) and 10-m wind speed (right) at 14 UTC on June 6th, 2006 .............................. 34 Figure 7.1 A schematic plot for “run_no2.pro” ......................................................................................... 38 Figure 7.2 Spatial plots of processed CMAQ and GOME-2 NO2 column data ............................................ 38

Page 7: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

7

List of tables

Table 2.1 Comparison of MODIS AOD spatial regridding methods with pixel aggregation method and conservative remapping (CR) method. Percentage of valid (available) pixels and processing time are shown....................................................................................................................................... 20

Table 4.1 Model configurations used in this study. ................................................................................... 30

Page 8: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

8

1. Introduction

1.1 Problem statement

Fast and accurate handling of Geographic Information System (GIS) data and satellite data is essential in regional meteorological and air quality modeling and for data analysis. Accurate land use information is particularly important in meteorological simulations for land surface exchanges. It is also crucial in air quality simulations for spatial allocation of emission sources. There is increasing demand for geospatial data processing tools as finer resolution air quality simulations become more commonplace.

Modelers in the Texas Commission on Environmental Quality (TCEQ) are contemplating the use of fine resolution land use land cover (LULC) data and satellite-based remote sensing data for meteorological and air quality simulations. Byun et al. (2007; 2008) have incorporated a high resolution LULC dataset from the University of Texas Center for Space Research (UTCSR) and a Texas Forest Service (TFS) (Cheng and Byun, 2008) dataset into the Fifth-Generation Penn State/NCAR Mesoscale Model (MM5, Grell et al., 1994) in previous TCEQ projects. A similar approach to use fine resolution LULC data (30-m Texas LULC generated by Texas A&M) has also been done for the Weather Research and Forecasting (WRF) model (Skamarock and Klemp, 2008.) -- a successor of the MM5 model. Satellite data is also an important source for providing model inputs as well as for model performance evaluation. Precedence of such applications was evident in several previous TCEQ projects; e.g., implementations of Sea Surface Temperature (SST) (Byun et al., 2007), and soil moisture (Byun el al., 2011).

Many such previous approaches utilizing finer resolution data were successful. However, there was some concern regarding the customary practice of geospatial data processing used in these approaches. They had all focused on developing a project-specific data processing tool. There was no integrated effort to handle a variety of GIS and/or satellite data sets in a unified framework. Had such a framework existed it could facilitate speedy incorporation of new data formats. As more and more fine resolution data have become available recently, existing data processing tools have proven too slow or too memory restrictive to deal with them. For example, current National Land Cover Data (NLCD) data features 30-m LULC data covering the Contiguous United States (CONUS) (161190×104424≈16 billion pixels) and requires a large computer with large run time and memory to process. Usually, it is impossible to load the entire national data set into the computer memory. Random access and subtraction of local data sets are a fundamental capability of a successful data processing tool. The GIS road network data in vectorized polyline formats is such an example. It is one of the surrogates used in vehicular emission processing that entails high spatial precision and memory usage in the processing. It usually has more than a million entities that each includes hundreds to thousands of polylines. Therefore efficient tools to handle these data are critically important.

Page 9: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

9

1.2 Project Objectives

This project investigates basic computational algorithms to handle GIS data and satellite data. The project aims to develop a set of generalized libraries within a geospatial data processing framework achieving processing of geospatial data more efficiently and accurately. We utilize the Interactive Data Language (IDL) by EXELIS Visual Information Solutions to build geospatial processing libraries. An IDL-based Geospatial Data Processor (IGDP) was created by the Air Resources Laboratory, National Ocean and Atmospheric Administration (ARL/NOAA). It can process vector (e.g., ESRI shapefiles (.SHP) and raster formatted (e.g., Geo Tagged Image File Format (GeoTIFF) and ERDAS IMAGINE (.IMG)) GIS data for any given domain. Processing speed is improved through selective usage of polygon-clipping routines and other algorithms optimized for any particular application. The raster tool is developed utilizing a histogram reverse-indexing method that enables easy access of grouped pixels. It generates statistics of pixel values within each grid cell with improved speed and enhanced control of memory usage. The spatial allocating tool, using the polygon clipping algorithms, requires huge computational power to calculate fractional weighting between GIS polygons (and/or polylines) and gridded cells. An efficient polygon/polyline clipping algorithm is needed to overcome these deficiencies in speed and accuracy for the conventional algorithms. One key to achieve faster spatial allocation is to optimize computational iterations in both polygon clipping and map projection calculations.

The project has the following specific objectives

1. To conduct a literature search for summarizing and comparing available GIS data processing algorithms. Advantages and constraints from each algorithm will be described.

2. To develop an optimized geospatial data processing tool that can handle raster data format (e.g. pixels) and vector data format (polylines and polygons) for any given target domain more efficiently and accurately.

3. To collect and to process sample GIS and satellite data. Applications will include a spatial regridding method on emissions and satellite data, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) Aerosol Optical Depth (AOD), the Ozone Monitoring Instrument (OMI), and the Global Ozone Monitoring Experiment (GOME)-2 NO2 column data.

4. To perform an engineering test with the tool to process fine resolution LULC data. 5. To draft a final report that documents all work performed in support of the project.

1.3 Data analysis, interpretation and management

In addition to the development of a geospatial data processing tool, three types of data are collected and archived for the tool’s operational test and performance evaluations: (1) various GIS data is collected and utilized, both in vector and raster data format. GIS shape files for population, census track, road networks, rail road, etc., are tested for the evaluation of polygon and/or polyline clipping capability, and various fine resolution land use land cover is tested for the geospatial tool’s raster data handling capability. (2) Various satellite data, such as MODIS AOD, OMI/GOME-2 NO2 column data, and/or

Page 10: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

10

Geostationary Sea Surface Temperature (SST) data, are collected and utilized to investigate the spatial regridding capability of the newly developed geospatial data tool. (3) A short-term engineering model run with outputs is produced and archived. This engineering run is to evaluate the basic performance of the geospatial data processing tool and to generate an example of the model’s input data. This is not intended to generate the best-effort simulation with scientific meaning, but to evaluate with several reasonable methods to discuss the model input’s overall quality and to extrapolate into future project topic development. Simulation data is evaluated by comparisons with base run (e.g. with default model input) and observations data.

Page 11: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

11

2. Methodology: computational algorithms

Appropriate handling of GIS data is crucial in air quality simulations, especially in preparation of emissions data. However, tools for GIS data have scarcely been developed in the air quality scientific community. In most cases, current solutions for GIS data processing include PC-based ArcGIS tools and/or the U.S. EPA spatial allocator. These tools have clear limitations in the seamless processing of GIS data: compatibility on multi-platforms, flexibility in various data formats; and most importantly, processing speed for fine resolution data. In order to overcome these limitations, we have designed a generalized GIS data processing library and package to process GIS data, emissions and satellite data.

Most GIS data has three types of data: point (pixel), line (polyline), and area (polygon). Pixel data is usually called a raster dataset which is one-dimensional dataset without effective area. The value of each pixel represents the value at the center point of the area. Polyline and/or polygon data have an effective length or area, and a polygon is a closed form of polyline, so we can use similar routines for both polyline and polygon data. In order to convert polygon or pixel data to a gridded format, we need to know how these irregular polygons overlap each grid cell, and how many pixels are inside each grid cell. A polygon clipping algorithm and a pixel grouping algorithm are key components in this new GIS data processing tool that we developed.

2.1 Vector data processing

2.1.1 Polygon clipping algorithms

Traditionally, a polygon clipping algorithm has been used in computer graphics to clip out the portions of a polygon that lie outside the window of the output device to prevent undesirable effects. Lately advanced computer graphics use polygon clipping to render 3D images through hidden surface removal and to produce high-quality surface details using techniques such as beam tracing. It is also used in distributing the objects of a scene to appropriate processors in multiprocessor ray-tracing systems to improve rendering speeds.

Clipping an arbitrary polygon against an arbitrary polygon is a basic routine in computer graphics. It may be applied thousands of times. Therefore the efficiency of these routines is extremely important. To achieve good results, several polygon clipping algorithms have been developed, from simplified clipping algorithms that can only clip regular (e.g. convex) polygons, to more complicated algorithms that can handle more general polygons (e.g. concave or self-intersecting polygons). These algorithms have their own advantages and disadvantages in processing efficiency and flexibility. We describe two such algorithms:

(1) Southerland-Hodgman algorithm

The Sutherland-Hodgman polygon clipping algorithm (Sutherland and Hodgman, 1974) was introduced in 1974. It works by extending each line of the convex clip polygon in turn and selecting only vertices from the subject polygon that lie on the visible side. The algorithm begins with an input list of all vertices in the subject polygon, and one side of the clip polygon is extended infinitely in both directions, and the path of the subject polygon is traversed. Vertices from the input list are inserted into an output list if

Page 12: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

12

they lie on the visible side of the extended clip polygon line, and new vertices are added to the output list where the subject polygon path crosses the extended clip polygon line. This process is repeated iteratively for each clip polygon side, using the output list from one stage as the input list for the next. Once all sides of the clip polygon have been processed, the final generated list of vertices defines a new single polygon that is entirely visible. These steps are shown in Figure 2.1. This is a very fast and efficient algorithm, but applies only when the target polygon is convex. This is a very good candidate algorithm to be applied in regridding arbitrary shaped polygons into gridded cells that are always convex polygons.

Figure 2.1 Steps of Sutherland-Hodgman polygon clipping algorithm (http://www.cs.helsinki.fi/group/goa/viewing/leikkaus/intro2.html)

(2) Vatti clipping algorithm

The Vatti clipping algorithm (Vatti, 1992) is a more complicated and generalized polygon clipping algorithm. It allows clipping of any number of arbitrarily shaped subject polygons by any number of arbitrarily shaped clip polygons. Unlike the Sutherland-Hodgman algorithm, the Vatti algorithm does not restrict the types of polygons that can be used as subjects or clips. Even more complex (e.g. self-intersecting) polygons, and polygons with holes can be processed. This algorithm can be applied to various Boolean clipping operations: “intersections”, “difference”, “union”, and “exclusive”. This algorithm is generally applicable only in the 2D space.

Compared to the Sutherland-Hodgman algorithm, the Vatti clipping algorithm is a more complete polygon clipping algorithm with more functionalities, but its features are sometimes beyond our scope and cause inevitable loss in processing efficiencies. Therefore, for the development of the IGDP, we utilized both polygon clipping algorithms, the simple Sutherland-Hodgman and the complex Vatti algorithm, based on the necessary features and optimized processing time of GIS data.

Page 13: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

13

2.1.2 Implementation of polygon clipping algorithms in IDL

Based on the literature reviews on basic polygon clipping algorithms, we decided that the Sutherland-Hodgman polygon clipping algorithm can be used as a default option in various GIS polygon data handling routines. We first implemented this algorithm in the Interactive Data Language (IDL) and tested various polygon examples; these steps are shown in Figure 2.2. As expected, the implementation of Sutherland-Hodgman algorithm in IDL was successful, and showed fast and efficient performances.

In addition to the original implementation, we further utilized the “polyfillaa” and “polyclip” IDL library, developed by Dr. Smith in the Ritter Astrophysical Research Center, University of Toledo as a part of the Cube Builder for IRS Spectra Maps (CUBISM) (Smith et al., 2007) based on Sutherland-Hodgman algorithm. The advantage of this library is that it has self-compiling Dynamically Loadable Modules (DLM), written in C language, as a core polygon clipping routine. It showed 50 times faster polygon clipping performance compared to the normal IDL version of Sutherland-Hodgman algorithm, and worked as a normal IDL routine without any problem. We expected utilizing Dr. Smith’s polyfillaa IDL library could provide a complete solution comparable to an operational efficiency of numerical calculation within a geospatial data processing framework.

Figure 2.2 Implementation of the Southerland-Hodgman polygon clipping algorithm in IDL.

Page 14: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

14

2.1.3 Polyline clipping algorithm

The polyline clipping algorithm is also implemented in IDL, which can convert polyline data (e.g. road network) into modeling grids. We originally developed it as a 2-dimensional horizontal algorithm, but soon extended it to a multi-dimensional version, which can handle 4-dimensional trajectory data (e.g. longitude, latitude, altitude, and time; such as trajectory output, aircraft emission data). The actual output data from this polyline clipping algorithm is 5-dimensional: It returns indices of location (3-dimensional), time and fraction (or density). A simplified Sutherland-Hodgman algorithm is used to cut each polygon at their cross section through the boundaries for each grid cell within the model domain. Clipped pieces are then reconstructed using their cell indices. This routine is designed to be used for road or railroad network data, and is used for the emission surrogate file generation for those data. It is also used in the development of trajectory-3D Chemical Transport Model (CTM) combination tool. We used this capability in building a trajectory- CTM combination tool, using Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) simulation output and CMAQ model output and/or Sparse Matrix Operator Kernel Emissions Model (SMOKE) emission data, to investigate the chemical history of each parcel in the trajectory.

2.1.4 Application of vector processing tool

In this section, we introduce applications of geospatial data processing capability using polygon clipping algorithms introduced in the previous section, mostly focusing on the spatial regridding capability.

2.1.4.1 Spatial regridding

Regridding of model output or satellite data with different map projection settings is very important for inter-comparisons of modeled results and/or satellite outputs. Spatial regridding is a commonly performed procedure in satellite data processing. It converts a data set between different map projections and resolutions. Among numerous spatial regridding methods, interpolation and pixel aggregation are two of the most common methods. Interpolation is preferred when the target domain resolution is finer than the raw data pixels, on the other hand, pixel aggregation is the preferred way to average all the pixels inside each domain cell when the grid cell size is bigger than the raw data pixel size.

Despite their popularity, more mathematically complete methods for spatial regridding are needed, especially in dealing with fine resolution data and/or where conservation of measured quantities is required. A case in point is processing emission data. It requires great caution on spatial data handling because mass conservation is strictly applied. EPA’s spatial allocator, used in their SMOKE model emission processing, is one of the examples to handle emission data with strict mass conservation. It requires the calculation of fractional areas of overlapping polygons between raw data pixels and modeling grid cells. Simple interpolation might be able to generate an approximate result, but it fails to conserve mass. For more accurate remapping of spatial data, we need to know the exact fractions between the initial data cells and the target grid cells.

Page 15: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

15

In order to build a mass conservative spatial regridding tool, we have utilized polygon clipping algorithms, and have developed a tool to perform accurate spatial regridding of satellite data. In the next two sections, we introduce two core algorithms for the regridding tool, “Conservative remapping” and “Conservative downscaling” methods.

2.1.4.2 Conservative remapping

The spatial allocator tool (e.g. vector tool) from the IGDP can provide exact fractions using the polygon clipping algorithm, and this information can be used for lossless spatial regridding in the conservative remapping method. This method reconstructs raw data pixels (e.g. satellite data) into target domain grid cells, by calculating fractional weighting of each overlapping portions, between data pixels and domain grid cells. If the raw pixel data is in density units (e.g. concentration), we can calculate the overlapping fractions for each data pixel and grid cell, and the grid cell concentration can be calculated as a weighted average of data pixels and fractions.

𝑓𝑖,𝑗 =𝐴𝑟𝑒𝑎(𝑃𝑖 ∩ 𝐶𝑗)𝐴𝑟𝑒𝑎(𝐶𝑗)

𝐶𝑗 =∑𝑃𝑖 · 𝑓𝑖,𝑗∑ 𝑓𝑖,𝑗

where i and j are indices of data pixel, P, and grid cells, C. 𝑓𝑖,𝑗 is the overlapping fractions, and ∑𝑓𝑖,𝑗=1 if no missing pixels are involved in grid cell 𝐶𝑗.

If the satellite pixel data is in mass units, equations for the conservative remapping are slightly different. We need to calculate fractions of overlapped area to raw data pixel size, instead of grid cell size.

𝑔𝑖,𝑗 =𝐴𝑟𝑒𝑎(𝑃𝑖 ∩ 𝐶𝑗)𝐴𝑟𝑒𝑎(𝑃𝑗)

𝐶𝑗 = �𝑃𝑖 · 𝑔𝑖,𝑗

where𝑔𝑖,𝑗 is the fraction of overlapped area to the data pixel size.

Figure 2.3 shows an example of the conservative remapping method. The raw data is from 4-km MODIS Chlorophyll concentration, and the horizontal grid resolution of the target domain is 4-km in Houston, Texas. Although, they have same resolution, their map projections are different, and the “conservative remapping” method can help project the satellite data into a model-ready grid.

This conservative remapping method is a similar approach to an algorithm known as the “Variable-pixel linear reconstruction algorithm (or, informally ‘drizzle’ algorithm)” in the Astronomy community, which has been developed to preserve photometry and resolution for Hubble Space Telescope. Fruchter and Hook ( 2002) described the method and implementation of the “drizzle” algorithm, and discussed the photometric and astrometric accuracy and image fidelity of the algorithm as well as the noise characteristics of output images.

Page 16: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

16

Figure 2.3 Example of the conservative remapping method. The raw data is from 4-km MODIS Chlorophyll concentrations, and the horizontal grid resolution of the target domain is 4-km in Houston, Texas.

2.1.4.3 Conservative downscaling

The “Conservative downscaling” method is an advanced form of conservative remapping. While the conservative remapping simply performs simple spatial redistribution of data, the conservative downscaling can conduct a spatial reconstruction of data with additional information provided. “Downscaling” is a common concept in meteorological simulations, especially in application of global circulation models to initiate and to provide boundary conditions for regional models. We used a similar concept to describe the downscaling method in data processing. Namely, we applied the method to spatial regridding by supplying additional detailed information to add to that already contained in the original coarse resolution data. It is different from a simple increase in resolution (e.g. simple interpolation) as the raw data within the coarse resolution is restructured by a set of logics analogous to a regional meteorological model downscaling the global meteorology with its own set of physical balances.

Figure 2.4 shows an example of the downscaling method, using NO2 column data form GOME-2 and OMI. Recently, remotely measured NO2 column data from satellites have been popular to monitor surface NOx emission changes and trends. We have utilized GOME-2 and OMI NO2 column data from Tropospheric Emission Monitoring Internet Service (TEMIS) data center (http://www.temis.nl/). GOME-2

Page 17: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

17

and OMI NO2 column data have different spatial resolutions at different overpass times; therefore, fair comparisons of these two data sets are not easy to conduct. Their spatial resolutions, GOME-2 at 80×40 km2 and OMI at 13×24 km2, are grossly different. A simple regridding method cannot be a right tool for fair comparison for these data sets. If the normal regridding method were to be applied (even with conservative remapping), the GOME-2 data, with coarse resolution, would have a tendency to underestimate in the high concentration regions (e.g. urban regions or regions with road networks), and to overestimate concentrations in the suburban or rural areas when compared to finer resolution OMI, or in situ observations.

To overcome the constraint from their resolutions and to compare them in a fair manner, we tried to apply more information to increase their spatial structure. We utilized a high resolution (4-km) CMAQ simulation. Figure 2.4a shows the raw data pixels from GOME-2 NO2 column measurements, usually at 80×40 km2 or coarser resolution, in the Southern California region on Aug. 18, 2008. One can notice enhanced NO2 columns in the Los Angeles region, with a mostly smoothed pattern. On the other hand, the 4-km CMAQ simulated NO2 columns at 09:30AM shows clear spatial distributions, with close correlation with sources of NOx emissions.

For each GOME-2 pixel, we retrieved collocated CMAQ pixels, as shown in Figure 2.4b. With usual GOME-2 pixel size and CMAQ grid cell size, around 200 cells are available per one GOME-2 data pixel. A spatial weighting kernel is constructed as a fractional distribution of CMAQ NO2 values corresponding to a GOME-2 pixel, as shown in Figure 2.4c. It was then applied to the GOME-2 data pixel to reconstruct its spatial distribution, as expressed in the following equation,

𝐶𝑗 =∑𝑃𝑖 · 𝑘𝑖,𝑗 · 𝑓𝑖,𝑗

∑ 𝑓𝑖,𝑗

where 𝑘𝑖,𝑗 is the spatial weighting kernel of 𝑃𝑖 in the position of 𝐶𝑗. The total of kernel 𝑘𝑖,𝑗 throughout 𝑃𝑖 sums up to 1. The average of reconstructed GOME-2 pixel is identical to the original value. With all reconstructed GOME-2 pixels, we performed conservative remapping again to build the whole distribution within the target domain. Figure 2.4d shows the final output, with more detailed structures.

The most important point in implementing the downscaling procedure is the sequence of downscaling and regridding. It should be noted that, in our implementation, we applied the spatial weighting kernel first, for each data pixel, and then performed regridding to construct whole spatial distribution. If regridding is conducted first, followed by downscaling, then the quantities in the original data pixels might be dithered out to the whole domain. By conducting downscaling for each data pixel, with the unique weighting kernel for each pixel, the quantity in the original pixel never propagates out of the original data pixel’s coverage. Weighting kernels only help fine structures, and general spatial patterns are determined only by original pixel distributions. While it was thought to be computationally expensive at early stage of implementation, we could greatly improve the performance by applying efficient polygon clipping algorithms, optimized map projecting routines, and well-structured computational performance flows.

Page 18: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

18

Figure 2.5 shows examples of downscaled GOME-2 and OMI NO2 columns using a CMAQ NO2 column simulation as a spatial weighting kernel. GOME-2 NO2 columns and CMAQ 09:30 AM NO2 columns are shown in the upper panel, and OMI and 13:30 PM NO2 columns are shown in the lower panel. Clearly, one can notice fine structures, Los Angeles and other cities, such as San Diego, Santa Clara, Bakersfield, and Mexicali, and also the I-10 freeway traffic. It is really interesting to see that the left column and middle column are showing the same data, while the downscaled plot (middle) seems to show much higher concentrations, especially in the GOME-2 plots. They should be identical if there are no missing pixels or overlapped pixels. The domain averages show less than 1% difference between them, implying that serious errors were introduced in the NO2 column values that occurred in urban regions. It eventually led to a NOx emission mis-interpretation in this region with the traditional spatial data processing approach. We have confirmed that, under ideal conditions where no missing or overlapping of data pixels happens, the domain average of downscaled data is clearly identical with the original data.

This method is not fully matured yet. However, with further sensitivity testing and statistical analysis this method can provide very important information such as on long term emission trends by monitoring fine-scale signals of NOx emissions from satellite data. This can be a potential huge advantage of using advanced spatial regridding techniques.

Figure 2.4 A schematic of “Downscaling” method, using fine resolution weighting kernel (a) GOME-2 NO2 column raw pixels, (b) CMAQ NO2 column, (c) spatial weighting kernel from CMAQ, and (d) Downscaled GOME-2 NO2 column.

Page 19: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

19

Figure 2.5 Examples of “Downscaling” regridding methods. Left panel shows original NO2 columns from GOME-2 (upper) and OMI (lower), and right panel shows 4-km CMAQ simulated NO2 columns. In the middle panel, downscaled NO2 columns are shown. Domain averaged values are also shown in the bottom of each plots. Only valid cells both from satellite and CMAQ are used for the average.

2.1.4.4 Discussion on regridding methods

We further investigated the advantage of spatial regridding methods that were introduced in the previous section by application to the processing of the Moderate Resolution Imaging Spectroradiometer (MODIS) Aerosol Optical Depth (AOD) data. Two spatial regridding methods, the traditional pixel aggregation method and the newly developed conservative remapping method, are compared using various domain grid resolutions, 108-km, 36-km, 12-km, 4 km, and 1-km, over the whole CONUS domain. Figure 2.6 shows spatial distributions of regridded MODIS level 2 AOD data using the pixel method (left) and the conservative remapping method (right). In the coarse resolution regridding (e.g. for 108-km and 36-km), there are no explicit differences between the two methods. As shown in Table 2.1, the percentages of valid pixels are similar, and the pixel method shows faster processing time. However, for the finer resolution target domains, which are similar or finer than the MODIS AOD resolution (~10km), the pixel method fails to cover domain grid cells and tends to leave many of them unfilled. For an extreme case of a 1-km CONUS domain (1.6 million grid cells), the pixel method totally fails and provides less than 1 % of valid data, while the conservative remapping method using fractional weighting still shows similar coverage with coarse resolution regridding, implying that

Page 20: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

20

the conservative remapping method is compatible for any resolution, and is crucial in fine resolution spatial data processing for model validation and data assimilation.

108 km (50x30)

36 km (147x89)

12 km (442x265)

4 km (1199x799)

1 km (5200x3200)

Pixel 76% / 1s 60% / 2s 37% / 4s 7% / 7s 0.4% / 51s

CR 79% / 13s 66% / 14s 57% / 19s 55% / 55s 50% / 607s

Table 2.1 Comparison of MODIS AOD spatial regridding methods with the pixel aggregation method and conservative remapping (CR) method. Percentage of valid (available) pixels and processing time are shown.

Page 21: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

21

Figure 2.6 Comparison of MODIS AOD data spatial processing between the traditional pixel aggregation (left column) and conservative remapping (right column) methods, onto 108-km, 36-km, 12-km, 4-km and 1-km target domains.

Page 22: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

22

2.1.4.5 Fractional weighting

Polygon clipping and the overlaying fraction calculation capability in the IGDP tool can be used in the comparison study between coarse resolution model outputs and on-site observations. There are several ways to conduct such a comparison: One can compare the site observation with the model cell data where the site is located (“On-the-cell”), or use a linear or bilinear interpolation method to estimate the value at the location of site observation (“Linear-interpolation”). One can propose another approach based on a more mathematically solid and mass conservative methodology. The so-called “fractional weighting” method is one such methodology that uses fractional areas of overlapped area within an area of interest for a given radius. As shown in Figure 2.7c, model values can be estimated as an average using fractional weightings provided by the polygon clipping capability built into the IGDP tool.

Figure 2.7 Examples of model-site comparison methods: (a) “On-the-cell”, (b) “Linear-interpolation”, and (c) “ fractional weighting”.

2.1.4.6 Grid masking

Figure 2.8 shows examples of grid masking routines using the global administrative boundary data (GADM). The GADM provide global administrative boundary in 3 levels (level0 – country, level1 – state/province, level2 – county) so that any gridded model outputs can be masked for necessary administrative boundaries.

Page 23: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

23

Figure 2.8 Examples of grid masking routine. Level 1 masking (state level) for Texas and Maryland (left), and level 0 masking (country level) for China, S. Korea, and Japan.

2.2 Raster data processing

The raster data processing tool uses a histogram reverse-indexing method in the IDL “histogram” function, and is capable of fast access of grouped pixels. For each grid cell, the raster tool provides a histogram and statistics of pixels inside the grid cell. Figure 2.9 shows an example of 30-m NLCD LULC data in a 4-km grid cell near the Houston region. Usually, around 18,000 pixels are found in a 4-km grid cell, and the histogram in the right panel shows the pixel count distribution for LULC types in a single grid cell.

The key features in the raster data handling routine is to build an efficient indexing process to group each raster data pixel into the target domain grid cells, with minimum usage of machine memory. Direct accessing of partial data from the raw raster data is also an important capability since reading in whole raw data should be avoided for appopriate usage of machine memory.

Page 24: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

24

Figure 2.9 30-m NLCD land use land cover data set near Houston (left), and an example pixel distribution from a 4-km grid cell.

In addition to basic raster data handling capability using the IDL reverse-indexing of histogram function, we extended the pixel statistics capability to any given polygons with arbitrary shapes. One of the key feature in this routine is how to determine if a pixel is inside or outside of given polygons. Figure 2.10 shows examples of an “inside polygon determination” routine. Shaded areas are counties or cities that were assigned with a unique Federal Information Processing Standards (FIPS) code. We used locations of EPA AIRNow sites to test the routine; red and blue crosses indicate site locations ‘inside’ and ‘outside’ of given the FIPS boundary, respectively. Listed below are a few technical notes:

- IDLanROI::ContainsPoints method from IDL Region of Interest class was used to determine if each point is located within a closed polygon region.

- Boundary polygon could have a single closed polygon (Figure 2.10a) or multi polygons (Figure 2.10b). For the multi-polygon case, a point is determined to be interior if it is inside one of polygons.

- If the boundary has a hole, a point inside the hole should be considered to be exterior. A ‘inside’ hole polygon is recognized by its rotation direction; clockwise or anti-clockwise. From the case in Figure 2.10 c & d, the point in the center belongs to Roanoke City (FIPS=51770) not to Roanoke County (FIPS=51161), showing that our routine performs well as expected.

Figure 2.10 Examples of “inside polygon” determination. Shaded areas are county/FIPS boundaries for (a) Sacramento County, CA (06067), (b) Hancock county, ME (23009), (c) Roanoke County, VA (51161), and (d) Roanoke City, VA (21770). Blues crosses indicate AIRNow site locations outside given county/FIPS boundary and red crosses indicate site locations inside the county/FIPS boundary.

Page 25: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

25

In order to investigate basic computational algorithms and to test the performance of raster data processing, we have collected raster data. We downloaded Global Land Cover Characterization (GLCC) Version 1.2 data sets from U.S. Geological Survey (USGS) (http://edc2.usgs.gov/glcc/glcc.php), which includes Global Ecosystems, International Geosphere Biosphere Programme (IGBP), Biosphere Atmosphere Transfer Scheme, Simple Biosphere Model Scheme, Simple Biosphere 2 Model, and USGS Land Use/Land Cover Scheme data sets with global 1-km resolution coverage. Algorithms for raster data processing is rather straightforward compared to vector processing algorithms that use complicated polygon clipping algorithms. However, optimizations of raw data accessing methods and pixel indexing are required to handle huge raster data efficiently. We have built partial data accessing routines for several GIS raster data formats such as Geo Tagged Image File (GeoTIFF) and ERDAS IMAGINE (.IMG) file to avoid unnecessary access of the whole data set that usually causes memory problems. We also utilized the histogram reverse-indexing method from the IDL histogram routine, which enables easy access of grouped pixels for given indices (e.g. target domain cell index). For the detailed description on IDL histogram reverse-indexing, one can refer to http://www.idlcoyote.com/tips/histogram_tutorial.html. It also generates statistics of pixel values within each grid cell with improved speed and enhanced control of memory usage. Figure 2.11 shows examples of USGS LULC data processing for urban fraction and water fraction.

Figure 2.11 Example of USGS LULC data processing for urban fraction (left) and water fraction (right).

Page 26: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

26

3. Application of IGDP

In this chapter, we describe examples of various geospatial data sets that were processed using IGDP tools. We focus on presenting data processing and visualization examples. Detailed discussion on scientific features from these examples is beyond the scope of this project, and can be developed into future project(s) with specific scopes of scientific interests.

3.1 Satellite data processing

The IGDP tool can be used in various satellite data processing for visualization and model comparison. As discussed in previous sections, the conservative spatial regridding method can be applied in converting swath-based satellite data into various model grids. Figure 3.1 shows examples of satellite data processing in various domain settings for GOME-2 NO2 column density (upper), OMI NO2 column density (middle), and GOME-2 Formaldehyde column density (lower). Left columns show examples of regridded data in global 1 degree resolution. Middle and right columns show cases for the 12-km CONUS domain and 4-km Houston domain, respectively.

Figure 3.1 Examples of satellite data processing in various domain definitions using the conservative regridding method.

Page 27: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

27

3.2 Land use land cover data

Efficient processing of land use land cover (LULC) data is also important in meteorological and chemical modeling. Figure 3.2 shows examples of LULC data processing using the Global Land Cover Characterization (GLCC) LULC data sets, using the IGDP tool, over the CONUS domain. The GLCC data set (http://edc2.usgs.gov/glcc/glcc.php), developed by The U.S. Geological Survey's (USGS) National Center for Earth Resources Observation and Science (EROS), the University of Nebraska-Lincoln (UNL) and the Joint Research Centre of the European Commission, is a 1-km resolution global land cover characteristics data base for use in a wide range of environmental research and modeling applications (Loveland et al., 2000). The land cover characterization effort is part of the National Aeronautics and Space Administration (NASA) Earth Observing System Pathfinder Program and the International Geosphere-Biosphere Programme-Data and Information System focus 1 activity, funded by the USGS, NASA, U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, U.S. Forest Service, and the United Nations Environment Programme. Figure 3.2 shows examples of dominant LULC types for USGS Land Use/Land Cover Scheme, International Geosphere Biosphere Programme (IGBP), Simple Biosphere 2 Model, Vegetation Lifeforms,Biosphere Atmosphere Transfer Scheme, Simple Biosphere Model Scheme, and Global Ecosystems data.

Figure 3.2 An Example of LULC data processing using IGDP tool. USGS Land Use/Land Cover Scheme, International Geosphere Biosphere Programme (IGBP), Simple Biosphere 2 Model, Vegetation Lifeforms, Biosphere Atmosphere Transfer Scheme, Simple Biosphere Model Scheme, and Global Ecosystems data sets are shown from the upper left.

Page 28: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

28

4. Test run with WRF

In order to demonstrate the IGDP tool’s capability to improve input data for modeling studies, we have performed an engineering run using the WRF model. This test demonstrates a case study using two sets of Sea Surface Temperature (SST) data: A base case using NCEP real-time global (RTG) 0.5 degree SST, and a test case using SST-update using Geostationary Operational Environmental Satellites (GOES) SST data.

4.1 SST construction from geostationary satellite

Following approaches in Byun et al. (2007), we constructed SST diurnal variation using

where a is the amplitude of SST variation, d is the phase shift, m is the mean SST, td is hour of a day, and ts is day number.

Figure 4.1 SST diurnal variations inside Galveston bay and outer ocean.

)(),(

),(

),(

))(242sin(),,,(

_

_

_

sdailymeanmonthly

meanmonthly

meanmonthly

dds

tmyxmmyxddyxaa

mdtattyxSST

+=

=

=

+−⋅=π

Page 29: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

29

4.2 Model setting

An engineering test was conducted for a period during the Second Texas Air Quality Study in 2006. The configuration including model domains, grid resolution, physics options and nudging strategy follow Lee et al., 2012, which was done according to the WRF model protocol used by TCEQ’s SIP modeling. The modeling domain structure consists of nested domains of different resolutions: a coarse grid domain (36-km cell size, named as ‘NA36’) that covers the continental United States, a regional domain (12-km cell size, named as ‘SUS12’) over the Texas and surrounding areas, and a fine domain (4-km cell size, named as ‘TX04’) covering the Eastern Texas area (Figure 4.2).

WRF version 3.4.1 was used for the engineering test for a time period June 3rd to 12th, 2006. The initial and boundary conditions for WRF originate from the National Centers for Environmental Prediction (NCEP) North American Mesoscale (NAM) 3 hourly analyses while the SST was updated daily by the NCEP real-time global sea surface temperature analysis in 0.5 deg grid spacing. Physics options are listed in Table 4.1.

The satellite sea surface temperature (SST) was processed and replaced the SST in the WRF input files by utilizing the IGDP tool. The WRF result with updated SST was compared with the control case in which the NCEP SST was used through the standard WPS processing. By using the IGDP tool, we did not need to re-run the WPS step or include extra coding of WPS for incorporating the satellite SST data in the WRF initial and boundary files.

Figure 4.2 WRF domains used for model simulations in three different spatial resolutions: 36-km (NA36), 12-km (SUS12) and 4-km (TX04).

Page 30: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

30

Table 4.1 Model configurations used in this study. Domain name NA36 SUS12 TX04 Resolution 36 km 12 km 4 km Domain coverage Continental US Texas & adjoined states Eastern Texas Horizontal grid 162 x 128 174 x 138 216 x 288

Initialization NAM + NCEP daily SST Run in 2-way nesting

Nest-down of SUS12

Microphysics WSM5a WSM6b

Cloud scheme KFc None Radiation scheme RRTMd for longwave radiation

MM5 (Dudhia)e for shortwave radiation PBL scheme YSUf scheme Land surface model 5-layer slab modelg

Nudging 3D grid nudging (no nudging of mass fields within PBL) aWRF Single-Moment 5-class (Hong et al., 2004). bWRF Single-Moment 6-class (Hong and Lim, 2006). cKain and Fritsch scheme (Kain, 2004). dRapid Radiative Transfer Model scheme (Mlawer et al. 1997). eDudhia (1989). fYonsei University scheme (Hong et al., 2006). g 5-layer soil temperature model (Grell et al., 1994).

4.3 Basic model evaluation

In order to confirm the WRF model’s general performance, we first conducted basic evaluation of WRF engineering test run outputs. We have collected Meteorological Assimilation Data Ingest System (MADIS) surface temperature and wind speed, and compared with WRF simulation outputs (Figure 4.3&Figure 4.4). In general, 2-m temperature and 10-m wind speed simulation agree well with MADIS observations. Figure 4.5 also shows spatial distribution comparisons for 2-m temperature and 10-m wind speed. Based on these basic evaluations, we conclude the current simulation has no specific issue for the purposes of the engineering test for the IDL-based Geospatial Data Processor (IGDP) tool.

Page 31: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

31

Figure 4.3 Time series plot for 2-m temperature comparison between CMAQ (red) and MADIS surface observations (circle).

Figure 4.4 Same with Figure 4.3 except for 10-m wind speed.

Page 32: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

32

Figure 4.5 2-m temperature and 10-m wind speed spatial distributions. Plots are snapshots at Jun. 7, 2013, 00UTC.

4.4 Test run

Figure 4.6 shows the difference plot of NCEP SST used in the control and the updated SST processed by the IGDP tool. Two figures present results for the early morning and late afternoon. In general, the IGDP processed SST,which was obtained by hourly satellite data, had much clear diurnal variation than the NCEP SST that was from daily averaged SST interpolated to WRF input intervals. Over the Gulf in the morning, the SST values in the simulation with IGDP SST were lower than the NCEP SST while in the afternoon along the coast of Texas, the IGDP processed SST was higher than the control case. The use of more frequent SST data updated by the IGDP tool did not have too big of an impact directly on the 2-m temperature and 10-m wind. However, it did potentially change the precipitation patterns that were observed a couple of times throughout the 10-day simulation period. Figure 4.7 shows an example.

Page 33: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

33

Figure 4.6 Difference plots of SST (simulation with NCEP SST and simulation with IGDP updated SST) at 11 UTC and 23 UTC on June 6th, 2006

Page 34: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

34

Figure 4.7 Difference plots (simulation with NCEP SST and simulation with IGDP SST) of 2-m temperature (left) and 10-m wind speed (right) at 14 UTC on June 6th, 2006

4.5 Discussion

In this chapter, we have performed a WRF simulation to demonstrate the IGDP tool's capability to modify and to improve input data for modeling studies. The test run shows a case study using two different SST data sets, a base case using traditional NCEP SST data (0.5 degree spatial resolution) and a test case using a modified SST data set from geostationary satellite. We processed both sets using the IGDP tool and improved their results in terms of the spatial and temporal resolution. Although, detailed analysis of scientific meaning of each run remain for future study, this test case clearly shows the advantage of the IGDP tool, which can provide enhanced information using additional data (e.g. geostationary satellite data in this test run) on top of conventional processing of basic data sets for modeling studies.

Page 35: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

35

5. Summary and conclusions

Fast and accurate handling of various geospatial data, such as GIS and satellite data, is important in regional meteorology and chemical simulations. Accurate land use and vegetation information is crucial in meteorological simulations, especially for land surface exchange, and is also important in air quality simulations, in association with the accurate allocation of emission sources. There has been increasing demand for geospatial data processing tools as finer resolution meteorology and air quality simulations become more commonplace.

This project was to investigate basic computational algorithms to handle Geographic Information System (GIS) data and satellite data, which are essential in regional meteorological and chemical modeling. It developed a set of generalized libraries within a geospatial data processing framework aiming to process geospatial data more efficiently and accurately. The tool processes GIS data both in vector format (e.g., ESRI shape files) and raster format (e.g., GEOTIFF and IMG) for any given domain. Processing speeds are improved through selective usages of polygon-clipping routines and other algorithms optimized for specific applications. The raster tool has been developed utilizing a histogram reverse-indexing method that enables easy access of grouped pixels. It generates statistics of pixel values within each grid cell with improved speed and enhanced control of memory usage. Spatial allocating tools that use polygon clipping algorithms require huge computational power to calculate fractional weighting between GIS polygons (and/or polylines) and gridded cells. To overcome the speed and computational accuracy deterioration issues, it is crucial to have an efficient polygon/polyline clipping algorithm. A key for faster spatial allocation computation was to optimize computational iterations in both polygon clipping and map projection calculations.

The project has achieved the following specific objectives: (A) To develop an optimized geospatial data processing tool that can transform raster data format and vector data format to any target domain within the data coverage more efficiently and accurately. (B) To collect and to process sample GIS and satellite data so that they are readily deployable for modeling studies. Applications will include a spatial regridding method for emission and satellite data, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) Aerosol Optical Depth (AOD), the Ozone Monitoring Instrument (OMI), and the Global Ozone Monitoring Experiment (GOME)-2 NO2 column data. (C) To perform an engineering test to demonstrate the IGDP tool's capability to modify and improve modeling input data.

Page 36: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

36

6. References

Byun, D., Kim, S., Cheng, F., Kim, H., & Ngan, F. (2007). Improved Modeling Inputs: Land Use and Sea Surface Temperature (p. 33).

Byun, D., Ngan, F., Cheng, F., Kim, H., & Kim, S. (2008). Improvement of MM5 surface characteristics (p. 44).

Byun, D., Ngan, F., & Kim, H. (2011). Improvement of Meteorological Modeling by Accurate Prediction of Soil Moisture in the Weather Research and Ofrecasting (WRF) Model (p. 46).

Cheng, F.-Y., & Byun, D. W. (2008). Application of high resolution land use and land cover data for atmospheric modeling in the Houston–Galveston metropolitan area, Part I: Meteorological simulation results. Atmospheric Environment, 42(33), 7795–7811. doi:10.1016/j.atmosenv.2008.04.055

Fruchter, A. S., & Hook, R. N. (2002). Drizzle: A Method for the Linear Reconstruction of Undersampled Images. Publications of the Astronomical Society of the Pacific, 114(792), 144–152. doi:10.1086/338393

Grell, G., Dudhia, J., & Stauffer, D. (1994). A description of the firth-generation Penn State/NCAR mesoscale model (MM5). NCAR Technical Note.

Skamarock, W. C., & Klemp, J. B. (n.d.). A time-split nonhydrostatic atmospheric model for weather research and forecasting applications.

Smith, J. D. T., Armus, L., Dale, D. a., Roussel, H., Sheth, K., Buckalew, B. a., … Kennicutt, J. . R. C. (2007). Spectral Mapping Reconstruction of Extended Sources. Publications of the Astronomical Society of the Pacific, 119(860), 1133–1144. doi:10.1086/522634

Sutherland, I. E., & Hodgman, G. W. (1974). Reentrant polygon clipping. Communications of the ACM, 17(1), 32–42. doi:10.1145/360767.360802

Vatti, B. R. (1992). A generic solution to polygon clipping. Communications of the ACM, 35(7), 56–63. doi:10.1145/129902.129906

Page 37: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

37

7. Appendix

In this chapter, we introduce a sample package for spatial regridding of satellite NO2 data, so readers can understand and follow IGDP libraries in their own applications. This package also includes a visualization library, Model Instant Report / IDL Enhanced (MIRIE) tool, to generate spatial plot examples.

7.1 User guide

The sample package is delivered as "igdp_sample.tgz", and should be uncompressed before example run.

"run1.pro" simply provides file names (CMAQ CONC and MCIP METCRO3D) to the main routine, "run_no2.pro".

Figure 7.1 shows a schematic diagram for the "run_no2.pro" routine. It has several sub-routines:

(1) “aqm_report_info.pro” investigates basic information from CMAQ CONC files including domain setting (e.g. “minfo” structure) and time time-flags (e.g. “tflag” array). These data are passed to the “rd_no2_gome2_temis.pro” routine.

(2) “aqm_tflag.pro” examines time-flags and determines target days to read and generate plots. (3) “rd_no2_cmaq.pro” reads CMAQ CONC and MCIP METCRO3D files and calculates NO2 column

density data. If /show keyword is set, it generates spatial plots. (4) “rd_no2_gome2_temis” reads and merges TEMIS GOME-2 NO2 column data from multiple files,

and performs regridding. a. “dn_no2_gome2_temis.pro” downloads necessary GOME-2 NO2 column data from the

TEMIS wetpage. b. “rd_gome2_no2_file.pro” is a routine to read TEMIS GOME-2 data from one file.

(Provided from TEMIS website)

tar xvf igdp_sample.tgz cd igdp IDL> run1

file1 = file_search('data/cmaq',’CONC.ncf') file2 = file_search('data/cmaq','METCRO3D.ncf') run_no2,file1,file2 end

Page 38: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

38

c. “aqm_regrd.pro” performs spatial regridding, conservative remapping or conservative downscaling (if weight= keyword is set)

d. “aqm_plot_tile.pro” is a part of the MIRIE system and generates spatial plots.

Figure 7.1 A schematic plot for “run_no2.pro”.

Figure 7.2 shows examples of processed CMAQ and satellite NO2 column density data.

Figure 7.2 Spatial plots of processed CMAQ and GOME-2 NO2 column data.

Page 39: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

39

7.2 IDL library description

Descriptions for sub-routines and calling sequences for selected routines are provided in the section.

NAME FUNCTION aqm_array.pro aqm_plot_shp.pro polyclip.pro run_scr.pro aqm_fgi.pro aqm_plot_tile.pro color_index.pro idx.pro polyfillaa.pro str.pro aqm_grid_index.pro color_load.pro pos.pro struct_list.pro aqm_polyfillaa.pro color_table.pro struct_make.pro aqm_grid_mid.pro aqm_proj.pro cvdat.pro map_3points.pro rd_m3b.pro struct_merge.pro aqm_grid_minfo.pro aqm_regrd.pro data_check.pro rd_no2_cmaq.pro struct.pro aqm_grid.pro aqm_report_info.pro minfo.pro rd_no2_gome2_temis.pro struct_read.pro aqm_lt2utc.pro aqm_tflag.pro ncdf_info.pro rd_shp.pro struct_retag.pro aqm_map_proj.pro array2.pro date2.pro

Array calculation GIS shape file plotting Polygon clipping routine Script execution Fractional grid index calculation Spatial plot routine Color index handling routine Data indexing routine Polygon clipping routine String data handling routine Lon/lat to grid conversion Graphic: Color loading Graphic subroutine Structure data routine Polyfillaa wrapper Graphic: color table handling Structure data handling routine “Map ID” Projection conversion Wind data converting routine Spherical area calculation M3 data reading Structure data handling routine Map information Regridding routine Data check and convert routine CMAQ NO2 column data reading routine Structure data handling routine Grid information handling routine Read basic M3 file information Map information checking routine Read TEMIS GOME-2 NO2 column data Structure data handling routine Convert local time to UTC based on minfo Exmaine time-flags Examine ncdf information Read GIS shape file Structure data handling routine Map projection conversion routine Array data handling routine Time-flag data handling routine

Page 40: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

40

path_check.pro rd_text.pro struct_set.pro array.pro date.pro pause.pro struct_tag.pro aqm_plot_info.pro dn_gadmv1.pro plot_circle.pro struct_write.pro aqm_plot_mapdata.pro chdmod.pro dn_no2_gome2_temis.pro plot_save.pro unit.pro aqm_plot_map_find.pro file_name.pro plot_shp.pro var_set.pro aqm_plot_map.pro color_bar.pro run_no2.pro

Directory path checking routine ASCII data reading routine Structure data handling routine Array data handling routine Time-flag data handling routine Temporary pause Structure data handling routine Check information for aqm_plot Gadm data downloading Circle plotting Structure data handling routine Map data handling routine Convert display modes Download TEMIS GOME-2 NO2 data Save plot Unit conversion Determine map data File name handling Shape file plotting Variable status checking routine Map plotting routine Plot color bar Sample package for NO2 column data processing

Calling sequences for main routines are shown below.

; ; NAME ; run_no2 ; ; PURPOSE ; To read NO2 column density data from CMAQ and TEMIS GOME-2 ; ; USAGE ; run_no2,file1,file2[,cday=] ; ; (Ex) ; ; file1 = file_search('/data/data03/aqf/hyunk/data/aqm/AQMD/cctm','CCTM*.CONC.*.ncf') ; file2 = file_search('/data/data03/aqf/hyunk/data/aqm/AQMD/mcip','METCRO3D*.ncf') ; run_no2,file1,file2,cday=cday ; ; ; INPUT ; file1 : CMAQ CONC files including 3-dimensional "NO2" field ; file2 : MCIP METCRO3D files ("DENS" and "ZF")

Page 41: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

41

; cday : target days ; ; AUTHOR ; Hyun Cheol Kim ([email protected]) ;------------------------------------------------------------------------------- pro run_no2,file1,file2,cday=_cday ; ; ; NAME ; aqm_report_info ; ; PURPOSE ; To read basic information from M3data files ; ; USAGE ; ii = aqm_report_info(file) ; ; INPUT ; file : M3 data file ; ; AUTHOR ; Hyun Cheol Kim ([email protected]) ;------------------------------------------------------------------------------- function aqm_report_info,file ; ; NAME ; rd_no2_cmaq ; ; PURPOSE ; to read NO2 column data from CMAQ outputs ; ; USAGE ; rr = rd_no2_cmaq(cday,file1,file2,[lhour=,platform=][,domain=,minfo=][,/show]) ; ; INPUT ; cday : target day ; file1 : CONC file(s) ; file2 : METCRO3D file(s) ; lhour : local hour to read ; platform : use local time corresponding satellite intruments ('GOME','GOME2','OMI','SCIAMACHY') ; domain : domain information ; ; KEYWORD

Page 42: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

42

; show : show/save plots ; ; AUTHOR ; 2013-04-19 Hyun Cheol Kim ([email protected]) ;------------------------------------------------------------------------------- function rd_no2_cmaq,cday,file1,file2,platform=platform,lhour=lhour,show=show,_extra=_ex,opt=opt ; ; NAME ; rd_no2_gome2_temis ; ; PURPOSE ; to read TEMIS GOME-2 ; ; USAGE ; rr = rd_no2_gome2_temis(cday[,domain=,minfo=][weight=][cloud=][,max_size][,/show] ; ; INPUT ; cday : target day ; domain/minfo : target area ; cloud : maximum cloud fraction to allow (default is 0.4) ; max_size : maximum pixel size to allow (default is 8000 km2) ; ; KEYWORD ; pixel_size : return pixel size ; show : generate plots ; ; AUTHOR ; Hyun Cheol Kim ([email protected]) ;------------------------------------------------------------------------------- function rd_no2_gome2_temis,cday,domain=domain,minfo=minfo,weighting=wt,$ cloud=_cldfrac,max_size=_max_size,pixel_size=pixel_size,show=show,_extra=_ex ; ; ; NAME ; dn_no2_gome2_temis ; ; PURPOSE ; to download TEMIS GOME-2 NO2 colunn data ; ; USAGE ; file = dn_no2_gome2_temis(cday) ; ; INPUT

Page 43: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

43

; cday : target day to read ; ; AUTHOR ; Hyun Cheol Kim ([email protected]) ;------------------------------------------------------------------------------- function dn_no2_gome2_temis,cday ; ; ; NAME ; aqm_regrd ; ; PURPOSE ; to regridpixeled data into target domain ; ; USAGE ; rr = aqm_regrd(tlon,tlat,data[,domain=,minfo=][,min_fraction=][,finfo=][,/sum,/avg,/total,/mean,/average][,missing=][,save=][,/show,winn=]) ; ; INPUT ; tlon/tlat : tiled lat/lon (rf. tlon = array(lon,/tile)) ; data : raw data ; domain/minfo : target domain setting ; finfo : fraction information (could be recycling for better speed) ; min_fraction : minimum fraction for each grid cell to use ; missing : missing number ; save : save/use finto to file ; winn : set window with /show ; weight : array for weighting ; ; KEYWORD ; sum/total : get total value ; avg/mean/average : get weighted average ; weight : if set, using weighting method ; show : show plot ; ; AUTHOR ; 2013-04-04 Hyun Cheol Kim ([email protected]) ; Rewritten from aqm_rgdf.pro using list for accurate calculation ; 2013-04-22 Added weighting option ;------------------------------------------------------------------------------- function aqm_regrd,tlon,tlat,data,domain=domain,minfo=minfo,$ finfo=finfo,min_fraction=mf,min_size=min_size,max_size=max_size,$ sum=sum,avg=avg,total=total,mean=mean,average=average,weighting=wt,$ missing=missing,save=save,show=show,winn=winn

Page 44: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

44

; ; ; NAME ; aqm_plot_tile ; ; PURPOSE ; to generate spatial (tile) plot ; ; USAGE ; aqm_plot_tile,data[,lon=,lat=|,tile=|,domain=|,minfo=|,file=] ; ; INPUT ; data : 2D data ; lon : longitude ; lat : lattitude ; tile : ll locations of each cell [5,nx*ny] ; domain : domain nick name ; mfino : map information ; file : data file name (to obtain minfo) ; var : variable name (used to get predefined color table) ; tflag : time tflag string (to plot time stamp) ; position : plot position ; xsize : plot width ; ysize : plot height ; range : color index range (pass to color_index) ; unit : data unit (color_bar labeling) ; timezone : convert tflag to given timezone ; limit : map_set limit (for zoom in) ; title : plot title ; wind : add wind arrow (wind = {u:u,v:v,lon:lon,lat:lat} ; obs : plot observation circles ; mapdata : pass map data (see aqm_plot_map with /skip_plot) ; missing : missing value ; color : pass color_load ; gif : save gif ; png : save png ; jpg : save jpg ; winn : window position in X display (0~3) ; option : any additional options ; ; ; KEYWORD ; congrid : use congrid/tv method in tile plotting (default is using polyfill) ; congrid is fater, but polyfill is more accurate ; hide : do not show plot ; pause : pause ;

Page 45: Development of an IDL-based geospatial data processing ...aqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 Final Report.pdf · Development of an IDL-based geospatial data processing

45

; AUTHOR ; 2010-01-13 Hyun Cheol Kim ([email protected]) ; 2010-05-14 updated cl,ct,ci,cb ;------------------------------------------------------------------------------- pro aqm_plot_tile,_data, $ lon=lon,lat=lat,tile=tile, $ domain=domain,minfo=minfo,file=file,$ var=var,tflag=tflag,position=position,xsize=xsize,ysize=ysize,range=range,$ unit=unit,timezone=timezone,limit=limit,congrid=congrid,title=title,$ wind=wind,obs=obs,front=front,mapdata=mapdata,missing=missing,color=color,$ gif=gif,png=png,jpg=jpg,update=update,$ winn=winn,hide=hide,pause=pause,option=option,_extra=_ex ;