stefan falke center for air pollution impact and trend analysis

30
Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis Gregory Stella Alpine Geophysics, LLC Terry Keating US EPA – Office of Air & Radiation Demonstration of a Distributed Emissions Inventory using Web Technologies

Upload: britain

Post on 14-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Demonstration of a Distributed Emissions Inventory using Web Technologies. Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions. Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stefan Falke Center for Air Pollution Impact and Trend Analysis

Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions

Stefan FalkeCenter for Air Pollution Impact and Trend Analysis

Washington University in St. Louis

Gregory StellaAlpine Geophysics, LLC

Terry KeatingUS EPA – Office of Air & Radiation

Demonstration of a Distributed Emissions Inventory using Web Technologies

Page 2: Stefan Falke Center for Air Pollution Impact and Trend Analysis

2

Background

In support of this longer term goal, the Commission on Environmental Cooperation (CEC) and the US EPA have initiated a project to develop a prototype web tool for enabling uniform access to distributed emissions data from North American electricity generating power plants.

Air pollutant emission inventories for the US, Canada, and Mexico are compiled, stored and disseminated using different methods

The development of a single comprehensive and accurate emissions inventory is essential for the coordinated reporting, policy development, transport analyses, and socio-economic studies that create an environment for collaboration among international researchers, policy-makers, and the interested public

Page 3: Stefan Falke Center for Air Pollution Impact and Trend Analysis

3

Distributed Data and Management Networks

• CyberinfrastructureNSF’s initiative to apply new IT to building new ways of conducting collaborative research

http://www.communitytechnology.org/nsf_ci_report/

• Earth Observation SummitInternational effort to build comprehensive, coordinated, and sustained Earth observation systems

http://www.earthobservationsummit.gov

• EcoinformaticsEPA’s vision for national and international cooperation in data and technology development

http://oaspub.epa.gov/sor/user_conference$.startup

Advances in information science and technology are driving the trend toward distributed networks and virtual communities for science and management.

Integrated Ocean Observing SystemInternational network of ocean related monitoring, assessment, and communication http://www.ocean.us/

Linked Environments for Atmospheric Discovery Network of high-performance computers and software to gain new insights into weather http://lead.ou.edu/

Virtual ObservatoryNetwork for astronomical data sharing and distributed analysis http://www.us-vo.org/

Page 4: Stefan Falke Center for Air Pollution Impact and Trend Analysis

4

Emissions Community Collaborative Activities

NIF Data standards► Standard format and submission► NEI XML schema

Environmental Information Exchange Network► Network linking EPA, States, and other partners through the Internet and

standardized data formats

Facility Registry System ► Standard facility codes and locations

Data Sharing Efforts► States, Tribes, Local agencies, RPOs ► North America

Page 5: Stefan Falke Center for Air Pollution Impact and Trend Analysis

5

…is both a conceptual framework and implementation effort for the development of a fully integrated, distributed air emissions inventory – and the foundation for an all-media environmental information network

– Tie together data at all spatial and temporal scales using emerging distributed database technologies

– Provide shared, online tools for processing and analysis– Provide for the seamless merging, manipulation and analysis of Internet accessible air

quality-relevant data through the development of emerging Internet-oriented technologies – Make use of existing resources – link and partner with other efforts– Build a broad-based air quality user community: scientists, regulators, policy analysts and

the public – Create the network and toolkit piece-wise through multiple, connected projects

Networked Environmental Information System for Global Emissions Inventories

NEISGEI

Ongoing Efforts:NSF-EPA Digital Government Funded Projects:

The California Air Resources NetworkFire & Air Quality Data and Tools Network

Future Effort:EPA OAR RFA on Distributed Air Quality Data in Support of NEISGEI

Page 6: Stefan Falke Center for Air Pollution Impact and Trend Analysis

6

CAREN: The California Air Resources Network

US EPA

RPOs

AQMDs

Municipalities

TribesStates

???

Environmental data sharing among international, national, state and local governments, the public and academic and other non-governmental research organizations is a difficult challenge.

► Barrier: Technological incompatibilities► Barrier: Data format incompatibilities► Barrier: Financial (staff time) limitations

Eduard Hovy, Jose-Luis Ambite, Andrew Philpot USC Information Sciences Institute

The Solution Strategy (First Step):Automate the integration of heterogeneous databases

Use semi-automated information integration methods to generate translation protocols between related information sources, e.g.AQMD and CARB.

Page 7: Stefan Falke Center for Air Pollution Impact and Trend Analysis

7

Fire and Air Quality Data Network

Data Catalog

FS Coarse Spatial Data

NEI Fire Emissions

BLM Fire History

Wildland Fire Assessment System

HMS Fire Detection

Spatial Interpolation

Service

VIEWS

CAPITA

Data Wrappers

ftp

text tables

RDBMS

Data Sources

Data ‘wrappers’ are used to translate the format of data sets into a uniform format. The data are either stored on the CAPITA database server or dynamically accessed from its original source The datasets are registered with metadata in the data catalog GIS-type interfaces provide users with ways to view, analyze, and export the data

Page 8: Stefan Falke Center for Air Pollution Impact and Trend Analysis

8

Envisioned Emissions Community Resource of Data & Tools

XML

GIS

EstimationMethods

RDBMS

GeospatialOne-Stop

TransportModels

EmissionsInventoryCatalog

Users &Projects

Web Tools/Services

Emissions Inventories

Data Data Catalogs

Activity Data

Spatial Allocation

Comparison of Emissions

Methods

Data Analysis

Model Development

Wrappers

Emissions Factors

Surrogates

ReportGeneration

Page 9: Stefan Falke Center for Air Pollution Impact and Trend Analysis

9

Current Project Objectives

The project’s focus is on criteria pollutants and toxics because of their availability and accessibility.

Recommend and demonstrate to the CEC approaches for the comparability of techniques and methodologies for data gathering and analysis, data management, and electronic data communications for promoting access to publicly available electric utility emissions

Identify, collect, and review existing sources of electric generating utility (EGU) emissions and activity databases, and provide a summary of the state-of-science

Build a prototype web browser tool to query, retrieve, and explore emissions data from heterogeneous databases

• Demonstrate the utility of new information technologies in creating an integrated network of distributed data and tools• Dynamically link multiple existing data without requiring substantial modification on the provider’s end and provide interfaces that make the links transparent to the end user.• “Add value” to the linked data through the application of data analysis and processing tools

Page 10: Stefan Falke Center for Air Pollution Impact and Trend Analysis

10

Design Objectives

Distributed Non-intrusive to data provider Transparent to end user Flexible Extendable Light on user requirements

Page 11: Stefan Falke Center for Air Pollution Impact and Trend Analysis

11

Process of Building Demonstration

Identify and access relevant data (build wrappers)

Build relational database to temporarily store the data that are not accessible in a distributed manner

Acquire authorization and access to those data that are dynamically accessible through internet interfaces

Create field name “mappings” among datasets

Identify available web technologies for building a distributed emissions tool

Develop new components necessary for the prototype

Build web tool prototype for demonstrating the feasibility of exploring emissions data

Page 12: Stefan Falke Center for Air Pollution Impact and Trend Analysis

12

Available Internet Accessible Emissions Data

Data Source Time Coverage Pollutants Reporting Level

NEI (US)

1985-1999 (criteria)

1996-1999 (HAPs)NOx, SO2, CO, PM, VOC, HAPs

Boiler

eGrid (US) 1996-2000 NOx, SO2, CO2, Mercury Boiler & Generator

Clean Air Markets (US)

1980, 1985,

1988-1999 NOx, SO2, CO2

Generator

NPRI (Canada) 1994-2001

HAPs

(Criteria starting in 2002)

Facility

These are publicly available, on-line accessible emissions data. Other data resources are available, but at this time only in hard copy form and therefore not usable in demonstrating distributed database concepts.

• NEI, NPRI, and eGrid data were downloaded and stored in relational databases on the CAPITA server• BRAVO Mexican emissions data were obtained in electronic format and imported to the CAPITA server• Clean Air Markets was identified as the source most suitable for demonstrating distributed access

Page 13: Stefan Falke Center for Air Pollution Impact and Trend Analysis

13

Emissions Data Characteristics

Web browser queryWeb map serverRecent database structure upgrade

Web browser queryRemotely accessible using SecuRemoteNot yet publicly accessible

Web browser queryRemotely accessible using SecuRemoteOracle database

Downloadable Excel SpreadsheetsPlans for a dynamic web system were shelved

Page 14: Stefan Falke Center for Air Pollution Impact and Trend Analysis

14

Database Fields Mapping

Emissions inventories are based on different underlying data models.

Each inventory uses a uniquely defined set of field names. However, many of these field names are similar to (or their content is similar to) fields in another country’s inventory.

Some of the key relationships among the inventories have been captured by developing a “mapping” among fields.

These mappings provide a set of connections that can subsequently be applied to automated query and integration of data from multiple inventories.

SO2

SO2Yr

SO2_Ann

Sulfur Dioxide

SO2

Page 15: Stefan Falke Center for Air Pollution Impact and Trend Analysis

15

Leveraging Multiple Projects

The challenges in distributed information systems should be addressed by collaborative efforts across governments, agencies, researchers, and disciplines.

Common underlying goals among projects provide opportunities to naturally design systems to interoperate

Avoids one-time, stand alone solutions that cannot be reused

Page 16: Stefan Falke Center for Air Pollution Impact and Trend Analysis

16

DataFed.Net

The Aerosol Data and Services Federation, (http://www.datafed.net), is a network of providers and users for sharing atmospheric data and processing services. DataFed includes a Community of participants who share and use data and processing services, Mediator software component to homogenize data access and Peer-to-peer computing for composing web applications.

R. Husar, CAPITA

Page 17: Stefan Falke Center for Air Pollution Impact and Trend Analysis

17

Catalog of Air Quality Related Data

Metadata for each dataset are registered in a catalog allowing users to browse available datasets and determine which datasets to use for their particular application.

The catalog entries includes data access instructions.

http://capita.wustl.edu/dvoy_2.0.0/dvoy_services/datafed.aspx?

select data domain: aerosols, emissions, fire …

The emissions data are registered in the DataFed.net catalog

Page 18: Stefan Falke Center for Air Pollution Impact and Trend Analysis

18

Preview Data

Page 19: Stefan Falke Center for Air Pollution Impact and Trend Analysis

19

Modular Applications for Working with Online Data

Emissions data is multi-dimensional (plant, year, pollutant, fuel type, boiler capacity, etc.).Its multidimensionality requires multiple “views” of the data in a variety of end-use applications.

Data views can be created in the DataFeds.net framework, including maps, time series, and tables. Each view is independently linked to its data sources and described (i.e. geographic and temporal extents). Using the data access instructions registered in the catalog, a view can be dynamically assembled.

The modularly designed views can be embedded and controlled in web pages using Javascript, ASP or other web application programming languages.

Page 20: Stefan Falke Center for Air Pollution Impact and Trend Analysis

20

Web Services

Substantial progress has been achieved in data interoperability. One the next advances required is interoperable data analysis/processing tools.

Web services are applications that are used over the Web. Because they are self-contained and use XML-based standards (SOAP, WSDL, UDDI) for describing themselves and communicating with other web resources, they can be reused in a variety of independent applications.

Many of the analysis and processing tools used by the air quality management community could benefit from web service technology. Not only can their data be shared but their heterogeneous, distributed tools that operate on that data can be shared as well.

A longer term vision for web services is to be able to “orchestrate” or “chain” multiple services from multiple providers so that new “third party” applications can be constructed.

Page 21: Stefan Falke Center for Air Pollution Impact and Trend Analysis

21

In this example, the user can change the pollutant, date, map zoom data on/off, and map/time scales.

Example Web Services Application

Emissions data from multiple databases are displayed on maps, time series, and tables. Tools are included for browsing and querying the data.

Page 22: Stefan Falke Center for Air Pollution Impact and Trend Analysis

22

North American Emissions Demonstration Data Flow

MapPointAccess

DataCatalog

MapPointRender

MapImageOverlay

MapImageAccess MapImageRender

The settings of each web service can be changed by the user, creating a dynamic application

Wrappers

= web service

BRAVO

NPRI

NEI

eGrid

CAM

DataSet = NPRIYear=1999Parameter=SO2

MapPointAccess

DataSet = eGridYear=1999Parameter=SO2

MapPointAccess

DataSet = NEIYear=1999Parameter=SO2

Color = YellowSymbol=BarWidth=8

MapPointRender

Color = RedSymbol=BarWidth=8

MapPointRender

Color = BlueSymbol=BarWidth=8

DataSet = N.Am. Borders

Layer Order = N.Am, NEI, eGRid, NPRI

Color= Maroon Size=2

Name

Settings

Page 23: Stefan Falke Center for Air Pollution Impact and Trend Analysis

23

Embedded Images and Controllers in Web Page

Parameter Controller

Date Controller

Query Controller

The controllers and map image view can be linked and assembled in a web page. Changing the settings of a controller changes the URL of the map image and updates the web page.

The web page can be constructed using standard web application programming languages, such as JavaScript and ASP.

Page 24: Stefan Falke Center for Air Pollution Impact and Trend Analysis

24

Project Results

Technology has been demonstrated to be at the point where we can begin to apply some of the distributed database concepts to “real” situations

Emissions data present unique challenges due to their complex relational dimensionality

Collaborative efforts in the near future could generate a distributed North American emissions inventory– The initial versions of the inventory would help clarify the issues related

to handling complex queries

– Building and using a distributed emissions tool will assist in creating consensus data naming conventions

Page 25: Stefan Falke Center for Air Pollution Impact and Trend Analysis

25

Current Project Challenges

Dynamic Access to Data- technical snafus- security issues- slow performance

Complexity of Emissions Data

Quickly evolving technology

Page 26: Stefan Falke Center for Air Pollution Impact and Trend Analysis

26

What’s Missing

Metadata– More complete metadata would help in relating heterogeneous databases

More complete access to distributed datasets– A process for creating trusted provider-user agreements would help address

issues of security and data misuse

More comprehensive content– Networked data and tools that spark additional interest in the technology’s

potential

Actual Implementations– FASTNet will demonstrate the use of many of these technologies in the real time

monitoring of major aerosol events this summer

Page 27: Stefan Falke Center for Air Pollution Impact and Trend Analysis

27

Next Steps

Advance the prototype tool to make it more representative of what an emissions inventory would ultimately use

EPA’s RFA on Distributed Air Quality Data in Support of NEISGEI

Link to other web services (Geospatial One-Stop, TerraServer) and other relevant data sources, including Web Map Servers

Establish collaborative partnerships with other researchers and agencies developing related networks and tools (ReVa, Earth Science Federation (NASA), NSF Cyberinfrastructure efforts)

Build tools that add value to the distributed data and provide incentives for data providers to join networks

Clarify the handling of distributed data for optimal system performance

Page 28: Stefan Falke Center for Air Pollution Impact and Trend Analysis

28

Mediators

The flow of data from provider to user passes through brokers, or mediators.

The data continues to be maintained by original providers

Contracts are used to retain a constant link to the original data

Data is still not centrally stored but is “cached” in a format that allows efficient queries and analyses

Mediators provide an interface between the user and the data that enhances the effective information exchange between the two sides

Page 29: Stefan Falke Center for Air Pollution Impact and Trend Analysis

29

Adding Value to data through Services

Server Client

Mediator Services

Data Sources Data Users

multidimensional data cubes

analytical web services

Page 30: Stefan Falke Center for Air Pollution Impact and Trend Analysis

30

Demo Links

http://capita.wustl.edu/NAMEN/DemoFeb27.htm