handling publicly generated air quality datasensor alert service (sas) • interfaces enabling...

19
HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO MARCH 8 TH , 2017

Upload: others

Post on 08-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO

    MARCH 8TH, 2017

  • EXAMPLES OF DEP DATA AND CROWDSOURCING

    • Storm Readiness• Beach Assessments• Park Closings• Emergency Management – Social Media• Watershed Ambassadors• ARMS

  • AIR INFORMATION MANAGEMENT SYSTEM (ARMS)

    • There are 286 monitors at these 40 air quality stations tracking 35 distinct parameters (continuous and non-continuous)

    • NOX, NO, SO2, PM2.5, NO2, CO, O3, wind speed, wind direction, temp are monitored

    • There approximately 150,000,000 continuous air quality minute data points collected every year.

  • ARMS SITES & MONITORS

  • NJDEP Air and Radiation Monitoring System

    VerizonWirelessNetwork

    Verizon Wireless Access Points

    StandbyNJ Air/Rad Monitoring System

    @ 401 East State Street

    PrimaryNJ Air/Rad MonitoringSystem @ Troop “C”

    GSN

    MasterComm. Center

    & FTP

    PrimaryDatabase

    SlaveComm. Center

    & FTP

    Clus

    terin

    g

    StandbyDatabase

    StandbyComm. Center

    & FTP

    MOXA NPort

    Leased Lines & Phone lines

    Network

    Clustering

    Leased Line Modem

    Leased Line Modem

    CRESTWireless Sites

    CRESTLeased Line

    Sites

    Regular AirWireless Sites

    Air FRMWireless Sites

    DigiWireless Router

    DigiWireless Router

    DigiWireless Router

    Air TEOMWireless Sites

    DigiWireless Router

    Phone Line Modem for Backup

    Modem Bank for Backup

    Modem Bank for Backup

    Green Devices represent future projects

    MOXA NPort

    Air FRMPhone Line Sites

    Air TEOMPhone Line Sites

    Phone Line Modem

    Phone Line Modem

    FRM/TEOM Modem

    Public Access Layer

    Secure Access Layer

    Core layer

    Envista Comm Centerand Database

    Back Up Wireless Polling System

    Verizon Air Card

    DEP TLS

    Prepared by Harry Chen, 12/1/2010

    Oracle Observer(Manage Fast Start Failover)

    OIT ExtraNet WWWWWW.NJAQINOW.NET

    Hosted by EnvitechUpdated via FTP from ARMS Comm Center

  • CROWDSOURCING

    • a specific sourcing model in which individuals or organizations use contributions from Internet users to obtain needed services or ideas

    • Amazon Mechanical Turk• Kickstarter• Wikipedia

  • BACKGROUND

    • Massive data deluge in recent years• 80% of the worlds data is unstructured (images, videos, raw text, etc.)• Algorithms to fully comprehend unstructured data have not been developed

    yet

    • Many experts believe we are at least several decades away from this goal

  • CONSIDERATIONS OF USING INFO FROM 'CROWDS'

    • Can disseminate both valid and invalid information• Crowds often have no immediate way to discern truth from falsehood• Crowds are prone to add opinion to data; which sometimes sticks more than

    the credible data themselves. Separating opinion and credible data through expert interpretation and curation, both centralized and decentralized, is important

    • Very few organizational or procedural channels specifying how to aggregate and incorporate information in decision making

    • Better information is needed not necessarily more monitors.

  • INTEGRATING EXPERTS, CROWDS, & ALGORITHMS.

  • CROWD SOURCING CONCERNS

    • How to solicit users• What they can contribute• How to combine their contributions• How to manage quality, open versus close worlds, query semantics, query

    execution, optimization, and user interfaces

  • BENEFITS OF MACHINE LEARNING

    • Feature extraction – i.e. interpreting text to infer time, location, people, etc. referred to in it;• Classification - classify, group or tag information based on some explicit or unknown criteria;• Clustering - Machines can process vast amounts of data and present correlations and proximities

    that escape the human eye and brain, sometimes discovering non-obvious correlations between variables

    With large amounts of data available, it is not even necessary to have a deep understanding of the relationships within the data themselves: machines can on their own distil the noise from the relevant correlations through successive optimization.

  • MACHINE LEARNING SHORTCOMINGS• Algorithms are more specific than sensitive, meaning that important signals may be missed (false negatives)

    • A combination of algorithms is important to draw different types of events and event features from undifferentiated data

    • understanding which algorithms, through experience, is essential• Algorithms need to be thoroughly validated and tested and reassessed• Algorithms need data to train and feedback to learn. ‘Out of the box‘ value is difficult• Human factor – ‘lazy over time’

    • experience with accepted algorithms, where over-dependency and improper cross-checks of an algorithm's results may result in missed or misinterpreted signals;

    • Low social acceptance of systems that do not function in a way that is predictable or describable • Past misuse of machine learning has led users to fear and ‘distrust’ algorithm w/o some human interaction.

    • Should an algorithm declare a health emergency or should it help present data to an expert or authority with 'suggestions' and 'red flags', and then the authority can declare a health emergency

  • STANDARDIZATION NEEDED FOR INTEROPERABILITY• Interoperability challenges with data formats, service interfaces, semantics and measurement uniformity• Broad usage of open sensor standards is needed• The Sensor Web Enablement Initiative (SWE) by the OGC (Open Geospatial Consortium) seeks to provide open

    standards and protocols for enhanced operability within and between multiple platforms and vendors. They aim to make sensors discoverable, query-able, and controllable over the Internet.

    • Currently, the SWE family consists of seven standards:• Sensor Model Language (SensorML)

    • XML Schemas to defining geometric, dynamic and observational properties of a sensor. Accommodates sensor discovery, processing and analysis of the retrieved data, as well as the geo-location of observed values. • Observations & Measurements (O&M)

    • Transducer Model Language (TML)• Generally speaking, TML can be understood as O&M's pendant or streaming data by providing a method and

    message format describing how to interpret raw transducer data.

    • • Sensor Observation Service (SOS)• This component provides a service to retrieve measurement results from a sensor or a sensor network.

  • STANDARDIZATION CONTINUED

    • Sensor Planning Service (SPS)• This component provides a standardized interface for collection assets and aims at automating

    complex information flows in large networks..

    • Sensor Alert Service (SAS)• Interfaces enabling sensors to advertise and publish alerts, including according metadata.

    • Web Notification Service (WNS)• Enables 1 & 2 way message exchanges, with other services. This process is especially expedient

    when several services are required to comply with a client's request, or when an according response is only possible under considerable delays.

  • SENSOR OBSERVATION SERVICE (SOS)

  • NEED A GOOD PLAN• What are you trying to do - what’s the value of this data• What’s the approach?• Selecting location and placement• Collecting

    • Quality control• Sensor maintenance• Data review• Data validation• Issues (interference and drift)

    • Analyze, interpret, communicate results• QA• QC

  • SENSOR CONSIDERATIONS

    • Low cost• Varying reliability, quality, and accuracy• Questionable maintenance and calibration• Pollutants measured (ozone, PM, volatiles)• Location and Placement - Fixed/mobile, in/outside,

    below/above ground

    • IOT – Security of devices

  • DATA MANAGEMENT CONSIDERATIONS

    • Several Existing repositories - would not want to replicate• DEP had experience in managing large sets of data but not at

    this potential scale

    • Large cost of managing data• Infrastructure/Tools/etc.

    • Leverage existing Real time and historical APIs• Separation of local, state, and nationwide data • Integration and analysis with existing state data

    Handling Publicly Generated Air Quality Data Examples of DEP Data and CrowdSourcingAir Information Management System (ARMS)ARMS Sites & MonitorsSlide Number 5CrowdsourcingBackgroundConsiderations OF USING Info from 'crowds'Integrating experts, crowds, & algorithms.Slide Number 10Crowd Sourcing ConcernsBenefits of Machine Learningmachine learning shortcomingsStandardization needed for InteroperabilityStandardization ContinuedSensor Observation Service (SOS)Need a Good PlanSensor ConsiderationsData Management Considerations