handling publicly generated air quality datasensor alert service (sas) • interfaces enabling...
TRANSCRIPT
-
HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO
MARCH 8TH, 2017
-
EXAMPLES OF DEP DATA AND CROWDSOURCING
• Storm Readiness• Beach Assessments• Park Closings• Emergency Management – Social Media• Watershed Ambassadors• ARMS
-
AIR INFORMATION MANAGEMENT SYSTEM (ARMS)
• There are 286 monitors at these 40 air quality stations tracking 35 distinct parameters (continuous and non-continuous)
• NOX, NO, SO2, PM2.5, NO2, CO, O3, wind speed, wind direction, temp are monitored
• There approximately 150,000,000 continuous air quality minute data points collected every year.
-
ARMS SITES & MONITORS
-
NJDEP Air and Radiation Monitoring System
VerizonWirelessNetwork
Verizon Wireless Access Points
StandbyNJ Air/Rad Monitoring System
@ 401 East State Street
PrimaryNJ Air/Rad MonitoringSystem @ Troop “C”
GSN
MasterComm. Center
& FTP
PrimaryDatabase
SlaveComm. Center
& FTP
Clus
terin
g
StandbyDatabase
StandbyComm. Center
& FTP
MOXA NPort
Leased Lines & Phone lines
Network
Clustering
Leased Line Modem
Leased Line Modem
CRESTWireless Sites
CRESTLeased Line
Sites
Regular AirWireless Sites
Air FRMWireless Sites
DigiWireless Router
DigiWireless Router
DigiWireless Router
Air TEOMWireless Sites
DigiWireless Router
Phone Line Modem for Backup
Modem Bank for Backup
Modem Bank for Backup
Green Devices represent future projects
MOXA NPort
Air FRMPhone Line Sites
Air TEOMPhone Line Sites
Phone Line Modem
Phone Line Modem
FRM/TEOM Modem
Public Access Layer
Secure Access Layer
Core layer
Envista Comm Centerand Database
Back Up Wireless Polling System
Verizon Air Card
DEP TLS
Prepared by Harry Chen, 12/1/2010
Oracle Observer(Manage Fast Start Failover)
OIT ExtraNet WWWWWW.NJAQINOW.NET
Hosted by EnvitechUpdated via FTP from ARMS Comm Center
-
CROWDSOURCING
• a specific sourcing model in which individuals or organizations use contributions from Internet users to obtain needed services or ideas
• Amazon Mechanical Turk• Kickstarter• Wikipedia
-
BACKGROUND
• Massive data deluge in recent years• 80% of the worlds data is unstructured (images, videos, raw text, etc.)• Algorithms to fully comprehend unstructured data have not been developed
yet
• Many experts believe we are at least several decades away from this goal
-
CONSIDERATIONS OF USING INFO FROM 'CROWDS'
• Can disseminate both valid and invalid information• Crowds often have no immediate way to discern truth from falsehood• Crowds are prone to add opinion to data; which sometimes sticks more than
the credible data themselves. Separating opinion and credible data through expert interpretation and curation, both centralized and decentralized, is important
• Very few organizational or procedural channels specifying how to aggregate and incorporate information in decision making
• Better information is needed not necessarily more monitors.
-
INTEGRATING EXPERTS, CROWDS, & ALGORITHMS.
-
CROWD SOURCING CONCERNS
• How to solicit users• What they can contribute• How to combine their contributions• How to manage quality, open versus close worlds, query semantics, query
execution, optimization, and user interfaces
-
BENEFITS OF MACHINE LEARNING
• Feature extraction – i.e. interpreting text to infer time, location, people, etc. referred to in it;• Classification - classify, group or tag information based on some explicit or unknown criteria;• Clustering - Machines can process vast amounts of data and present correlations and proximities
that escape the human eye and brain, sometimes discovering non-obvious correlations between variables
With large amounts of data available, it is not even necessary to have a deep understanding of the relationships within the data themselves: machines can on their own distil the noise from the relevant correlations through successive optimization.
-
MACHINE LEARNING SHORTCOMINGS• Algorithms are more specific than sensitive, meaning that important signals may be missed (false negatives)
• A combination of algorithms is important to draw different types of events and event features from undifferentiated data
• understanding which algorithms, through experience, is essential• Algorithms need to be thoroughly validated and tested and reassessed• Algorithms need data to train and feedback to learn. ‘Out of the box‘ value is difficult• Human factor – ‘lazy over time’
• experience with accepted algorithms, where over-dependency and improper cross-checks of an algorithm's results may result in missed or misinterpreted signals;
• Low social acceptance of systems that do not function in a way that is predictable or describable • Past misuse of machine learning has led users to fear and ‘distrust’ algorithm w/o some human interaction.
• Should an algorithm declare a health emergency or should it help present data to an expert or authority with 'suggestions' and 'red flags', and then the authority can declare a health emergency
-
STANDARDIZATION NEEDED FOR INTEROPERABILITY• Interoperability challenges with data formats, service interfaces, semantics and measurement uniformity• Broad usage of open sensor standards is needed• The Sensor Web Enablement Initiative (SWE) by the OGC (Open Geospatial Consortium) seeks to provide open
standards and protocols for enhanced operability within and between multiple platforms and vendors. They aim to make sensors discoverable, query-able, and controllable over the Internet.
• Currently, the SWE family consists of seven standards:• Sensor Model Language (SensorML)
• XML Schemas to defining geometric, dynamic and observational properties of a sensor. Accommodates sensor discovery, processing and analysis of the retrieved data, as well as the geo-location of observed values. • Observations & Measurements (O&M)
• Transducer Model Language (TML)• Generally speaking, TML can be understood as O&M's pendant or streaming data by providing a method and
message format describing how to interpret raw transducer data.
• • Sensor Observation Service (SOS)• This component provides a service to retrieve measurement results from a sensor or a sensor network.
-
STANDARDIZATION CONTINUED
• Sensor Planning Service (SPS)• This component provides a standardized interface for collection assets and aims at automating
complex information flows in large networks..
• Sensor Alert Service (SAS)• Interfaces enabling sensors to advertise and publish alerts, including according metadata.
• Web Notification Service (WNS)• Enables 1 & 2 way message exchanges, with other services. This process is especially expedient
when several services are required to comply with a client's request, or when an according response is only possible under considerable delays.
-
SENSOR OBSERVATION SERVICE (SOS)
-
NEED A GOOD PLAN• What are you trying to do - what’s the value of this data• What’s the approach?• Selecting location and placement• Collecting
• Quality control• Sensor maintenance• Data review• Data validation• Issues (interference and drift)
• Analyze, interpret, communicate results• QA• QC
-
SENSOR CONSIDERATIONS
• Low cost• Varying reliability, quality, and accuracy• Questionable maintenance and calibration• Pollutants measured (ozone, PM, volatiles)• Location and Placement - Fixed/mobile, in/outside,
below/above ground
• IOT – Security of devices
-
DATA MANAGEMENT CONSIDERATIONS
• Several Existing repositories - would not want to replicate• DEP had experience in managing large sets of data but not at
this potential scale
• Large cost of managing data• Infrastructure/Tools/etc.
• Leverage existing Real time and historical APIs• Separation of local, state, and nationwide data • Integration and analysis with existing state data
Handling Publicly Generated Air Quality Data Examples of DEP Data and CrowdSourcingAir Information Management System (ARMS)ARMS Sites & MonitorsSlide Number 5CrowdsourcingBackgroundConsiderations OF USING Info from 'crowds'Integrating experts, crowds, & algorithms.Slide Number 10Crowd Sourcing ConcernsBenefits of Machine Learningmachine learning shortcomingsStandardization needed for InteroperabilityStandardization ContinuedSensor Observation Service (SOS)Need a Good PlanSensor ConsiderationsData Management Considerations