gsn related developments jan beutel, tonio gsell, roman lim, mustapha yuecel, christoph walser,...

12
GSN Related Developments Jan Beutel, Tonio Gsell, Roman Lim, Mustapha Yuecel, Christoph Walser, Matthias Keller, Bernhard Buchli, Josua Hunziker, Felix Sutton, Lothar Thiele, ETH Zurich

Upload: bernard-stafford

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

GSN Related Developments

Jan Beutel, Tonio Gsell, Roman Lim, Mustapha Yuecel, Christoph Walser, Matthias Keller, Bernhard

Buchli, Josua Hunziker, Felix Sutton, Lothar Thiele, ETH Zurich

Computer Engineering and NetworksTechnische Informatik und Kommunikationsnetze

GSN Data Integration

Data Management – Online Semantic Data

• Global Sensor Network (GSN)– Data streaming framework from EPFL– Organized in “virtual sensors”, i.e. data types/semantics– Hierarchies and concatenation of virtual sensors enable on-line

processing– Translates data from machine representation to SI values– Adds metadata

Private

PublicMetadata============PositionCoordinatesSensor typeValidity period…

Import from field GSN GSN Web export

Multi-Site, Multi-Station, Multi-Revision Data…

Metadata Mapping Architecture

• Based on 2 GSN instances – Separation of load/concern across two machines– “Private” GSN instance, raw data, protected behind firewall, high

availability– “Public” GSN instance, mapped and converted data, world readable, non-

critical

• Metadata stored in version control system (CSV text files, SVN)

• Mapping of– Positions, coordinates, sensor types, conversion functions, sensor

calibration…

• Conversion of – Time formats, raw to SI values…

• Replay of metadata/mapping possible, e.g. on bugs/errors• Change management

• Transparency, scalability, traceability, load balancing

Metadata Change Management

• Allows simple exchange of sensor hard-/software at runtime• Post-deployment annotation

– Stop public GSN– deployment change – annotate metadata – restart public GSN

• Automatic synchronization with 1 day change boundaries

Issue – Data Quality and Integrity

• Since 07/2008: ~150’000’000 data points

• Inconsistencies– Between timestamps and sequence numbers

• Duplicates• Data gaps

– Sporadic– Systematic

Revision / Extension June 2010Service 2009Installation & Service 2008

Sensornode 20 - 22 new installiation

[Keller, SenSys 2009, IPSN 2011]

Mitigating Data Loss – BackLog Architecture

• BackLog = Auxiliary data aggregation layer at device level– Remote storage and synchronization layer for Linux systems– Python based, designed for PermaSense CoreStation– Plugin architecture for extension to custom data sources– Data multiplex from plugin to GSN wrapper over one socket

• Reliable (flow controlled) synchronization to GSN• Schedulable plugin/script execution, remote controlled by

GSN

GSN New Functionalities

• Data wrappers and virtual sensors (based on backlog)– GPS, various formats– Vaisala WXT520 weather station– JPEG & RAW image manipulation– Binary file grabber/storage on file system for large files (image data)– OpenSense air quality sensors – syslog-ng based log file grabber (aka remote tail –f /var/log/syslog)– Dozer beacon generation (command push)– Schedule backlog plugin– SI value conversion– CamZilla robot control– …

• GSN/MySQL/SensorViz performance statistics– Custom virtual sensors to measure DB access timing, processing

quantities…

GSN New Functionalities

• Frontend enhancements – network topology graphs, table views– log file viewer– virtual sensor search– GSN uptime counter on front page– Automatic device/type detection per deployment for automating web

page generation

• Lots of enhancements and bug fixes• Cacti-based system monitoring (MySQL)

SensorViz Plotting Frontend

• Time series plotting of large data• Backend caching server for different data aggregates• Java script plotting tool for web integration• Customizable views, selection, pan & zoom

Ideas to be discussed with EPFL team

• MySQL – GSN interface optimizations– What can be improved?– Partitioning of large tables?– MySQL version/parameterization?

• Alerting dashboard– Better control and overview for alert messaging– Email is good but there is no overview on configs and status (see tools like zabbix, cacti)

• SensorViz integration, improvements– Migration to other caching technique, in-DB views?– Other plotting formats– User interface enhancements

• “Standard” way to monitor system/component performance– Performance metrics for every VS? packets, timing, memory, bytes, rates…– Dependency graph of VS/wrappers– “traceroute” aka data provenience

• Performance statistics for MySQL DB– Per VS or per DB: table size, records, Mbytes, total # entries…– SHOW STATUS from table permasense_pvt