making every drop count: how i20 addresses the water crisis with the iot and apache cassandra
TRANSCRIPT
Making Every Drop Count: How i2O Addresses the Water Crisis with the IOT and Apache Cassandra
Mike Williams, Software and IT Director, i2O Water
Thank you for joining. We will begin shortly.
All attendees
placed on muteInput questions at any time
using the online interface
Webinar Housekeeping
1 About i2O Water & The State of Water Consumption Today
2 Database Technology for Time Series and IOT
3 Relational vs NoSQL
4 Why NoSQL and Cassandra?
5 Impact of Cassandra
6 Q&A
Agenda
About i2O Water
• Smart pressure management systems for water
distribution networks
• Hardware/Firmware/Software
• IoT
• Reduces leakages, burst frequency and energy
waste
• Saves over 235 million litres of water per day
around the world
Water Consumption v.s. Supply
• A staggering 46 billion litres (~12.15 billion gallons ) of drinking water are lost globally every day
• It’s not a problem restricted only to the developing world either – Montreal, for example, loses 40% of the water it produces
• 40% global shortfall between water demand and supply by 2030• The World Economic Forum ranked water crises as the top global risk in its 2015 Global
Risks Report
The Challenge with Relational Databases
• Massive volumes of time series data (1.5TB and growing) that needs to be stored and analyzed in close to real time
• Low energy, battery powered devices
• Must be efficient in protocol design
• Must be efficient in message sizes
• Relational database (SQL Server) couldn't adequately handle time-series data and IOT needs at scale
• Re-indexing tables causes loss of performance
• The need to scale without impacting performance of availability
• Migration from existing data management platform
Key Database Requirements
Strong real-time search and query capability
Support high reliability and security
Structured & unstructured real-time data
Flexible data model with affordable scalability
1 2
3 4
How Cassandra Compared to RDBMS
• Wide rows allowed better modeling of time series– time sharding and rollup aggregations
• Smaller data footprint on “disk”• Much faster write performance• Faster read performance• Scalable by design / architecture• Encourages us to duplicate data
• Model what you need to query efficiently
How Cassandra Compared to other NoSQL
• Compared it to HBASE
– Similar architectures
– C* was a better supported product
• Compared it to MongoDB
– C* was much more scalable by design
– Concerned over sharding and lossy writes
• Compared it to RavenDB
– C* was far more robust
– C* was far more performant
– C* is better supported
Why NoSQL and Apache Cassandra?
• Database platform built for IOT applications
• Optimized for storage and retrieval of time-series data
• Streaming and real time analytics (Spark integration)
• Best performance based on internal benchmark testing
• Strong search & real-time query capability on unstructured data (Solr integration)
• Supports 100% uptime through masterless architecture and multi-datacenter replication
• Linear scalability across commodity hardware makes supporting high velocity data a reality (and affordable)
i2O Technical Environment
SSL OffloadingLoad Balancing
In memory Cache
Development Stack
Message Broker
QueuingProtocols
Message Encoding
Data Stores
AMQP
Cassandra
Postgresql
PostGIS
The Results
>15,000 devices
>70 Water Utilities globally
+235M litres of water saved per day
20+ countries
>99.9% Uptime since launch
even during upgrades & node failuresSecurity audited / tested
Recommendations
• Think Security at all times - everywhere• Think about Scalability early• Think about API’s and Protocols early• Use test infrastructures to practice changes to tech
– Cheap to do, consider Container technologies like Docker