perform end-to-end data analysis in the cloudfiles.meetup.com/12718532/perform end-to-end...
TRANSCRIPT
© 2016 IBM Corporation
Perform End-to-End Data Analysis in the CloudBuilding an IoT Ecosystem with Arduino and Bluemix
Dale MumperOpen Source Analytics Solution Engineer - Industrial
© 2016 IBM Corporation2
Disclaimer
© Copyright IBM Corporation 2016. All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE
MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED
“AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM'S CURRENT
PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE
FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER
DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY
WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF
ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
IBM's statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM's sole discretion. Information
regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or
functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any
future features or functionality described for our products remains at our sole discretion.
IBM, the IBM logo, ibm.com, Information Management, DB2, DB2 Connect, DB2 OLAP Server, pureScale, System Z, Cognos, solidDB, Informix,
Optim, InfoSphere, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other
countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or
™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks
may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and
trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
© 2016 IBM Corporation3
Agenda
Bio
Solution Overview
Bluemix Overview
Sensor Board
NodeRED
Cloudant
dashDB
Data Science Experience
Watson Analytics
© 2016 IBM Corporation4
Bio
Dale Mumper
IBM Open Source Analytics Solution Engineer
Consultant and analytics leader for over 20 years
Background in physics and math
Certifications
- Cloudera Certified Administrator
for Apache Hadoop - CCAH
- Cloudera Certified Developer
for Apache Hadoop - CCDH
- Microsoft MCSE – Data Platform
- Microsoft MCSE – Business Intelligence
- Oracle Certified Professional - OCP
© 2016 IBM Corporation5
IoT Defined
“The network of physical devices, vehicles, building and other items
embedded with electronics, software, sensors, actuators and network
connectivity that enables object to collect and exchange data.”
“The infrastructure of the information society.”
“Every object, device and every familiar part of the traditional home,
is being equipped with smart circuitry.”
“With a trillon sensors embedded in the environment—all connected
by computing systems, software and services—it will be possible to
hear the heartbeat of the Earth, impacting human interaction with the
globe as profoundly as the Internet has revolutionized
communications,”
© 2016 IBM Corporation6
IoT Market Drivers
USD 157.05 Billion in 2016
USD 661.74 Billion by 2021
Compound Annual Growth Rate (CAGR) of 33%
Impacting all industries
Industry leaders admit they are lack “clear perspective” on the business opportunities
afforded in the IoT arena – the trend remains nascent
2020 could see 30 Billion devices on the global net
Supplier Attention – open source software and open source hardware,
development tool kits, major vendor support
Technological Advances – ARM Cortex (1/10 the power usage), miniaturized
sensors, declining component costs, faster bandwidth
Increasing Demand - demand for 1st gen. will increase as costs decline and
next generations become more advanced; very price sensitive
Emerging Standards – semiconductor, hardware, networking and software
companies have joined with a number of industry associations and
academics consortiums; common APIs
Phone Sensor Demo
Step 1
• Take out your phone
• Go to the URL on the card
• Write down the Device ID
d:quickstart:phonesensor<Device ID>
Step 2
• ibm.biz/iotqstart
• Enter Device ID
Step 3
• Explore
• Move Phone
Tilt
Rotate
Slow vs. Fast
© 2016 IBM Corporation8
Environmental Recorder – ER1
Indoor Environmental Monitoring
• Measures and sends dataTemperature (from three different sensors)
Humidity
Air Pressure
Light Levels
LEDs provide operational feedback
• Connects to a local wifi network
Synchronizes time from an NTP source
Gets the real IP address and determines geolocation from IP address
Asks nearest weather station for local forecast
Connect to an MQTT broker and sends data
© 2016 IBM Corporation9
Use Case for the ER1
Sleep Therapy
Room Monitoring
Remote Property
Easily add sensors and capabilities
• UV and IR Sensor
• Distance (Ultrasonics and Laser)
• Motion
• Shock
• Vibration
• Rotation
• Tension and Flex
• Soil and Moisture
• GPS Module
• LTE Cellular W-Fi
• Solar Power and Battery
© 2016 IBM Corporation10
Bill of Materials
Arduino MKR1000
• Atmel ATSAMW25 SoC
SAMD21 Cortex M0+ ARM MCU
WINC1500 2.4GHz 801.11 b/g/n Wi-Fi
3.3V
256MB Flash
32KB SRAM
Full-Speed USB w/Embedded Host
Sensors
• Adafruit DS3231
• Adafruit SHT31-D
• Adafruit TSL2691
• Adafruit BMP183
• Adafruit Neopixels
Parts
• LED
• 220ohm resistor
• Full-sized breadboard
• USB A/MicroB Cable
• Jumper Wires, 3”, MM
• Jumper Wires, 6”, MM
Vendors
• adafruit.com
• arduino.cc
• element14.com
• digikeys.com
© 2016 IBM Corporation11
IoT Analytics Ecosystem IoT + Runtime + Cloudant + dashDB + Spark
REST (HTTP/s) API
IBM dashDB
Schema Discovery
IoT Platform
MQTT
Spark Connector
© 2016 IBM Corporation12
Arduino MKR1000Combines the Arduino Zero and a Wi-Fi Shield at a Great Price Point
Atmel SAMD21 Cortex-M0+
• 3.3V
• 256KB Flash
• 32KB SRAM
• Clock Speed 48MHz
8 Digital I/O Pins
• 4 with PWM (pulse width modulated)
6 Analog Input Pins
1 Analog Output Pin
USB connection
Reset button
Wi-Fi
Encryption
Li-Po Battery Charger
1. MPC and Memory
2. Wi-Fi
3. Small Form Factor
4. Lower Cost
© 2016 IBM Corporation13
SHT31-D Sensor
Sensor made by Sensiron
• 2.5 x 2.5 x 0.9 mm3
• temperature range of –40°C to 90°C
• ±2% relative humidity and ±0.3°C accuracy
PCB Board made by Adafruit
• 3V and 5V compliant
• I2C interface
Power Pins
• Vin
2.5 to 5VDC (Volts Direct Current)
• GND
Common Ground
I2C Login Pins
• SCL
I2C clock
• SDA
I2C data pin
© 2016 IBM Corporation14
TSL2561 Sensor
Sensor made by ams AG
• Light-to-digital converter
• 188ulux to 88,000lux
• Infrared and Full Spectrum diodes
PCB Board made by Adafruit
• 3V and 5V compliant
• I2C interface
Power Pins
• Vin
2.5 to 5VDC (Volts Direct Current)
• GND
Common Ground
I2C Login Pins
• SCL
I2C clock
• SDA
I2C data pin
© 2016 IBM Corporation15
Adafruit DS3231 Real-Time Clock (RTC)
Chip made by Maxim Integrated
• DS3231 Real-Time Clock (RTC)
• Temperature-compensated crystal oscillator and crystal
• Long-term accuracy
PCB Board made by Adafruit
• I2C interface
• Optional battery maintains time
Power Pins
• Vin
• GND
I2C Login Pins
• SCL - I2C clock
• SDA - I2C data pin
z
© 2016 IBM Corporation16
BMP183 Sensor
Sensor made by Bosch
• 300 to 1100hPa (+9000m to -500m)
• Enhanced GPS, navigation, weather, vert. velocity
PCB Board made by Adafruit
• 3V and 5V compliant
• SPI interface
Power Pins
• Vin
2.5 to 5VDC (Volts Direct Current)
• GND
Common Ground
SPI Logic Pins
• SCK - Clock
• SDO - Serial Data OUT
• SDI - Serial Data IN
• CS - Chip Select
© 2016 IBM Corporation17
NeoPixels == TOTALLY COOL
Ring
Jewel
Strips
Stick
Matrix
© 2016 IBM Corporation18
Turning Sensors into an IoT Device (ER1)
Sensors, Clock and LEDs in Review
Wi-Fi Connectivity
NTP Client
Time and Data Handling
C/C++ Style Floating Point Operations
HTTP Client
MQTT Client
JSON Parsing
ER1 Sketch Version 3.50
• Expects to find the IBM_CLASS 2.4GHz, WPA wireless network
Already has the SSID and the password in the sketch
• Defaults to using the IBM Watson IoT Platform in Quickstart Mode
• Sketch automatically determines the Device ID from the MAC
See your laminated MKR1000 card in your student kit
© 2016 IBM Corporation19
IaaS
• Virtual Servers
• Bare Metal Servers
• Network
• Storage
• Load Balancers
PaaS
• Database
• Web Server
• Development Tools
• Runtime Containers
SaaS
• CRM
• Games
• Virtual Desktop
Cloud Service Models
© 2016 IBM Corporation20
Who Does What?
On-Premise
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Managed by Client Managed by Provider
IaaS
Applications
Data
Runtime
Middleware
OS
Virtualization
Compute
Storage
Networking
PaaS
Applications
Data
Runtime
Middleware
OS
Virtualization
Compute
Storage
Networking
SaaS
Applications
Data
Runtime
Middleware
OS
Virtualization
Compute
Storage
Networking
© 2016 IBM Corporation21
IBM Cloud
© 2016 IBM Corporation22
Bluemix is an open-standard, cloud-based platform for building, managing, and
running applications of all types (web, mobile, big data, new smart devices…)
Go Live in Seconds
Zero to running in one click.
Development plans deploy in
seconds. Enterprise plans
deploy in 1-2 days.
DevOps
Development, monitoring,
deployment, and logging tools
allow the developer to run the
entire application.
APIs and Services
A catalog of IBM, third party,
and open source API services
allow the developer to stitch an
application together in minutes.
On-Premises Integration
Build hybrid environments.
Connect to on-premises
assets plus other public and
private clouds.
Flexible Pricing
Sign up in minutes. Pay as
you go and subscription
models offer choice and
flexibility.
Layered Security
IBM secures the platform and
infrastructure and provides
you with the tools to secure
your apps.
IBM Bluemix
© 2016 IBM Corporation23
Demo – Bluemix Overview
© 2016 IBM Corporation24
We Are Here
MQTT
© 2016 IBM Corporation25
This Is Our DestinationIoT + Runtime + Cloudant + dashDB + Spark
REST (HTTP/s) API
IBM dashDB
Schema Discovery
IoT Platform
MQTT
Spark Connector
© 2016 IBM Corporation26
IBM Watson IoT Starter Platform
1. Catalog > Boilerplates > Internet of Things Platform Starter
2. Fill in Name: <Name of App Here>
3. CREATE
Application is created and staged
• http://<hostname>mybluemix.net
• Creates a Node.js SDK Container
• Creates a Cloudant NoSQL Database
© 2016 IBM Corporation27
• Browser-based UI for creating flows of events
• Deploying action in a light-weight runtime
• Based upon node.js• Event-driven, non-blocking model
• Flows stored as JSON, so super easy to share
• Large library available today
• Suitable for server, network, edge and mobile device placement
• Open source project on GitHub
• IBM is a major contributor
• Benefits• Rapid Development
• Simple to use with JSON
• Simple REST API
• Simple MQTT messaging
• Contributor Nodes• Simple to use other services
Node-RED
A visual tool for wiring the Internet of Things
© 2016 IBM Corporation28
MQTT
Machine-to-Machine (M2M)/”Internet of Things” (IoT)
• Lightweight connectivity protocol for publish/subscribe messaging transport
• Small code footprint, limited bandwidth, low power usage
• Minimized packets and efficient distribution to multiple receivers
MQTT v3.1.1 now an OASIS Standard
• Invented by Dr. Andy Stanford-Clark (IBM) and Arlen Nipper (Eurotech)
• MQ Telemetry Transport (ISO/IEC PRF 20922)
MQTT Broker/Servers
• IBM Websphere MQ Telemetery, Message Sight, Integration Bus
• Mosquitto, Eclipse Paho, Europtech Everywhere Device Cloud, emqttd,
Xively, Moquette, Yunab.io, m2m.io, RabbitMQ, Apache ActiveMQ, HiveMQ
MQTT Client Methods
• Connect, Disconnect, Subscribe, Unsubscribe, Publish
© 2016 IBM Corporation29
msg.payload
{
"topic": "iot-2/type/MKR1000/id/f8f005f5f8db/evt/status/fmt/json",
"payload": {
"d": { "IBM_IoT_Workshop": "Arduino_MKR1000", "recordType":
"sensorsRead", "DS3231_epoch": 1471003668, "DS3231_date": "08-13-
2016", "DS3231_time": "13:07:48", "DS3231_tempC": 28, "DS3231_tempF":
82.4, "SHT31_tempC": 27.72, "SHT31_tempF": 81.94, "SHT31_humidity":
45.32, "TSL2561_lux": 9, "BMP183_hPa": 1004.22, "BMP183_tempC":
28.08, "BMP183_tempF": 82.55, "BMP183_altStatic": 78.98,
"BMP183_altComputed": 68.09, "local_IP": "192.168.0.170", "mac_addr":
"f8f005f5f8db" }
},
"deviceId": "f8f005f5f8db",
"deviceType": "MKR1000",
"eventType": "status",
"format": "json",
"_msgid": "4a43bc63.b5bc44”
}
© 2016 IBM Corporation30
Demo – Node-RED
© 2016 IBM Corporation31
ER1 Message Payloads
deviceStart
ipapiFetch
localWeather
sensorRead
badJSON
These are all placed into one NoSQL database
© 2016 IBM Corporation32
deviceStart
© 2016 IBM Corporation33
ipapiFetch
© 2016 IBM Corporation34
localWeather
© 2016 IBM Corporation35
sensorRead
© 2016 IBM Corporation36
Powerful DBaaS Operational NoSQL JSON store
Master-less architecture for
maximum scalability & availability
Advanced APIs
REST (HTTPS) API
Replication & synchronization
Geo-load balancing
Incremental MapReduce indexes
Military-grade Geospatial indexes
Lucene full-text search
Offline access to mobile apps & data
A fully-managed NoSQL database layer that
can be developed & deployed in days
Cloudant – NoSQL Database as a Service
Cloudant delivers a fully-managed database in service to the Analytics, App, and API economy
SparkIntegration(Spark SQL)
dashDBIntegration
(Analytics)
© 2016 IBM Corporation
© 2016 IBM Corporation37
Demo – Cloudant
© 2016 IBM Corporation38
Edge to Warehouse
Cloudant sits on the Edge of Cloud
• Fast, minimal latency, scalable
• Transactional
• Not the place for long-term storage
• Not the place for analytics
Move IoT data to a warehouse
• Basic business intelligence
• Connect to other sources of data
• The start of analytics journey
dashDB on Bluemix
• Data Warehouse as a Service
© 2016 IBM Corporation39
IBM dashDB – Analytics Warehouse as a Service
For apps that need:
• Elastic scalability
• High availability
• Data model flexibility
• Data mobility
• Text search
• Geospatial
Available as:• Fully managed DBaaS
• On-premises private cloud
• Hybrid architecture
BLU Acceleration
Netezza In-Database
Analytics
Cloudant NoSQL Integration
In-database analytics capabilities for best performance atop a fully-managed warehouse
dashDB MPP
Fully-managed data warehouse on cloud
Choice of SoftLayer or Amazon Web Services
BLU Acceleration columnar technology +
Netezza in-database analytics
BLU in-memory processing, data skipping, actionable
compression, parallel vector processing, “Load & Go”
administration
Netezza predictive analytic algorithms
Fully integrated RStudio & R language
Oracle compatibility
Massively Parallel Processing (MPP)
On disk data encryption and
secure connectivity
for
Analytics
© 2016 IBM Corporation40
Demo – dashDB
© 2016 IBM Corporation41
Replicating Cloudant JSON Data into dashDB
Cloudant’s Schema Discovery Process (SDP) translates JSON documents into a schema (or set of tables) that dashDB understands
SDP maintains continuous
synchronization from
Cloudant to dashDB
© 2016 IBM Corporation42
Demo – Replication and SQL
© 2016 IBM Corporation43
Tailored Experiences For Users Collaborating Together
Architects how data is
organized & ensures operability
Gets deep into the data to draw
hidden insights for the business
Works with data to apply insights
to the business strategy
Plugs into data and models &
writes code to build apps
Ingest
data
Transform
: clean
Create
and build
model
Evaluate
Deliver
and deploy
model
Communicate
results
Understand
problem and
domain
Explore and
understand
data
Transform:
shape
OUTPUT
ANALYSIS
INPUTData Engineer
Data Scientist
Business Analyst
App Developer
Data Connect
Data Science Experience
Watson Analytics
Bluemix
© 2016 IBM Corporation44
What is a “Notebook”?
Pen and Paper Pen and paper has long provided the rich
experience that scientists need to document
progress through notes and drawings:– Expressive
– Cumulative
– Collaborative
Notebooks Notebooks are the digital equivalent of the
“pen and paper” lab notebook, enabling data
scientists to document reproducible analysis: Markdown and visualization
Iterative exploration
Easy to share
© 2016 IBM Corporation45
Web-Based Notebooks…
Notebooks:
“interactive computational environment, in which you can combine
code execution, rich text, mathematics, plots and rich media”
Jupyter
• Based on Ipython
• Supports multiple interpreters
• Python, Scala, R
Zeppelin
• Apache incubator project
• Supports multiple interpreters
• Python, Scala, others
Data Scientist
&
Notebooks
© 2016 IBM Corporation46
Built-in learning to
get started or go
the distance with
advanced tutorials
Learn
The best of open source
and IBM value-add to
create state-of-the-art
data products
Create
Community and
social features that
provide meaningful
collaboration
Collaborate
http://datascience.ibm.com
Introducing the Data Science Experience - DSXCurrently in Public Beta
Powered by
© 2016 IBM Corporation47
IBM Data Science Experience
Community Open Source IBM Added Value
Powered by IBM DataWorks in the Cloud
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Code in Scala/Python/R/SQL
• Jupyter and Zeppelin* Notebooks
• RStudio IDE and Shiny apps
• Apache Spark
• Your favorite libraries
• Data Shaping/Pipeline UI *
• Auto-data preparation and modeling*
• Advanced Visualizations*
• Model management and deployment*
• Documented Model APIs*
• Spark as a Service
* DSX product roadmap items
Core Attributes of the Data Science Experience
© 2016 IBM Corporation48
Demo – Data Science Experience
© 2016 IBM Corporation49
IBM Watson Analytics - Smart Data Discovery in the Cloud
Designed to support the business professional’s analytics process so it’s easy to engage
with and find meanings and patterns in your data in minutes.
Data prep made easy
Guided exploration
Understand outcomes
Share insights
All the benefits of advanced analytics without the complexity
© 2016 IBM Corporation50
Demo – Watson Analytics
© 2016 IBM Corporation51
IBM investment into Apache Spark
Foster
Community
Educate 1M+ data scientists and engineers
via online courses
Sponsor AMPLab, creators and
evangelists of Spark
Infuse the
Portfolio
Integrate Spark throughout portfolio
3,500 employees working on Spark-related topics
Spark however customers want it –
standalone, platform or products
Source: https://www-03.ibm.com/press/us/en/pressrelease/47107.wss
Launch Spark Technology Cluster
(STC), 300 engineers
Open source SystemML
Partner with databricks
Contribute to
the Core
"It's like Spark
just got blessed
by the enterprise
rabbi."
Ben Horowitz
Andreessen Horowitz
© 2016 IBM Corporation52
IBM has the largest investment in Spark of any company in the world
IBM Spark
IBM Spark Technology Center
• Launched in June of 2015
• Goal to hire 300 Engineers.
• Goal to Contribute to Apache
Spark Apache community
• Contributed SystemML
technology to Apache community
• STC continues to grow...
IBM Contributes to core Apache Spark Project
www.spark.tc
© 2016 IBM Corporation53
http://www.spark.tc/blog/
IBM driving SQL and Machine Learning innovation..
© 2016 IBM Corporation54
Big Data University
http://bigdatauniversity.com/
Foster Community - Free Education
© 2015 IBM Corporation55
Signup to learn more!
Webinars MeetupsHands-on
Labs
Learning Resources
Twitter: @data_gurus
http://ibm.biz/datagurus
© 2015 IBM Corporation56
Raffle!
Fill out the paper form
and drop it in the box.
Two books being given away!
© 2015 IBM Corporation57
Dale Mumper Open Source Analytics Solution Engineer - Industrial