big data in nato and your role
DESCRIPTION
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems. The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.TRANSCRIPT
BIG DATA IN NATO: WHAT IT MEANS TO YOU
Jay Gendron
jaygendron
October 29, 2014
U
Our Journey
• BIG DATA– What is BIG?– What is DATA?
• Two Workhorses of Big Data– Enterprise Data Warehousing– Predictive Analytics
• Weaponry Available– Open Source Tools– Open Source Data
• Use Cases• Future Trends
BIG DATA: HOW BIG IS BIG?
Need something small…
20lb copy paper = 0.004” = 0.1 mm
Let 1 byte = 1 paper thickness
Image: http://pencilgrinder.wordpress.com/
01000010 01101001 01100111 00100000 01000100 01100001 01110100 01100001 00100000 01101001 01101110 00100000 01001110 01000001 01010100 01001111 00111010 00100000 01010111 01101000 01100001 01110100 00100000 01101001 01110100 00100000 01001101 01100101 01100001 01101110 01110011 00100000 01110100 01101111 00100000 01011001 01101111 01110101 00101110
This is 39 bytes…or in our thinking…
a pile of paper 39 pages thick
So 1 MB is a million sheets thick…
1 MB
Imagine the front squareCompletely covered with 10,000
sheetsStacked 100 sheets high
That’s under one-half inch thick
…1 GB?
33 feet tall10 meters
1 GB
266 feet tall81 meters8GB
So what does 1 TB look like?
6.3 miles (10 km) high. Plane! Duck!
1 TB
BIG data
There is no technical definition
3V’s =VolumeVelocityVariety
“Big Data is at the heart of modern science and business…the necessity of grappling with Big Data, and the desirability of unlocking the information hidden within it, is now a key theme in all the sciences – arguably the key scientific theme of our times.”
-Francis X. Diebold , University of PennsylvaniaA Personal Perspective on theOrigin(s) and Development of “Big Data”:The Phenomenon, the Term, and the Discipline
(2012)
Laney, D. (2001)
Volume
Large Synoptic SurveyTelescope (LSST)
40TB/day
100+PB in 10-year lifetime
Illumina HiSeq 2000DNA Sequencer
~1TB/day; 30 TB/month
Images: https://d396qusza40orc.cloudfront.net/datasci/lecture_slides/week1/005_escience.pdf
250 miles Exosphere
186 miles Thermosphere
25 miles Mesosphere
6 miles Troposhere
40 TB: 10,000 x 4,000,000,000 sheets high
How Big is the Internet?
Size of the Internet as of 31st Dec 2013
14.3 Trillion - Webpages, live on the Internet.48 Billion - Webpages indexed by Google Inc.14 Billion - Webpages indexed by Microsoft's Bing.672 Exabytes - 672,000,000,000 Gigabytes (GB) of accessible data.
Source: http://www.factshunt.com/2014/01/total-number-of-websites-size-of.html
1 EB = 1,000,000,000,000,000,000 = 1 x 1018
Velocity
43.6 EB = Total Internet Traffic in 2013
Volume + Velocity + Variety
Image: http://www.mediabistro.com/alltwitter/files/2011/06/internet-60-seconds-infographic.jpg
DATA IS…
…that which aids Decision Making
Data is…
The “WebMaster”
Image: http://www.fivem.be/
The “DataMaster”
• Hiring a room of PhD’s won’t solve Big Data
• They have a role…as does IT
• Ultimately Big Data will also be a team effort like the web buildout
• …and You have a role on that team
Image: http://www.fivem.be/
BIG DATA: THE WORKHORSES
Difficult Data is more apt
Images: Elephant - http://www.marcolotz.com/?p=77 Word Cloud - http://www.fotolia.com/id/36647313?by=serie
Enterprise DataWarehousing
PredictiveAnalytics
Enterprise Data Warehousing
• What began with MapReduce in 2004• Evolved in open source like Hadoop
• Permanent contributions of evolution:– Fault tolerance – running on many machines
and accounting for failures– Schema-on-Read – more flexibility in working
with data in different forms– User Defined Functions – giving developers
more freedom in where to place queries
Source: https://class.coursera.org/datasci-002/lecture/15
Predictive Analytics
• Business Intelligence• Statistics• Visualization• Programming• Machine Learning
Image: http://www.ted.com/talks/nate_silver_on_race_and_politics?language=en
Desired End State
A team approach
Predict
BI
Stats
Analyst
Viz
S/W + scale
+ algorithm
+ statistics
+ programming + data products
Impact of the Phenomenon
Images: Galileo - http://www.crystalinks.com/galileo.html; Formulae - https://msschwarzeducationstation.wordpress.com/page/2/; Computers - http://www.utsystem.edu/blog/2011/09/26/ut-austin-awarded-50-million-build-faster-more-powerful-supercomputer; Book Cover - http://radar.oreilly.com/2011/09/building-data-science-teams.html
Theoretical
ComputationalEmpirical +
Empirical
Human-Computer Symbiosis
Sankar, S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
People and Culture Count
People are “tinkerers” & “hobbyists”
We ALL have 1 thing in commonWE ARE ALL DIFFERENT!
ExperiencePerspective
THE WEAPONRY: OPEN SOURCES
Source: Sankar (2012). https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
Repurposing Data
Open Source Tools
Visual Text Analytics
Image: http://hermeneuti.ca/voyeur/tools
Open Datasets
Again…team approach
SCIENCE
•Stats•S/W
DOMAI
NS
•SME•BI•Analyst
INFO TECH
•DBA•VizT TT TTT
Skill Building
• Free Courses
• Meetups
• Hackathons
• Podcasts
Image: http://www.amazon.com/Data-Science-Business-data-analytic-thinking/dp/1449361323
…to make the point
Fawcett, T. & Provost, F. (2013, August 9). Data Science for Business. Retrieved from https://www.safaribooksonline.com/library/view/data-science-for/9781449374273/.
…on page 13
DATA USE CASES
Remember…You have a role
Informal poll at Univ of WA:
How much time do you spend “handling data” as opposed to “doing science”?
Most given response: 90%
Image: http://www.ibmbigdatahub.com/infographic/infographic-big-data-exploration
Image: http://www.ibmbigdatahub.com/infographic/infographic-enhanced-360-view-customer
Image: http://www.ibmbigdatahub.com/infographic/security-intelligence-extension
Fraud Detection
Image: http://www.ibmbigdatahub.com/infographic/countering-fraud-big-data-world
Human-Computer Symbiosis
Sankar, S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
Image: http://www.ibmbigdatahub.com/infographic/operations-analytics
Data Warehouse Augmentation
Existing + NEW
Operational Efficiencies
Images: Arrow - http://canadawebservices.com/7-powerful-ways-increase-website-traffic/; Explore - http://www.keadventure.com/page/explore_more!.html
More Data
Leveraging Use Cases
Business and Commerce
• Big Data Exploration
• 360 Degree View
• Security & Intelligence• Operational Analysis
• Data Warehouse Augmentation
NATO Enterprise• FMN (“experiment”
and data mining)• Enterprise (supporting
commands)• Cyber Defence (threat
tactics)• C2 (requirements text
analysis)• Ent. Architecture &
Technology (req’ts)
FUTURE TRENDS
According to Gartner
Image: http://www.forbes.com/sites/gilpress/2014/08/18/its-official-the-internet-of-things-takes-over-big-data-as-the-most-hyped-technology/
According to IBM
• More Analytics – Less Gut• Data security and privacy• Leaders with data knowledge• Data-centric applications• Integrating internal and external• Investments in platforms
Cho, I. (2013, February 3). 6 trends in big data and analytics [IBM Big Data Hub]. Podcast retrieved from http://www.ibmbigdatahub.com/podcast/6-trends-big-data-and-analytics
An example
Visual Analytics
Koblin, A. (2011, March). Visualizing ourselves ... with crowd-sourced data [TED2011]. Podcast retrieved from http://www.ted.com/talks/aaron_koblin?language=en
Art and science meet
Miebach, N. (2011, July). Art made of storms [TEDGlobal 2011]. Podcast retrieved from http://www.ted.com/talks/nathalie_miebach?language=en
Summary
• Big Data – has implications• Big Data = Data + Analytics• Open Source – tools and data• Use Case – leverage others’ results• Future– More analytics and applications– Need for data fluency among managers– Need processes to encourage exploring
You know WHAT
…the SO WHAT
NOW WHAT?
Empower
Make apps
Find data
ReferencesCho, I. (2013, February 3). 6 trends in big data and analytics [IBM Big Data Hub]. Podcast retrieved from
http://www.ibmbigdatahub.com/podcast/6-trends-big-data-and-analytics.
Diebold, F.X. (2012, November 26). A personal perspective on the origin(s) and development of “big
data”: The phenomenon, the term, and the discipline. Retrieved from
http://www.ssc.upenn.edu/~fdiebold/papers/paper112/Diebold_Big_Data.pdf.
Fawcett, T. & Provost, F. (2013, August 9). Data Science for Business. Retrieved from
https://www.safaribooksonline.com/library/view/data-science-for/9781449374273/. (on page 13)
Howe, B. (2013). Data science in science [PDF document]. Retrieved from Lecture Notes Online Web site:
https://d396qusza40orc.cloudfront.net/datasci/lecture_slides/week1/005_escience.pdf.
Koblin, A. (2011, March). Visualizing ourselves ... with crowd-sourced data [TED2011]. Podcast retrieved
from http://www.ted.com/talks/aaron_koblin?language=en.
Laney, D. (2001), 3-D data management: Controlling data volume, velocity and variety. META Group
Research Note, February 6. Retrieved from http://
blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
.
Miebach, N. (2011, July). Art made of storms [TEDGlobal 2011]. Podcast retrieved from
http://www.ted.com/talks/nathalie_miebach?language=en.
Sall, E. (2013, February 12). Top 5 big use cases [IBM Big Data Hub]. Podcast retrieved from http://
www.ibmbigdatahub.com/podcast/top-5-big-data-use-cases.
Sankar, S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved
from
https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en.