do you have big data? (most likely!)
TRANSCRIPT
![Page 1: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/1.jpg)
![Page 2: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/2.jpg)
Do You Have Big Data? (Most Likely!)Peter Myers – Bitwise SolutionsSaptak Sen – Microsoft
DBI-B325
![Page 3: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/3.jpg)
Presenter IntroductionPeter MyersBI Expert – Bitwise SolutionsBBus, SQL Server MCSE, MCT, SQL Server MVPExperienced in designing, developing and maintaining Microsoft database and application solutions, since 1997Focuses on education and mentoringBased in Melbourne, [email protected]://www.linkedin.com/in/peterjsmyers
![Page 4: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/4.jpg)
Presenter IntroductionSaptak SenSenior Product Manager, Big Data, Microsoft Corporation
Focused on Big Data and NoSQL offerings for Microsoft customers. For last 12 years at Microsoft he has worked on various distributed computing platforms.
Twitter: @saptak
![Page 5: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/5.jpg)
Session ObjectivesTo introduce:Big dataHadoopHDInsightTo describe big data processesTo demonstrate various big data scenariosTo describe and inspire you with big data capabilities and potentialTo provide relevant resources for further investigation
![Page 6: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/6.jpg)
Introducing Big Data“Big data is a collection of data sets so large
and complex that it becomes awkward to work with using on-hand database
management tools. Difficulties include capture, storage, search, sharing, analysis,
and visualization.” – Wikipedia
![Page 7: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/7.jpg)
Introducing Big DataContinuedBig data solutions deal with complexities of:
VOLUME (Size)
VARIETY (Structure)
VELOCITY (Speed)
![Page 8: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/8.jpg)
Introducing Big DataContinued
Data Complexity: Variety and Velocity
Terabytes
Gigabytes
Megabytes
Petabytes Big
DataLog filesSpatial & GPS coordinatesData market feedseGov feedsWeather Text/image
Click streamWikis/blogs
Sensors/RFID/devices
Social sentimentAudio/video
Web 2.0
Web LogsDigital MarketingSearch MarketingRecommendations
AdvertisingMobile
CollaborationeCommerce
ERP/CRMPayables
PayrollInventory
ContactsDeal TrackingSales Pipeline
![Page 9: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/9.jpg)
Introducing Big DataContinued
![Page 10: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/10.jpg)
Introducing Big DataResponding to New Questions
Advanced Analytics
Live Data Feed
Social Analytics
How do I optimize my services based on patterns of weather, traffic, etc.?
What’s the social sentiment of my product?
How do I better predict future outcomes?
![Page 11: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/11.jpg)
Introducing HadoopApache Hadoop is for big dataIt is a set of open source projects that transform commodity hardware into a service that can:Store petabytes of data reliablyAllow huge distributed computations
Key attributes:Open sourceHighly scalableRuns on commodity hardwareRedundant and reliable (no data loss)Batch processing centric –using “Map-Reduce” processing paradigm
![Page 12: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/12.jpg)
Introducing the Hadoop Ecosystem
Distributed Storage(HDFS)
Query(Hive)
Distributed Processing(Map Reduce)
Scripting(Pig)
NoSQL Database(HBase)
Metadata(HCatalog)
Data Integration( ODBC / SQOOP/
REST)
Business Intelligence (Excel, PowerView…
)
Machine Learning(Mahout)
Graph(Pegasus)
Stats processing(RHadoop)
Pipeline / workflow(Oozie)
Log file aggregation
(Flume)
PDW
World’s Data (Azure Data Marketplace) AD, System CenterWindows Azure
Storage
![Page 13: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/13.jpg)
Introducing HDInsightHDInsight is Microsoft’s 100% Apache compatible Hadoop distributionAvailable as a Windows Azure service – presently available as developer previewEmpowers organizations with new insights on previously untouched unstructured data, while connecting to the most widely used BI tools on the planet
![Page 14: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/14.jpg)
How it WorksFIRST, STORE THE DATA
Server
Files
Server Server
Server
![Page 15: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/15.jpg)
How it WorksSECOND, TAKE THE PROCESSING TO THE DATA
// Map Reduce function in JavaScriptvar map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {
if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());
}context.write(key, sum);};
ServerServer
ServerServer
RUNTIME
Code
![Page 16: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/16.jpg)
Demonstration
Peter MyersBitwise Solutions
1 – Word Count (The “Hello World” for Hadoop)
![Page 17: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/17.jpg)
Traditional E-Commerce Data FlowOPERATIONAL DATA
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Excess Data
Logs
ETL Some Data
Data Warehouse
![Page 18: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/18.jpg)
New E-Commerce Big Data FlowOPERATIONAL DATA
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Data Warehouse
Logs
Logs Raw Data“Store it All” Cluster
Raw Data“Store it All” Cluster
![Page 19: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/19.jpg)
Demonstration
Peter MyersBitwise Solutions
2 – Integration Services ETL with HIVE
![Page 20: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/20.jpg)
The Hadoop Data Flow
HadoopData Analytics
![Page 21: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/21.jpg)
Demonstration
Saptak SenMicrosoft
3 – Self-Service BI with HIVE
![Page 22: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/22.jpg)
Hadoop Capabilities
Machine Learning
Graph Processing
Distributed Compute
Extract Load Transform
Predictive
Analysis
![Page 23: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/23.jpg)
Common Big Data Algorithms
Mining Social-Network Graphs
Finding Similar Items Mining Data Streams Frequent Item Sets
Advertising on the Web
Link Analysis
Recommendation SystemsClustering
c
![Page 24: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/24.jpg)
Common Big Data AlgorithmsFrequent Item Sets – Market Basket Analysis
Market Basket Analysis
Plagerism
BioMarkers
Related Concepts
![Page 25: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/25.jpg)
Demonstration
Peter MyersBitwise Solutions
4 – Analysis Services Data Mining with HIVE
![Page 26: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/26.jpg)
Collaborative FilteringSimilar Music tastes
Common Big Data AlgorithmsFinding Similar or Complimentary Items
![Page 27: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/27.jpg)
Demonstration
Saptak SenMicrosoft
5 – Data Mining with Apache Mahout
![Page 28: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/28.jpg)
Do You Have Big Data?It is likely that you have big data – you’re definitely capturing outcome data, and probably capturing ambient data
All data – outcome or ambient – has value
Azure and SQL Server Data Platform can unleash insight from big data, small data, all data
![Page 29: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/29.jpg)
Take action and operationalize
Form theories, analyze, and refine
Find, combine,
and manage
Complete.
Powerful.Easy.
DATA INSIGHT
![Page 30: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/30.jpg)
ResourcesMicrosoft Big Datahttp://www.microsoft.com/bigdataWindows Azure HDInsighthttps://www.hadooponazure.comHDInsight Services for WindowsIncludes an excellent set of BI specific resources in the section named “Using HDInsight with Other BI Technologies”http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows-en-us.aspxBlog: Big Data for Everyone: Using Microsoft’s Familiar BI Tools with Hadoophttp://blogs.msdn.com/b/microsoft_business_intelligence1/archive/2012/02/24/big-data-for-everyone-using-microsoft-s-familiar-bi-tools-with-hadoop.aspx
![Page 31: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/31.jpg)
Related contentBreakout Sessions
DBI-B366: Big Data Analytics with Microsoft Excel 2013 [Wed 8:30AM]DBI-B340: Taking Your Application Design to the Next Level by Using SQL Server 2012 Data Mining [Thu 10:15AM]DBI-B401: Enriching Big Data for Analysis [Fri 10:15AM]DBI-B221: Data Management in Microsoft HDInsight: How to Move and Store Your Data [Fri 4:30PM]
![Page 32: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/32.jpg)
msdnResources for Developers
http://microsoft.com/msdn
LearningMicrosoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources
Sessions on Demandhttp://channel9.msdn.com/Events/TechEd
Resources for IT Professionalshttp://microsoft.com/technet
![Page 33: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/33.jpg)
Evaluate this session
Scan this QR code to evaluate this session.
![Page 34: Do You Have Big Data? (Most Likely!)](https://reader030.vdocument.in/reader030/viewer/2022033108/589ad2cd1a28abc93a8b589b/html5/thumbnails/34.jpg)
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.