data science and big data analytics chap 9: advanced analytical theory and methods: text analysis...
Post on 28-Dec-2015
253 Views
Preview:
TRANSCRIPT
Data Science and Big Data Analytics
Chap 9: Advanced Analytical Theory and Methods: Text
Analysis
Charles TappertSeidenberg School of CSIS, Pace
University
Data Analytics Lifecycle
Data Analytics Lifecycle Overview Phase 1: Discovery Phase 2: Data Preparation Phase 3: Model Planning Phase 4: Model Building Phase 5: Communicate Results Phase 6: Operationalize Case Study: GINA
2.1 Data Analytics Lifecycle Overview
Huge volume of data Not just thousands/millions, but billions of
items Complexity of data types and
structures Varity of sources, formats, structures
Speed of new data creation and grow High velocity, rapid ingestion, fast
analysis
2.2 Phase 1: Discovery
Mobile sensors Social media – 700 Facebook updates/sec
in2012 Video surveillance Video rendering Smart grids Geophysical exploration Medical imaging Gene sequencing – more prevalent, less
expensive
2.3 Phase 2: Data Preparation
image
2.4 Phase 3: Model Planning
image
2.6 Phase 5: Communicate Results
Structured – defined data type, format, structure Transactional data, OLAP cubes, RDBMS, CVS files,
spreadsheets Semi-structured
Text data with discernable patterns – e.g., XML data Quasi-structured
Text data with erratic data formats – e.g., clickstream data Unstructured
Data with no inherent structure – text docs, PDF’s, images, video
2.7 Phase 6: Operationalize
image
2.8 Case Study: Global Innovation Network and
Analysis (GINA)
image
1.2 State of the Practicein Analytics
Business Intelligence (BI) versus Data Science
Current Analytical Architecture Drivers of Big Data Emerging Big Data Ecosystem
and a New Approach to Analytics
Business Intelligence (BI) versus Data Science
image
Business Intelligence (BI) versus Data Science
image
Current Analytical Architecture
image
Current Analytical Architecture
image
Drivers of Big Data
image
Emerging Big Data Ecosystem and a New Approach to Analytics
Four main groups of players Data devices
Games, smartphones, computers, etc. Data collectors
Phone and TV companies, Internet, Gov’t, etc. Data aggregators – make sense of data
Websites, credit bureaus, media archives, etc. Data users and buyers
Banks, law enforcement, marketers, employers, etc.
Emerging Big Data Ecosystem and a New Approach to Analytics
image
1.3 Key Roles for theNew Big Data
Ecosystem
image
Three Key Roles of theNew Big Data
Ecosystem
1. Deep analytical talent Advanced training in quantitative disciplines –
e.g., math, statistics, machine learning
2. Data savvy professionals Savvy but less technical than group 1
3. Technology and data enablers Support people – e.g., DB admins,
programmers, etc.
Three Recurring Data Scientist Activities
1. Reframe business challenges as analytics challenges
2. Design, implement, and deploy statistical models and data mining techniques on Big Data
3. Develop insights that lead to actionable recommendations
Profile of Data ScientistFive Main Sets of Skills
image
Profile of Data ScientistFive Main Sets of Skills
Quantitative skill – e.g., math, statistics
Technical aptitude – e.g., software engineering, programming
Skeptical mindset and critical thinking – ability to examine work critically
Curious and creative – passionate about data and finding creative solutions
Communicative and collaborative – can articulate ideas, can work with others
1.4 Examples of Big Data Analytics
Retailer Target Uses life events: marriage, divorce,
pregnancy Apache Hadoop
Open source Big Data infrastructure innovation
MapReduce paradigm, ideal for many projects
Social Media Company LinkedIn Social network for working professionals Can graph a user’s professional network 250 million users in 2014
Focus of Course
Focus on quantitative disciplines – e.g., math, statistics, machine learning
Provide overview of Big Data analytics In-depth study of a several key algorithms
top related