nosql for the sql server pro lynn langit feb 2013 – sdc, sweden
TRANSCRIPT
NoSQL for the SQL Server Pro
Lynn Langit
Feb 2013 – SDC, Sweden
Is NoSQL just Hadoop?
• HUGE Hype factor over last few years
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers
Hadoop in the Enterprise
Working with HadoopCommon Tools / Languages• Java (JDK) / Eclipse• MapReduce
• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)
• Pig (ETL -- Java)• Hive (HQL Query)
• HBase tables• Others
• Mahout (analyze)• Karmasphere (analyze)• R (analyze)
Demo -HDInsight– Cluster Allocation
What is the relationship?
NoSQL BigData
BigData = Exponentially More Data• Retail Example -> ‘Feedback Economy’– Number of transactions– Number of behaviors (collected every minute)
12:00 12:30 1:00 1:30 2:00 2:300
500
1000
1500
2000
2500
PurchasesLocationsPhone data
BigData = ‘Next State’ Questions
• What could happen?• Why didn’t this happen?• When will the next new thing
happen?• What will the next new thing be?• What happens?
Collecting Behavioral
data
Demo - HDInsight - MapReduce
Hitting (Relational) Walls
• CA– Highly-available
consistency• CP– Enforced consistency
• AP– Eventual consistency
So many NoSQL options• More than just the Elephant in the room• Over 120+ types of NoSQL databases
Flavors of NoSQL
Key / Value Database• Schema-less• State (Persistent or Volatile)• Examples– AWS Dynamo DB– Riak
Column Database
• Wide, sparse column sets• Examples:– Cassandra– HBase– BigTable– GAE HR DS– Azure Tables– SQL 2012
Tabular Model
More about Column Databases
• Type A– Column-families– Non-relational– Sparse– Examples: HBase, Cassandra, xVelocity (SQL 2012 Tabular)
• Type B– Column-stores– Relational– Dense– Example:
• SQL Server 2012 Columnstore index
Demo - Document Database (Mongo DB)
• document-oriented (collection of JSON documents) w/semi structured data– Encodings include BSON, JSON, XML…
• binary forms – PDF, Microsoft Office documents --
Word, Excel…)
Demo - Graph Database (Neo4j)• a lot of many-to-many relationships• recursive self-joins • when your primary objective is quickly
finding connections, patterns and relationships between the objects within lots of data
So which type of NoSQL? Back to CAP…
ConsistencyAvailability
Partitioning
CP = NoSQL/columnHadoopBig TableH-baseMemCacheDB
CA = SQL/RDBMSSQL Sever /OracleMySQL
AP = NoSQL/document or key/valueDynamoDBCouchDBCassandraVoldemort
Which type of NoSQL for which type of data?
Type of Data Type of NoSQL solution Example
Log files Wide Column HBase
Product Catalogs Key Value on disk DynamoDB
User profiles Key Value in memory Redis
Startups Document MongoDB
Social media connections Graph Neo4j
LOB w/Transactions NONE! Use RDBMS SQL Server
Cloud-hosted NoSQL up to 50x CHEAPER
The reality…two pivots
Storage Methods• SQL (RDBMS) • NoSQL
Storage Locations• On premises • Cloud-hosted
NoSQL (Cloud) BLOB Storage Buckets• Amazon – S3 or Glacier– The gold standard
• Google – Cloud Storage– Free for developers
• Microsoft Azure BLOBS• DropBox, Box…
Cloud-hosted RDBMS• AWS RDS – SQL Server,
mySQL, Oracle– Medium cost– Solid feature set, i.e. backup,
snapshot– Use existing tooling
• Google – mySQL– Lowest cost– Most limited RDBMS
functionality• Microsoft – SQLAzure
– Highest cost
Demo - AWS RDS
• SQL Server, MySQL or Oracle• Essential to understand pricing models
Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
Cloud RDBMS RDS – all major mySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL databases DynamoDB H/R Data on GAE Azure Tables
Streaming ML or (Mahout)
Custom EC2 Prospective Search &Prediction API
StreamInsight
Document or Graph MongoDB on EC2 Freebase MongoDB on Windows Azure
Hadoop Elastic MapReduce using S3 & EC2
none HDInsight
Dremel/Warehousing
RedShift BigQuery none
Data Scientists…
Com
parin
g…
Karmasphere Studio for AWS
Hadoop Connector to Excel
Google BigQuery• Hadoop-like (Dremel) based service• For massive amounts of data• SQL-like query language
Dremel Realized => Impala
• Interactive Hadoop?
Other types of cloud data services
Hosting public datasets• Pay to read• Earn revenue by offering for
read
Cleaning / matching (your) data • ETL – Microsoft Data
Explorer, Google Refine• Data Quality – Windows
Azure Data Market, InfoChimps, DataMarket.com
NoSQL To-Do ListUnderstand CAP & types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments
Learn noSQL access technologies• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…
The Changing Data Landscape
NoSQLRDBMS
OtherServices
www.TeachingKidsProgramming.org• Free Courseware ( • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic
• recipes)
Toward Data Craftsmanship…
Follow me @LynnLangit
RSS my blog www.LynnLangit.com
Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions