Download - Whynosql
![Page 1: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/1.jpg)
Big Data and Why NoSQL Andy Cobley
School of ComputingUniversity of DundeeTwitter: @andycobley
![Page 2: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/2.jpg)
Who am I ?Lecturer at University of DundeeProgram director of Business Intelligence
and new program Data Science (http://goo.gl/ljl6N and http://goo.gl/uwHSi )
Geek and Hacker
![Page 3: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/3.jpg)
So what is Big Data?
![Page 4: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/4.jpg)
From evil Wikipedia“In information technology, big data[1]
consists of datasets that grow so large that they become awkward to work with using on-hand database management tools.”
Which doesn’t tell us muchAny definition that relies on data “size” will
become obsolete very quickly as data storage capabilities grows.
![Page 5: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/5.jpg)
Lets try something differentThe Three V’sVolume
How Big is the data, Terabytes ? Petabytes?Variety
Is it the same sort of data, what about blobs ? Does it change ?
VelocityHow fast is it coming in ? Can we store it fast
enough and then use it ?
http://nosql.mypopescu.com/post/5547192335/bigdata-the-three-vs-volume-variety-velocity
![Page 6: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/6.jpg)
The Twitter problemTwitpocalypseOverflow of status ids for 32 bit signed
integersBut beyond that, can we physically store data
fast enough ?
![Page 7: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/7.jpg)
Suppose we are storing 16 columns of 16 bytes
At 100 per second0.7 Terabyte per yearAdd at 1 million per second that’s 7 petabytes per yearThis is volume
![Page 8: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/8.jpg)
VariabilityData is sparse and can be different sizesOver time the type of data changesConsider click through data, as pages evolve
new data types and fields need to be stored
![Page 9: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/9.jpg)
What about
id MassSpec Meta data Meta data
1
2
![Page 10: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/10.jpg)
We need UDFUser Defined functions inside the dBOr a different way of dealing with it, such as
Hadoop or MRSQL.
![Page 11: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/11.jpg)
So what is NoSqlThrows away everything you know about
DatabasesIs a family of different databasesLots of different “products”BUT !http://nosql.mypopescu.com/post/101632061
7/mongodb-is-web-scale (warning might offend)
They should only be used when it’s sensible, they are not magic sauce.
![Page 12: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/12.jpg)
NoSql typesKey-ValueColumn-familyDocument databases
Allow sharding across nodesGraph
Fast for graph like data and operations
![Page 13: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/13.jpg)
Some NoSQL databasesCouchDb MongoDbCassandraRiakHbaseNeo4j
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
![Page 14: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/14.jpg)
Sharding ?Distribution of data across nodesAllows performance to be spread across
multiple machinesSQL databases can be shardedNot all NoSQL databases can be sharded
![Page 15: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/15.jpg)
Cap TheoremCAP (or Brewers) theorem says:It’s impossible for a web service to provide
the following ConsistencyAvailabilityPartition tolerance
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdf
But see : http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed and http://codahale.com/you-cant-sacrifice-partition-tolerance/
![Page 16: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/16.jpg)
http://blog.nahurst.com/visual-guide-to-nosql-systems
![Page 17: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/17.jpg)
Partitions ?Essentially failing to achieve consistency
within a set time causes a partition.You can sacrifice availability to ensure
consistencyPartitions are rare and if you have one server,
almost never happenPartitions are caused by networks, failed
nodees
![Page 18: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/18.jpg)
Eventual Consistency Eventually all nodes will tell the same storyIsn’t this a mad idea ?Facebook (Actually not)The Internet is based on and Eventual
Consistency dBDNS
![Page 19: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/19.jpg)
Introducing CassandraDistributed / DecentralizedColumn OrientatedKey Value StoreFault Tolerant
![Page 20: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/20.jpg)
Network topology of a Cassandra dbMultiple nodesCassandra can be Rack AwareKeys are replicated across nodesIt’s essentially a DHT Distributed Hash
TableThink BitTorrent
![Page 21: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/21.jpg)
CQLVersion 8 introduced CQL Cassandra Query
LanguageAlmost looks like SQL !http://crlog.info/2011/09/17/cassandra-query-
language-cql-v2-0-reference/ Language ref
http://www.datastax.com/docs/0.8/dml/using_cql
![Page 22: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/22.jpg)
DemoStart CassandraOpen CQLSHCreate KeyspaceCreate a columnfamilyNow we can insert !
![Page 23: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/23.jpg)
So why does this work ?Jsmith
Password: ch@ngem3a
JbrownGender: MalePhone: 01382 345078
Column store, keys with name: value pairs underneath
![Page 24: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/24.jpg)
Interfacing to CassandraBased on Thrift
http://thrift.apache.org/Large number of Languages supported
http://wiki.apache.org/cassandra/ClientOptionsI’ve used Java and Hector
http://prettyprint.me/Although there is a Csharp version
http://hectorsharp.com/
![Page 25: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/25.jpg)
Cassandra JDBCVery new, difficult to know how stable it isNeeds compiling and libraries not in
Cassandra !http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/
![Page 26: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/26.jpg)
AstyanaxFrom NetflixBased on Hector but said to be a lot simpler!https://github.com/Netflix/astyanax/wiki
![Page 27: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/27.jpg)
jBloggyAppy a demo app of CassandraAll Source code on Githubhttps://github.com/acobley/jBoggyAppyFeel free to use and abuseSimple blogging App
![Page 28: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/28.jpg)
A word on using OpenSource softwareVersioning !Things Change !Documentation is wrong !
http://prettyprint.me/End up reading unit tests to actually
program.
![Page 29: Whynosql](https://reader035.vdocument.in/reader035/viewer/2022070316/5558c3aed8b42a235c8b461e/html5/thumbnails/29.jpg)
One Last thing Dundee DDD 17th November , Big Data trackAnyone interested in speaking ?