experience cassandra wenjing wu 2011-5-17. outline about cassandra data model deployment client...
TRANSCRIPT
outline
• About Cassandra• Data Model• Deployment• Client Programming• An example: implementing a name space • Stress tests
What is Cassandra(1)
• Decentralized/fault tolerant/scalable /durable distributed hash storage
• Originally developed by facebook, now maintained by apache.
• A list of big users: cloudkick, digg, Facebook, twitter, Rackspace, Cisco etc.
• A combination of Big Table and Dynamo• Like a big hash table(both 2 and 3 dimensional )
What is Cassandra(2)
• Eventual consistence• CAP theory: AP, however, configurable
tradeoffs between A and C. • Easy to deploy• Rich client APIs for your own application, easy
to install/use
Data model(1)
• Non SQL• Support single index for query– Select username from user where
city=‘beijing’(Yes)– -select username from user where city=‘beijing’
and age=‘28’ (No!)• No joins , no complicated query• Useful for suitable cases
Data model(2)• Keyspace , one for each application, equivalent to a
database• Column: an attribute of the structured data, has a name,
value and timestamp, equivalent to column of a table. (column=username, value=tom, timestamp=1299137043078874)
• Column family: a serial columns as above ones. Define a column family User:– (column=username, value=tom,
timestamp=1299137043078874)– (column=email, [email protected],
timestamp=1299137043078133)– (column=city, value=beijing, timestamp=1299137043078141)
Data Model(3)
• A row : identified by a key, instantiated one or more of the columns in column family:– RowKey: userkey1– (column=username, value=tom, timestamp=1299137043078874)– (column=email, [email protected],
timestamp=1299137043078133)• Application creates the key(unique, usually use uuid to avoid
collision) for each row, each row can have different number of columns within the column family
• Analogous to 2 dimensional hash tableUser{row_key1}{username}=tomUser{row_key1}{email}[email protected]
Data Model(4)
• Supper column family– Each column of the super column family is a
column family
• 3 dimensional hash table– Person{row_key1}{user}{user_name}=tom– Person{row_key1}{user}{email}[email protected]– Person{row_key2}{manager}{user_name}=Alice
Deployment(1)
• Pretty easy!– Wget
http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.5/apache-cassandra-0.7.5-bin.tar.gz
– tar zxvf apache-cassandra-0.7.5-bin.tar.gz– cd apache-cassandra-0.7.5– udo mkdir -p /var/log/cassandra– sudo chown -R `whoami` /var/log/cassandra– sudo mkdir -p /var/lib/cassandra– sudo chown -R `whoami` /var/lib/cassandra
Deployment(2)• Start service
– bin/cassandra –f• Try to connect with client:
– bin/cassandra-cli --host localhost –port 9160• How to start:
– create keyspace Keyspace1– create column family Users with comparator=UTF8Type and
default_validation_class=UTF8Type;– set Users[jsmith][first] = 'John';– set Users[jsmith][last] = 'Smith';
• What you see?– [default@KS1] get Users[jsmith];– => (column=last, value=Smith, timestamp=1287604215498000)– => (column=first, value=John, timestamp=1287604214111000)
Run over a cluster
• Configuration file– conf/cassandra.yaml – listen_address: fst01.ihep.ac.cn(for gossip)– rpc_address: fst01.ihep.ac.cn(for client)– seeds: - fst02.ihep.ac.cn - fst03.ihep.ac.cn - fst04.ihep.ac.cn
• Test the cluster– bin/nodetool –host fst01.ihep.ac.cn ring
Client Programming
• Rich client options (c/java/php/perl/python)• Driver for python client(pycassa)• Easy to install– Install by easy_install
• Have easy_install installed• $easy_install pycassa
– Manual install• $ Easy_install thrift05• $ git clone git://github.com/pycassa/pycassa.git• $ cd pycassa/• $ sudo python setup.py install
API examples
• >>> import pycassa• >>> pool = pycassa.connect('Keyspace1',
['localhost:9160'])• col_fam = pycassa.ColumnFamily(pool, ’User')• col_fam.insert(’user_key1', {’username':
’tom'})• col_fam.get(’user_key1')• col_family.remove(‘user_key1’)
An example: implement a namespace
• Use pycassa to implement a name space.• Similar to ext3 file system, inodes to represent
metadata• 2 column family used (Directory, FFile) to
describe the metadata• CF Directory, columns include :– Metadata: create/modify/access time, owner,group– Contents inside the directory: sub directories
names, file names
Directory(1)
dir_key1
Owner filestore
Group filestore
testdir1 dir_keyxxxxx1
testdir2 dir_keyxxxxx2
testfile1 file_keyyyyyy1
Directory(2)
• A row :– RowKey: dirkey_372c5d87-4567-11e0-bc71-001a64631cb0– => (column=dir3, value=3e180f00-459b-11e0-8846-
001a64631cb0, timestamp=1299159388519845)– => (column=f2, value=c69f2ac2-45a6-11e0-9c79-001a64631cb0,
timestamp=1299329058698329)– => (column=f3, value=ddd77c2e-45a5-11e0-934f-001a64631cb0,
timestamp=1299328989534849)– => (column=group, value=root, timestamp=1299137043078874)– => (column=owner, value=root, timestamp=1299137043078874)– => (column=p3, value=edf0ed73-45a6-11e0-bf90-
001a64631cb0, timestamp=1299164408007020)
FFile(1)
• CF FFile is used to store the metadata and contents of a specific file
• FFile columns include:– Metadata: create/modify/access time,
owner,group,size, checksum– Contents of the file
Ffile(3)
• A row– RowKey: filekey_edf0ed73-45a6-11e0-bf90-001a64631cb0– => (column=content, value=– 127.0.0.1 localhost.localdomain localhost– 202.122.33.12 lcg002.ihep.ac.cn lcg002– 192.168.56.11 lwn011.ihep.ac.cn lwn011– ....,timestamp=1299164408007882)– => (column=group, value=root, timestamp=1299164408007882)– => (column=owner, value=root, timestamp=1299164408007882)– => (column=size, value=11281, timestamp=1299164408007882)
Name space operation
• fs_ls (list a dir/file)• fs_mkdir(make a dir)• fs_rename (rename a file/dir)• fs_mv(move a file/dir to another file/dir)• fs_rm (remove a file/dir)• fs_cpw(write a file to the storage)• fs_cpr(read a file from the storage)
How does it workdir_key1
owner filestore
group filestore
testdir1 dir_keyxxx1
testdir2 dir_keyxxx2
dir_keyxxx1
owner filestore
testdir12 dir_keyxxx4
testfile11 file_keyyyy1
testfile12 File_keyyyy2
file_keyyyy1
owner filestore
group filestore
size 1023
content This is a test file….
/testdir1/testfile11
How to implement?
• mk_dir: fs_mkdir /testdir1/testdir2/testdir3 (/testdir1/testdir2 already exisits)– 1. generate a key for this entry: new_key=dirkey_`uuid`– 2. walk from the root directory(/, key is dirkey_1) to get
the key for the parent directory(testdir2), assuming the key is dirkey_XXX
– 3.insert a column in the parent directory entry (testdir2, with key dirkey_XXX). the column name is the name of the inserting directory(testdir3), and its value is the new_key
– 4. create a new entry for the new directory, with all the metadata columns (owner, group)
Stress test
• Testbed: A small cluster– 4 nodes cluster– Replica number is 3– One client
• test methodology:– Operation sequence:
• mkdir/touch a file/list dir & file
– Depth of directory(4) /dir1/dir2/dir3/dir4– -test result: finished 255102 operation(mkdir,create file,list dir, list
file) in 111397.302446 seconds, 0..436second for each operation sequence
– Another test failed (more than 10million operation) due to memory crash.