Download - Counters for real-time statistics Aug 2011
![Page 1: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/1.jpg)
Counters for real-time statistics
Aug 2011
![Page 2: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/2.jpg)
Quick Cassandra storage primer
![Page 3: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/3.jpg)
Standard columns
Idempotent writes – last client time stamp wins Store byte [] - can have validators No internal locking Not read before write Example:
set Users['ecapriolo']['fname']='ed';
![Page 4: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/4.jpg)
Counter columns
Store Integral values only Can be incremented or decremented with single
RPC Local read before write Merged on read Example:
incr followers['ecapriolo']['x'] by 30
![Page 5: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/5.jpg)
Counters combine powers with:
composite keys: incr stats['user/date']['page'] by 1; scale to distribute writes
A distributed system to record events Pre-caclulated real time stats
And you get:
![Page 6: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/6.jpg)
Other ways to collect and report
Store in files, process into reports Example: data-> hdfs -> hive queries -> reports Light work on front end Heavy on back end
Store into relational database Example:
data -> rdbms (ind) -> rt queries & reports -> reports Divides work between front end and back end Indexes can become choke points
![Page 7: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/7.jpg)
Example data set
url | username | event_time | time_to_serve_millis
/page1.htm | edward | 2011-01-02 :04:01:04 | 45
/page1.htm | stacey | 2011-01-02 :04:01:05 | 46
/page1.htm | stacey | 2011-01-02 :04:02:07 | 40
/page2.htm | edward | 2011-01-02 :04:02:45 | 22
![Page 8: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/8.jpg)
“Query” one: hit count bucket by minute
page | time | count
/page1.htm | 2011-01-02 :04:01 | 2
/page1.htm | 2011-01-02 :04:02 | 1
/page2.htm | 2011-01-02 :04:02 | 1
![Page 9: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/9.jpg)
“Query” two: resources consumed by user per hour
user | time | total_time_to_serve
edward | 2011-01-02 :04 | 67
stacey | 2011-01-02 :04 | 86
![Page 10: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/10.jpg)
Turn a record line into a pojo
class Record {
String url,username;
Date date;
int timeToServe;
}
Use your imagination here:
public static List<Record> readRecords(String file) throws Exception {
![Page 11: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/11.jpg)
writeRecord() Method
public static void writeRecord(Cassandra.Client c, Record r) throws Exception {
DateFormat bucketByMinute = new SimpleDateFormat("yyyy-MM-dd HH:mm");
DateFormat bucketByDay = new SimpleDateFormat("yyyy-MM-dd");
DateFormat bucketByHour = new SimpleDateFormat("yyyy-MM-dd HH");
![Page 12: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/12.jpg)
“Query” 1 page counts by minute
CounterColumn counter = new CounterColumn();
ColumnParent cp = new ColumnParent("page_counts_by_minute");
counter.setName(ByteBufferUtil.bytes (bucketByMinute.format(r.date)));
counter.setValue(1);
c.add( ByteBufferUtil.bytes(
bucketByDay.format(r.date)+"-"+r.url)
, cp, counter, ConsistencyLevel.ONE);
![Page 13: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/13.jpg)
“Query” 2 usage by users per hour
CounterColumn counter2 = new CounterColumn();
ColumnParent cp2 = new ColumnParent ("user_usage_by_minute");
counter2.setName( ByteBufferUtil.bytes(
bucketByHour.format(r.date)));
counter2.setValue(r.timeToServe);
c.add(ByteBufferUtil.bytes(
bucketByDay.format(r.date)+"-"+r.username)
, cp2, counter2, ConsistencyLevel.ONE);
![Page 14: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/14.jpg)
How this works
![Page 15: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/15.jpg)
Results
[default@counttest] list user_usage_by_minute;
——————-
RowKey: 2011-01-02- stacey
=> (counter=2011-01-02 04, value=86)
——————-
RowKey: 2011-01-02- edward
=> (counter=2011-01-02 04, value=67)
![Page 16: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/16.jpg)
More Results
[default@counttest] list page_counts_by_minute;
——————-
RowKey: 2011-01-02-/page1.htm
=> (counter=2011-01-02 04:01, value=2)
=> (counter=2011-01-02 04:02, value=1)
——————-
RowKey: 2011-01-02-/page2.htm
=> (counter=2011-01-02 04:02, value=1)
![Page 17: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/17.jpg)
Recap
Counters pushed work to the “front end” Data is bucketed, sorted, and indexed on insert Data is already “ready” on read Designed around how you want to read data
Distributed writes across the cluster Bucketed data by time, user, page, etc. Different then table/index contention point
![Page 18: Counters for real-time statistics Aug 2011](https://reader035.vdocument.in/reader035/viewer/2022062723/56813d60550346895da72f92/html5/thumbnails/18.jpg)
Questions?Full code at: http://www.jointhegrid.com/highperfcassandra/?cat=7