mongodb tick data presentation
DESCRIPTION
TRANSCRIPT
MongoDB as a Tick Store
MongoDB WorldNew York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including •MongoDB 2.6•Sharding•Replication•Aggregation
http://world.mongodb.comSave $200 with discount code THANKYOU
3
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
4
MongoDB Overview
350+ employees 1,000+ customers
Over $231 million in funding13 offices around the world
5
7,000,000+ 7,000,000+ MongoDB DownloadsMongoDB Downloads
150,000+ 150,000+ Online Education RegistrantsOnline Education Registrants
35,000+ 35,000+ MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users
30,000+ 30,000+ MongoDB User Group MembersMongoDB User Group Members
20,000+ 20,000+ MongoDB Days AttendeesMongoDB Days Attendees
Global Community
6
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
7
MongoDB.
NoSQL Document based database.
Designed to build todays applications.
•Fast to build.
•Quick to adapt.
•Easy to scale
•Lessons learned from 40 years of RDBMS.
8
Relational Model
PlanID BenFK Plan
100 1 PPO Plus
200 2 Standard
EmpID Name Dept Title Manage Payband
9950 Dunham, Justin
500 1500 6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
BenID Benefit
1 Health
2 Dental
DeptID Department
500 Marketing
TitleID Title
1500 Product Manager
9
Document Model
EmpID Name Dept Title Manage Payband Benefits
9950 Dunham, Justin
Marketing Product Manager
6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
Health PPO Plus
Dental Standard
PlanID BenFK Plan
100 Health PPO Plus
200 Dental Standard
10
MongoDB - Agility
Dynamic Schemas
V 1.0 V 1.1 V 2.0
EmpID Name Dept Title Manager Payband Benefits
9950 Dunham, Justin
Marketing Product Manager
6531 C
EmpID Name Title Payband Bonus
9952 Joe White CEO E 20,000
EmpID Name Dept Title Manager Payband Shares
9531 Nearey, Graham
Marketing Director 9952 D 5000
Health PPO Plus
Dental Standard
11
ShellCommand-line shell for interacting directly with database
MongoDB - Usability
DriversDrivers for most popular programming languages and frameworks
> db.collection.insert({product:“MongoDB”, type:“Document Database”})> > db.collection.findOne(){
“_id” : ObjectId(“5106c1c2fc629bfe52792e86”),“product” : “MongoDB”“type” : “Document Database”
}
Java
Python
Perl
Ruby
Haskell
JavaScript
12
MongoDB - Utility
• Complex Indexed Queries
• Aggregation.
Age > 65 AND Male living near LyonAge Profit Margin
1-17 0
18-35 20
36-50 80
51-65 50
66+ 5
13
MongoDB - Scalability
• High Availability
• Auto Sharding
• Enterprise Monitoring
• Grid file storage
14
Column Family
Key/Value Store
Relational
Document Store
Options for building a Operational Database
15
MongoDB & Hadoop
• Multi-source analytics• Interactive & Batch• Data lake
• Online, Real-time• High concurrency &
HA• Live analytics
Operational
Post Processingand
MongoDB Connector for
Hadoop
16
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
17
Tick Data – Why MongoDB?
• Flexible Data Model– Easy Onboarding
• Flexible Querying and Indexing– Primary, Secondary & Index Intersection
• Aggregation Framework – Native to MongoDB
• Pre-aggregation pattern– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
18
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrice: 55.37,offerPrice: 55.58,bidQuantity: 500,offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Flexible Data ModelEasy Onboarding – e.g. Equities
19
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrices: [55.37, 55.36, 55.35],offerPrices: [55.58, 55.59, 55.60],bidQuantities: [500, 1000, 2000],offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Flexible Data ModelEasy Onboarding – e.g. Depth of Book
20
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
Flexible Data ModelEasy Onboarding – e.g. News
21
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
timestamp: ISODate("2013-02-15 10:00"),
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Flexible Data ModelEasy Onboarding – e.g. Social Networking
22
Tick Data – Why MongoDB?
• Flexible Data Model– Easy Onboarding
• Flexible Querying and Indexing– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce– Native to MongoDB
• Pre-aggregation pattern– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
23
Architecture for Querying Data
Higher Latency Trading
Applications
Higher Latency Trading
Applications
Backtesting ApplicationsBacktesting Applications
Research & Analysis
Applications
Research & Analysis
Applications
24
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
Flexible Querying and IndexingIndex any field [or arrays]
25
// Ticks for last month for media companies
> db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01-01")}, timestamp: {$lte: new ISODate("2013-01-31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({ symbol: "DIS",
bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02-01")}})
Flexible Querying and IndexingRich Query Language
26
Tick Data – Why MongoDB?
• Flexible Data Model– Easy Onboarding
• Flexible Querying and Indexing– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce– Native to MongoDB
• Pre-aggregation pattern– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
27
//Aggregate minute bars for Disney for February
db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} )
Aggregation FrameworkParallel execution across cluster
28
Tick Data – Why MongoDB?
• Flexible Data Model– Easy Onboarding
• Flexible Querying and Indexing– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce– Native to MongoDB
• Pre-aggregation pattern– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab
29
Pre-aggregation patternReal-time and continuous state
{ _id : ObjectId("4e2e3f92268cdda473b628f6”)symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrices: [55.37, 55.36, 55.35],…
} {
_id : ObjectId("4e2e3f92268cdda473b628f6”)symbol : "DIS",timestamp: ISODate("2013-02-15 …
}
{ _id : ObjectId("4e2e3f92268cdda473b628f6”)symbol : "DIS",Daily_high: 66.1Daily_low: 57.1Daily_volume: 100222
}
All Ticks CollectionPre-aggregated State
30
Tick Data – Why MongoDB?
• Flexible Data Model– Easy Onboarding
• Flexible Querying and Indexing– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce– Native to MongoDB
• Pre-aggregation pattern– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab
31
Process Data in Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop jobs– No need to go through HDFS
• Leverage power of Hadoop ecosystem against operational data in MongoDB
32
Tick Data – Why MongoDB?
• Flexible Data Model– Easy Onboarding
• Flexible Querying and Indexing– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce– Native to MongoDB
• Pre-aggregation pattern– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab
• High Throughput & High Scalability
33
Why MongoDB Is Fast and Scalable
Better data locality
Relational MongoDB
In-Memory Caching
Auto-Sharding
Read/write scalingRead/write scaling
34
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
35
Easy On-boarding
Easy On-boarding of all Financial Data
Problem Why MongoDB
• Financial data comes in many different shapes and sizes, and it needs to be on-boarded for research and analysis from multiple platforms like Bloombergs and Reuters
Shapes- Time Series News- Event- Sentiment
Sizes- 1MB 1x a day price data - 1GB x 1000s data matrices- 40GB 1-minute data- 30TB Tick data- Even bigger << options data
• On-boarding can takes week in a relational model with complex schema designs and ETL
•An FX Option can be a 80+ table schema
• Relational technology is a scale up architecture and did not meet performance requirement of AHL
• Dynamic schema: can on-board data of any shape or size almost instantly, without having to go through a typical “ETL” lifecyle
• Performance: Quant researchers want data rendered in <1s for up-to 20 years of historical data for back-testing trading strategies
• Replication: Team of 40 Quants researchers who rely on this system being up.
• Sharding: can scale seamlessly and accommodate data of any shape and size
36
Low latency:
-1xDay data: 4ms for 10,000 rows (vs. 2,210ms from SQL)
-OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick)
-1s for 15M rows Java
-
Parallel Access:
-Cluster with 256+ concurrent data access
-Consistent throughput – little load on the Mongo server
Efficient:
-10-15x reduction in network load
-Negligible decompression cost (lz4: 1.8Gb/s)
Easy On-boardingResults
37
38
39
James (AHL) Presentation Links
• Slides:
• http://www.slideshare.net/JamesBlackburn1/mongodb-and-python-as-a-market-data-platform
• YouTube:
• James Blackburn - Python and MongoDB as a Platform for Financial Market Data
Q&A