polyglottany is not a sin

33
Polyglottany Is Not a Sin Eric Lubow @elubow [email protected] #MongoBoston

Upload: eric-lubow

Post on 04-Jul-2015

245 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Polyglottany Is Not A Sin

Polyglottany Is Not a Sin

Eric Lubow

@elubow

[email protected]

#MongoBoston

Page 2: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Overview• SimpleReach

• Definitions and Data Stores

• Evolution to Polyglottany

• Tie It Together

• Final Thoughts

• Questions

Page 3: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Socially Intelligent

Page 4: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Size• 150m events

recorded per day and growing

• 600m Pageviews per month and growing

Page 5: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Polyglot Persistence

Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand.

http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence

Page 6: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Right Tool For The Job

Page 7: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

OtherFinancial

Data

Decisions. Decisions.

Tech• Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)?

• What kind of enterprise support is available?

• What is the community like?

• Does the product roadmap pertain to my roadmap?

• Are my display requirements for realtime data?

• Do I need to aggregate data on the fly?

• Is my data structured or unstructured?

• Does my data lend itself to a specific design pattern?

• What are my query patterns?

• Is my data ingestion high volume/high velocity?

• Am I batch loading data?

• Am I write heavy or read heavy?

• Are data relationships important?

• Does my data need to be immediately available everywhere?

• Am I cloud based?

• Am I hardware based?

• Am I a cloud/iron hybrid?

• How much am I willing to spend?

• How much am I willing to spend if something goes wrong?

• How fault tolerant is the system?

• What supporting tools do I need?

• Is there support for my language?

• Is the encryption/authentication/authorization support sufficient for my needs?

• Are there monitoring architectures already built?

• Are there best practices guides already

• Will the data need to be distributed?

Data Tech

Financial Other

Page 8: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

No One Size Fits All

Page 9: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

ToolsC*

Page 10: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Free vs. Cost

Page 11: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Languages

Page 12: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Pre-Scale

Page 13: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

SimpleReach Pre-Scale

Page 14: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Scale

Page 15: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

SimpleReach

C*

Page 16: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Mongo Conference

Page 17: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

• Large data volume ingestion at high velocity

• Really fast writes to many locations (eventual consistency)

• Query by column groups within rows (slicing)

• Opscenter

• Data toolkit: more than a data storage layer

• TTLs for small group aggregation

• Wrote Helenus, Node.js driver for Cassandra

Cassandra C*

Page 18: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

• Fast atomic increments (Node.js is native JSON)

• Sharding

• Solid ORM for Rails (MongoID)

• Fast access for pub/sub of durable/persisted documents

• B-Tree Indexes

• Document based via JSON

• TTLs for ephemeral data

MongoDB

Page 19: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

• Supports hundreds of thousands transactions per second

• Great caching engine

• Supports useful variable types like sets, sorted set, lists

• Everything is guaranteed to Memory Mapped (mmap)

• Transactional and supports bulk operations

• Centralized queueing and locking system

Redis

Page 20: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

• Works with standard MySQL driver

• Column Stores for ad-hoc analytics queries in SQL

• Databases built for business intelligence

• Heavy compression of data

• Pre-aggregated data (Knowledge Grid)

Infobright

Page 21: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

• Polyglottany doesn’t only apply to data stores

• Each language has its own benefit to each data storage layer

• Each language has its own individual benefits

• JSON, APIs, Performance

Ruby, Node.js, Python

Page 22: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Choice

Page 23: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Cons• Redis - Can only utilize a single core. SerDe price.

• MySQL Column Store - DELETE/UPDATEs are VERY expensive

• Cassandra - No btree indexes

• Mongo - Indexes must fit in memory. Forced Replica ping times

• Python - Whitespace. Community

• Ruby - Not high performance enough for our standards

• Javascript (Node.js) - Bad for CPU or IO intensive workloads

Page 24: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Even with the right tools, 80% of the work of building a big data system is acquiring and refining the raw data into usable data.

Tying It Together

Page 25: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Tying It Together

Page 26: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Tying It Together• Service Oriented Architecture (Internal API)• Data accuracy checks: visual and programmatic• Built framework for testing out storage engines• Access to many toolsets (for all languages and

DBs)

Page 27: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Service Architecture

Internal APIInternal API

AnalyticsAnalytics

Real-timeReal-time

C*

C*

Page 28: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Distributed ArchitectureUS-EAST-1a

MONGO-SHARD-0001-B

MONGO-SHARD-0000-A

CASSANDRA-0001

CASSANDRA-0010

REDIS-0001A

MYSQL-0001

iAPI-0001

US-EAST-1b

MONGO-SHARD-0002-B

MONGO-SHARD-0001-A

CASSANDRA-0002

CASSANDRA-0011

REDIS-0001B

iAPI-0002

US-EAST-1e

MONGO-SHARD-0002-A

MONGO-SHARD-0000-B

CASSANDRA-0003

CASSANDRA-0012

MYSQL-0002

iAPI-0003

Page 29: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Points To Consider• Data consistency - Same in all data stores

• How important is data durability?

• Managing many servers (Chef, AWS, CSSH)

• Managing and learning many different applications and tuning for them

• Expertise

Page 30: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Expertise• What happens when you need help?

• How do you become experts?

• What happens when you need more experts?

Page 31: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

Summary• Polyglottany is not a sin

• Know your data read/write patterns

• Know the tools available to you

• Know your compromises

• Expertise

Page 32: Polyglottany Is Not A Sin

Polyglottany Is Not A Sin Eric Lubow @elubow

We’re Hiring

Page 33: Polyglottany Is Not A Sin

Questions are guaranteed in life.Answers aren’t.

Eric Lubow

@elubow

[email protected]

#MongoBoston

Thank you.