performance management in ‘big data’ applications

46
Performance Management in ‘Big Data’ Applications It’s still about the Application Michael Kopp, Technology Strategist [email protected] m @mikopp Edward Capriolo [email protected] @edwardcapriolo m6d.com/blog

Upload: michael-kopp

Post on 27-May-2015

1.347 views

Category:

Technology


0 download

DESCRIPTION

Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.

TRANSCRIPT

Page 1: Performance Management in ‘Big Data’ Applications

Performance Management in ‘Big Data’ ApplicationsIt’s still about the Application

Michael Kopp, Technology Strategist

[email protected]

@mikopp

blog.dynatrace.com

Edward Capriolo

[email protected]

@edwardcapriolo

m6d.com/blog

Page 2: Performance Management in ‘Big Data’ Applications

High Volume/Low Latency DBs

3

JavaWeb

BigData

Key Benefits1) Fast Read/Write2) Horizontal Scalability3) Redundancy and High

Availability

Key Challenges1) Even Distribution2) Correct Schema and Access

patterns3) Understanding Application Impact

Page 3: Performance Management in ‘Big Data’ Applications

Hive high-levelmap/reducequery JOB

123

BigData

JOB

1234...

754Key Benefits1) Massive Horizontal Batch Job2) Split big Problems into smaller

ones

Hive Server

batchtrigger

Large Parallel Batch Processing

Key Challenges1) Optimal Distribution 2) Unwieldy Configuration3) Can easily waste your

resources

Page 4: Performance Management in ‘Big Data’ Applications

What is m6d?

5

Page 5: Performance Management in ‘Big Data’ Applications

6

Impressions look like…

Page 6: Performance Management in ‘Big Data’ Applications

7

Map Reduce Performance

Page 7: Performance Management in ‘Big Data’ Applications

8

Typical MapReduce Job at m6d

Page 8: Performance Management in ‘Big Data’ Applications

9

Hadoop at m6d

• Critical piece of infrastructure

• Long Term Data Storage

– Raw logs

– Aggregations

– Reports

– Generated data (feed back loops)

• Numerous ETL (Extract Transform Load)

• Scheduled and adhoc processes

• Used directly by Tech-Team, Ad Ops, Data Science

Page 9: Performance Management in ‘Big Data’ Applications

Hadoop at m6d

• Two deployments 'production' and 'research'– ~ 500 TB - 40+ Nodes

– ~ 350 TB – 20+ Nodes

• Thousands of jobs – <5 minute jobs and 12 hour Job Flows

– Mostly Hive Jobs

– Some custom code and streaming jobs

Page 10: Performance Management in ‘Big Data’ Applications

Hadoop Design Tenants

• Linear scalability by adding more hardware

• HDFS Distributed file system

– User space file system

– Blocks are replicated across nodes

– Limited semantics

• MapReduce

– Paradigm which models using map/reduce

– Data Locality

– Split Job into Tasks by Data

– Retry in failure

Page 11: Performance Management in ‘Big Data’ Applications

12

Schema Design Challenges

• Partition data for good distribution

– By time interval (optionally a second level)

• Partition pruning with WHERE

– Clustering (aka bucketing)

• Optimized sampling and joins

– Columnar

• Column oriented • Raw Data Growth

• Data features change (more distinct X)

Page 12: Performance Management in ‘Big Data’ Applications

13

Key Performance Challenges

• Intermediate I/O

– Compression codec

– Block size

– Split-table formats

• Contentions between jobs

• Data and Map/Reduce Distribution• Data Skew

• Non Uniform Computation (long running tasks)

• ‘Cost' of new feature – is this justified?

• Tuning variables (spills, buffers, Etc, etc)

Page 13: Performance Management in ‘Big Data’ Applications

14

How to handle Performance Issues?

• Profile the Job / Query?– Who should do this?

(DBA, Dev, Ops, DevOps , NoOps, Big Data Guru)

– How should we do this?• Look at job run times day over day?

• Look at code and micro-benchmark?

• Collect Job Counters?

• Upgrade often for latest performance features?

• Investigate/purchase newer better hardware– More cores? RAM? 10G Ethernet? SSD

• Read blogs?Test Data is not like

Real Data

Page 14: Performance Management in ‘Big Data’ Applications

15

But how to optimize the job itself?

Page 15: Performance Management in ‘Big Data’ Applications

16

Understanding Map/Reduce Performance

Maximum Parallelism

Actual Mapping Parallelism

Also your own Code

Attention Data Volume!

Attention Potential Choke

Point!

Maximum Reduce

Parallelism

Actual Reduce Parallelism

Also your own Code

Millions of Executions!!!

Page 16: Performance Management in ‘Big Data’ Applications

Understanding Map/Reduce Performance

Page 17: Performance Management in ‘Big Data’ Applications

18

Map/Reduce Performance

Page 18: Performance Management in ‘Big Data’ Applications

19

Map/Reduce behind the scenesSerialize

De-Serialize and Serialize

again

Potentionally Inefficient

Too Many Files, Same Key

spread all over

Expensive Synchronous

Combine

De-Serialize and Serialize

again

Page 19: Performance Management in ‘Big Data’ Applications

20

Map/Reduce Combine and Spill Performance

1) Pre Combine in Mapping Step2) Avoid many intermediate files and combines

Page 20: Performance Management in ‘Big Data’ Applications

21

Map/Reduce “Map” Performance

Focus on Big HotspotsAvoid Brute ForceSave a lot of HardwareThen Optimize Hadoop

Page 21: Performance Management in ‘Big Data’ Applications

22

Map/Reduce to the Max!

• Ensure Data Locality

• Optimize Map/Reduce Hotspots

• Reduce Intermediate Data and “Overhead”

• Ensure optimal Data and Compute Distribution

• Tune Hadoop Environment

Page 22: Performance Management in ‘Big Data’ Applications

23

Cassandra and

Application Performance

Page 23: Performance Management in ‘Big Data’ Applications

24

1. Browsers visit Publishers and create impressions.2. Publishers sell impressions via Exchanges.3. Exchanges serve as auction houses for the impressions4. On behalf of the marketer, m6d bids the impressions via

the auction house. If m6d wins, we display our ad to the browser.

A High Level look at RTB

Page 24: Performance Management in ‘Big Data’ Applications

25

Cassandra at m6d for Real Time Bidding

• RTB limited data is provided from exchange

• System to store information on users

– Frequency Capping

– Visit History

– Segments (product service affinity)

• Low latency Requirements

– Less then 100ms

– Requires fast read/write on discrete data

Page 25: Performance Management in ‘Big Data’ Applications

26

Cassandra design

Page 26: Performance Management in ‘Big Data’ Applications

Key Cassandra Design Tennents

• Swap/paging not possible

• Mostly schema-less

• Writes do not read– Read/Write is an anti-pattern

• Optimize around put and get– Not for scan and query

• De-Normalize data– Attempt to get all data in single read*

Page 27: Performance Management in ‘Big Data’ Applications

28

Cassandra Design Challenges

• De-normailize

– Store data to optimize reads

– Composite (multi-column) keys

• Multi-column family and Multi-tenant scenarios

• Compress settings

– Disk and cache savings

– CPU and JVM costs

• Data/Compaction settings

– Size tiered vs LevelDB

• Caching, Memtable and other tuning

Page 28: Performance Management in ‘Big Data’ Applications

29

How to handle performance issues?

• Monitor standard vitals (cpu,disk) ?

• Read blogs and documentation?

• Use Cassandra JMX to track req/sec

• Use Cassandra JMX to track size of Column Families, rows and columns

• Upgrade often to get latest performance enhancements? *

What about the Application?

Page 29: Performance Management in ‘Big Data’ Applications

30

APM for Cassandra

Page 30: Performance Management in ‘Big Data’ Applications

NoSQL APM is not so different after all…

31

JavaWeb

Key APM Problems Identified1) Response Time Contribution2) data access patterns3) transaction to query

relationship (transaction flow)

Database

Page 31: Performance Management in ‘Big Data’ Applications

32

Response Time Contribution

Access PatternAccess PatternAccess Pattern

Contribution to Business Transaction Connection Pool

Page 32: Performance Management in ‘Big Data’ Applications

33

Statement Analysis

Contribution to Business Transaction

Executions per Transactions and

Total

Average and Total Execution Time

Page 33: Performance Management in ‘Big Data’ Applications

34

Where, Why, How and which Transaction…

Where and why in my Transaction

Single Statement Performance

Which Web Service

Which Business Transaction

Page 34: Performance Management in ‘Big Data’ Applications

How does this apply to NoSQL Databases?

35

Key APM Problems Identified1) Response Time Contribution2) data access patterns3) transaction to query

relationship (transaction flow)

1) Data Access Distribution2) End-to-End Monitoring3) Storage (I/O, GC) Bottlenecks4) Consistency Level

JavaWeb

Page 35: Performance Management in ‘Big Data’ Applications

37

Real End-to-End Application Performance

Third Party

Services

External

End User

Our Application

End User Response Time Contribution

Page 36: Performance Management in ‘Big Data’ Applications

38

Understanding Cassandra’s Contribution

Which statements did the Transaction Execute?Which node where they executed against?Which Consistency Level was used?

Contribution of each StatmentToo many calls? Data Access patterns

Page 37: Performance Management in ‘Big Data’ Applications

39

Understand Response Time Contribution

4 Calls~15 ms Contribution

5 Calls~50-80 ms Contribution?

Access and Data Distribution

Page 38: Performance Management in ‘Big Data’ Applications

40

Why and how was a statement executed?

45ms latency? 60ms waiting on the server?

Page 39: Performance Management in ‘Big Data’ Applications

41

Any Hotspots on the Cassandra Nodes?

Much more load on Node3?Which Transactions are

responsible

Page 40: Performance Management in ‘Big Data’ Applications

42

Specific Cassandra Health Metrics

Page 41: Performance Management in ‘Big Data’ Applications

43

General Health of Cassandra

Too much GC Suspensions?

Memory Issues?

Too many requests?

Page 42: Performance Management in ‘Big Data’ Applications

44

Conclusion

Page 43: Performance Management in ‘Big Data’ Applications

45

Extend Performance Focus on Application

JavaWeb

A Fast Database doesn’t make a fast Application

Page 44: Performance Management in ‘Big Data’ Applications

Hive high-levelmap/reducequery JOB

123

JOB

1234...

754

master node

Hive Server

batchtrigger

data/task node

data/task node

Intelligent MapReduce APM

data/task node

Simple Optimizations with big impact

Page 45: Performance Management in ‘Big Data’ Applications

47

Big Data is about solving Application Problems

APM is about Application Performance and Efficiency

Page 46: Performance Management in ‘Big Data’ Applications

THANK YOU

48

Michael Kopp, Technology Strategist

[email protected]

@mikopp

blog.dynatrace.com

Edward Capriolo

[email protected]

@edwardcapriolo

m6d.com/blog