a paradigm shift: the increasing dominance of memory-oriented solutions for high performance data...

Post on 27-May-2015

12.614 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

This lecture was presented at UCL on the Financial Computing course in October 2011.

TRANSCRIPT

A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access !

Ben Stopford : RBS

How fast is a HashMap lookup?

~20 ns

That’s how long it takes light to travel a room

How fast is a database lookup?

~20 ms

That’s how long it takes light to go to Australia and back

3 times

Computers really are very fast!

The problem is we’re quite good at writing software that slows them down

Desktop Virtualization

We love abstraction

There are many reasons why abstraction is a good idea… …performance just isn’t one of them

Question: is it fair to compare a Database with a HashMap?

Not really…

Key Point

On one end of the scale sits the

HashMap…

..on the other sits the database…

…but it’s a very very long scale that sits between them.

Times are changing

Database Architecture is Aging

The Traditional Architecture

Traditional

Distributed In Memory

Shared Disk In Memory Shared

Nothing

Simpler Contract

Simplifying the Contract

How big is the internet?

5 exabytes

(which is 5,000 petabytes or 5,000,000 terabytes)

How big is an average enterprise database

80% < 1TB (in 2009)

Simplifying the Contract

Databases have huge operational overheads

Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Improving Database Performance !Shared Disk Architecture

Shared Disk

Improving Database Performance !Shared Nothing Architecture

Each machine is responsible for a subset of the records. Each record exists on only one

machine. !

765, 769…

1, 2, 3… 97, 98, 99…

333, 334… 244, 245…

169, 170… Client

Improving Database Performance (3) !

In Memory Databases !(single address-space)

Databases must cache subsets of the data in memory

Cache

Not knowing what you don’t know

Data on Disk

90% in Cache

If you can fit it ALL in memory you know everything!!

The architecture of an in memory database

Memory is at least 100x faster than disk

0.000,000,000,000

μs ns ps ms

L1 Cache Ref

L2 Cache Ref

Main Memory Ref

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB Disk/Network

* L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm

Memory allows random access. Disk only works well for sequential reads

This makes them very fast!!

The proof is in the stats. TPC-H Benchmarks on a 1TB data set

So why haven’t in memory databases taken off?

Address-Spaces are relatively small and of a finite, fixed size

Durability

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM.

765, 769…

1, 2, 3… 97, 98, 99…

333, 334… 244, 245…

169, 170… Client

Distribution solves our two problems

We get massive amounts of parallel processing

But at the cost of loosing the single address space

Traditional

Distributed In Memory

Shared Disk In Memory Shared

Nothing

Simpler Contract

There are three key themes here:

Distribution

Gain scalability through a distributed architecture

Simplify the contract

Improve scalability by picking appropriate ACID properties.

No Disk

All data is held in RAM

ODC

ODC – Distributed, Shared Nothing, In Memory, Semi-Normalised, Graph DB

450 processes

Messaging (Topic Based) as a system of record (persistence)

2TB of RAM

ODC represents a balance between throughput and

latency

What is Latency?

What is Throughput

Which is best for latency?

Latency?

Traditional Database

Shared Nothing

(Distributed)

In-Memory Database

Which is best for throughput?

Latency?

Traditional Database

Shared Nothing

(Distributed)

In-Memory Database

So why do we use distributed in memory?

In Memory Plentiful hardware

Latency Throughput

This is the technology of the now. So what is the technology of the future?

Terabyte Memory Architectures

Fast Persistent Storage

New Innovations on the Horizon

These factors are remolding the hardware landscape to one where

memory both vast and durable

This is changing the way we write software

Huge servers in the commodity space are driving us towards single process architectures that utilise many cores and large address spaces

We can attain hundreds of thousands of executions per second from a single process if it is well optimised.

“All computers wait at the same speed” !

We need to optimise for our CPU architecture

0.000,000,000,000

μs ns ps ms

L1 Cache Ref

L2 Cache Ref

Main Memory Ref

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB Disk/Network

* L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm

Tools like Vtune allow us to optimise software to truly leverage

our hardware

So what does this all mean?

Further Reading

top related