sn wf12 amd fabric server (satheesh nanniyur) oct 12

Fabric ArchitectureA Big Idea for the Big Data infrastructure

Satheesh NanniyurSenior Product Line ManagerAMD Data Center Server Solutions (formerly, SeaMicro)

Agenda

• Defining Big Data from an Infrastructure perspective• Fabric Architecture for Big Data• An overview of the Fabric Server and Fabric Storage• Illustrating Fabric Architecture Benefits for Hadoop• Conclusion

Have you come across Big Data?

Apple’s virtual smartphone assistant, Siri, uses complex machine learning techniques

Target’s “pregnancy prediction score”– NY Times: “How companies learn your

secrets” – Feb 2012

So, what really is Big Data?

Business• “Key basis of competition and growth…”

Observational• “Too big, moves too fast, or doesn’t fit the structures of your

database”

Mathematical• “Every day, we create 2.5 “million trillion” (quintillion) bytes

of data"

Systems• “Exceeds the processing capacity of conventional database”

The Infrastructural definition of Big Data

• Store “all” data not knowing its use in advance

Massive Storage

• Ask a query, and when you do, get the answer fast

Massive Compute

Big Data infrastructure is not business as usual

The IT architectural approach used in clustered environments such as a large Hadoop grid is radically different from the converged and virtualized IT environments

Massive Storage

• Petabyte scale high density storage• Flexible storage to compute ratio to

meet evolving business needs

Massive Compute

• High density scale-out compute• Power and space efficient

infrastructure

IDC White Paper, “Big Data: What It Is and Why You Should Care”

Fabric Architecture for Big DataThe holy grail of Big Data Infrastructure

Imagine a world where you could simply stack up servers, with each server:

Fraction of a rack unit

Share over 5 PB of storage

10GE network with no cabling

Flexible provisioning of

storage

A deeper look at the traditional rack-mount architecture

Nodes

ToR

Aggregation

• Compromise between Compute and Storage density

• Rigid compute to storage ratio• Oversubscribed network suitable for

north-south traffic, not heavy east-west required for Big Data

• Too many adapters (NIC, Storage Ctlr) and cabling that can fail

Cabling and Management

Fabric with 3-D Torus for Big Data Infrastructure

Big Data is a big shift from North-South traffic to East-West

Switchless Linear Scalability that avoids bottlenecks

Highly available network minimizing node loss and data reconstruction

High density scale-out architecture with low power and space

High Speed and Low Latency Interconnection

An overview of the Fabric Server

Z+

Z-X+

X-

Y+

Y-

PCIe

x86 Server

SeaMicro Fabric Node with IOVT

• 512 x86 cores with 4TB DRAM in 10RU

• Up to 5 petabytes of storage

• Flexible Storage to Compute ratio

• 10GE network per server 160GE of uplink bandwidth

http://myteams/sites/dcssmkt/Photos/SM15K_Frt_R.gif

Fabric Storage ... for Big Data?Isn’t Big Data always deployed with DAS?

Flexible Fabric Storage to Compute Ratio

Rigid Storage to Compute Ratio (Traditional Rackmount)

Storage

Com

pute

Underutilized Compute & Network

• Add storage capacity independent of compute to increase cluster efficiency

• Flexibly provision storage capacity to meet evolving customer needs

“.. the rate of change was killing us, where the data volumes were practically doubling every month. Trying to keep up with that growth was an extreme challenge to say the least.. “

Customer quote from IDC white paper - “Big Data – What It Is and Why You Should Care”

Massive capacity scale-out Fabric Storage

Traditional Rackmount

Freedom Fabric

Captive DAS with Rigid Storage to Compute Ratio

Flexible scale-out Fabric Storage up to 5PB

Intel /AMD x86 servers

• Massive scale-out capacity with commodity drives• Decoupled from Compute and Network to grow storage

independently

Hadoop and the SMAQ stack

Data Storage

Data Processing

Query

MapReduce Framework

HDFS

Pig, Hive

Built to scale linearly with massive scale-out storage (HDFS) and compute (MapReduce)

Hadoop data processing phasesFabric Architecture cost efficiently meets the Hadoop infrastructure needs

Map

Map

Map

Reduce

Reduce

Storage Intensive

Compute Intensive

Network Intensive

Compute Intensive

Storage Intensive

HDFS Input

Shuffle HDFS Output

Map and Intermediate

Data Write

Reduce

5 Petabytes of storage capacity with independent scale-out

512 x86 cores with 4TB DRAM per Fabric Server in 10RU

10 Gpbs Inter-Node Bandwidth per server

160 Gbps shared uplink for Inter-Rack traffic

Hadoop resource usage patternBased on Terasort run on SeaMicro SM15000

Map

Shuffle

Reduce

Compute

Map Shuffle

Reduce

Storage

Shuffle

Network

Deployment Challenges of Hadoop

• Plan for peak utilization

– Hadoop infrastructure utilization is bursty

• Compute, Storage, and Network mix dependent on

application workload

– Flexible ratios optimize deployment

• Power and Space Efficiency key to large scale

deployment

• Administrative cost can increase as rapidly as your data

– Simplified deployment and reduced hardware components

decrease TCO

Fabric Server for Hadoop DeploymentFabric Server offers 60% more compute and storage in the same power and space envelope

Traditional Rackmount

SeaMicro Fabric Server

Intel Xeon Cores 320 512

AMD Opteron Cores* 320 1024

Storage 720 TB 1136 TB

Storage Scalability None Up to 4PB

Network B/W per server Up to 2Gbps Up to 8Gbps

Network Downlinks 40 0

ToR Switch 2 0 (Built-in)

Aggregation (End of Row) switch/router 1 1

Based on SeaMicro SM15000 and HP DL380 Gen8 2U dual socket octal core servers in a 42U rack

Summary

Traditional architectures cannot scale to meet the needs of Big Data

Efficient Big Data deployments need flexible storage to compute ratio

Conventional wisdom of reduced hardware components still holds

Fabric Servers provide unprecedented density, bandwidth, and scalability for Big Data deployments

http://www.amd.com/seamicroFor more information, visit http://www.amd.com/seamicro or email [email protected]

mailto:[email protected]

sn wf12 amd fabric server (satheesh nanniyur) oct 12

Technology