massively scalable storage for the idc 3 platform...
TRANSCRIPT
Massively scalable storage for the IDC 3rd platform era Ashish Nadkarni
Storage Team
The IT industry is in the midst of a massive transformation
Copyright 2014 IDC
Source: IDC
Cloud • $3.8B and 28EB for public cloud • >2 Trillion objects stored in Amazon S31
Mobility • 227M Tablets and 1.9B phones ship WW • Roughly $50B spent on mobile chipsets
Social
• 700 million expressions/day2 • FB deploys >7PB of storage/mo for photos3
Big Data
• 1.8EB of capacity purchased for Big Data
• 1.4ZB of machine generated data4
1. http://techcrunch.com/2013/04/18/amazons-s3-now-stores-2-trillion-objects-up-from-1-trillion-last-june-regularly-peaks-at-over-1-1m-requests-per-second/
2. Sept 2013, Executive Panel at Oracle World 3. http://www.datacenterknowledge.com/archives/2013/01/18/facebook-builds-new-data-
centers-for-cold-storage/ 4. Source: IDC White Paper, "The Digital Universe in 2020: Big Data, Bigger Digital
Shadows, and Biggest Growth in the Far East" Sponsored by EMC, Dec. 2012
3
In the data center, the infrastructure Problem for the 3rd platform is changing priorities as well
APPLICATIONS DEVICES
CONTENT DATA
4
Virtualization Containerization
Analytics
Traditional storage falls short in meeting data requirements for the 3rd platform era
5
Performance Optimized
Capacity Optimized
Data Type Structured Unstructured
Record Size Kilobytes or less Megabytes to Terabytes
Data Updates Frequent Rare/never
Access Frequency Heavy Light
Metadata Fixed Variable
Scale Required Up to Terabytes Exabytes
Data Requirements Are Changing
Scalability Data Integrity Security Cost Efficiency
RAID cannot effectively scale to Petabyte levels. Disk sizes are increasing which leads to long rebuild times and increased chance of downtime
Data integrity suffers when system size is 10,000,000,000 times larger than the bit error rate of a hard drive
Data security suffers with increased copies in multiple locations
Replication at Petabyte scale and above is not cost-effective
Traditional Storage Falls Short
• UNSTRUCTURED DATA accounts for 70-80% of storage capacity growth
• Capacity-optimized storage spending growth has 16.2% CAGR
Copyright 2014 IDC
Evolving data profiles require a different kind of storage infrastructure
Ever-growing data sets require dispersed data management ! Workloads need to run right
where the data is located
! Geo-dispersed data sets require geo-dispersed workloads
! Workloads are gaining location awareness
! Social and mobile usage is generating more unstructured data
! Machines generating more semi-structured
6 Copyright 2014 IDC
Decentralized data stores
Mobile
Data
App Data
User Data
This infrastructure has to be agile, software-defined and deliver “public cloud-economics” for private clouds
Businesses need an infrastructure that: 1. Can scale on-demand 2. Is highly agile 3. Allows easy control of costs 4. Allows a shift of CapEx dollars to OpEx dollars 5. Is service-based
7 Copyright 2014 IDC
The solution: Software-defined
Infrastructure
Buyers should look at new storage architectures in the context of next-gen applications
Copyright 2014 IDC 8
Cos
t P
erformance
5-15%
5-25%
60-90%
Four key technologies disrupting Enterprise Storage
! Licensed Software
! Commercial systems TB & $
! Architectures
! HDD TB & $ ! Controller designs ! Internal and DAS
footprint
! Commercial systems TB & $
! Internal and DAS ! Open-source ! ODM/VAI
Flash
Cloud
Software Defined
Convergence ! Packaging ! Licensed software ! Mgmt. &
Provisioning ! New Architectures
(Hyperconvergence)
Most massively scalable object storage systems are a form of software-defined storage solutions
! Employ a shared nothing software-defined architecture - use COTS hardware with a local file system or database as persistent disk store
! Use a flat and distributed object-based layout – this differentiates them from scale-out file systems with object interfaces
! Along with object interfaces, many platforms deploy geo-dispersed namespaces and erasure coding or replication
10
1. Shared nothing 2. Internal Disk 3. Local file system
Software only
Physical appliances
Cloud
Virtual Appliances
Object-based layout 1. OBS interfaces 2. Geo-dispersed metadata 3. Erasure coding or
replication
Data organization Persistent data stores Storage services Delivery model
Copyright 2014 IDC
Delivery model offers agility and cost optimization
Confidential: Not for external use or attribution
Attributes of Leaders in the OBS Market Business Capabilities
! Supplier demonstrates unconditional
commitment to product/solution and more
importantly, the market
• Business, Financial, marketing and
product development
! Suppliers demonstrate clear product
capabilities and strategy for products
• Product is versatile and offers more than
80pc of listed features
! Product has consistent, established (and
increased) revenue
• Size of revenue is not simply the reason for
inclusion in leader category
! Supplier has a global presence
• Product is marketed and sold worldwide,
directly or via partners
11 Copyright 2014 IDC
Confidential: Not for external use or attribution
Attributes of Leaders in the OBS Market Product capabilities
! Data reliability, availability and durability • How available is my data? • What are my SLAs?
! Seamless (no-forklift) pay-as-you-go model) • Add capacity, or replace old hardware without
stopping the system ! Data Security:
• Regulatory audit • Compliance • Encryption
! Performance: • Ability to tier data • Ability to provide SLAs on hot data via
SSD/Flash ! Cost efficiency
• Overhead (RAID vs. Replicas vs. Erasure Coding)
! Workloads • Supported use cases across multiple
verticals
12 Copyright 2014 IDC
Confidential: Not for external use or attribution
Attributes of Leaders in the OBS Market Commitment to making the platform versatile
Workloads and Use Cases Supplier’s commitment to a diverse set of use cases, workloads deployed on-premises or off-premise through a provider ! Active Archive ! Long-term preservation Archive
! Business Analytics ! Content depots
! Hybrid cloud ! General collaboration
! Cold storage ! Dispersed machine generated data
Partner Ecosystem
Supplier’s commitment to building a comprehensive Partner Ecosystem that benefits buyers: ! A Comprehensive ecosystem of best-in-class
application providers that can deliver a cost-effective, scalable and highly available end-to-end storage solution
! Partners that provide third-party application interoperability and support
! Partners that can provide mutual and complementary support for the entire integrated storage solution
13 Copyright 2014 IDC
Essential Guidance for Buyers What do buyers look for when examining a new product and vendor?
Copyright 2014 IDC 14
4.33 4.33
3.62
3.20 3.18
2.44
.00
.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
Application or workload
requirements
Architectural and/or feature function
merits
Relationship at C-level with supplier
Reseller recommendations
Geographic location of the
storage supplier
Other
Q: In selecting an external storage system, on a scale of 1 to 5, how important are the following factors in selecting a product and vendor? 5 is very important and 1 is not important.
Focus on • Application/Workload
requirements • Architecture/Feature
deployments
Essential Guidance for Buyers Key questions to ask suppliers of object-based storage solutions
" Platform scalability • How much the system scale from a hardware, throughput (bandwidth), file-size,
and file-volume perspective? " Deployment flexibility
• Does the supplier offer deployment flexibility and agility for the OBS platform? " Data management capabilities
• Does it support advanced metadata, indexing, and analytics? " Storage efficiency:
• What data optimization technologies such as automated data tiering are supported?
" Data resiliency capabilities: • What data resiliency schemes like replication and erasure coding does the
system employ? " Workload adjacency
• How does the system handle distributed and localizable workloads like Map/Reduce and hypervisors?
" File ingest/store capabilities: • How efficiently can the system ingest and store large files, as well as a high
volume of small files?
15 Copyright 2014 IDC
Confidential: Not for external use or attribution
Essential Guidance for Buyers Object-based Storage offers an opportunity for IT organizations
Enables IT organizations to
1. Become more agile 2. Embrace a service-oriented
approach 3. Become more collaborative
with their business units/clietns 4. Reduce costs
Enables IT Managers to
1. Expand their and their team’s skillsets
2. Change buying patterns to buy when necessary
3. Shift CapEx dollars to OpEx dollars
4. Consolidate and optimize storage and compute infrastructure
16 Copyright 2014 IDC
Object-based storage solutions offer a higher ROI on a reduced TCO!
Using&So)ware&Defined&Storage&to&address&new&workloads&&Oscar&Wahlberg&Director,&Product&Management&
!18!
Nexenta:&Inventor&&&Global&Leader&in&So)wareFDefined&Storage&
# Inventor&and&global&leader&in&So)wareFDefined&Storage&(SDS)&
# Founded&in&2008,&200+&employees,&HQs&in&Santa&Clara,&CA&
# Open&source&roots&delivering&enterpriseFready&So)wareFDefined&Storage&soluTons&
# 6,000+&enterprise&and&community&customers&worldwide&with&over&1&Exabyte&of&capacity&deployed&to&date;&largest&in&SDS&market&
# 350+&partners&worldwide;&outreach&to&thousands&via&Nexenta&Alliance&Network&
# 30+&patents&and&patent&applicaTons&in&U.S.&and&Europe&
# Management&team&with&track&record&of&success&=&revenues,&acquisiTons,&IPOs&
!19!
DisrupTve&Economics&For&All&Workloads&An
y&Protocol&
Any&
Pla_
orm&
Choice&of&
Hardware&
Block&F&FC&File&F&NFS&
Object&–&S3&Block&F&iSCSI&
File&F&SMB&Object&–&Swi)&
File&–&HDFS&
…
…
!20!
Take&back&control&with&So)ware&Defined&Storage&
# So)ware&Defined&Storage&enable&choice&and&flexibility&&
# Design&a&soluTon&for&specific&use&cases&or&use&well&defined&Reference&Architectures&as&building&blocks&
# Use&your&preferred&hardware&vendors&
# Deploy&with&intelligent&so)ware&to&provide&storage&services&– Easy&to&integrate&into&exisTng&infrastructures&– Easy&to&start&small&and&grow&onFdemand&
&
!21!
# New&ApplicaTon&workloads&create&unique&storage&challenges&– Scalability&into&Petabytes&and&beyond&– Ability&grow/shrink,&and&move&dynamically&– Predictable,&Controllable&cost&–&CAPEX&and&OPEX&– At&scale&Data&integrity&and&Resilience&&
# TradiTonal&storage&falls&short&of&meeTng&these&needs&
# So)ware&Defined&Storage&fueled&by&modern&hardware&provide&a&soluTon&– Industry&standard&hardware&with&modern&x86&CPU’s&– Large&capacity&drives&F&&8F10&TB&and&growing&– The&advent&of&flash&storage&
A&new&generaTon&of&ApplicaTons&create&unique&storage&challenges&
!22!
NexentaEdge&–&Next&GeneraTon&Scale&Out&Storage&
# Block&(iSCSI&&&Cinder)&and&Object&(Swi)&&&S3)&storage&services&– ScaleFout&cluster&of&x86&servers&on&10GbE&Fabric&– Designed&to&scale&from&100TB&to&100PB&and&1000’s&of&servers&– Next&generaTon&architecture&opTmizing&both&network&and&storage&
# Key&differenTators&– Nexenta!DNA&–&end&to&end&data&integrity,&Cloud&COW,&Snaps,&Clones!– Cluster;wide!inline!deduplica@on!&!compression&at&chunk&level&– Simplified&operaTons&&&management&– Self!balancing&with&realFTme&data&placement&opTmizaTon!
# Target&use&cases&– OpenStack&Clouds&– Object&based&AcTve&Archives&– Big&Data&infrastructure&
… … …
…&
…&
…&
…&
…&
…&
…&
…&
Object&(Swi),&S3)&&&Block&(iSCSI)&
!23!
NexentaEdge&Target&Use&Cases&and&DifferenTators&
OpenStack&Cloud&&
iSCSI&Cinder&and&Swi)&Object&API&Low&latency&block&services&Inline&clusterFwide&deduplicaTon&Inline&compression&Instant&snapshots&and&clones&
AcTve&Archive&
Swi)&and&S3&Object&API&Simple&mulTFPB&scaling&&Cloudscale&availability&management&Automated&capacity&balancing&Inline&data&reducTon&
ScaleFOut&Storage&for&VMware&
Low&latency&iSCSI&services&Simple&mulTFPB&scaling&&Cloudscale&availability&management&Inline&clusterFwide&deduplicaTon&Instant&snapshots&and&clones&
Big&Data&Lake&
High&Performance&HDFS&Automated&capacity&balancing&Simple&mulTFPB&scaling&Inline&data&reducTon&Instant&snapshots&and&clones&
!24!
Fully&distributed&enterprise&class&scaleFout&storage&
# ScaleFout&technology&is&ideal&for&storing&large&volumes&(Petabytes)&of&data&
# Storing&millions&and&billions&of&small&objects&or&files&present&a&different&challenge&–&Scaling&your&metadata&storage&
# NexentaEdge&does¬&have&a¢ralized&metadata&server,&instead&metadata&is&stored&as&“just&another&object&in&the&cluster”&– Without&a¢ralized&metadata&server&there&is&no&single&
point&of&failure&– Storing&metadata&as&objects&allow&NexentaEdge&to&scaleF
out&without&the&risk&of&boplenecks&
# FlexHash&uses&dynamic&data&placement&based&on&uTlizaTon&and&rebalances&resources&automaTcally&
# Replicast&is&an&autoFconfiguring&storage&network&protocol&designed&to&reduce&overhead&and&network&bandwidth&use&
# NexentaEdge&uses&the&same&strong&cryptographic&checksum&throughout&the&system&for&data&integrity,&locality&and&deduplicaTon&
# UTlizing&a&dynamic,&decentralized&hash&table&allows&NexentaEdge&to&scale&to&virtually&unlimited&size&
Object& Object&Manifest&
2&
FlexHash&–&Dynamic&data&placement&hash&algorithm&Replicast&–&Network&opTmized&transportaTon&protocol&
Block&&&Object&GW&
Block&&&Object&GW&
Data!placemen
t!Nego@
a@on
!
!25!
NexentaEdge&Key&DifferenTators&
# Enterprise!grade!data!integrity&and&funcTonality&to&scaleFout&block&&&object&storage&– Strong&cryptographic&checksums&provide&guaranteed&data&integrity&and&selfFhealing&– Unlimited&snapshot&and&clones&down&to&singleFobject&granularity&
# Deduplica@on!&!Compression&for&ulTmate&storage&efficiency&– Saves&storage&capacity&and&improves&performance&by&reducing&network&traffic&– Ideal&for&VM&boot&image&storage&and&archive&of&data&with&duplicated&informaTon&
# ScaleFout&design&that&can&deliver&best!in!class!performance!on!small!random!I/O!– Important&for&iSCSI&and&Block&services&supporTng&VM&environments&– Typical&scaleFout&object&soluTons&have&challenges&with&small&random&I/O.&NexentaEdge&is&different&
# Low;touch!opera@onal!model&for&simplified&management&and&low&Opex&– Data&automaTcally&flows&around&hot&spots&and&failed&components&– SelfFbalancing&cluster&for&both&data&placement&and&performance&
!26!
NexentaEdge&F&MeeTng&the&demands&of&the&future&
# Designed&from&the&ground&up&to&address&the&requirements&of&a&new&generaTon&of&applicaTons&that&cannot&be&services&by&tradiTonal&storage&– Scale&on&demand&– Ability&to&address&mulTple&workloads&– Cost&efficient&–&both&CAPEX&and&OPEX&
# Block&(iSCSI&&&Cinder)&and&Object&(Swi)&&&S3)&storage&services&– ScaleFout&cluster&of&x86&servers&on&10GbE&Fabric&– Designed&to&scale&from&100TB&to&100PB&and&1000’s&of&servers&– Next&generaTon&architecture&opTmizing&both&network&and&storage&
# Key&differenTators&– Nexenta!DNA&–&end&to&end&data&integrity,&Cloud&COW,&Snaps,&Clones!– Cluster;wide!inline!deduplica@on!&!compression&at&chunk&level&– Simplified&operaTons&&&management&– Self!balancing&with&realFTme&data&placement&opTmizaTon!
… … …
…&
…&
…&
…&
…&
…&
…&
…&
Object&(Swi),&S3)&&&Block&(iSCSI)&
!27!
Q&A!