colony for-openstack-grizzly-summit
DESCRIPTION
The presentation slides at OpenStack Grizzly summit on Oct.15th, 2012 in San DiegoTRANSCRIPT
Copyright © 2012 NTT DATA Corporation
15/Oct/2012 NTT DATA INTELLILINK
Motonobu Ichimura
Inter-cloud object storage:
Colony
2 Copyright © 2012 NTT DATA INTELLILINK Corporation
http://etherpad.openstack.org/grizzly-colony
EtherPad
3 Copyright © 2012 NTT DATA INTELLILINK Corporation
Agenda
•What is Colony ?
–Our goal
–Usecase
•How to make swift network(or region) aware
–Problems with original swift code
–Our modification
–Investigation
–Conclusion
•Future Plan
–Problems to tackle (and being tackled)
–Collaboration
4 Copyright © 2012 NTT DATA INTELLILINK Corporation
What is Colony?
5 Copyright © 2012 NTT DATA INTELLILINK Corporation
・・・
Univ. -A Cloud Univ.-B Cloud
Univ.-X Cloud
Academic
Community Cloud Education Cloud Education Cloud
Research Cloud Research Cloud
Science Information Network
Goal: academic community cloud
5
Intercloud services Intercloud services
6 Copyright © 2012 NTT DATA INTELLILINK Corporation
Intercloud object storage service
Swift for
intercloud
use
Swift
Swift
Swift
Swift for
intercloud
use
Swift for
intercloud
use
Swift for
local use
Swift for
intercloud use
Nova
Nova
Nova
Glance
Glance Glance
Colony federates cloud
object storage services,
like swift, to archive
intercloud object
storage service.
7 Copyright © 2012 NTT DATA INTELLILINK Corporation
Swift-I
Cloud-A
Swift-A
Container A1
Container A2
Container A3
Inter-cloud Container I1
Inter-cloud Container I4
Object A1-1
Object A1-2
Object A1-3
Object I4-1
Object I4-2
Object I4-3
Cloud-B
Container B1
Container B2
Container B3
Inter-cloud Container I1
Inter-cloud Container I8
Object B1-1
Object B1-2
Object B1-3
Object I1-1
Object I1-2
Object I1-3
Inter-cloud object storage service : colony
Cloud Services
Inter-cloud Container
I1 Inter-cloud Container I2
Inter-cloud Container I3
Inter-cloud Container I13
Inter-cloud Container I10
Inter-cloud Container I4
Swift-B
Geographically Geographically
Distributed Object I4-1
Object I4-2
Object I4-3
Object I1-1
Object I1-2
Object I1-3
Users’ points of view
7
8 Copyright © 2012 NTT DATA INTELLILINK Corporation
Colony archives the federation
Colony Apache
mod_wsgi mod_shib
Colony-horizon
Colony-keystone
Colony-dispatcher
Squid
Slapd
Ubuntu
Colony-Keystone
Slapd
Swift
Colony-Keystone
Slapd
Swift
Provide seamless access to
multiple swifts
Authenticate with
Shibboleth IdP
Shibboleth IdP
Cloud-A User
Swift-I Swift-A
9 Copyright © 2012 NTT DATA INTELLILINK Corporation
UseCase
We plan to use Colony as Object Storage for Clouds to Clouds migration Object Storage to delevery VM Images around Japan Object Storage to store big data.
10 Copyright © 2012 NTT DATA INTELLILINK Corporation
Developed software components in colony
•Colony-Horizon – based on diablo/stable Horizon with some enhancements
•Multi-region support – Users can choose which swift is used to store/retrieve objects
•Swift Container’s ACL ,metadata support
•Swift Object’s metadata support
•>5G segment upload support …
•Colony-Keystone – based on diablo/stable Keystone with some enhancements
•Authenticate with Shibboleth
•%{tanant_name} can be used for endpointTemplates in addition to %{tenant_id} to federate
cloud services
•Colony-Dispatcher - new
•Relay requests to multiple object services (and merge response for clients)
•Relay requests to a specific object service indicated by URI
•Choose the “nearest” swift-proxy server to relay requests
•Copy objects among different swifts
•Utilities - new
•Tools to simplfy admin tasks to federate object storage services
11 Copyright © 2012 NTT DATA INTELLILINK Corporation
Swift Swift --AA
Colony-horizon
Swift Swift --II Users can choose swift
12 Copyright © 2012 NTT DATA INTELLILINK Corporation
Shibboleth SPShibboleth SP ColonyColony--HorizonHorizon ColonyColony--HorizonHorizon
Shibboleth IdP
Colony-Keystone Colony-Keystone
Colony – keystone
1. ID/passwd 2. Attribute: ePPN, mail_addr
3. Attribute: ePPN
4. auth_token
0-1. User registration by mail_addr
0-2 . Associate ePPN to mail_addr
by initial access
Modifications to keystone
• Add ePPN field to keystone schema
•
('/token_by/eppn') and email address('/token_by/email')
•
('/users/{user_id}/eppn')
• Add ePPN field to keystone schema
• ADD rest api services to create token by ePPN
('/token_by/eppn') and email address('/token_by/email')
• Add a rest api service to register/update ePPN
('/users/{user_id}/eppn')
13 Copyright © 2012 NTT DATA INTELLILINK Corporation
Colony-dispatcher
Swift Proxy
Colony Dispatcher
Swift Proxy Swift Proxy
Swift-A (local) Swift-I (intercloud )
A:container1
A:container2
I:container1
I:container2
Swift Client
1. Swift client can send requests to Swift-A and Swift-I through Swift Dispatcher
2. Swift Dispatcher merges and sends the response from each Swift to Swift Client
Requests modified for merging responses.
•Account Info
•Container List
•X-Copy-from/to
Response merged by
Colony Dispatcher has
a prefix to indicate
which Swift is used to
store.
Response merged by
Colony Dispatcher has
a prefix to indicate
which Swift is used to
store.
1
14 Copyright © 2012 NTT DATA INTELLILINK Corporation
A:container1
A:container2
I:container1
I:container2
Swift Client
Swift Dispatcher can use cache proxy (like squid) per
swift proxy to retrieve objects from remote swifts.
Caching
1
Swift Proxy
Colony Dispatcher
Swift Proxy Swift Proxy
Swift-A (local) Swift-I (intercloud )
Cache(Proxy)
15 Copyright © 2012 NTT DATA INTELLILINK Corporation
How to swift make network aware
16 Copyright © 2012 NTT DATA INTELLILINK Corporation
Current implementation
17 Copyright © 2012 NTT DATA INTELLILINK Corporation
Problems which original swift code has
•PUT/GET performance
–Swift proxy waits all objects are put to storage servers.
–Swift proxy chooses randomly the node to retrieve object.
18 Copyright © 2012 NTT DATA INTELLILINK Corporation
Test Environments
Sapporo
Tokyo
9900MBps
900MBps(0.1msec)
CPU: AMD Opetron 6128 2000Mhz (16core) Mem: 32GB NIC: 10000baseT/Full
CPU: Intel(R) Xeon(R) CPU E7- 8870 (40core) Mem: 126GB NIC: 1000baseT/Full
x2
x2
19 Copyright © 2012 NTT DATA INTELLILINK Corporation
PUT operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
Object PUT operation is always affected by the worst case.
20 Copyright © 2012 NTT DATA INTELLILINK Corporation
Object's location
21 Copyright © 2012 NTT DATA INTELLILINK Corporation
PUT object's throughput @Tokyo (Bytes/sec)
22 Copyright © 2012 NTT DATA INTELLILINK Corporation
GET operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
1/replications
High-bandwidth, low-latency
High-bandwidth, low-latency
23 Copyright © 2012 NTT DATA INTELLILINK Corporation
Object's location
24 Copyright © 2012 NTT DATA INTELLILINK Corporation
GET object's throughput @Tokyo (Bytes/sec)
Performance degradation by network between Sapporo and Tokyo
25 Copyright © 2012 NTT DATA INTELLILINK Corporation
Our modification
26 Copyright © 2012 NTT DATA INTELLILINK Corporation
How to solve - Basic Idea
•Limitation
–Don’t modify data structure (including ring)
–Minimize customization
•Adding some rules to the ring’s data strcuture
–Zone information is treated as decimal number, so consider difference between zoneA and ZoneB represents a distance of zoneA and ZoneB
•Adding some zone hints to Swift proxy servers
•Changes the order of nodes for Proxy server.
27 Copyright © 2012 NTT DATA INTELLILINK Corporation
How to solve
[app:proxy-server]
nearby_mode = false
own_zone = 100
near_distance = 10
Tokyo
Sapporo
zone 100-102
zone 200-202
Proxy
Zone 100 Distance 10
Proxy
Zone 200 Distance 10
Proxy ,which has zone info(100) and zone distance(10), considers storage servers between zone 100-110 to be located near the proxy.
Proxy , which has zone info(200) and zone distance(10), considers storage servers between zone 200-210 to be located near the proxy.
28 Copyright © 2012 NTT DATA INTELLILINK Corporation
PUT operation
Tokyo
Proxy
Storage
A
Storage
B
Storage
C
Sapporo
Storage D
Storage
F
Storage
G
Client
Proxy initially puts objects to the nearest storage servers using zone information and zone distance. Then object replicator replicates it the proper position asyncronasly.
zone_info: 100 zone_distance: 10
29 Copyright © 2012 NTT DATA INTELLILINK Corporation
PUT operation
Tokyo
Proxy
Storage
A
Storage
B
Storage
C
Sapporo
StorageD
Storage
E
Storage
F
Client
××
××
××
Hinted hand off
This is the same situation that all storage servers located in Supporo are broken.
30 Copyright © 2012 NTT DATA INTELLILINK Corporation
GET operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
1.First, try to retrieve object from storage server near the proxy. 2.After that, try to retrieve object from storage server indicated as a primary zone
31 Copyright © 2012 NTT DATA INTELLILINK Corporation
DELETE operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
1.First, try to delete object from storage server near the proxy 2.After that, try to delete object from storage server indicated as a primary zone
32 Copyright © 2012 NTT DATA INTELLILINK Corporation
Code
def get_near_nodes(self, account, container, obj, own_zone, near_distance): """ Get the partition and nodes same as get_nodes, :param account: account name :param container: container name :param obj: object name :param own_zone: top number of zone name :param near_distance: recognize matched zone name which start from own_zone to a number add own_zone and this number. :returns: a tuple of (partition, list of node dicts) """ part, nodes = self.get_nodes(account, container, obj) def isnearby(one, other, distance): if one <= other and one + distance > other: return True return False near_nodes = [] for node in nodes: if isnearby(own_zone, node['zone'], near_distance): near_nodes.append(node) if len(near_nodes) <= self.replica_count: for node in self.get_more_nodes(part): if isnearby(own_zone, node['zone'], near_distance): near_nodes.append(node) if len(near_nodes) >= self.replica_count: break return part, near_nodes
ring.py
@@ -1044,6 +1056,14 @@ def POST(self, req): 1056 container_partition, containers, _junk, req.acl, _junk = ¥ 1057 self.container_info(self.account_name, self.container_name, 1058 account_autocreate=self.app.account_autocreate) 1059 + if self.app.nearby_mode: 1060 + partition, near_nodes = self.app.object_ring.get_near_nodes( 1061 + self.account_name, self.container_name, self.object_name, 1062 + self.app.own_zone, self.app.near_distance) 1063 + print 'before nodes: %s' % containers 1064 + containers = near_nodes + ¥ 1065 + [cont for cont in containers if cont['zone'] not in [c['zone'] for c in near_nodes]] 1066 + print 'after nodes: %s' % containers 1047 1067 if 'swift.authorize' in req.environ: 1048 1068 aresp = req.environ['swift.authorize'](req) 1049 1069 if aresp:
adding get_near_nodes() to ring.py
proxy/server.py
and then modify proxy/server.py to use get_near_nodes() for each method.
33 Copyright © 2012 NTT DATA INTELLILINK Corporation
Investigation
1K 1M 10M 100M 1G
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
PUT Average (bytes/sec) @Sapporo
Original
Patched
1K 1M 10M 100M 1G
0
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
140,000,000
160,000,000
PUT Average (bytes/sec) @Tokyo
Original
Patched
34 Copyright © 2012 NTT DATA INTELLILINK Corporation
Using Cache
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
Kyusyu
Proxy
How about the case of all objects are located to remote areas ?
35 Copyright © 2012 NTT DATA INTELLILINK Corporation
Colony-Dispatcher as a cache
Colony-Dispatcher can be a swift-proxy-proxy with cache mechanism
36 Copyright © 2012 NTT DATA INTELLILINK Corporation
Investigation – Cache effectiveness
Using Colony-Dispatcher as a cache, the performance to retrieve objects from remote area could be nice.
1K 1M 10M 100M 1G
0
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
GET average (bytes/sec) @Tokyo
Column K
Column K
Column K
Column K
1K 1M 10M 100M 1G
0
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
300,000,000
350,000,000
GET average (bytes/sec) @Sapporo
Column K
Column K
Column K
Column K
37 Copyright © 2012 NTT DATA INTELLILINK Corporation
Conclusion
•Re-ordering the nodes by regions for Proxy resolves GET/PUT performance issues
–And this feature can be implemented with minimum(<50 lines of code) customization.
•Using cache is a good idea for inter-cloud use
38 Copyright © 2012 NTT DATA INTELLILINK Corporation
Our future plan
39 Copyright © 2012 NTT DATA INTELLILINK Corporation
Problems to tackle
•Object’s location •Adding Region concepts to the ring structure might help.
–Primary nodes isolated by region
•Replication’s performance
– Key factor
• We aggressivelly used hinted-hand-off mechanism to
– Using UDT instead of TCP for replication
– Using pyinotify to I/O event driven replication
– Separation of Network for replication
– Hop by Hop replication
40 Copyright © 2012 NTT DATA INTELLILINK Corporation
Are you interested in Colony ?
•Please contact with me if you are interested in Colony project.
–We want to collaborate with people who want to use/develop swift as a inter-cloud object store.
41 Copyright © 2012 NTT DATA INTELLILINK Corporation
Are you interested in academic clouds?
•If you are interested in the way how to integrate clouds using dodai and clony
–My colleague (guan-san) will make a presentation about dodai (Cluster as a service) at 17:20 @Manchester A
–Yokoyama-san (a member of NII) might talk about how to integrate both Colony and Dodai on LT
42 Copyright © 2012 NTT DATA INTELLILINK Corporation
Thank you.
43 Copyright © 2012 NTT DATA INTELLILINK Corporation
Q&A
•Please phase your question using simple grammar if possible.