colony for-openstack-summit

44
Copyright © 2012 NTT DATA Corporation 15/Oct/2012 NTT DATA INTELLILINK Motonobu Ichimura @famao Inter-cloud object storage: Colony

Upload: motonobu-ichimura

Post on 05-Jul-2015

283 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Colony for-openstack-summit

Copyright © 2012 NTT DATA Corporation

15/Oct/2012NTT DATA INTELLILINK

Motonobu Ichimura @famao

Inter-cloud object storage: Colony

Page 2: Colony for-openstack-summit

2Copyright © 2012 NTT DATA INTELLILINK Corporation

http://etherpad.openstack.org/grizzly-colony

EtherPad

Page 3: Colony for-openstack-summit

3Copyright © 2012 NTT DATA INTELLILINK Corporation

Agenda

• What is Colony ?– Our goal– Usecase

• How to make swift network(or region) aware– Problems with original swift code– Our modification– Investigation– Conclusion

• Future Plan– Problems to tackle (and being tackled)– Collaboration

Page 4: Colony for-openstack-summit

4Copyright © 2012 NTT DATA INTELLILINK Corporation

What is Colony?

Page 5: Colony for-openstack-summit

5Copyright © 2012 NTT DATA INTELLILINK Corporation

  

・・・

Univ. -A Cloud Univ.-B Cloud

Univ.-X Cloud

Academic Community Cloud Education CloudEducation Cloud

Research CloudResearch Cloud

Science Information Network

Goal: academic community cloud

5

Intercloud servicesIntercloud services

Page 6: Colony for-openstack-summit

6Copyright © 2012 NTT DATA INTELLILINK Corporation

Intercloud object storage service

Swift for intercloud use

Swift

Swift

Swift

Swift for intercloud use

Swift for intercloud use

Swift for local use

Swift for intercloud use

Nova

Nova

Nova

Glance

GlanceGlance

Colony federates cloud object storage services, like swift, to archive intercloud objectstorage service.

Page 7: Colony for-openstack-summit

7Copyright © 2012 NTT DATA INTELLILINK Corporation

Swift-I

Cloud-A

Swift-AContainer A1Container A2Container A3

Inter-cloud Container I1Inter-cloud Container I4

Object A1-1Object A1-2Object A1-3

Object I4-1Object I4-2Object I4-3

Cloud-BContainer B1Container B2Container B3

Inter-cloud Container I1Inter-cloud Container I8

Object B1-1Object B1-2Object B1-3

Object I1-1Object I1-2Object I1-3

Inter-cloud object storage service : colony

Cloud Services

Inter-cloud Container I1Inter-cloud Container I2Inter-cloud Container I3

Inter-cloud Container I13

Inter-cloud Container I10

Inter-cloud Container I4

Swift-B

Geographically

Distributed

Geographically

Distributed Object I4-1Object I4-2Object I4-3

Object I1-1Object I1-2Object I1-3

Users’ points of view

7

Page 8: Colony for-openstack-summit

8Copyright © 2012 NTT DATA INTELLILINK Corporation

Colony archives the federation

ColonyApache

mod_wsgi mod_shib

Colony-horizon

Colony-keystoneColony-

dispatcherSquid

Slapd

Ubuntu

Colony-Keystone

Slapd

Swift

Colony-Keystone

Slapd

Swift

Provide seamless access to multiple swifts

Authenticate with Shibboleth IdP

Shibboleth IdP

Cloud-A User

Swift-I Swift-A

Page 9: Colony for-openstack-summit

9Copyright © 2012 NTT DATA INTELLILINK Corporation

UseCaseWe plan to use Colony as

Object Storage for Clouds to Clouds migrationObject Storage to delevery VM Images around Japan

Object Storage to store big data.

Page 10: Colony for-openstack-summit

10Copyright © 2012 NTT DATA INTELLILINK Corporation

Developed software components in colony• Colony-Horizon – based on diablo/stable Horizon with some enhancements

• Multi-region support – Users can choose which swift is used to store/retrieve objects

• Swift Container’s ACL ,metadata support• Swift Object’s metadata support• >5G segment upload support …

• Colony-Keystone – based on diablo/stable Keystone with some enhancements

• Authenticate with Shibboleth• %{tanant_name} can be used for endpointTemplates in addition to %

{tenant_id} to federate cloud services• Colony-Dispatcher - new

• Relay requests to multiple object services (and merge response for clients)

• Relay requests to a specific object service indicated by URI• Choose the “nearest” swift-proxy server to relay requests• Copy objects among different swifts

• Utilities - new• Tools to simplfy admin tasks to federate object storage services

Page 11: Colony for-openstack-summit

11Copyright © 2012 NTT DATA INTELLILINK Corporation

Swift -ASwift -A

Colony-horizon

Swift -ISwift -I Users can choose swift

Page 12: Colony for-openstack-summit

12Copyright © 2012 NTT DATA INTELLILINK Corporation

Shibboleth SPShibboleth SPColony-HorizonColony-HorizonColony-HorizonColony-Horizon

Shibboleth IdP

Colony-KeystoneColony-

Keystone

Colony – keystone

1. ID/passwd 2. Attribute: ePPN, mail_addr

3. Attribute: ePPN

4. auth_token

0-1. User registration by mail_addr0-2 . Associate ePPN to mail_addr by initial access

Modifications to keystone • Add ePPN field to keystone schema• ADD rest api services to create token by ePPN ('/token_by/eppn') and email address('/token_by/email') • Add a rest api service to register/update ePPN ('/users/{user_id}/eppn')

• Add ePPN field to keystone schema• ADD rest api services to create token by ePPN ('/token_by/eppn') and email address('/token_by/email') • Add a rest api service to register/update ePPN ('/users/{user_id}/eppn')

Page 13: Colony for-openstack-summit

13Copyright © 2012 NTT DATA INTELLILINK Corporation

Colony-dispatcher

Swift Proxy

Colony Dispatcher

Swift Proxy Swift Proxy

Swift-A (local) Swift-I (intercloud )

A:container1A:container2I:container1I:container2

Swift Client

1. Swift client can send requests to Swift-A and Swift-I through Swift Dispatcher2. Swift Dispatcher merges and sends the response from each Swift to Swift Client

Requests modified for merging responses.•Account Info•Container List•X-Copy-from/to

Response merged by Colony Dispatcher has a prefix to indicate which Swift is used to store.

Response merged by Colony Dispatcher has a prefix to indicate which Swift is used to store.

13

Page 14: Colony for-openstack-summit

14Copyright © 2012 NTT DATA INTELLILINK Corporation

A:container1A:container2I:container1I:container2

Swift Client

Swift Dispatcher can use cache proxy (like squid) per swift proxy to retrieve objects from remote swifts.

Caching

14

Swift Proxy

Colony Dispatcher

Swift Proxy Swift Proxy

Swift-A (local) Swift-I (intercloud )

Cache(Proxy)

Page 15: Colony for-openstack-summit

15Copyright © 2012 NTT DATA INTELLILINK Corporation

How to swift make network aware

Page 16: Colony for-openstack-summit

16Copyright © 2012 NTT DATA INTELLILINK Corporation

Current implementation

Page 17: Colony for-openstack-summit

17Copyright © 2012 NTT DATA INTELLILINK Corporation

Problems which original swift code has

• PUT/GET performance– Swift proxy waits all objects are put to storage servers.– Swift proxy chooses randomly the node to retrieve object.

Page 18: Colony for-openstack-summit

18Copyright © 2012 NTT DATA INTELLILINK Corporation

Test Environments

Sapporo

Tokyo

200-850Mbps(18msec)

9900MBps

900MBps(0.1msec)

CPU: AMD Opetron 6128 2000Mhz (16core)Mem: 32GBNIC: 10000baseT/Full

CPU: Intel(R) Xeon(R) CPU E7- 8870 (40core)Mem: 126GBNIC: 1000baseT/Full

x2

x2

Page 19: Colony for-openstack-summit

19Copyright © 2012 NTT DATA INTELLILINK Corporation

PUT operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

Object PUT operation is always affected by the worst case.

Page 20: Colony for-openstack-summit

20Copyright © 2012 NTT DATA INTELLILINK Corporation

name @Tokyo @Sapporo

1K 1 2

1M 2 1

10M 1 2

100M 2 1

1G 2 1

Object's location

Page 21: Colony for-openstack-summit

21Copyright © 2012 NTT DATA INTELLILINK Corporation

1 2 3 4 5

1K 4,857 5,596 2,384 405 7,844

1M 1,109,196 1,161,519 1,157,529 1,092,685 1,162,359

10M 2,052,541 1,935,695 2,066,010 2,065,412 2,068,340

100M 9,425,346 9,411,894 9,441,722 9,427,770 9,432,213

1G 47,020,441 47,032,115 47,667,067 47,083,438 47,852,594

PUT object's throughput @Tokyo (Bytes/sec)

Page 22: Colony for-openstack-summit

22Copyright © 2012 NTT DATA INTELLILINK Corporation

GET operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

1/replications

High-bandwidth, low-latency

High-bandwidth, low-latency

Page 23: Colony for-openstack-summit

23Copyright © 2012 NTT DATA INTELLILINK Corporation

name @Tokyo @Sapporo

1K 1 2

1M 2 1

10M 1 2

100M 2 1

1G 2 1

1.txt (1G) 3 0

5.txt (1G) 0 3

Object's location

Page 24: Colony for-openstack-summit

24Copyright © 2012 NTT DATA INTELLILINK Corporation

1 2 3 4 5

1K 8,859 8,165 8,225 11,455 11,504

1M 1,222,259 1,172,193 1,149,629 1,148,493 49,542,924

10M 96,848,249 97,777,529 2,098,071 100,899,319 99,814,948

100M 104,857,600 9,670,414 9,672,893 9,658,095 9,657,313

1G 117,490,592 115,273,333 51,117,116 51,109,464 51,099,616

1.txt(Worst case)

51,085,780 44,245,222 50,812,419 50,923,435 51,066,880

5.txt(Best case)

117,473,740 115,216,645 115,340,248 115,288,545 114,347,285

GET object's throughput @Tokyo (Bytes/sec)

Performance degradation by network between Sapporo and Tokyo

Page 25: Colony for-openstack-summit

25Copyright © 2012 NTT DATA INTELLILINK Corporation

Our modification

Page 26: Colony for-openstack-summit

26Copyright © 2012 NTT DATA INTELLILINK Corporation

How to solve - Basic Idea

• Limitation– Don’t modify data structure (including ring)– Minimize customization

• Adding some rules to the ring’s data strcuture– Zone information is treated as decimal number, so consider

difference between zoneA and ZoneB represents a distance of zoneA and ZoneB

• Adding some zone hints to Swift proxy servers• Changes the order of nodes for Proxy server.

Page 27: Colony for-openstack-summit

27Copyright © 2012 NTT DATA INTELLILINK Corporation

How to solve

[app:proxy-server]

nearby_mode = false

own_zone = 100

near_distance = 10

Tokyo

Sapporo

zone 100-102

zone 200-202

ProxyZone 100Distance

10

ProxyZone 200Distance

10

Proxy ,which has zone info(100) and zone distance(10), considersstorage servers between zone 100-110 to be located near the proxy.

Proxy , which has zone info(200) and zone distance(10), considersstorage servers between zone 200-210 to be located near the proxy.

Page 28: Colony for-openstack-summit

28Copyright © 2012 NTT DATA INTELLILINK Corporation

PUT operation

Tokyo

Proxy

StorageA

StorageB

StorageC

Sapporo

Storage D

StorageF

StorageG

Client

Proxy initially puts objects to the nearest storage servers using zone information and zone distance. Then object replicator replicates it the proper position asyncronasly.

zone_info: 100zone_distance: 10

Page 29: Colony for-openstack-summit

29Copyright © 2012 NTT DATA INTELLILINK Corporation

PUT operation

Tokyo

Proxy

StorageA

StorageB

StorageC

Sapporo

StorageD

StorageE

StorageF

Client

××

××

××

Hinted hand off

This is the same situation that all storage servers located in Supporo are broken.

Page 30: Colony for-openstack-summit

30Copyright © 2012 NTT DATA INTELLILINK Corporation

GET operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

1. First, try to retrieve object from storage server near the proxy.

2. After that, try to retrieve object from storage server indicated as a primary zone

Page 31: Colony for-openstack-summit

31Copyright © 2012 NTT DATA INTELLILINK Corporation

DELETE operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

1. First, try to delete object from storage server near the proxy

2. After that, try to delete object from storage server indicated as a primary zone

Page 32: Colony for-openstack-summit

32Copyright © 2012 NTT DATA INTELLILINK Corporation

Code

 def get_near_nodes(self, account, container, obj, own_zone, near_distance):        """ Get the partition and nodes same as get_nodes,

:param account: account name :param container: container name :param obj: object name :param own_zone: top number of zone name :param near_distance: recognize matched zone name which start from own_zone to a number add own_zone and this number. :returns: a tuple of (partition, list of node dicts) """        part, nodes = self.get_nodes(account, container, obj)

        def isnearby(one, other, distance):            if one <= other and one + distance > other:                return True            return False

        near_nodes = []        for node in nodes:            if isnearby(own_zone, node['zone'], near_distance):                near_nodes.append(node)        if len(near_nodes) <= self.replica_count:            for node in self.get_more_nodes(part):                if isnearby(own_zone, node['zone'], near_distance):                    near_nodes.append(node)                if len(near_nodes) >= self.replica_count:                    break        return part, near_nodes

ring.py@@ -1044,6 +1056,14 @@ def POST(self, req): 1056              container_partition, containers, _junk, req.acl, _junk = \ 1057                  self.container_info(self.account_name, self.container_name, 1058                      account_autocreate=self.app.account_autocreate)   1059 +            if self.app.nearby_mode: 1060 +                partition, near_nodes = self.app.object_ring.get_near_nodes(   1061 +                    self.account_name, self.container_name, self.object_name,   1062 +                    self.app.own_zone, self.app.near_distance)   1063 +                print 'before nodes: %s' % containers   1064 +                containers = near_nodes + \   1065 +                    [cont for cont in containers if cont['zone'] not in [c['zone'] for c in near_nodes]]   1066 +                print 'after nodes: %s' % containers 1047 1067              if 'swift.authorize' in req.environ: 1048 1068                  aresp = req.environ['swift.authorize'](req) 1049 1069                  if aresp:

adding get_near_nodes() to ring.py

proxy/server.py

and then modify proxy/server.py to use get_near_nodes() for each method.

Page 33: Colony for-openstack-summit

33Copyright © 2012 NTT DATA INTELLILINK Corporation

Investigation

1K 1M 10M 100M 1G0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

40,000,000

PUT Average (bytes/sec) @Sapporo

OriginalPatched

1K 1M 10M 100M 1G0

20,000,000

40,000,000

60,000,000

80,000,000

100,000,000

120,000,000

140,000,000

160,000,000

PUT Average (bytes/sec) @Tokyo

OriginalPatched

Page 34: Colony for-openstack-summit

34Copyright © 2012 NTT DATA INTELLILINK Corporation

Using Cache

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

Kyusyu

Proxy

How about the case of all objects are located to remote areas ?

Page 35: Colony for-openstack-summit

35Copyright © 2012 NTT DATA INTELLILINK Corporation

Colony-Dispatcher as a cache

Colony-Dispatcher can be a swift-proxy-proxy with cache mechanism

Page 36: Colony for-openstack-summit

36Copyright © 2012 NTT DATA INTELLILINK Corporation

Investigation – Cache effectiveness

Using Colony-Dispatcher as a cache, the performance to retrieve objects from remote area could be nice.

1K 1M 10M 100M 1G0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

GET average (bytes/sec) @Tokyo

Column KColumn KColumn KColumn K

1K 1M 10M 100M 1G0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

300,000,000

350,000,000

GET average (bytes/sec) @Sapporo

Column KColumn KColumn KColumn K

Page 37: Colony for-openstack-summit

37Copyright © 2012 NTT DATA INTELLILINK Corporation

Conclusion

• Re-ordering the nodes by regions for Proxy resolves GET/PUT performance issues

– And this feature can be implemented with minimum(<50 lines of code) customization.

• Using cache is a good idea for inter-cloud use

Page 38: Colony for-openstack-summit

38Copyright © 2012 NTT DATA INTELLILINK Corporation

Our future plan

Page 39: Colony for-openstack-summit

39Copyright © 2012 NTT DATA INTELLILINK Corporation

Problems to tackle

• Object’s location• Adding Region concepts to the ring structure might help.

– Primary nodes isolated by region

• Replication’s performance– Key factor

• We aggressivelly used hinted-hand-off mechanism to – Using UDT instead of TCP for replication– Using pyinotify to I/O event driven replication– Separation of Network for replication– Hop by Hop replication

Page 40: Colony for-openstack-summit

40Copyright © 2012 NTT DATA INTELLILINK Corporation

Are you interested in Colony ?

• Please contact with me if you are interested in Colony project.– We want to collaborate with people who want to use/develop swift as

a inter-cloud object store.

Page 41: Colony for-openstack-summit

41Copyright © 2012 NTT DATA INTELLILINK Corporation

Are you interested in academic clouds?

• If you are interested in the way how to integrate clouds using dodai and clony

– My colleague (guan-san) will make a presentation about dodai (Cluster as a service) at 17:20 @Manchester A

– Yokoyama-san (a member of NII) might talk about how to integrate both Colony and Dodai on LT

Page 42: Colony for-openstack-summit

42Copyright © 2012 NTT DATA INTELLILINK Corporation

Thank you.

Page 43: Colony for-openstack-summit

43Copyright © 2012 NTT DATA INTELLILINK Corporation

Q&A

• Please phase your question using simple grammar if possible.

Page 44: Colony for-openstack-summit

Copyright © 2011 NTT DATA Corporation

Copyright © 2012 NTT DATA INTELLILINK Corporation