moving to nova cells without destroying the world
Post on 14-Aug-2015
216 Views
Preview:
TRANSCRIPT
Moving toNova CellsWithout Destroyingthe WorldMike Dorman @misterdormSenior Systems Engineer, Go Daddyhttp://x.co/yvrcells
CELLS INTRODUCTION
How to scale nova?
http://docs.openstack.org/openstack-ops/content/scaling.html
CELLS INTRODUCTION
Use cells to overcome …
• Large number of nova-computes• Single message queue instance• Complicated scheduling• Multi-site behind one API
3
CELLS INTRODUCTION
Cells defined
• Hierarchy of Nova instances• Each has database, message queue, scheduler, and
compute• Message routing between cells to perform operations• Top-level API cell for nova-api and cell scheduling• Overrides the default compute API class
• Lots of caveats• This is cells v1 (v2 in Liberty)4
CELLS INTRODUCTION
5
http://comstud.com/cells.pdf
CELLS INTRODUCTION
6
http://comstud.com/cells.pdf
CELLS INTRODUCTION
More details to get started
• Nova cells configuration reference• http://docs.openstack.org/juno/config-reference/content/section_compute-cells.htm
• Openstack-dev cells disucssions• http://www.gossamer-threads.com/lists/openstack/dev/16277
• CERN’s cells architecture• http://openstack-in-production.blogspot.com/2014/03/cern-cloud-architecture-update-for.html
• Folsom cells design summit slides• http://comstud.com/FolsomCells.pdf
• Exploring OpenStack Nova Cells• http://www.dorm.org/blog/exploring-openstack-nova-cells/
• Talks by Rackspace, CERN, NeCTAR
7
PLANNING THE CONVERSION
Goals
• Get to cells before scaling fire drill• Keep nova RMQ, DB close to compute
nodes• Maintain existing instances state• Little or no downtime
8
PLANNING THE CONVERSION
Basic plan
• Existing nova becomes first compute cell
• Split RMQ cluster• Create new nova instance for API cell• Import data to API cell• Existing nova-api service until final
cutover9
ENVIRONMENT PREP
Getting ready
• New servers for the API cell services• Database for nova API cell• Migrate non-nova services to new
machines• Network ACLs• Check DNS
11
ENVIRONMENT PREP
Extra credit: Split RabbitMQ cluster
• Not strictly necessary!• To minimize downtime and maintain
state• First add new nodes• Split and contract cluster
12
heat neutron glance
nova ceilometer
ENVIRONMENT PREP
Expand RabbitMQ cluster
13
Original RMQ/App Servers(to be: compute cell)
heat neutron glance
nova ceilometer
ENVIRONMENT PREP
Expand RabbitMQ cluster
14
Original RMQ/App Servers(to be: compute cell)
New RMQ/App Servers(to be: API cell)
heat neutron glance
nova ceilometer
ENVIRONMENT PREP
Expand RabbitMQ cluster
15
Original RMQ/App Servers(to be: compute cell)
New RMQ/App Servers(to be: API cell)
heat neutron glance
nova
ceilometer
ENVIRONMENT PREP
Reconfigure non-nova services
16
Original RMQ/App Servers(to be: compute cell)
New RMQ/App Servers(to be: API cell)
heat neutron glance
nova
ceilometer
ENVIRONMENT PREP
Split brain
17
Original RMQ/App Servers(to be: compute cell)
New RMQ/App Servers(to be: API cell)
heat neutron glance
nova
ceilometer
ENVIRONMENT PREP
Remove opposite nodes
18
Compute Cell Servers(Original RMQ/App Servers)
API Cell Servers(New RMQ/App Servers)
CONFIGURE COMPUTE CELL
Set up record for parent cell
nova-manage cell create \ --name=api --cell_type=parent \ --username=api_rmq_user --password=api_rmq_pass \ --hostname=api_rmq_host --virtual_host=api_rmq_vhost
• Use the API cell RMQ servers!• Or use cells_config option and put this in json
http://docs.openstack.org/juno/config-reference/content/section_compute-cells.html#cell-config-optional-json
19
CONFIGURE COMPUTE CELL
20
http://comstud.com/cells.pdf
CONFIGURE COMPUTE CELL
Enable nova-cells in compute cell
[cells]enable = truename = cell_01cell_type = compute
• Start up nova-cells, verify connections to RMQ• Do not restart nova-api after this!
21
CONFIGURE COMPUTE CELL
Disable quotas in compute cell
• Quotas will be enforced by the API cell
[DEFAULT]quota_driver=nova.quota.NoopQuotaDriver
22
BOOTSTRAP NOVA FOR API CELL
Install & configure nova as usual
• Install packages, db sync• Use the API cell RMQ servers!• Configure cells options
[cells]enable = truename = apicell_type = api
• Don’t start services yet (need to import data)23
BOOTSTRAP NOVA FOR API CELL
Set up record for child cell
nova-manage cell create \ --name=cell_01 --cell_type=child \ --username=comp_rmq_user --password=comp_rmq_pass \ --hostname=comp_rmq_host --virtual_host=comp_rmq_vhost
• Use the compute cell RMQ servers!• Remember cells_config/json option
24
BOOTSTRAP NOVA FOR API CELL
25
http://comstud.com/cells.pdf
IMPORT NOVA DATA
Seed API cell data
• API cell needs flavor, quota, instance, etc. data• Must do this directly in SQL• Shut down nova-api to prevent changes while you do this
mysqldump nova_orig_db table_name | \mysql nova_api_cell_db
26
IMPORT NOVA DATA
Tables to import
• instance_types• instance_type_extra_specs• instance_type_projects• instances• instance_info_caches• block_device_mapping• instance_system_metadata• instance_groups• instance_group_member• instance_group_metadata• instance_group_policy
• key_pairs• quota_classes• quota_usages• quotas• snapshots• snapshot_id_mappings• virtual_interfaces• volumes• May be others you need!
27
RESTART SERVICES
Start up all nova services
API Cell• nova-cells• nova-api• nova-consoleauth *• nova-spicehtml5proxy• nova-serialproxy
28 * http://blog.mgagne.ca/nova-cells-and-console-access/
Compute Cell• nova-cells• nova-cert• nova-conductor• nova-console• nova-scheduler• nova-network• nova-compute• (Maybe nova-api)
CAVEATS
Things that just don’t work
• Neutron vif plugging notifications to novavif_plugging_is_fatal = falsevif_plugging_timeout = 5
(But this causes a race condition)
• Any notifications between cells and other servicesceilometerhttp://openstack-in-production.blogspot.com/2014/03/cern-cloud-architecture-update-for.html
30
CAVEATS
Things that just don’t work
• nova cells-list “circular reference detected” bughttps://bugs.launchpad.net/nova/+bug/1312002https://review.openstack.org/#/c/106991/2/nova/cells/state.py
• Console AuthMake sure to set cells/enable=true on all node typeshttp://blog.mgagne.ca/nova-cells-and-console-access/
31
CAVEATS
Some objects are not cell-aware
• Flavors and Server GroupsMust exist in API cell and compute cell DB (with same IDs!)https://github.com/NeCTAR-RC/nova/commit/5abc8847dc89b162b6ae678176a5cfe4989144a9
• Block Deviceshttp://blog.mgagne.ca/nova-cells-and-block-device-mapping/
• Security groups• ???
32
CAVEATS
Host aggregates and availability zonesnova-api server read cell state from DB:
https://github.com/NeCTAR-RC/nova/commit/6fe7057fb4957485d3bac06579ddc38c93458064
Add AZ support for cells:https://github.com/NeCTAR-RC/nova/commit/048bd2d6d438fb8fa9ad7d3e0d57e7d03c546f6f
Support aggregate API in cells:https://github.com/NeCTAR-RC/nova/commit/8ca8828d191bc271460eb80567717fd15ef6167c
Ability to filter cells capacity report:https://github.com/NeCTAR-RC/nova/commit/97921ef1010c5e5bca357d77682bd0ee42d6ffcc
Print cell name in cell timeout exceptions:https://github.com/NeCTAR-RC/nova/commit/60f669ba1ed5221d71138a72fb2cf3b34c07a970
Use sysmetadata to get instances AZ in API cell:https://github.com/NeCTAR-RC/nova/commit/95e4cccac623c601e074a618ea71d121a359e00f
Use sysmetadata to get instance_name in API cell:https://github.com/NeCTAR-RC/nova/commit/6bf1cf78b86bed99733e1119b891397dee15a65e
33
CAVEATS
Other issues
• nova.cells.messaging errorsnova.cells.messaging OperationalError: (OperationalError) (1048, "Column 'instance_uuid' cannot be null") 'UPDATE instance_extra SET updated_at=%s, instance_uuid=%s WHERE instance_extra.id = %s’
No clue on this, but doesn’t seem to break anything
• Database consistency between API and compute cellsCommunication interruption between cells can cause thisUse case for running nova-api in compute cells
35
CELLS V2
A better way forward for nova
• Cells is the default mode• No nova-cells service• nova-api calls directly to each cell’s DB and
message queue
https://wiki.openstack.org/wiki/Nova-Cells-v2https://etherpad.openstack.org/p/kilo-nova-cells-manifesto
36
CELLS V2
Give me Liberty or give me death!
• Experimental in Liberty• Transition from no cells v2 should be seamless• Unclear how cells v1 will migrate to v2
• Unless you really need to go to cells right now …… wait for Liberty
37
top related