document history - mirantis web viewfor nvme or pcie disks the recommendation is one journal disk...

22
VALIDATED HARDWARE CONFIGURATION Partner Name Reference Architecture

Upload: doanh

Post on 31-Jan-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

VALIDATED HARDWARE CONFIGURATION

Partner NameReference Architecture

Page 2: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Table of ContentsDOCUMENT HISTORY.....................................................................................................................3

GLOSSARY.......................................................................................................................................4

TRADEMARKS..................................................................................................................................5

OVERVIEW........................................................................................................................................5USE CASE SUMMARY..............................................................................................................................5MIRANTIS OPENSTACK...........................................................................................................................5HARDWARE OPTIONS..............................................................................................................................6NETWORKING..........................................................................................................................................6KEY BENEFITS.........................................................................................................................................6

MIRANTIS OPENSTACK ARCHITECTURE...................................................................................6MIRANTIS OPENSTACK OVERVIEW.........................................................................................................6NODE TYPES...........................................................................................................................................7

Infrastructure Node...........................................................................................................................8Controller Node.................................................................................................................................8Compute Node..................................................................................................................................8Storage Node....................................................................................................................................8

OPTIONAL COMPONENTS......................................................................................................................10

NETWORK ARCHITECTURE........................................................................................................10NETWORK DESIGN.................................................................................................................................11CABLING................................................................................................................................................13

Rack Physical Cabling Schema....................................................................................................13Server Interface Configuration......................................................................................................14Logical Networking Schema..........................................................................................................15

RACK HARDWARE SPECIFICATION..........................................................................................15SERVERS...............................................................................................................................................16RECOMMENDED MINIMUM SERVER CONFIGURATIONS........................................................................16SWITCHES.............................................................................................................................................16

DISK CONFIGURATION.................................................................................................................16COMPUTE NODE....................................................................................................................................17CONTROLLER NODE..............................................................................................................................17STORAGE NODE....................................................................................................................................17INFRASTRUCTURE NODE.......................................................................................................................17

SCALING.........................................................................................................................................18SUGGESTED RACK CONFIGURATIONS..................................................................................................18

TESTING..........................................................................................................................................18

REFERENCES.................................................................................................................................18

Page 3: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Document History

Version Revision Date Description

0.1 DD-MM-YYYY Initial Version

Page 4: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

GlossaryTerm DescriptionBMC The baseboard management controllerBonding A virtual network interface consisting of aggregated physical linksCeph A distributed block store, object store, and file systemDVR Distributed Virtual Router – a way of configuring and running Neutron L3

agent on every Compute node in order to move load from Controller nodes. Each Compute node serves East-West routing and North-South Floating IP translation for local instances while Controller nodes serve only North-South traffic for those instances which don’t have Floating IPs assigned.

East-West traffic Traffic within a cloud. I.e. VM to VM or tenant to tenant traffic.Ephemeral storage A non-persistent disk or volume used by VMFloating IP An IP address that can be instantly moved from one VM to another. Floating

IP addresses enables dynamic IP address association with virtual machine instances.

Fuel An open source project built to deploy OpenStack. Part of OpenStack Big Tent.

Fuel plugin A special package which extends Fuel’s functionality. For more details see https://wiki.openstack.org/wiki/Fuel/Plugins

IPMI Intelligent Platform Management InterfaceLACP Link Aggregation Control Protocol - an international standard of network

interfaces bonding supported by almost all vendors and operating systems. The current specification is IEEE 802.1AX-2008.

MOS Mirantis OpenStackMulti-chassis Link Aggregation

A specific configuration of network switches when two or more independent switches act as a single switch from the LACP standpoint.

NAT Network address translation - a special mechanism that allows hosts from one network to reach out to another network without having proper routing established.

North-South traffic Traffic between cloud VMs and the rest of the network (anything outside the cloud).

OS Operating systemRA Reference ArchitectureRally An open source tool to perform cloud verification, benchmarking & profiling.Shaker A wrapper around popular network performance testing tools to measure

network performance of a cloud.Tempest This is a set of integration tests to be run against a live OpenStack cloud.TLS Transport Layer Security protocol aimed to secure network connections.

Page 5: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Trademarks<Partner-specific trademark notes>. Ubuntu is the registered trademark of Canonical Ltd. in the U.S. and other countries.

DISCLAIMER: The OpenStack® Word Mark and OpenStack Logo are either registered trademarks/ service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries, and are used with the OpenStack Foundation's permission. Neither Mirantis nor <Partner> are affiliated with, endorsed or sponsored by the OpenStack Foundation or the OpenStack community.

Other trademarks and trade names may be used in this publication to refer to either the entities claiming the marks and names or their products. Mirantis, Inc., disclaims any proprietary interest in trademarks and trade names other than its own.

OverviewThis document describes the <RA name> — a fully-validated deployment of Mirantis OpenStack on <Partner-Product> servers and <Network-equipment-provider-name> switches.

The <RA name> is engineered to facilitate construction of a <RA goal description>.

This Reference Architecture details all hardware and software components of the <RA name>; describes their physical integration and logical interoperation; and discusses in depth certain critical aspects of the design relevant to performance, high availability, data-loss prevention, and scaling.

Use Case SummaryThis Reference Architecture (RA) is designed for <RA workload type>. <Additional Description Here>

Mirantis OpenStackMirantis is a number one contributor to OpenStack. Fuel - a core part of Mirantis OpenStack distribution - has been taken under Big Tent.

Mirantis recommends the following configuration of Mirantis OpenStack:● Mirantis OpenStack version 8.0 (OpenStack Liberty)● Host OS: Ubuntu 14.04● Core OpenStack services to be deployed:

○ Keystone○ Nova

Page 6: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

○ Glance○ Cinder○ Neutron

● Optional OpenStack services to be deployed:○ Horizon○ Heat○ Murano○ Ceilometer

● HA Configuration: Three OpenStack Controller nodes with: ○ HAproxy○ Corosync/Pacemaker○ MySQL Galera○ RabbitMQ○ MongoDB

● Storage: A Ceph cluster provides redundant backend for ○ Glance○ Cinder○ Nova Ephemeral○ Swift-API

Cluster size = 3 (3 x replication) is utilized to protect data● Secure public OpenStack API endpoints and Horizon with TLSv1.2.

Hardware Options<Describe Partner Hardware Stack>.

<Describe Hardware Stack and Configuration Benefits>

Networking<Describe Networking: what switches are used, their benefits, LACP/MLAG benefits>

Key Benefits

● <Identify Key Benefits of the whole solution – Bullets>

Mirantis OpenStack Architecture

Mirantis OpenStack OverviewMirantis OpenStack 8.0 is a Liberty based OpenStack distribution running on top of Ubuntu 14.04.

Page 7: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

In Mirantis OpenStack all API services are put on a dedicated nodes - Controller nodes - for better control and operability. Compute and Storage nodes serve virtualization and storage load correspondingly. To avoid possible cross service influence those components are situated on separate nodes.

All OpenStack services except Ceilometer are using local MySQL Galera Cluster as a database backend. Ceilometer uses MongoDB.

All services use RabbitMQ as a message queueing service. Murano uses a separate instance of RabbitMQ in order to separate security domains.

Ceph provides Swift API via RadosGW and also is used as a backend for Cinder, Glance, and Nova Ephemeral storage.To learn more, please, see the description of Mirantis OpenStack architecture at https://docs.mirantis.com/openstack/fuel/fuel-8.0/mos-planning-guide.html#fuel-reference-architecture-overview.

Node TypesServers used in this Reference Architecture will be deployed as one of these node types:

● Infrastructure● Controller

Page 8: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

● Compute● Storage

Infrastructure NodeThe Infrastructure node is an Ubuntu 14.04 based node which carries two virtual appliances:

● Fuel Master node - an OpenStack deployment tool.● Cloud Validation node - a set of OpenStack post-deployment validation tools including

Tempest and Rally.

Controller NodeA Controller node is a control plane component of a cloud which incorporates all core OpenStack infrastructure services: MySQL, RabbitMQ, HAproxy, OpenStack APIs (Keystone, Nova, Neutron, Glance, Cinder, Heat, Murano, Ceilometer, RadosGW for Swift API), Horizon, MongoDB. This node is not used to run virtual instances or store cloud data.

Compute NodeThe Compute node is a KVM hypervisor component of a cloud that runs virtual instances. It has Nova-compute and Neutron vSwitch and vRouter entities running.

Storage NodeCeph is used as a storage back end. A storage node is a component of an OpenStack environment that stores and replicates all user data stored in a cloud. High AvailabilityThe High Availability model implemented in Mirantis OpenStack (MOS) is well described in the official documentation here: https://docs.mirantis.com/openstack/fuel/fuel-8.0/mos-planning-guide.html#fuel-ref-arch-ha-arch

Here is the short description of how Mirantis OpenStack HA works:

To protect core OpenStack services from node failure, all control plane components are distributed across multiple Controller nodes. A minimum of 3 nodes is required to provide high availability for the cloud control plane. It is possible to deploy an odd number of Controller nodes greater than 3 or add more nodes in pairs after the initial deployment (keeping the total number odd) to distribute load and increase the redundancy level. Pacemaker/Corosync is used to manage critical infrastructure parts when failure occurs. Because of a quorum majority voting algorithm used in Corosync, a cluster of ‘N’ Controller nodes may survive when ‘N/2-1’ nodes go down. For 3 nodes it’s 1 node down, for 5 nodes it’s 2 nodes, and so on.

Almost all services are working in active-active mode and load is balanced by HAproxy listening at a virtual IP address. Most OpenStack components support active/active mode natively, as do RabbitMQ, MongoDB, and Galera Cluster, which is used to protect MySQL. The only exception here is Cinder which is running in Active/Passive mode. That's aimed to be fixed in further OpenStack releases.

Page 9: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

From the networking perspective, high availability is maintained by redundant and bonded connections to two ToR switches. That means if one cable gets cut or even the entire switch goes down a cloud will survive with half its normal network bandwidth still available for the affected nodes.

Another low-level hardware feature that provides redundancy is RAID1 for disks on Infrastructure and Storage nodes to keep an operating system intact in case of a disk failure. Also it’s highly recommended to use two independent hot-plug power supplies for each hardware unit, ideally connected to separate power circuits. This protects a cloud from a power failure.

Ceph provides redundancy for stored data. The recommended replication factor is 3 which means at a given time there are 3 replicas of each object are distributed among Ceph nodes. It is recommended to store not more than 10% of actual data (comparing to raw storage) on each Storage node to avoid huge load to a network and Ceph cluster in case of Storage node failure.

Page 10: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Optional ComponentsThere are two possible ways to use MongoDB with Ceilometer - install MongoDB locally or use an external MongoDB database. Both options are available in Mirantis OpenStack. In the case of installing MongoDB locally we recommend using at least 3 separate servers to form a MongoDB cluster. Placing MongoDB onto the Controller node may cause resource starvation for key services such as MySQL or RabbitMQ which may lead to severe issues with the cloud as a whole. The best approach is to use dedicated nodes for MongoDB - either external nodes or nodes deployed as part of MOS.

By default, Mirantis OpenStack protects public OpenStack APIs and Horizon with TLSv1.2 with self-signed or user-provided certificates, but it’s possible to disable this feature if needed.

Mirantis OpenStack may be extended via Fuel plugins to provide two validated options to monitor your cloud - Zabbix or LMA toolchain (Elasticsearch/Kibana, InfluxDB/Grafana and Nagios).

Network ArchitectureThe underlying network comprises the OpenStack control plane, data plane, and BMC Management. A pair of ToR 10G switches in each rack forms a MC-LAG group and serves as OpenStack control and data plane traffic and uplinks. MC-LAG allows servers’ bonded NICs and rack uplinks to be connected to both switches (opposite to traditional LACP), thus every connection is secured against physical link or the entire switch failure.

A single 1G switch serves BMC Management network 1Gbps connections that include IPMI, switch management traffic, and OpenStack Provisioning traffic.

Ceph IO and monitoring traffic is transported on the OpenStack Management network and is protected against single path network failure with LACP.

It is recommended to transport Ceph replication traffic on a separate physical network and protect it with LACP along with OpenStack Management network.

It is also recommended to transport tenant's traffic over a separate physical network and protect it with LACP.

Network designThe networking model of Mirantis OpenStack requires the following networks to be set up

Page 11: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

(VLAN numbers are only given for a reference):Network VLAN IDsBMC Management networks:Mirantis OpenStack Admin/PXE (OpenStack Provisioning) network 120Out of Band IPMI network 100OpenStack control plane networks:OpenStack Management and Ceph Public network 140OpenStack Public network 160OpenStack Ceph Replication network 180OpenStack data plane network:OpenStack Private network 200-1000

To learn more about Mirantis OpenStack’s networking model see https://docs.mirantis.com/openstack/fuel/fuel-8.0/mos-planning-guide.html#plan-the-network

We recommend deploying Mirantis OpenStack with Neutron DVR enabled. In this mode, Neutron runs a standalone virtual router on every Compute node. This vRouter can route both East-West and North-South traffic for Floating IPs of local instances, dramatically decreasing load on Controller nodes (which serve only North-South NAT traffic when DVR is enabled). To learn more about DVR see https://wiki.openstack.org/wiki/Neutron/DVR

The following subnets are used by Mirantis OpenStack by default.

Network Subnet Details

Page 12: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Mirantis OpenStack PXE Network 10.20.0.0/24This network may be fully isolated inside of a cloud and does not need to be routed to a customer’s network.

OpenStack Management and Ceph Public Network

192.168.0.0/24

OpenStack Public Network Depends on the cloud’s intended use. Minimum /26 network is recommended. To calculate the required size of the Public Network see https://docs.mirantis.com/openstack/fuel/fuel-8.0/mos-planning-guide.html#calculate-float-public-ip

OpenStack Private Network This network is actually a set of VLANs where tenant traffic is going. Since these networks are isolated from the rest of data center any IP space may be used here. For validation purposes, a single tenant network is created during the deployment and given the network address 192.168.122.0/24. This network can be removed after validation is complete.

OpenStack Ceph Replication Network

192.168.1.0/24This network may be fully isolated inside of a cloud and does not need to be routed to a customer’s network

Out of Band IPMI and Switch Management Network

The network should be wide enough to accommodate all IPMI and switch management IP addresses along with a gateway for them.

Page 13: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Cabling

Rack Physical Cabling Schema

<Source of the image https://drive.google.com/file/d/0Bw6txZ1qvn9CS2xFQXhaOUFIU1k/view?usp=sharing >

Page 14: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Server Interface ConfigurationNext tables show how switch ports should be configured in regard to corresponded servers' interfaces.

Controller Node Interface 1st 1G 1st & 2nd 10G

bond0IPMI

VLANs 120 (untagged) Active/Active LACP140, 160, 200-1000 (tagged)

100 (untagged)

Compute Node Interface 1st 1G 1st & 2nd 10G

bond0IPMI

VLANs 120 (untagged) Active/Active LACP140, 160, 200-1000 (tagged)

100 (untagged)

Note: Public network (VLAN 160) should be provided to Compute nodes only if Neutron DVR is used.

Storage NodeInterface 1st 1G 1st & 3nd 10G

bond02rd & 4th 10G

bond1IPMI

VLANs 120 (untagged)

Active/Active LACP140, 200-1000 (tagged)

Active/ActiveLACP180 (tagged)

100 (untagged)

Note: Storage node has 2 network cards with 2 ports each. Each bonded interface consists of two ports from different physical network cards to provide protection against network card failure.

Page 15: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Logical Networking Schema

<Source of the image https://drive.google.com/file/d/0Bw6txZ1qvn9COWpsaW1IQ19wckU/view?usp=sharing >

Rack hardware specification

Page 16: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Servers<Servers used and brief descriptions>

Recommended Minimum Server ConfigurationsController Node

Compute Node

Storage Node Infrastructure Node

Server Model - - - -Quantity 3 - - 1Sockets - - - 1Cores per socket

- - - 6

RAM Configuration

- - - 32

Storage Configuration

2x 400GB RAID1 SSDs

2x 400GB RAID1 SATA disks

- 2x 400 GBRAID1 SATA disks

Networking Configuration

1G NIC and 2x 10G NICs

1G NIC and 2x 10G NICs

1G NIC and 2x 10G 2-ports NICs

2 1G NICs

Switches

Switch Model Purpose Quantity- Provisioning and OOB traffic -- Intracloud and external traffic -

Disk ConfigurationThe following configuration is the recommended minimum in terms of storage requirements.

Node RAID type Disks number Disk type Purpose

Controller RAID 1 1,2 SSD Operating system

RAID 1 3,4 SSD MongoDB

Compute RAID 1 1,2 SATA Operating system

Storage RAID 1 1,2 SATA Operating system

None <Put actual quantity of disks per server here>

<10k SAS or SSD>

CEPH OSDs

Page 17: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

None <1 Journal disk per 5 or 15 OSDs (see below)>

SSD CEPH Journal

Infrastructure RAID1 1,2 SSD Operating System

Compute NodeEach compute node requires at least one disk with average speed characteristics, used to deploy the OS. We recommend using at least 400GB disks in RAID1 for this purpose.

Controller nodeEach Controller node requires at least one disk with good speed characteristics for OS and OpenStack services. We recommend using 400GB SSDs in RAID1 for these purposes.

You may use a separate set of disks to host MongoDB if the database is running on Controller nodes. We recommend using 2 SSDs in RAID1 with at least 200GB each for a cloud with 30 Compute nodes. The actual MongoDB load depends on Ceilometer configuration. Please, contact your Mirantis representative to get detailed information.

Storage nodeStorage nodes require at least one disk with average speed characteristics to host the OS. We recommend using at least 200GB disks in RAID1 for this purpose.

To speed up Ceph, we recommend putting Ceph journals on SSDs. The amount and size of required SSDs is roughly calculated as follows: 40GB space on SSD for every 1-1.2TB SAS disk and one SSD may be used with up to 5 SAS disks. For NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks and so on.

<Describe recommended storage node disk calculation based on actual HW and recommendations from above >

Infrastructure nodeInfrastructure nodes require at least one disk with average speed characteristics for the OS. We recommend using at least one 400GB disk for this purpose. RAID1 is optional but preferable. Otherwise the Fuel Master node must be backed up.

<Describe number and configuration of disks on Infrastructure nodes>

Page 18: Document History - Mirantis Web viewFor NVMe or PCIe disks the recommendation is one journal disk for every 15 OSD drives. The above formula gets 2x 200GB SSDs for 10 1.2TB SAS disks

Scaling<Describe scaling test results, recommendations, and provide contact recommendation to users who wish to inquire about scaling options>.

Suggested Rack Configurations <Describe possible rack configurations. I.e. 10 computes + 4 storages, 20 computes + 8 storages, etc. How many average VMs fit into each configuration and what the power footprints are.>

TestingPost-deployment cloud validation is performed in a semi-automated way using tools such as Tempest, Rally, Shaker, and others.

The testing approach is described in a separate Test Plan document.

References

● Mirantis OpenStack 8.0 Documentation https://docs.mirantis.com/openstack/fuel/fuel-8.0/