red hat openstack platform 16 · connecting red hat openstack platform to service telemetry...

61
Red Hat OpenStack Platform 16.0 Service Telemetry Framework Installing and deploying Service Telemetry Framework Last Updated: 2020-10-21

Upload: others

Post on 20-Aug-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Red Hat OpenStack Platform 160

Service Telemetry Framework

Installing and deploying Service Telemetry Framework

Last Updated 2020-10-21

Red Hat OpenStack Platform 160 Service Telemetry Framework

Installing and deploying Service Telemetry Framework

OpenStack Teamrhos-docsredhatcom

Legal Notice

Copyright copy 2020 Red Hat Inc

The text of and illustrations in this document are licensed by Red Hat under a Creative CommonsAttributionndashShare Alike 30 Unported license (CC-BY-SA) An explanation of CC-BY-SA isavailable athttpcreativecommonsorglicensesby-sa30 In accordance with CC-BY-SA if you distribute this document or an adaptation of it you mustprovide the URL for the original version

Red Hat as the licensor of this document waives the right to enforce and agrees not to assertSection 4d of CC-BY-SA to the fullest extent permitted by applicable law

Red Hat Red Hat Enterprise Linux the Shadowman logo the Red Hat logo JBoss OpenShiftFedora the Infinity logo and RHCE are trademarks of Red Hat Inc registered in the United Statesand other countries

Linux reg is the registered trademark of Linus Torvalds in the United States and other countries

Java reg is a registered trademark of Oracle andor its affiliates

XFS reg is a trademark of Silicon Graphics International Corp or its subsidiaries in the United Statesandor other countries

MySQL reg is a registered trademark of MySQL AB in the United States the European Union andother countries

Nodejs reg is an official trademark of Joyent Red Hat is not formally related to or endorsed by theofficial Joyent Nodejs open source or commercial project

The OpenStack reg Word Mark and OpenStack logo are either registered trademarksservice marksor trademarksservice marks of the OpenStack Foundation in the United States and othercountries and are used with the OpenStack Foundations permission We are not affiliated withendorsed or sponsored by the OpenStack Foundation or the OpenStack community

All other trademarks are the property of their respective owners

Abstract

This guide contains information about installing the core components and deploying ServiceTelemetry Framework

Table of Contents

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF

221 Persistent volumes2211 Using ephemeral storage

222 Resource allocation223 Node tuning operator

23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP

24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK

331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation

CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT

411 Manifest override parameters412 Overriding a managed manifest

Procedure42 ALERTS

Additional resources421 Creating an alert rule in Prometheus

ProcedureAdditional resources

422 Configuring custom alertsProcedureAdditional resources

423 Creating an alert route in AlertmanagerProcedureAdditional resources

43 HIGH AVAILABILITY

446

8889999

1010111111

121213141415171717

191919

20202122

2626262828293030303131313131323434

Table of Contents

1

431 Configuring high availabilityProcedure

44 DASHBOARDS441 Setting up Grafana to host the dashboard

4411 Viewing and editing queries442 The Grafana infrastructure dashboard

4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system

45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways

Procedure4521 Example manifests

453 Creating the OpenStack environment fileProcedureAdditional resources

454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE

461 Configuring ephemeral storage

APPENDIX A COLLECTD PLUG-INS

343435353636363738383839404141

42434445454646

47

Red Hat OpenStack Platform 160 Service Telemetry Framework

2

Table of Contents

3

CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK

Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types

Metric

a numeric measurement of an application or system

Event

irregular and discrete occurrences that happen in a system

The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP

STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration

11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE

Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo

Table 11 STF components

Client Component Server(OCP)

yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus

yes

no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus

yes

no Prometheus as time-series data storage yes

no ElasticSearch as events data storage yes

yes collectd to collect infrastructure metrics and events no

yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no

Figure 11 Service Telemetry Framework architecture overview

Red Hat OpenStack Platform 160 Service Telemetry Framework

4

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 2: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Red Hat OpenStack Platform 160 Service Telemetry Framework

Installing and deploying Service Telemetry Framework

OpenStack Teamrhos-docsredhatcom

Legal Notice

Copyright copy 2020 Red Hat Inc

The text of and illustrations in this document are licensed by Red Hat under a Creative CommonsAttributionndashShare Alike 30 Unported license (CC-BY-SA) An explanation of CC-BY-SA isavailable athttpcreativecommonsorglicensesby-sa30 In accordance with CC-BY-SA if you distribute this document or an adaptation of it you mustprovide the URL for the original version

Red Hat as the licensor of this document waives the right to enforce and agrees not to assertSection 4d of CC-BY-SA to the fullest extent permitted by applicable law

Red Hat Red Hat Enterprise Linux the Shadowman logo the Red Hat logo JBoss OpenShiftFedora the Infinity logo and RHCE are trademarks of Red Hat Inc registered in the United Statesand other countries

Linux reg is the registered trademark of Linus Torvalds in the United States and other countries

Java reg is a registered trademark of Oracle andor its affiliates

XFS reg is a trademark of Silicon Graphics International Corp or its subsidiaries in the United Statesandor other countries

MySQL reg is a registered trademark of MySQL AB in the United States the European Union andother countries

Nodejs reg is an official trademark of Joyent Red Hat is not formally related to or endorsed by theofficial Joyent Nodejs open source or commercial project

The OpenStack reg Word Mark and OpenStack logo are either registered trademarksservice marksor trademarksservice marks of the OpenStack Foundation in the United States and othercountries and are used with the OpenStack Foundations permission We are not affiliated withendorsed or sponsored by the OpenStack Foundation or the OpenStack community

All other trademarks are the property of their respective owners

Abstract

This guide contains information about installing the core components and deploying ServiceTelemetry Framework

Table of Contents

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF

221 Persistent volumes2211 Using ephemeral storage

222 Resource allocation223 Node tuning operator

23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP

24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK

331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation

CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT

411 Manifest override parameters412 Overriding a managed manifest

Procedure42 ALERTS

Additional resources421 Creating an alert rule in Prometheus

ProcedureAdditional resources

422 Configuring custom alertsProcedureAdditional resources

423 Creating an alert route in AlertmanagerProcedureAdditional resources

43 HIGH AVAILABILITY

446

8889999

1010111111

121213141415171717

191919

20202122

2626262828293030303131313131323434

Table of Contents

1

431 Configuring high availabilityProcedure

44 DASHBOARDS441 Setting up Grafana to host the dashboard

4411 Viewing and editing queries442 The Grafana infrastructure dashboard

4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system

45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways

Procedure4521 Example manifests

453 Creating the OpenStack environment fileProcedureAdditional resources

454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE

461 Configuring ephemeral storage

APPENDIX A COLLECTD PLUG-INS

343435353636363738383839404141

42434445454646

47

Red Hat OpenStack Platform 160 Service Telemetry Framework

2

Table of Contents

3

CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK

Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types

Metric

a numeric measurement of an application or system

Event

irregular and discrete occurrences that happen in a system

The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP

STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration

11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE

Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo

Table 11 STF components

Client Component Server(OCP)

yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus

yes

no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus

yes

no Prometheus as time-series data storage yes

no ElasticSearch as events data storage yes

yes collectd to collect infrastructure metrics and events no

yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no

Figure 11 Service Telemetry Framework architecture overview

Red Hat OpenStack Platform 160 Service Telemetry Framework

4

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 3: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Legal Notice

Copyright copy 2020 Red Hat Inc

The text of and illustrations in this document are licensed by Red Hat under a Creative CommonsAttributionndashShare Alike 30 Unported license (CC-BY-SA) An explanation of CC-BY-SA isavailable athttpcreativecommonsorglicensesby-sa30 In accordance with CC-BY-SA if you distribute this document or an adaptation of it you mustprovide the URL for the original version

Red Hat as the licensor of this document waives the right to enforce and agrees not to assertSection 4d of CC-BY-SA to the fullest extent permitted by applicable law

Red Hat Red Hat Enterprise Linux the Shadowman logo the Red Hat logo JBoss OpenShiftFedora the Infinity logo and RHCE are trademarks of Red Hat Inc registered in the United Statesand other countries

Linux reg is the registered trademark of Linus Torvalds in the United States and other countries

Java reg is a registered trademark of Oracle andor its affiliates

XFS reg is a trademark of Silicon Graphics International Corp or its subsidiaries in the United Statesandor other countries

MySQL reg is a registered trademark of MySQL AB in the United States the European Union andother countries

Nodejs reg is an official trademark of Joyent Red Hat is not formally related to or endorsed by theofficial Joyent Nodejs open source or commercial project

The OpenStack reg Word Mark and OpenStack logo are either registered trademarksservice marksor trademarksservice marks of the OpenStack Foundation in the United States and othercountries and are used with the OpenStack Foundations permission We are not affiliated withendorsed or sponsored by the OpenStack Foundation or the OpenStack community

All other trademarks are the property of their respective owners

Abstract

This guide contains information about installing the core components and deploying ServiceTelemetry Framework

Table of Contents

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF

221 Persistent volumes2211 Using ephemeral storage

222 Resource allocation223 Node tuning operator

23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP

24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK

331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation

CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT

411 Manifest override parameters412 Overriding a managed manifest

Procedure42 ALERTS

Additional resources421 Creating an alert rule in Prometheus

ProcedureAdditional resources

422 Configuring custom alertsProcedureAdditional resources

423 Creating an alert route in AlertmanagerProcedureAdditional resources

43 HIGH AVAILABILITY

446

8889999

1010111111

121213141415171717

191919

20202122

2626262828293030303131313131323434

Table of Contents

1

431 Configuring high availabilityProcedure

44 DASHBOARDS441 Setting up Grafana to host the dashboard

4411 Viewing and editing queries442 The Grafana infrastructure dashboard

4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system

45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways

Procedure4521 Example manifests

453 Creating the OpenStack environment fileProcedureAdditional resources

454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE

461 Configuring ephemeral storage

APPENDIX A COLLECTD PLUG-INS

343435353636363738383839404141

42434445454646

47

Red Hat OpenStack Platform 160 Service Telemetry Framework

2

Table of Contents

3

CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK

Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types

Metric

a numeric measurement of an application or system

Event

irregular and discrete occurrences that happen in a system

The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP

STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration

11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE

Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo

Table 11 STF components

Client Component Server(OCP)

yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus

yes

no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus

yes

no Prometheus as time-series data storage yes

no ElasticSearch as events data storage yes

yes collectd to collect infrastructure metrics and events no

yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no

Figure 11 Service Telemetry Framework architecture overview

Red Hat OpenStack Platform 160 Service Telemetry Framework

4

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 4: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Table of Contents

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF

221 Persistent volumes2211 Using ephemeral storage

222 Resource allocation223 Node tuning operator

23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP

24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK

331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation

CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT

411 Manifest override parameters412 Overriding a managed manifest

Procedure42 ALERTS

Additional resources421 Creating an alert rule in Prometheus

ProcedureAdditional resources

422 Configuring custom alertsProcedureAdditional resources

423 Creating an alert route in AlertmanagerProcedureAdditional resources

43 HIGH AVAILABILITY

446

8889999

1010111111

121213141415171717

191919

20202122

2626262828293030303131313131323434

Table of Contents

1

431 Configuring high availabilityProcedure

44 DASHBOARDS441 Setting up Grafana to host the dashboard

4411 Viewing and editing queries442 The Grafana infrastructure dashboard

4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system

45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways

Procedure4521 Example manifests

453 Creating the OpenStack environment fileProcedureAdditional resources

454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE

461 Configuring ephemeral storage

APPENDIX A COLLECTD PLUG-INS

343435353636363738383839404141

42434445454646

47

Red Hat OpenStack Platform 160 Service Telemetry Framework

2

Table of Contents

3

CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK

Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types

Metric

a numeric measurement of an application or system

Event

irregular and discrete occurrences that happen in a system

The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP

STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration

11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE

Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo

Table 11 STF components

Client Component Server(OCP)

yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus

yes

no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus

yes

no Prometheus as time-series data storage yes

no ElasticSearch as events data storage yes

yes collectd to collect infrastructure metrics and events no

yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no

Figure 11 Service Telemetry Framework architecture overview

Red Hat OpenStack Platform 160 Service Telemetry Framework

4

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 5: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

431 Configuring high availabilityProcedure

44 DASHBOARDS441 Setting up Grafana to host the dashboard

4411 Viewing and editing queries442 The Grafana infrastructure dashboard

4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system

45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways

Procedure4521 Example manifests

453 Creating the OpenStack environment fileProcedureAdditional resources

454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE

461 Configuring ephemeral storage

APPENDIX A COLLECTD PLUG-INS

343435353636363738383839404141

42434445454646

47

Red Hat OpenStack Platform 160 Service Telemetry Framework

2

Table of Contents

3

CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK

Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types

Metric

a numeric measurement of an application or system

Event

irregular and discrete occurrences that happen in a system

The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP

STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration

11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE

Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo

Table 11 STF components

Client Component Server(OCP)

yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus

yes

no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus

yes

no Prometheus as time-series data storage yes

no ElasticSearch as events data storage yes

yes collectd to collect infrastructure metrics and events no

yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no

Figure 11 Service Telemetry Framework architecture overview

Red Hat OpenStack Platform 160 Service Telemetry Framework

4

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 6: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Table of Contents

3

CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK

Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types

Metric

a numeric measurement of an application or system

Event

irregular and discrete occurrences that happen in a system

The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP

STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration

11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE

Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo

Table 11 STF components

Client Component Server(OCP)

yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus

yes

no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus

yes

no Prometheus as time-series data storage yes

no ElasticSearch as events data storage yes

yes collectd to collect infrastructure metrics and events no

yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no

Figure 11 Service Telemetry Framework architecture overview

Red Hat OpenStack Platform 160 Service Telemetry Framework

4

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 7: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK

Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types

Metric

a numeric measurement of an application or system

Event

irregular and discrete occurrences that happen in a system

The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP

STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration

11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE

Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo

Table 11 STF components

Client Component Server(OCP)

yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus

yes

no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus

yes

no Prometheus as time-series data storage yes

no ElasticSearch as events data storage yes

yes collectd to collect infrastructure metrics and events no

yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no

Figure 11 Service Telemetry Framework architecture overview

Red Hat OpenStack Platform 160 Service Telemetry Framework

4

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 8: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Figure 11 Service Telemetry Framework architecture overview

NOTE

The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported

For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus

If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore

Server-side STF monitoring infrastructure consists of the following layers

Service Telemetry Framework 1x (STF)

Red Hat OpenShift Container Platform (OCP)

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

5

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 9: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Infrastructure platform

Figure 12 Server-side STF monitoring infrastructure

For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241

NOTE

Do not install OCP on the same infrastructure that you want to monitor

12 INSTALLATION SIZE

The size of your Red Hat OpenShift Container Platform installation depends on the following factors

The number of nodes you want to monitor

The number of metrics you want to collect

The resolution of metrics

The length of time that you want to store the data

Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for

Red Hat OpenStack Platform 160 Service Telemetry Framework

6

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 10: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation

The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice

CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

7

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager

8 Verify that the configYAML field contains the expected changes Exit from the pod

9 To clean up the environment delete the curl pod

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode

global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl alertmanager-operated9093apiv1status

statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

CHAPTER 4 ADVANCED FEATURES

33

Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

43 HIGH AVAILABILITY

High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes

NOTE

STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed

Enabling high availability has the following effects

Two AMQ Interconnect pods run instead of the default 1

Three ElasticSearch pods run instead of the default 1

Recovery time from a lost pod in either of these services reduces to approximately 2seconds

431 Configuring high availability

To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Use the oc command to edit the ServiceTelemetry object

4 Add highAvailabilityEnabled true to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry

spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

34

44 DASHBOARDS

Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo

441 Setting up Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Clone the dashboard repository

4 Deploy the Grafana operator

5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully

6 Launch a Grafana instance

7 Verify that the Grafana instance deployed

8 Create the datasource and dashboard resources

oc project service-telemetry

git clone httpsgithubcominfrawatchdashboardscd dashboards

oc create -f deploysubscriptionyaml

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded

$ oc create -f deploygrafanayaml

$ oc get pod -l app=grafana

NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m

CHAPTER 4 ADVANCED FEATURES

35

9 Verify that the resources installed correctly

10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address

11 To view the dashboard click Dashboards and Manage

Additional resources

For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

4411 Viewing and editing queries

Procedure

1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user

2 Change to the service-telemetry namespace

3 To retrieve the default username and password describe the Grafana object using oc describe

442 The Grafana infrastructure dashboard

The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard

4421 Top panels

Title Unit Description

oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml

$ oc get grafanadashboards

NAME AGErhos-dashboard 7d21h

$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m

oc get routes

oc project service-telemetry

oc describe grafana service-telemetry-grafana

Red Hat OpenStack Platform 160 Service Telemetry Framework

36

Current Global Alerts - Current alerts fired byPrometheus

Recent Global Alerts - Recently fired alerts in 5m timesteps

Status Panel - Node status up downunavailable

Uptime smhdMY Total operational time of node

CPU Cores cores Total number of cores

Memory bytes Total memory

Disk Size bytes Total storage size

Processes processes Total number of processes listedby type

Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue

4422 Networking panels

Panels that display the network interfaces of the node

Panel Unit Description

Physical Interfaces Ingress Errors errors Total errors with incoming data

Physical Interfaces Egress Errors errors Total errors with outgoing data

Physical Interfaces Ingress ErrorRates

errorss Rate of incoming data errors

Physical Interfaces egress ErrorRates

errorss Rate of outgoing data errors

Physical Interfaces PacketsIngress pps Incoming packets persecond

Physical Interfaces PacketsEgress

pps

Outgoing packets per second Physical Interfaces Data Ingress bytess

CHAPTER 4 ADVANCED FEATURES

37

Incoming data rates Physical Interfaces Data Egress bytess

Outgoing data rates Physical Interfaces Drop RateIngress

pps

Incoming packets drop rate Physical Interfaces Drop RateEgress

pps

4423 CPU panels

Panels that display CPU usage of the node

Panel Unit Description

Current CPU Usage percent Instantaneous usage at the timeof the last query

Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node

Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores

4424 Memory panels

Panels that display memory usage on the node

Panel Unit Description

Memory Used percent Amount of memory being used attime of last query

Huge Pages Used hugepages Number of hugepages beingused

Memory

4425 Diskfile system

Panels that display space used on disk

Panel Unit Description Notes

Disk Space Usage percent Total disk use at time oflast query

Red Hat OpenStack Platform 160 Service Telemetry Framework

38

Inode Usage percent Total inode use at timeof last query

Aggregate Disk SpaceUsage

bytes Total disk space usedand reserved

Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive

Disk Traffic bytess Shows rates for bothreading and writing

Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs

Operationss opss Operations done persecond

Average IO OperationTime

seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs

Panel Unit Description Notes

45 CONFIGURING MULTIPLE CLOUDS

You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)

1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo

2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo

3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo

CHAPTER 4 ADVANCED FEATURES

39

Figure 41 Two Red Hat OpenStack Platform clouds connect to STF

451 Planning AMQP address prefixes

By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data

To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers

collectdcloud1-telemetry

collectdcloud1-notify

anycastceilometercloud1-eventsample

collectdcloud2-telemetry

collectdcloud2-notify

Red Hat OpenStack Platform 160 Service Telemetry Framework

40

anycastceilometercloud2-eventsample

collectdus-east-1-telemetry

collectdus-west-3-telemetry

452 Deploying Smart Gateways

You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud

When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

oc project service-telemetry

3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output

$ oc get smartgateways

NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d

4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways

truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone

5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[

6 Deploy your new Smart Gateways

CHAPTER 4 ADVANCED FEATURES

41

oc apply -f tmpcloud1-smartgatewaysyaml

7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways

oc get po -l app=smart-gateway

4521 Example manifests

IMPORTANT

The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment

Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo

NOTE

Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP

apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-

Red Hat OpenStack Platform 160 Service Telemetry Framework

42

1

2

3

4

5

6

Name for Ceilometer notifications for cloud1

AMQP Address for Ceilometer notifications for cloud1

Name for collectd telemetry for cloud1

AMQP Address for collectd telemetry for cloud1

Name for collectd notifications for cloud1

AMQP Address for collectd notifications for cloud1

453 Creating the OpenStack environment file

To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo

notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true

CHAPTER 4 ADVANCED FEATURES

43

WARNING

Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node

Procedure

1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment

resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml

parameter_defaults EnableSTF true

EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1

CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5

CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true

MetricsQdrAddresses

Red Hat OpenStack Platform 160 Service Telemetry Framework

44

+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute

1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event

2 Save the file in a directory for custom environment files for example homestackcustom_templates

3 Source the authentication file

4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment

Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo

454 Querying metrics data from multiple clouds

Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud

To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway

- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast

MetricsQdrSSLProfiles - name sslProfile

MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile

[stackundercloud-0 ~]$ source stackrc

(undercloud) [stackundercloud-0 ~]$

(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml

CHAPTER 4 ADVANCED FEATURES

45

46 EPHEMERAL STORAGE

Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started

If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests

461 Configuring ephemeral storage

To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object

4 Add the storageEphemeralEnabled true parameter to the spec section

5 Save your changes and close the object

oc project service-telemetry

$ oc edit ServiceTelemetry stf-default

spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true

Red Hat OpenStack Platform 160 Service Telemetry Framework

46

APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160

collectd-aggregation

collectdpluginaggregationaggregators

collectdpluginaggregationinterval

collectd-amqp1

collectd-apache

collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)

collectdpluginapacheinterval

collectd-apcups

collectd-battery

collectdpluginbatteryvalues_percentage

collectdpluginbatteryreport_degraded

collectdpluginbatteryquery_state_fs

collectdpluginbatteryinterval

collectd-ceph

collectdplugincephdaemons

collectdplugincephlongrunavglatency

collectdplugincephconvertspecialmetrictypes

collectd-cgroups

collectdplugincgroupsignore_selected

collectdplugincgroupsinterval

collectd-conntrack

None

collectd-contextswitch

collectdplugincontextswitchinterval

collectd-cpu

APPENDIX A COLLECTD PLUG-INS

47

collectdplugincpureportbystate

collectdplugincpureportbycpu

collectdplugincpuvaluespercentage

collectdplugincpureportnumcpu

collectdplugincpureportgueststate

collectdplugincpusubtractgueststate

collectdplugincpuinterval

collectd-cpufreq

None

collectd-cpusleep

collectd-csv

collectdplugincsvdatadir

collectdplugincsvstorerates

collectdplugincsvinterval

collectd-df

collectdplugindfdevices

collectdplugindffstypes

collectdplugindfignoreselected

collectdplugindfmountpoints

collectdplugindfreportbydevice

collectdplugindfreportinodes

collectdplugindfreportreserved

collectdplugindfvaluesabsolute

collectdplugindfvaluespercentage

collectdplugindfinterval

collectd-disk

collectdplugindiskdisks

collectdplugindiskignoreselected

collectdplugindiskudevnameattr

Red Hat OpenStack Platform 160 Service Telemetry Framework

48

collectdplugindiskinterval

collectd-entropy

collectdpluginentropyinterval

collectd-ethstat

collectdpluginethstatinterfaces

collectdpluginethstatmaps

collectdpluginethstatmappedonly

collectdpluginethstatinterval

collectd-exec

collectdpluginexeccommands

collectdpluginexeccommands_defaults

collectdpluginexecglobals

collectdpluginexecinterval

collectd-fhcount

collectdpluginfhcountvaluesabsolute

collectdpluginfhcountvaluespercentage

collectdpluginfhcountinterval

collectd-filecount

collectdpluginfilecountdirectories

collectdpluginfilecountinterval

collectd-fscache

None

collectd-hddtemp

collectdpluginhddtemphost

collectdpluginhddtempport

collectdpluginhddtempinterval

collectd-hugepages

collectdpluginhugepagesreport_per_node_hp

APPENDIX A COLLECTD PLUG-INS

49

collectdpluginhugepagesreport_root_hp

collectdpluginhugepagesvalues_pages

collectdpluginhugepagesvalues_bytes

collectdpluginhugepagesvalues_percentage

collectdpluginhugepagesinterval

collectd-intel_rdt

collectd-interface

collectdplugininterfaceinterfaces

collectdplugininterfaceignoreselected

collectdplugininterfacereportinactive

Collectdplugininterfaceinterval

collectd-ipc

None

collectd-ipmi

collectdpluginipmiignore_selected

collectdpluginipminotify_sensor_add

collectdpluginipminotify_sensor_remove

collectdpluginipminotify_sensor_not_present

collectdpluginipmisensors

collectdpluginipmiinterval

collectd-irq

collectdpluginirqirqs

collectdpluginirqignoreselected

collectdpluginirqinterval

collectd-load

collectdpluginloadreport_relative

collectdpluginloadinterval

collectd-logfile

collectdpluginlogfilelog_level

Red Hat OpenStack Platform 160 Service Telemetry Framework

50

collectdpluginlogfilelog_level

collectdpluginlogfilelog_file

collectdpluginlogfilelog_timestamp

collectdpluginlogfileprint_severity

collectdpluginlogfileinterval

collectd-madwifi

collectd-mbmon

collectd-md

collectd-memcached

collectdpluginmemcachedinstances

collectdpluginmemcachedinterval

collectd-memory

collectdpluginmemoryvaluesabsolute

collectdpluginmemoryvaluespercentage

collectdpluginmemoryinterval collectd-multimeter

collectd-multimeter

collectd-mysql

collectdpluginmysqlinterval

collectd-netlink

collectdpluginnetlinkinterfaces

collectdpluginnetlinkverboseinterfaces

collectdpluginnetlinkqdiscs

collectdpluginnetlinkclasses

collectdpluginnetlinkfilters

collectdpluginnetlinkignoreselected

collectdpluginnetlinkinterval

collectd-network

collectdpluginnetworktimetolive

collectdpluginnetworkmaxpacketsize

APPENDIX A COLLECTD PLUG-INS

51

collectdpluginnetworkforward

collectdpluginnetworkreportstats

collectdpluginnetworklisteners

collectdpluginnetworkservers

collectdpluginnetworkinterval

collectd-nfs

collectdpluginnfsinterval

collectd-ntpd

collectdpluginntpdhost

collectdpluginntpdport

collectdpluginntpdreverselookups

collectdpluginntpdincludeunitid

collectdpluginntpdinterval

collectd-numa

None

collectd-olsrd

collectd-openvpn

collectdpluginopenvpnstatusfile

collectdpluginopenvpnimprovednamingschema

collectdpluginopenvpncollectcompression

collectdpluginopenvpncollectindividualusers

collectdpluginopenvpncollectusercount

collectdpluginopenvpninterval

collectd-ovs_events

collectdpluginovs_eventsaddress

collectdpluginovs_eventsdispatch

collectdpluginovs_eventsinterfaces

collectdpluginovs_eventssend_notification

Red Hat OpenStack Platform 160 Service Telemetry Framework

52

collectdpluginovs_events$port

collectdpluginovs_eventssocket

collectd-ovs_stats

collectdpluginovs_statsaddress

collectdpluginovs_statsbridges

collectdpluginovs_statsport

collectdpluginovs_statssocket

collectd-ping

collectdpluginpinghosts

collectdpluginpingtimeout

collectdpluginpingttl

collectdpluginpingsource_address

collectdpluginpingdevice

collectdpluginpingmax_missed

collectdpluginpingsize

collectdpluginpinginterval

collectd-powerdns

collectdpluginpowerdnsinterval

collectdpluginpowerdnsservers

collectdpluginpowerdnsrecursors

collectdpluginpowerdnslocal_socket

collectdpluginpowerdnsinterval

collectd-processes

collectdpluginprocessesprocesses

collectdpluginprocessesprocess_matches

collectdpluginprocessescollect_context_switch

collectdpluginprocessescollect_file_descriptor

collectdpluginprocessescollect_memory_maps

collectdpluginpowerdnsinterval

APPENDIX A COLLECTD PLUG-INS

53

collectd-protocols

collectdpluginprotocolsignoreselected

collectdpluginprotocolsvalues

collectd-python

collectd-serial

collectd-smart

collectdpluginsmartdisks

collectdpluginsmartignoreselected

collectdpluginsmartinterval

collectd-snmp_agent

collectd-statsd

collectdpluginstatsdhost

collectdpluginstatsdport

collectdpluginstatsddeletecounters

collectdpluginstatsddeletetimers

collectdpluginstatsddeletegauges

collectdpluginstatsddeletesets

collectdpluginstatsdcountersum

collectdpluginstatsdtimerpercentile

collectdpluginstatsdtimerlower

collectdpluginstatsdtimerupper

collectdpluginstatsdtimersum

collectdpluginstatsdtimercount

collectdpluginstatsdinterval

collectd-swap

collectdpluginswapreportbydevice

collectdpluginswapreportbytes

collectdpluginswapvaluesabsolute

Red Hat OpenStack Platform 160 Service Telemetry Framework

54

collectdpluginswapvaluespercentage

collectdpluginswapreportio

collectdpluginswapinterval

collectd-syslog

collectdpluginsysloglog_level

collectdpluginsyslognotify_level

collectdpluginsysloginterval

collectd-table

collectdplugintabletables

collectdplugintableinterval

collectd-tail

collectdplugintailfiles

collectdplugintailinterval

collectd-tail_csv

collectdplugintail_csvmetrics

collectdplugintail_csvfiles

collectd-tcpconns

collectdplugintcpconnslocalports

collectdplugintcpconnsremoteports

collectdplugintcpconnslistening

collectdplugintcpconnsallportssummary

collectdplugintcpconnsinterval

collectd-ted

collectd-thermal

collectdpluginthermaldevices

collectdpluginthermalignoreselected

collectdpluginthermalinterval

collectd-threshold

APPENDIX A COLLECTD PLUG-INS

55

collectdpluginthresholdtypes

collectdpluginthresholdplugins

collectdpluginthresholdhosts

collectdpluginthresholdinterval

collectd-turbostat

collectdpluginturbostatcore_c_states

collectdpluginturbostatpackage_c_states

collectdpluginturbostatsystem_management_interrupt

collectdpluginturbostatdigital_temperature_sensor

collectdpluginturbostattcc_activation_temp

collectdpluginturbostatrunning_average_power_limit

collectdpluginturbostatlogical_core_names

collectd-unixsock

collectd-uptime

collectdpluginuptimeinterval

collectd-users

collectdpluginusersinterval

collectd-uuid

collectdpluginuuiduuid_file

collectdpluginuuidinterval

collectd-virt

collectdpluginvirtconnection

collectdpluginvirtrefresh_interval

collectdpluginvirtdomain

collectdpluginvirtblock_device

collectdpluginvirtinterface_device

collectdpluginvirtignore_selected

collectdpluginvirthostname_format

Red Hat OpenStack Platform 160 Service Telemetry Framework

56

collectdpluginvirtinterface_format

collectdpluginvirtextra_stats

collectdpluginvirtinterval

collectd-vmem

collectdpluginvmemverbose

collectdpluginvmeminterval

collectd-vserver

collectd-wireless

collectd-write_graphite

collectdpluginwrite_graphitecarbons

collectdpluginwrite_graphitecarbon_defaults

collectdpluginwrite_graphiteglobals

collectd-write_kafka

collectdpluginwrite_kafkakafka_host

collectdpluginwrite_kafkakafka_port

collectdpluginwrite_kafkakafka_hosts

collectdpluginwrite_kafkatopics

collectd-write_log

collectdpluginwrite_logformat

collectd-zfs_arc

None

APPENDIX A COLLECTD PLUG-INS

57

  • Table of Contents
  • CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
    • 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
    • 12 INSTALLATION SIZE
      • CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
        • 21 THE CORE COMPONENTS OF STF
        • 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
          • 221 Persistent volumes
            • 2211 Using ephemeral storage
              • 222 Resource allocation
              • 223 Node tuning operator
                • 23 DEPLOYING STF TO THE OCP ENVIRONMENT
                  • 231 Deploying STF to the OCP environment with ElasticSearch
                  • 232 Deploying STF to the OCP environment without ElasticSearch
                  • 233 Creating a namespace
                  • 234 Creating an OperatorGroup
                  • 235 Enabling the OperatorHubio Community Catalog Source
                  • 236 Enabling Red Hat STF Operator Source
                  • 237 Subscribing to the AMQ Certificate Manager Operator
                  • 238 Subscribing to the Elastic Cloud on Kubernetes Operator
                  • 239 Subscribing to the Service Telemetry Operator
                  • 2310 Creating a ServiceTelemetry object in OCP
                    • 24 REMOVING STF FROM THE OCP ENVIRONMENT
                      • 241 Deleting the namespace
                      • 242 Removing the OperatorSource
                          • CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
                            • 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
                            • 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
                            • 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
                              • 331 Retrieving the AMQ Interconnect route address
                              • 332 Configuring the STF connection for the overcloud
                              • 333 Validating client-side installation
                                  • CHAPTER 4 ADVANCED FEATURES
                                    • 41 CUSTOMIZING THE DEPLOYMENT
                                      • 411 Manifest override parameters
                                      • 412 Overriding a managed manifest
                                        • Procedure
                                            • 42 ALERTS
                                              • Additional resources
                                              • 421 Creating an alert rule in Prometheus
                                                • Procedure
                                                • Additional resources
                                                  • 422 Configuring custom alerts
                                                    • Procedure
                                                    • Additional resources
                                                      • 423 Creating an alert route in Alertmanager
                                                        • Procedure
                                                        • Additional resources
                                                            • 43 HIGH AVAILABILITY
                                                              • 431 Configuring high availability
                                                                • Procedure
                                                                    • 44 DASHBOARDS
                                                                      • 441 Setting up Grafana to host the dashboard
                                                                        • 4411 Viewing and editing queries
                                                                          • 442 The Grafana infrastructure dashboard
                                                                            • 4421 Top panels
                                                                            • 4422 Networking panels
                                                                            • 4423 CPU panels
                                                                            • 4424 Memory panels
                                                                            • 4425 Diskfile system
                                                                                • 45 CONFIGURING MULTIPLE CLOUDS
                                                                                  • 451 Planning AMQP address prefixes
                                                                                  • 452 Deploying Smart Gateways
                                                                                    • Procedure
                                                                                    • 4521 Example manifests
                                                                                      • 453 Creating the OpenStack environment file
                                                                                        • Procedure
                                                                                        • Additional resources
                                                                                          • 454 Querying metrics data from multiple clouds
                                                                                            • 46 EPHEMERAL STORAGE
                                                                                              • 461 Configuring ephemeral storage
                                                                                                  • APPENDIX A COLLECTD PLUG-INS
Page 11: Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK 3.2. DEPLOYING WITH ROUTED L3 NETWORKS 3.3. CONFIGURING RED HAT OPENSTACK PLATFORM

CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK

Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment

WARNING

Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF

21 THE CORE COMPONENTS OF STF

The following STF core components are managed by Operators

Prometheus and AlertManager

ElasticSearch

Smart Gateway

AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects

Additional resources

For more information about Operators see the Understanding Operators guide

22 PREPARING YOUR OCP ENVIRONMENT FOR STF

As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage

Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo

Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo

To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo

STF uses ElasticSearch to store events which requires a larger than normal

Red Hat OpenStack Platform 160 Service Telemetry Framework

8

vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo

221 Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events

Additional resources

For more information about configuring persistent storage for OCP see Understanding persistentstorage

2211 Using ephemeral storage

WARNING

You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments

Procedure

To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest

Additional resources

For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo

222 Resource allocation

To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled

The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor

Additional resources

For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241

For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml

223 Node tuning operator

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

9

STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform

If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator

In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command

The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied

openshift-control-plane-es

openshift-node-es

When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods

Additional resources

For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml

For more information about how the profiles are applied to nodes see Custom tuning specification

23 DEPLOYING STF TO THE OCP ENVIRONMENT

You can deploy STF to the OCP environment in one of two ways

Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo

Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo

231 Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo

oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator

Red Hat OpenStack Platform 160 Service Telemetry Framework

10

4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo

7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

232 Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks

1 Section 233 ldquoCreating a namespacerdquo

2 Section 234 ldquoCreating an OperatorGrouprdquo

3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo

4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo

5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo

6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo

233 Creating a namespace

Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation

Procedure

Enter the following command

234 Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods

Procedure

Enter the following command

oc new-project service-telemetry

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

11

Additional resources

For more information see OperatorGroups

235 Enabling the OperatorHubio Community Catalog Source

Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source

Procedure

Enter the following command

236 Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource

Procedure

1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator

2 To validate the creation of your OperatorSource use the oc get operatorsources command A

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF

Red Hat OpenStack Platform 160 Service Telemetry Framework

12

2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled

3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand

237 Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators

Procedure

1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager

NOTE

The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-

$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf

NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled

$ oc get packagemanifests | grep Red Hat STF

smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

13

2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded

238 Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator

Procedure

1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator

2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command

239 Subscribing to the Service Telemetry Operator

To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment

Procedure

1 To create the Service Telemetry Operator subscription enter the oc apply -f command

$ oc get --namespace openshift-operators csv

NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF

$ oc get csv

NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded

oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1

Red Hat OpenStack Platform 160 Service Telemetry Framework

14

2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand

2310 Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo

The following core parameters are available for a ServiceTelemetry manifest

Table 21 Core parameters for a ServiceTelemetry manifest

Parameter Description Default Value

eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo

false

metricsEnabled Enable metrics support in STF true

kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF

$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

15

highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo

false

storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo

false

Parameter Description Default Value

Procedure

1 To store events in ElasticSearch set eventsEnabled to true during deployment

2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand

PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

3 View the pods and the status of each pod to determine that all workloads are operatingnominally

NOTE

If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts

oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF

oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible

$ oc get pods

NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0

Red Hat OpenStack Platform 160 Service Telemetry Framework

16

24 REMOVING STF FROM THE OCP ENVIRONMENT

Remove STF from an OCP environment if you no longer require the STF functionality

Complete the following tasks

1 Section 241 ldquoDeleting the namespacerdquo

2 Section 242 ldquoRemoving the OperatorSourcerdquo

241 Deleting the namespace

To remove the operational resources for STF from OCP delete the namespace

Procedure

1 Run the oc delete command

2 Verify that the resources have been deleted from the namespace

242 Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog

Procedure

1 Delete the OperatorSource

46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m

oc delete project service-telemetry

$ oc get allNo resources found

CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK

17

2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result

3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it

Additional resources

For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo

$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted

$ oc get packagemanifests | grep Red Hat STF

$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted

Red Hat OpenStack Platform 160 Service Telemetry Framework

18

CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION

31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK

To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo

32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES

If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide

If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files

Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles

The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files

Procedure

1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide

CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

19

2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter

Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf

This example configuration is on a CephStorage leaf role

33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK

To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud

To configure the Red Hat OpenStack Platform overcloud complete the following tasks

1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo

3 Section 333 ldquoValidating client-side installationrdquo

331 Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file

Procedure

1 Log in to your Red Hat OpenShift Container Platform (OCP) environment

2 In the service-telemetry project retrieve the AMQ Interconnect route address

ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)

ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)

CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)

Red Hat OpenStack Platform 160 Service Telemetry Framework

20

NOTE

If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address

332 Configuring the STF connection for the overcloud

To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud

Procedure

1 Log in to the Red Hat OpenStack Platform undercloud as the stack user

2 Create a configuration file called stf-connectorsyaml in the homestack directory

IMPORTANT

The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo

3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment

Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF

Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo

4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect

the stf-connectorsyaml environment file

the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment

$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch

parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

21

the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF

5 Deploy the Red Hat OpenStack Platform overcloud

333 Validating client-side installation

To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH

Procedure

1 Log in to an overcloud node for example controller-0

2 Ensure that metrics_qdr container is running on the node

3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666

4 Return a list of connections to the local AMQ Interconnect

openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml

$ sudo podman container inspect --format StateStatus metrics_qdr

running

$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf

listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections

Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-

Red Hat OpenStack Platform 160 Service Telemetry Framework

22

There are four connections

Outbound connection to STF

Inbound connection from collectd

Inbound connection from ceilometer

Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect

5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages

interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth

$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

23

6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods

7 Connect to the pod and run the qdstat --connections command to list the knownconnections

In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24

8 To view the number of messages delivered by the network use each address with the oc exec command

0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0

$ oc get pods -l application=stf-default-interconnect

NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections

2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb

Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000

Red Hat OpenStack Platform 160 Service Telemetry Framework

24

$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address

2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb

Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0

CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION

25

CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)

Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo

Alerts For more information see Section 42 ldquoAlertsrdquo

High availability For more information see Section 43 ldquoHigh availabilityrdquo

Dashboards For more information see Section 44 ldquoDashboardsrdquo

Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo

Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo

41 CUSTOMIZING THE DEPLOYMENT

The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging

WARNING

When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest

To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators

For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo

411 Manifest override parameters

This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands

Table 41 Manifest override parameters

Red Hat OpenStack Platform 160 Service Telemetry Framework

26

Override parameter Description Retrieval command

alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects

oc get alertmanager stf-default -oyaml

alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager

oc get secret alertmanager-stf-default -oyaml

elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects

oc get elasticsearch elasticsearch -oyaml

interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects

oc get interconnect stf-default-interconnect -oyaml

prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects

oc get prometheus stf-default -oyaml

servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects

oc get servicemonitor stf-default -oyaml

smartgatewayCollectdMetricsManifest

Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-telemetry -oyaml

smartgatewayCollectdEventsManifest

Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-collectd-notification -oyaml

CHAPTER 4 ADVANCED FEATURES

27

smartgatewayCeilometerEventsManifest

Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects

oc get smartgateway stf-default-ceilometer-notification -oyaml

Override parameter Description Retrieval command

412 Overriding a managed manifest

Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments

TIP

The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Load the ServiceTelemetry object into an editor

4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF

NOTE

The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line

oc project service-telemetry

oc edit servicetelemetry stf-default

$ oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z

Red Hat OpenStack Platform 160 Service Telemetry Framework

28

1

2

Manifest override parameter is defined in the spec of the ServiceTelemetry object

End of the manifest override content

5 Save and close

42 ALERTS

You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms

To create an alert complete the following tasks

1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo

generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running

CHAPTER 4 ADVANCED FEATURES

29

2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo

Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview

To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts

421 Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus

4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl

5 Run curl to access the prometheus-operated service to return the rules loaded intomemory

oc project service-telemetry

oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF

oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty

[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups

Red Hat OpenStack Platform 160 Service Telemetry Framework

30

6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod

7 Clean up the environment by deleting the curl pod

Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

422 Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo

Procedure

1 Use the oc edit command

2 Edit the PrometheusRules manifest

3 Save and close

Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules

For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd

423 Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers

[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]

[ rootcurl ]$ exit

$ oc delete pod curl

pod curl deleted

oc edit prometheusrules prometheus-alarm-rules

alertmanageryaml |- global resolve_timeout 5m route

CHAPTER 4 ADVANCED FEATURES

31

To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo

Procedure

1 Log in to Red Hat OpenShift Container Platform

2 Change to the service-telemetry namespace

3 Edit the ServiceTelemetry object for your STF deployment

4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager

NOTE

This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m

group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null

oc project service-telemetry

oc edit servicetelemetry stf-default

apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route

Red Hat OpenStack Platform 160 Service Telemetry Framework

32

5 Verify that the configuration was applied to the secret

6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl

7 Run curl against the alertmanager-operated service to retrieve the status and configY