kaas user guide - mirantis · cluster: openstack, aws, or bare metal. the deployment procedure is...
TRANSCRIPT
KaaS User Guideversion beta
ContentsCopyright notice 1Preface 2
Intended audience 2Documentation history 2
Create and manage a KaaS child cluster 3Create and manage a baremetal-based KaaS child cluster 3
Create a child cluster 3Add a bare metal host 7Add a machine 8Add a Ceph cluster 9Delete a child cluster 10
Create and manage an OpenStack-based KaaS child cluster 11Create a child cluster 11Add a machine 15Delete a child cluster 16
Create and manage an AWS-based KaaS child cluster 17Create a child cluster 17Add a machine 21Delete a child cluster 22
Change a cluster configuration 23Update a child cluster 23Delete a machine 25
Manage a KaaS management cluster 26Connect to a KaaS cluster 27Manage IAM 29
IAM CLI 29Configure IAM CLI 29Available IAM CLI commands 29
Role list 33Manage StackLight 36
Access StackLight web UIs 36
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page i
View Grafana dashboards 36View Kibana dashboards 40Available StackLight alerts 40
Alertmanager 40AlertmanagerFailedReload 41AlertmanagerMembersInconsistent 41AlertmanagerNotificationFailureWarning 41AlertmanagerAlertsInvalidWarning 41
Calico 42CalicoDataplaneFailuresHigh 42CalicoDataplaneAddressMsgBatchSizeHigh 42CalicoDatapaneIfaceMsgBatchSizeHigh 42CalicoIPsetErrorsHigh 43CalicoIptablesSaveErrorsHigh 43CalicoIptablesRestoreErrorsHigh 43
Ceph 43CephClusterHealthMinor 44CephClusterHealthCritical 44CephMonQuorumAtRisk 44CephOsdDownMinor 45CephOSDDiskNotResponding 45CephOSDDiskUnavailable 45CephClusterNearFull 46CephClusterCriticallyFull 46CephOsdPgNumTooHighWarning 46CephOsdPgNumTooHighCritical 46CephMonHighNumberOfLeaderChanges 47CephNodeDown 47CephDataRecoveryTakingTooLong 47CephPGRepairTakingTooLong 47CephOSDVersionMismatch 48CephMonVersionMismatch 48
Elasticsearch 48
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page ii
ElasticHeapUsageTooHigh 48ElasticHeapUsageWarning 49ElasticClusterRed 49ElasticClusterYellow 49NumberOfRelocationShards 50NumberOfInitializingShards 50NumberOfUnassignedShards 50NumberOfPendingTasks 50ElasticNoNewDocuments 51
etcd 51etcdInsufficientMembers 51etcdNoLeader 51etcdHighNumberOfLeaderChanges 52etcdGRPCRequestsSlow 52etcdMemberCommunicationSlow 52etcdHighNumberOfFailedProposals 52etcdHighFsyncDurations 53etcdHighCommitDurations 53
General alerts 53TargetDown 53NodeDown 54Watchdog 54
General node alerts 54SystemCpuFullWarning 55SystemLoadTooHighWarning 55SystemLoadTooHighCritical 55SystemDiskFullWarning 55SystemDiskFullMajor 56SystemMemoryFullWarning 56SystemMemoryFullMajor 56SystemDiskInodesFullWarning 56SystemDiskInodesFullMajor 57SystemDiskErrorsTooHigh 57
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page iii
Ironic 57IronicMetricsMissing 57IronicApiOutage 58
Kubernetes applications 58KubePodCrashLooping 58KubePodNotReady 59KubeDeploymentGenerationMismatch 59KubeDeploymentReplicasMismatch 59KubeStatefulSetReplicasMismatch 59KubeStatefulSetGenerationMismatch 60KubeStatefulSetUpdateNotRolledOut 60KubeDaemonSetRolloutStuck 60KubeDaemonSetNotScheduled 60KubeDaemonSetMisScheduled 61KubeCronJobRunning 61KubeJobCompletion 61KubeJobFailed 62
Kubernetes resources 62KubeCPUOvercommitPods 62KubeMemOvercommitPods 62KubeCPUOvercommitNamespaces 63KubeMemOvercommitNamespaces 63KubeQuotaExceeded 63CPUThrottlingHigh 63
Kubernetes storage 64KubePersistentVolumeUsageCritical 64KubePersistentVolumeFullInFourDays 64KubePersistentVolumeErrors 64
Kubernetes system 65KubeNodeNotReady 65KubeVersionMismatch 65KubeClientErrors 66KubeletTooManyPods 66
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page iv
KubeAPILatencyHighWarning 66KubeAPILatencyHighCritical 66KubeAPIErrorsHighCritical 67KubeAPIErrorsHighWarning 67KubeAPIResourceErrorsHighCritical 67KubeAPIResourceErrorsHighWarning 67KubeClientCertificateExpirationInSevenDays 68KubeClientCertificateExpirationInOneDay 68ContainerScrapeError 68
MongoDB 68MongodbCursorsOpenTooMany 69MongodbCursorTimeouts 69MongodbConnectionsTooMany 69MongodbMemoryUsageWarning 69
Netchecker 70NetCheckerAgentErrors 70NetCheckerReportsMissing 70NetCheckerTCPServerDelay 71NetCheckerDNSSlow 71
NGINX 71NginxServiceDown 71NginxDroppedIncomingConnections 71
Node network 72SystemRxPacketsErrorTooHigh 72SystemTxPacketsErrorTooHigh 72SystemRxPacketsDroppedTooHigh 72SystemTxPacketsDroppedTooHigh 73NodeNetworkInterfaceFlapping 73
Node time 73ClockSkewDetected 73
Prometheus 74PrometheusConfigReloadFailed 74PrometheusNotificationQueueRunningFull 74
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page v
PrometheusErrorSendingAlertsWarning 74PrometheusErrorSendingAlertsCritical 75PrometheusNotConnectedToAlertmanagers 75PrometheusTSDBReloadsFailing 75PrometheusTSDBCompactionsFailing 76PrometheusTSDBWALCorruptions 76PrometheusNotIngestingSamples 76PrometheusTargetScrapesDuplicate 76PrometheusRuleEvaluationsFailed 77
Salesforce notifier 77SfNotifierDown 77SfNotifierAuthFailure 77
SMART disks 78SystemSMARTDiskUDMACrcErrorsTooHigh 78SystemSMARTDiskHealthStatus 78SystemSMARTDiskReadErrorRate 78SystemSMARTDiskSeekErrorRate 79SystemSMARTDiskTemperatureHigh 79SystemSMARTDiskReallocatedSectorsCount 79SystemSMARTDiskCurrentPendingSectors 80SystemSMARTDiskReportedUncorrectableErrors 80SystemSMARTDiskOfflineUncorrectableSectors 80SystemSMARTDiskEndToEndError 80
SSL certificates 81SSLCertExpirationWarning 81SSLCertExpirationCritical 81
Telemeter 81TelemeterClientAuthenticationFailed 82TelemeterClientFederationFailed 82
Disable workload monitoring 82
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page vi
Copyright notice2020 Mirantis, Inc. All rights reserved.This product is protected by U.S. and international copyright and intellectual property laws. Nopart of this publication may be reproduced in any written, electronic, recording, or photocopyingform without written permission of Mirantis, Inc.Mirantis, Inc. reserves the right to modify the content of this document at any time without priornotice. Functionality described in the document may not be available at the moment. Thedocument contains the latest information at the time of publication.Mirantis, Inc. and the Mirantis Logo are trademarks of Mirantis, Inc. and/or its affiliates in theUnited States an other countries. Third party trademarks, service marks, and names mentionedin this document are the properties of their respective owners.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 1
PrefaceThis documentation provides information on how to use Mirantis products to deploy cloudenvironments. The information is for reference purposes and is subject to change.
Intended audienceThis documentation assumes that the reader is familiar with network and cloud concepts and isintended for the following users:
• Infrastructure Operator
• Is member of the IT operations team• Has working knowledge of Linux, virtualization, Kubernetes API and CLI, and OpenStack
to support the application development team• Accesses Mirantis KaaS and Kubernetes through a local machine or web UI• Provides verified artifacts through a central repository to the Tenant DevOps engineers
• Tenant DevOps engineer
• Is member of the application development team and reports to line-of-business (LOB)• Has working knowledge of Linux, virtualization, Kubernetes API and CLI to support
application owners• Accesses Mirantis KaaS and Kubernetes through a local machine or web UI• Consumes artifacts from a central repository approved by the Infrastructure Operator
Documentation historyThe documentation set refers to Mirantis KaaS beta as to the latest released beta version of theproduct. For details about the KaaS beta minor releases dates, refer to KaaS releases.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 2
Create and manage a KaaS child cluster
NoteThis tutorial applies only to the KaaS web UI users with the writer or operator access roleassigned by the Infrastructure Operator.
After you deploy the KaaS management cluster, you can start creating the KaaS child clustersthat will be based on the same cloud provider type that you have for the KaaS managementcluster: OpenStack, AWS, or bare metal.The deployment procedure is performed using the KaaS web UI and comprises the followingsteps:
1. Create an initial cluster configuration depending on the provider type.2. For a baremetal-based child cluster, create and configure bare metal hosts with
corresponding labels for machines such as worker, control plane, or storage.3. Add the required amount of machines with the corresponding configuration to the child
cluster.4. For a baremetal-based child cluster, add a Ceph cluster.
Create and manage a baremetal-based KaaS child clusterAfter bootstrapping your baremetal-based KaaS management cluster as described in KaaSDeployment Guide: Deploy a baremetal-based management cluster, you start creating thebaremetal-based KaaS child clusters using the KaaS web UI.
Create a child clusterThis section instructs you on how to configure and deploy a Mirantis KaaS child cluster that isbased on the baremetal-based Mirantis KaaS management cluster through the KaaS web UI.To create a Mirantis KaaS child cluster on bare metal:
1. Log in to KaaS web UI with the operator or writer permissions.2. Select the required namespace.3. On the SSH keys page, click Add key (the + icon) to upload the public SSH key that will be
used for the SSH access to VMs.4. In the Clusters block, click Create cluster (the + icon).5. Configure the new cluster in the Create new cluster wizard that opens:
1. Define general and Kubernetes parameters:
Create new cluster: General, Provider, and Kubernetes
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 3
Section Parametername Description
Generalsettings
Name The cluster name.
Provider Select Baremetal.Region From the drop-down list, select Baremetal.Releaseversion
The Mirantis KaaS version.
Kubernetesapplications
Istio Select to enable Istio service mesh for application owners.
Caution!Istio is deprecated since the Cluster release 3.1.0and 2.2.0 and will be removed in future releases.
KubernetesDashboard
Select to enable the Kubernetes Dashboard to manageapplications that run on a Kubernetes cluster as well astroubleshoot them using the web UI.
SSH keys From the drop-down list, select the SSH key name thatyou have previously added for SSH access to theOpenStack VMs.
Provider LB host IP The IP address of the load balancer endpoint that will beused to access the Kubernetes API of the new cluster. ThisIP address must be from the same subnet as used forDHCP in Metal³.
LB addressrange
The range of IP addresses that can be assigned to loadbalancers for Kubernetes Services by MetalLB.
Kubernetes Node CIDR The Kubernetes worker nodes CIDR block. For example,10.10.10.0/24.
ServicesCIDR blocks
The Kubernetes Services CIDR blocks. For example,10.233.0.0/18.
Pods CIDRblocks
The Kubernetes pods CIDR blocks. For example,10.233.64.0/18.
2. Optional, recommended Enable and configure StackLight:
StackLight configuration
Section Parametername Description
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 4
StackLight Enabled Select to enable StackLight monitoring.
NoteYou can also enable, disable, or configureStackLight parameters after deploying a KaaSchild cluster. For details, see Change a clusterconfiguration.
Multiservermode
Select to enable StackLight monitoring in the HAmode. For the differences between HA and non-HAmodes, see KaaS Reference Architecture: StackLightdeployment architecture.
Elasticsearch retentiontime
The Elasticsearch logs retention period in Logstash.
Elasticsearch persistentvolumeclaim size
The Elasticsearch persistent volume claim size.
Prometheusretentiontime
The Prometheus database retention period.
Prometheusretentionsize
The Prometheus database retention size.
Prometheuspersistentvolumeclaim size
The Prometheus persistent volume claim size.
EnableWatchdogalert
Select to enable the Watchdog alert that fires as longas the entire alerting pipeline is functional.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 5
Customalerts
Specify alerting rules for new custom alerts or upload aYAML file in the following exemplary format:
- alert: HighErrorRate expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m labels: severity: page annotations: summary: High request latency
For details, see Official Prometheus documentation:Alerting rules. For the list of the predefined StackLightalerts, see KaaS User Guide: Available StackLightalerts.
StackLightemail alerts
Enabled Select to enable the StackLight email alerts.
Sendresolved
Select to enable notifications about resolvedStackLight alerts.
Require TLS Select to enable transmitting emails through TLS.Email alertsconfiguration forStackLight
Fill out the following email alerts parameters asrequired:
• To - the email address to send notifications to.• From - the sender address.• SmartHost - the SMTP host through which the
emails are sent.• Authentication username - the SMTP user name.• Authentication password - the SMTP password.• Authentication identity - the SMTP identity.• Authentication secret - the SMTP secret.
StackLightSlack alerts
Enabled Select to enable the StackLight Slack alerts.
Sendresolved
Select to enable notifications about resolvedStackLight alerts.
Slack alertsconfiguration forStackLight
Fill out the following Slack alerts parameters asrequired:
• API URL - The Slack webhook URL.• Channel - The channel to send notifications to, for
example, #channel-for-alerts.6. Click Create.
Now, proceed to Add a bare metal host.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 6
Add a bare metal hostBefore you proceed with adding a bare metal host, verify that the physical network on the serverhas been configured correctly. See KaaS Reference Architecture: Network fabric for details.To add a bare metal host to a baremetal-based KaaS child cluster:
1. Log in to the Kaas web UI with the operator permissions.2. Select the required namespace.3. Add unique credentials for a new bare metal host:
1. On the upper right side of the namespace page, click Credentials. The Credentials pageopens.
2. Click Add Credential (the + icon).3. Type in a credential name.4. Select the Baremetal credential type.5. Select the Baremetal credential region.6. Enter Username and Password.7. Click Create.
NoteEvery bare metal host requires its own credentials.
4. On the upper right side of the namespace page, click Baremetal. The Baremetal pageopens.
5. On the Baremetal page, click Add BM Host (the + icon).6. Fill out the Add new BM host form as required:
• NameSpecify the name of the new bare metal host.
• CredentialSelect credentials that you created for the host in the step 3.
• Boot MAC AddressSpecify the MAC address of the PXE network interface.
• AddressSpecify the URL to access the BMC. Should start with https://.
• LabelAssign the machine label to the new host that defines which type of machine maybe deployed on this bare metal host. Only one label can be assigned to a host. Thesupported labels include:
• Worker
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 7
Assigned by default. The host with this label may be used to deploy theworker machine type. Assign this label to the bare metal hosts that havesufficient CPU and RAM resources, as described in KaaS ReferenceArchitecture: Reference hardware configuration.
• StorageAssign this label to the bare metal hosts that have sufficient storagedevices to match KaaS Reference Architecture: Reference hardwareconfiguration. Hosts with this label will be used to deploy machines withthe storage type that run Ceph OSDs.
• Control planeAssign this label to the bare metal hosts that may be used to deploymachines with the control plane type. These hosts must match the CPUand RAM requirements from KaaS Reference Architecture Referencehardware configuration.
7. Click CreateWhile adding the bare metal host, Mirantis KaaS discovers and inspects the hardware of thebare metal host and adds it to BareMetalHost.spec for future references.
Now, you can proceed to Add a machine.
Add a machineAfter you add a bare metal host to the child cluster as described in Add a bare metal host, youcan create a Kubernetes machine in your cluster.To add a Kubernetes machine to a baremetal-based KaaS child cluster:
1. Log in to the KaaS web UI with the operator or writer permissions.2. Select the namespace where to add the machine.3. In the Clusters block, click the required cluster name. The Machines page opens.4. On the Machines page, click Create machine (the + icon).5. Fill out the Create new machine form as required:
• CountSpecify the number of machines to add.
• Control PlaneSelect Control Plane to create a Kubernetes control plane node. Otherwise, theKubernetes worker node will be created. The recommended minimum number ofmachines is three for the control plane HA and two for the KaaS workloads.
• Bare metal host labelAssign the role to the new machine(s) to link the machine to a previously createdbare metal host with the corresponding label. You can assign one role type permachine. The supported labels include:
• Worker
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 8
The default role for any node in a child cluster. Only the kubelet service isrunning on the machines of this type.
• Control planeThis node hosts the control plane services of the child cluster. For thereliability reasons, KaaS does not permit running end user workloads onthe control plane nodes or use them as storage nodes.
• StorageThis node is a worker node that also hosts Ceph OSD daemons andprovides its disk resources to Ceph. KaaS permits end users to runworkloads on storage nodes by default.
6. Click Create.At this point, Mirantis KaaS adds the new machine object to the specified KaaS child cluster. Andthe Bare Metal Operator controller creates the relation to BareMetalHost with the labelsmatching the roles.Provisioning of the newly created machine starts when the machine object is created andincludes the following stages:
1. Creation of partitions on the local disks as required by the operating system and theMirantis KaaS architecture.
2. Configuration of the network interfaces on the host as required by the operating system andthe Mirantis KaaS architecture.
3. Installation and configuration of the KaaS LCM agent.
Seealso
• Add a Ceph cluster• Connect to a KaaS cluster
Add a Ceph clusterAfter you add machines to your new bare metal KaaS child cluster as described in Add amachine, you can create a Ceph cluster on top of this child cluster using the KaaS web UI.The procedure below enables you to create a Ceph cluster with minimum three nodes thatprovides persistent volumes to the Kubernetes workloads in the KaaS child cluster.To create a Ceph cluster in the KaaS child cluster:
1. Log in to the KaaS web UI with the operator or writer permissions.2. Select the namespace.3. In the Ceph block, click Create Ceph cluster (the + icon).4. Configure the Ceph cluster in the Create new Ceph cluster wizard that opens:
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 9
Create new Ceph cluster
Section Parametername Description
Generalsettings
Name The Ceph cluster name.
Cluster Select the name of the KaaS child cluster that will host thenew Ceph cluster.
Machines /Machine#1-3
Selectmachine
Select the name of the Kubernetes machine that will hostthe corresponding Ceph node in the Ceph cluster.
Manager,Monitor
Select the required Ceph services to install on the Cephnode.
Devices Select the disk that Ceph will use.
WarningDo not select the device for system services, forexample, sda.
5. To add more Ceph nodes to the new Ceph cluster, click + next to any Ceph Machine title inthe Machines tab. Configure a Ceph node as required.
WarningDo not add more than 3 Manager and/or Monitor services to the Ceph cluster.
6. After you add and configure all nodes in your Ceph cluster, click Create.
Delete a child clusterDeleting a baremetal-based KaaS child cluster does not require a preliminary deletion of themachines running on the cluster.To delete a baremetal-based KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. Click the Delete cluster icon next to the name of the cluster you need to remove.4. Verify the list of machines to be removed. Confirm the deletion.5. Optional. If you do not plan to reuse the credentials of the deleted cluster, delete them:
1. On the upper right side of the required namespace page, click Credentials.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 10
2. On the Credentials page, click the Delete credential action icon next to the name of thecredentials to be deleted. Confirm the deletion.
WarningYou can delete credentials only after deleting the KaaS cluster they relate to.
Deleting a cluster automatically frees up the resources allocated for this cluster, for example,instances, load balancers, networks, floating IPs, and so on.
Create and manage an OpenStack-based KaaS childclusterAfter bootstrapping your OpenStack-based KaaS management cluster as described in KaaSDeployment Guide: Deploy an OpenStack-based management cluster, you can create theOpenStack-based KaaS child clusters using the KaaS web UI.
Create a child clusterThis section describes how to create an OpenStack-based KaaS child cluster using the KaaS webUI of the OpenStack-based KaaS management cluster.To create an OpenStack-based KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. On the upper right side of the namespace page, click SSH keys. The SSH keys page opens.4. On the SSH keys page, click Add key (the + icon) to upload the public SSH key that will be
used for the OpenStack VMs creation.5. On the upper right side of the namespace page, click Credentials. The Credentials page
opens.6. On the Credentials page, click Add credential (the + icon) to add your OpenStack
credentials. You can either upload your OpenStack clouds.yaml configuration file or fill inthe fields manually.
7. In the Clusters block, click Create cluster (the + icon) and fill out the form with the followingparameters as required:
1. Configure general settings and the Kubernetes parameters:
KaaS child cluster configuration
Section Parameter DescriptionGeneral settings Name Cluster name
Provider Select OpenStack
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 11
Provider credential From the drop-down list,select the OpenStackcredentials name that youcreated in the previousstep.
Release version The Mirantis KaaS version.Kubernetes applications Istio Select to enable Istio
service mesh forapplication owners.
Caution!Istio is deprecatedsince the Clusterrelease 3.1.0 and2.2.0 and will beremoved in futurereleases.
Kubernetes Dashboard Select to enable theKubernetes Dashboard tomanage applications thatrun on a Kubernetescluster as well astroubleshoot them usingthe web UI.
SSH keys From the drop-down list,select the SSH key namethat you have previouslyadded for SSH access toVMs.
Provider External network Type of the externalnetwork in the OpenStackcloud provider.
DNS name servers Comma-separated list ofthe DNS hosts IPs for theOpenStack VMsconfiguration.
Kubernetes Node CIDR The Kubernetes nodesCIDR block. For example,10.10.10.0/24.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 12
Services CIDR blocks The Kubernetes ServicesCIDR block. For example,10.233.0.0/18.
Pods CIDR blocks The Kubernetes PodsCIDR block. For example,10.233.64.0/18.
2. Optional, recommended Enable and configure StackLight:
StackLight configuration
Section Parametername Description
StackLight Enabled Select to enable StackLight monitoring.
NoteYou can also enable, disable, or configureStackLight parameters after deploying a KaaSchild cluster. For details, see Change a clusterconfiguration.
Multiservermode
Select to enable StackLight monitoring in the HAmode. For the differences between HA and non-HAmodes, see KaaS Reference Architecture: StackLightdeployment architecture.
Elasticsearch retentiontime
The Elasticsearch logs retention period in Logstash.
Elasticsearch persistentvolumeclaim size
The Elasticsearch persistent volume claim size.
Prometheusretentiontime
The Prometheus database retention period.
Prometheusretentionsize
The Prometheus database retention size.
Prometheuspersistentvolumeclaim size
The Prometheus persistent volume claim size.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 13
EnableWatchdogalert
Select to enable the Watchdog alert that fires as longas the entire alerting pipeline is functional.
Customalerts
Specify alerting rules for new custom alerts or upload aYAML file in the following exemplary format:
- alert: HighErrorRate expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m labels: severity: page annotations: summary: High request latency
For details, see Official Prometheus documentation:Alerting rules. For the list of the predefined StackLightalerts, see KaaS User Guide: Available StackLightalerts.
StackLightemail alerts
Enabled Select to enable the StackLight email alerts.
Sendresolved
Select to enable notifications about resolvedStackLight alerts.
Require TLS Select to enable transmitting emails through TLS.Email alertsconfiguration forStackLight
Fill out the following email alerts parameters asrequired:
• To - the email address to send notifications to.• From - the sender address.• SmartHost - the SMTP host through which the
emails are sent.• Authentication username - the SMTP user name.• Authentication password - the SMTP password.• Authentication identity - the SMTP identity.• Authentication secret - the SMTP secret.
StackLightSlack alerts
Enabled Select to enable the StackLight Slack alerts.
Sendresolved
Select to enable notifications about resolvedStackLight alerts.
Slack alertsconfiguration forStackLight
Fill out the following Slack alerts parameters asrequired:
• API URL - The Slack webhook URL.• Channel - The channel to send notifications to, for
example, #channel-for-alerts.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 14
8. Click Create.To view the deployment status, use the Status column in the Clusters tab. Once theUpdating status disappears, the deployment is complete.
9. Proceed with Add a machine.
SeealsoDelete a child cluster
Add a machineAfter you create a new OpenStack-based KaaS child cluster as described in Create a childcluster, proceed with adding machines to this cluster using the KaaS web UI.You can also use the instruction below to scale up an existing KaaS child cluster.To add a machine to an OpenStack-based KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. In the Clusters block, click the required cluster name. The Machines page opens.4. On the Machines page, click Create machine (the + icon).5. Fill out the form with the following parameters as required:
KaaS machine configuration
Parameter DescriptionCount Add the required number of machines to create.
The recommended minimum number of machines is three for thecontrol plane HA and two for the KaaS workloads.Select Control Plane for a machine with the control plane role.Otherwise, the machine will have the worker role.
Flavor From the drop-down list, select the required hardwareconfiguration for the machine. The list of available flavorscorresponds to the one in your OpenStack environment.For the hardware requirements, see: Mirantis KaaS ReferenceArchitecture.
Image From the drop-down list, select the cloud image with Ubuntu18.04. If you do not have this image in the list, add it to yourOpenStack environment using the Horizon web UI by downloadingthe image from the Ubuntu official website.
Availability zone From the drop-down list, select the availability zone from whichthe new machine will be launched.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 15
6. Click Create.To view the deployment status, use the Status column on the Machines page. Once thestatus changes from Pending, Updating to Ready, the deployment is complete.
7. Repeat the steps above for the remaining number of machines.8. Verify the status of the cluster nodes as described in Connect to a KaaS cluster.
Deleting a machine does not require preliminary actions. You can delete a machine using theDelete machine icon on the Machines page of the KaaS web UI. Deleting a machineautomatically frees up the resources allocated for this machine.
WarningThe operational KaaS child cluster should contain minimum 3 Kubernetes control planenodes and 2 Kubernetes worker nodes. To meet the etcd quorum and to prevent thedeployment failure, scaling down of the control plane nodes is prohibited.
Delete a child clusterDeleting a KaaS child cluster does not require a preliminary deletion of VMs that run on thiscluster.To delete an OpenStack-based KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. In the Clusters block, click the Delete cluster action icon next to the name of the cluster to
be deleted.4. Verify the list of machines to be removed. Confirm the deletion.
Deleting a cluster automatically frees up the resources allocated for this cluster, forexample, instances, load balancers, networks, floating IPs.
5. If Istio and Harbor were enabled, verify the OpenStack volumes. Since the Istio and Harborstorage is external, manually delete the corresponding resources using the OpenStack webUI or API.
Caution!
Deprecation notesSince the Cluster release 3.1.0 and 2.2.0, removed Harbor support and deprecatedIstio support. Istio will be removed in future Cluster releases.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 16
6. If the cluster deletion hangs and the The cluster is being deleted message does notdisappear for a while:
1. In the upper right corner of the KaaS web UI, click the arrow next to your user name toopen the drop-down menu.
2. In the drop-down menu, click Download kubeconfig to download kubeconfig of yourKaaS management cluster.
3. Log in to any local machine with kubectl installed.4. Copy the downloaded kubeconfig to this machine.5. Run the following command:
kubectl --kubeconfig <KUBECONFIG_PATH> edit -n <NAMESPACE_NAME> cluster <CHILD_CLUSTER_NAME>
6. Edit the opened kubeconfig by removing the following lines:
finalizers:- cluster.cluster.k8s.io
7. Optional. If you do not plan to reuse the credentials of the deleted cluster, delete them:
1. On the upper right side of the required namespace page, click Credentials.2. On the Credentials page, click the Delete credential action icon next to the name of the
credentials to be deleted. Confirm the deletion.
WarningYou can delete credentials only after deleting the KaaS cluster they relate to.
Create and manage an AWS-based KaaS child clusterAfter bootstrapping your AWS-based KaaS management cluster as described in KaaSDeployment Guide: Deploy an AWS-based management cluster, you can create the AWS-basedKaaS child clusters using the KaaS web UI.
Create a child clusterThis section describes how to create an AWS-based KaaS child cluster using the KaaS web UI ofthe AWS-based KaaS management cluster.To create an AWS-based KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. On the upper right side of the namespace page, click SSH keys. The SSH keys page opens.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 17
4. On the SSH keys page, click Add key (the + icon) to upload the public SSH key that will beused for the AWS VMs creation.
5. On the upper right side of the namespace page, click Credentials. The Credentials pageopens.
6. On the Credentials page, click Add credential (the + icon) and fill in the required fields toadd your AWS credentials.
7. Return to the namespace page.8. In the Clusters block, click Create cluster (the + icon) and fill out the form with the following
parameters as required:
1. Configure general settings and the Kubernetes parameters:
KaaS child cluster configuration
Section Parameter DescriptionGeneral settings Name Cluster name
Provider Select AWSProvider credential From the drop-down list,
select the previouslycreated AWS credentialsname.
Release version The Mirantis KaaS version.Kubernetes applications Istio Select to enable Istio
service mesh forapplication owners.
Caution!Istio is deprecatedsince the Clusterrelease 3.1.0 and2.2.0 and will beremoved in futurereleases.
Kubernetes Dashboard Select to enable theKubernetes Dashboard tomanage applications thatrun on a Kubernetescluster as well astroubleshoot them usingthe web UI.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 18
SSH keys From the drop-down list,select the SSH key namethat you have previouslyadded for SSH access toVMs.
Provider AWS region Type in the AWS Regionfor the KaaS child cluster.For example, us-east-2.
Services CIDR blocks The Kubernetes ServicesCIDR block. For example,10.233.0.0/18.
Pods CIDR blocks The Kubernetes PodsCIDR block. For example,10.233.64.0/18.
2. Optional, recommended Enable and configure StackLight:
StackLight configuration
Section Parametername Description
StackLight Enabled Select to enable StackLight monitoring.
NoteYou can also enable, disable, or configureStackLight parameters after deploying a KaaSchild cluster. For details, see Change a clusterconfiguration.
Multiservermode
Select to enable StackLight monitoring in the HAmode. For the differences between HA and non-HAmodes, see KaaS Reference Architecture: StackLightdeployment architecture.
Elasticsearch retentiontime
The Elasticsearch logs retention period in Logstash.
Elasticsearch persistentvolumeclaim size
The Elasticsearch persistent volume claim size.
Prometheusretentiontime
The Prometheus database retention period.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 19
Prometheusretentionsize
The Prometheus database retention size.
Prometheuspersistentvolumeclaim size
The Prometheus persistent volume claim size.
EnableWatchdogalert
Select to enable the Watchdog alert that fires as longas the entire alerting pipeline is functional.
Customalerts
Specify alerting rules for new custom alerts or upload aYAML file in the following exemplary format:
- alert: HighErrorRate expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m labels: severity: page annotations: summary: High request latency
For details, see Official Prometheus documentation:Alerting rules. For the list of the predefined StackLightalerts, see KaaS User Guide: Available StackLightalerts.
StackLightemail alerts
Enabled Select to enable the StackLight email alerts.
Sendresolved
Select to enable notifications about resolvedStackLight alerts.
Require TLS Select to enable transmitting emails through TLS.Email alertsconfiguration forStackLight
Fill out the following email alerts parameters asrequired:
• To - the email address to send notifications to.• From - the sender address.• SmartHost - the SMTP host through which the
emails are sent.• Authentication username - the SMTP user name.• Authentication password - the SMTP password.• Authentication identity - the SMTP identity.• Authentication secret - the SMTP secret.
StackLightSlack alerts
Enabled Select to enable the StackLight Slack alerts.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 20
Sendresolved
Select to enable notifications about resolvedStackLight alerts.
Slack alertsconfiguration forStackLight
Fill out the following Slack alerts parameters asrequired:
• API URL - The Slack webhook URL.• Channel - The channel to send notifications to, for
example, #channel-for-alerts.9. Click Create.
To view the deployment status, use the Status column in the Clusters tab. Once theUpdating status disappears, the deployment is complete.
10.Proceed with Add a machine.
SeealsoDelete a child cluster
Add a machineAfter you create a new AWS-based KaaS child cluster as described in Create a child cluster,proceed with adding machines to this cluster using the KaaS web UI.You can also use the instruction below to scale up an existing KaaS child cluster.To add a machine to an AWS-based KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. In the Clusters block, click the required cluster name. The Machines page opens.4. On the Machines page, click Create machine (the + icon).
KaaS machine configuration
Parameter DescriptionCount Add the required number of machines to
create.The recommended minimum number ofmachines is three for the control plane HAand two for the KaaS workloads.Select Control Plane for a machine with thecontrol plane role. Otherwise, the machinewill have the worker role.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 21
Instance type Type in the AWS instance type that isc5d.2xlarge.
AMI ID Type in the required AMI ID of Ubuntu18.04. For example,ami-033a0960d9d83ead0.
Root device size Select the required root device size, 40 bydefault.
5. Click Create.6. Repeat the steps above for the remaining number of machines.
To view the deployment status, use the Status column in the Machines page. Once thestatus changes from Pending, Updating to Ready, the deployment is complete.
7. Verify the status of the cluster nodes as described in Connect to a KaaS cluster.Deleting a machine does not require preliminary actions. You can delete a machine using theDelete machine icon on the Machines page of the KaaS web UI. Deleting a machineautomatically frees up the resources allocated for this machine.
WarningThe operational KaaS child cluster should contain minimum 3 Kubernetes control planenodes and 2 Kubernetes worker nodes. To meet the etcd quorum and to prevent thedeployment failure, scaling down of the control plane nodes is prohibited.
Delete a child clusterDeleting a KaaS child cluster does not require a preliminary deletion of VMs that run on thiscluster.To delete an AWS-based KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. In the Clusters block, click the Delete cluster action icon next to the name of the cluster to
be deleted.4. Verify the list of machines to be removed. Confirm the deletion.
Deleting a cluster automatically removes the Amazon Virtual Private Cloud (VPC) connectedwith this cluster and frees up the resources allocated for this cluster, for example,instances, load balancers, networks, floating IPs.
5. If Istio and Harbor were enabled, verify the AWS volumes. Since the Istio and Harbor storageis external, manually delete the corresponding resources using the AWS API or AWSManagement Console.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 22
Caution!
Deprecation notesSince the Cluster release 3.1.0 and 2.2.0, removed Harbor support and deprecatedIstio support. Istio will be removed in future Cluster releases.
6. Optional. If you do not plan to reuse the credentials of the deleted cluster, delete them:
1. On the upper right side of the required namespace page, click Credentials.2. On the Credentials page, click the Delete credential action icon next to the name of the
credentials to be deleted. Confirm the deletion.
WarningYou can delete credentials only after deleting the KaaS cluster they relate to.
Change a cluster configurationAfter deploying a KaaS child cluster, you can change the configuration of the following clustercomponents using the KaaS web UI:
• Enable or disable Istio Deprecated since KaaS release 1.4.0 and Kubernetes Dashboard• Enable or disable StackLight and configure its parameters if enabled
To change a cluster configuration:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. On the right side of the required cluster block, click the gear icon.4. In the Configure cluster window, select or deselect the required Kubernetes application. If
StackLight is enabled, configure its parameters as required.5. Click Update to apply the changes.
Update a child clusterA KaaS management cluster automatically upgrades to a new available KaaS release versionthat supports new Cluster releases. Once done, a newer version of a Cluster release becomesavailable for KaaS child clusters and the Update button appears in the KaaS web UI.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 23
Caution!
Mirantis highly recommends updating your clusters that are based on Kubernetes 1.16 tothe latest supported Cluster release that is based on Kubernetes 1.17. Be aware that:
• Before the KaaS release 1.9.0, any Cluster release was supported at least by twoKaaS releases.
• Starting from the KaaS release 1.9.0:
• For the sake of development and the upcoming UCP-based Cluster release, oneKaaS release supports only one Cluster release that is based on Kubernetes 1.16and continues supporting two Cluster releases that are based on Kubernetes1.17.
• Only new deployments of the KaaS clusters based on Kubernetes 1.16 aresupported.
• An update from a previous KaaS release based on Kubernetes 1.16 is notsupported anymore.
Caution!
Make sure to update the Cluster release version of your KaaS child cluster before thecurrent Cluster release version becomes unsupported by a new KaaS release version.Otherwise, KaaS stops auto-upgrade and eventually Mirantis KaaS itself becomesunsupported.
This section describes how to update a KaaS child cluster of any provider type using the KaaSweb UI.To update a KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. In the Clusters block, click Update where available.4. In the Release update window, select the required Cluster release to update your child
cluster to.The Description section contains the list of components versions to be installed with a newCluster release. The release notes for each KaaS and Cluster release are available at KaaSRelease Notes: KaaS releases and KaaS Release Notes: Cluster releases.
5. Click Update.To view the update status, verify the Updating status of the cluster in the Clusters block.Once the Updating status disappears, the update is complete.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 24
Delete a machineThis section instructs you on how to scale down an existing KaaS child cluster through the KaaSweb UI.
WarningA machine with the control plane node role cannot be deleted manually. A machine withsuch role is automatically deleted during the KaaS child cluster deletion.
To delete a machine from a KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. Click on the cluster name to open the list of machines running in it.4. Click the Delete machine icon next to the machine you want to remove. Confirm the
deletion.Deleting a machine automatically frees up the resources allocated to this machine.
WarningThe operational KaaS child cluster should contain minimum 3 Kubernetes control planenodes and 2 Kubernetes worker nodes. To meet the etcd quorum and to prevent thedeployment failure, scaling down of the control plane nodes is prohibited.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 25
Manage a KaaS management clusterThe KaaS web UI enables you to perform the following operations with a KaaS managementcluster:
• View the cluster details (such as cluster ID, creation date, nodes count, and so on) as well asobtain a list of the cluster endpoints including the StackLight components, depending onyour deployment configuration.To view generic cluster details, in the Clusters block, click the Cluster info action icon nextto the name of the required management cluster.
• Verify the current release version of the cluster including the list of installed componentswith their versions and the cluster release change log.To view a cluster release version details, in the Clusters block, click the version next to thename of the required management cluster.A management cluster upgrade to a newer version is performed automatically once a newKaaS version is released. For more details about the KaaS release upgrade mechanism, see:KaaS Reference Architecture: KaaS release controller.
WarningDue to architecture limitations, a baremetal-based management cluster upgradefrom the KaaS release 1.6.0 to 1.7.0 is not supported.
SeealsoConnect to a KaaS cluster
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 26
Connect to a KaaS clusterAfter you deploy a KaaS management or child cluster, connect to the cluster to verify theavailability and status of the nodes as described below.This section also describes how to SSH to a node of a cluster where Bastion host is used for SSHaccess. For example, on the OpenStack-based management cluster or AWS-based managementand child clusters.To connect to a KaaS child cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. In the Clusters block, click the required cluster name. The Machines page opens.4. Verify the status of the control plane nodes. Once the first control plane node is deployed
and has the Ready status, the Download kubeconfig action icon for the cluster beingdeployed becomes active.
5. Click Download kubeconfig:
1. Enter your user password.2. Not recommended. Select Offline token to generate an offline IAM token. Otherwise, for
security reasons, the kubeconfig token expires every 30 minutes of the KaaS API idletime and you have to download kubeconfig again with a newly generated token.
3. Click Download.6. Verify the availability of the KaaS child cluster machines:
1. Export the kubeconfig parameters to your local machine with access to kubectl. Forexample:
export KUBECONFIG=~/Downloads/kubeconfig-test-cluster.yml
2. Obtain the list of available KaaS machines:
kubectl get nodes -o wide
The system response must contain the details of the nodes in the READY status.To connect to a KaaS management cluster:
1. Log in to a local machine where your KaaS management cluster kubeconfig is located andwhere kubectl is installed.
NoteThe KaaS management cluster kubeconfig is created during the last stage of the KaaSmanagement cluster bootstrap.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 27
2. Obtain the list of available KaaS management cluster machines:
kubectl get nodes -o wide
The system response must contain the details of the nodes in the READY status.To SSH to a KaaS cluster node if Bastion is used:
1. Obtain kubeconfig of the KaaS management or child cluster as described in the proceduresabove.
2. Obtain the internal IP address of a node you require access to:
kubectl get nodes -o wide
3. Obtain the Bastion public IP:
kubectl get cluster -o jsonpath='{.status.providerStatus.bastion.publicIP}' \-n <namespace> <cluster_name>
4. Run the following command:
ssh -i <private_key> ubuntu@<node_internal_ip> -o "proxycommand ssh -W %h:%p \-i <private_key> ubuntu@<bastion_public_ip>"
Substitute the parameters enclosed in angle brackets with the corresponding values of yourcluster obtained in previous steps. The <private_key> for a KaaS management cluster islocated at ~/.ssh/openstack_tmp. For a KaaS child cluster, this is the SSH key that youadded in the KaaS web UI before the child cluster creation.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 28
Manage IAMIAM CLIIAM CLI is a user-facing command-line tool for managing scopes, roles, and grants. Using yourpersonal credentials, you can perform different IAM operations through the iamctl tool. Forexample, you can verify the current status of the IAM service, request or revoke service tokens,verify your own grants within Mirantis KaaS as well as your token details.
Configure IAM CLIThe iamctl command-line interface uses the iamctl.yaml configuration file to interact with IAM.To create the IAM CLI configuration file:
1. Log in to the KaaS management cluster.2. Change the directory to one of the following:
• $HOME/.iamctl• $HOME• $HOME/etc• /etc/iamctl
3. Create iamctl.yaml with the following exemplary parameters and values that correspond toyour deployment:
server: <IAM_API_ADDRESS>timeout: 60verbose: 99 # Verbosity level, from 0 to 99
tls: enabled: true ca: <PATH_TO_CA_BUNDLE>
auth: issuer: <IAM_REALM_IN_KEYCLOAK> ca: <PATH_TO_CA_BUNDLE> client_id: iam client_secret:
The <IAM_REALM_IN_KEYCLOAK> value has the<keycloak-url>/auth/realms/<realm-name> format, where <realm-name> defaults to iam.
Available IAM CLI commandsUsing iamctl, you can perform different role-based access control operations in your Kubernetescluster. For example:
• Grant or revoke access to a Kubernetes cluster to a specific user for troubleshooting
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 29
• Grant or revoke access to a KaaS namespace that contains several Kubernetes clusters• Create or delete tokens for the KaaS services with a specific set of grants as well as identify
when a service token was used the last timeThe iamctl command-line interface contains the following set of commands:
• General commands• Account information commands• Scope commands• Role commands• Grant commands• Service token commands• User commands
The following tables describe the iamctl commands with their descriptions.
General commands
Usage Descriptioniamctl --help, iamctl help Output the list of available commands.iamctl help <command> Output the description of a specific command.
Account information commands
Usage Descriptioniamctl account info Output detailed account information such as user email, user
name, the details of their active and offline sessions, tokensstatuses and expiration dates.
iamctl account login Log in the current user. The system prompts to enter yourauthentication credentials. After a successful login, your usertoken is added to the $HOME/.iamctl directory.
iamctl account logout Log out the current user. Once done, the user information isremoved from $HOME/.iamctl.
Scope commands
Usage Description
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 30
iamctl scope list List the IAM scopes available for the current environment.Example output:
+---------------+-----------------+| NAME | DESCRIPTION |+---------------+-----------------+| m:iam | IAM scope || m:kaas | KaaS scope || m:k8s:managed | || m:k8s | Kubernetes scope|| m:cloud | Cloud scope |+---------------+-----------------+
iamctl scope list [prefix]
Output the specified scope list. For example: iamctl m:k8s.
Role commands
Usage Descriptioniamctl role list <scope> List the roles for the specified scope in IAM.iamctl role show <scope> <role>
Output the details of the specified scope role including therole name (admin, viewer, reader), its description, and anexample of the grant command. For example:iamctl role show m:iam admin.
Grant commands
Usage Descriptioniamctl grant give [username] [scope] [role]
Provide a user with a role in a scope. For example, theiamctl grant give jdoe m:iam admin command provides theIAM admin role in the m:iam scope to John Doe.For the list of supported IAM scopes and roles, see: Role list.
NoteTo lock or disable a user, use LDAP or Google OAuthdepending on the external provider integrated to yourdeployment.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 31
iamctl grant list <username> List the grants provided to the specified user. For example:iamctl grant list jdoe.Example output:
+--------+--------+---------------+| SCOPE | ROLE | GRANT FQN |+--------+--------+---------------+| m:iam | admin | m:iam@admin || m:sl | viewer | m:sl@viewer || m:kaas | writer | m:kaas@writer |+--------+--------+---------------+
• m:iam@admin - admin rights in all IAM-relatedapplications
• m:sl@viewer - viewer rights in all StackLight-relatedapplications
• m:kaas@writer - writer rights in KaaSiamctl grant revoke [username] [scope] [role]
Revoke the grants provided to the user.
Service token commands
Usage Descriptioniamctl servicetoken list [--all] List the details of all service tokens created by the current
user. The output includes the following service token details:
• ID• Alias, for example, nova, jenkins-ci• Creation date and time• Creation owner• Grants• Last refresh date and time• IP address
iamctl servicetoken show [ID] Output the details of a service token with the specified ID.iamctl servicetoken create [alias] [service] [grant1 grants2...]
Create a token for a specific service with the specified set ofgrants. For example,iamctl servicetoken create new-token iam m:iam@viewer.
iamctl servicetoken delete [ID1 ID2...]
Delete a service token with the specified ID.
User commands
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 32
Usage Descriptioniamctl user list List user names and emails of all current users.iamctl user show <username>
Output the details of the specified user.
Role listMirantis KaaS creates the IAM roles in scopes. For each application type, such as iam, k8s, orkaas, KaaS creates a scope in Keycloak. And every scope contains a set of roles such as admin,user, viewer. The default IAM roles can be changed during a KaaS child cluster deployment. Youcan grant or revoke a role access using the IAM CLI. For details, see: IAM CLI.Example of the structure of a cluster-admin role in a Kubernetes cluster:
m:k8s:kaas-tenant-name:k8s-cluster-name@cluster-admin
• m - prefix for all IAM roles in Mirantis KaaS• k8s - application type, Kubernetes• kaas-tenant-name:k8s-cluster-name - a Kubernetes cluster identifier in KaaS (CLUSTER_ID)• @ - delimiter between a scope and role• cluster-admin - name of the role within the Kubernetes scope
The following tables include the scopes and their roles descriptions by Mirantis KaaScomponents:
• IAM• KaaS• Kubernetes• StackLight
IAM
Scopeidentifier Role name Grant example Role description
m:iam admin m:iam@admin 1 Access Keycloak, the IAM APIand web UI.
user m:iam@user 1 Access the IAM API and webUI.
viewer m:iam@viewer 1 Access the data to be usedby the monitoring systems.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 33
KaaS
Scopeidentifier Role name Grant example Role description
m:kaas reader m:kaas@reader 1 List the Kubernetes clusterswithin the KaaS scope.
writer m:kaas@writer 1 Create or delete theKubernetes clusters withinthe KaaS scope.
m:kaas:$<CLUSTER_ID>
reader m:kaas:$<CLUSTER_ID>@reader
List the Kubernetes clusterswithin the specified KaaScluster ID.
writer m:kaas:$<CLUSTER_ID>@writer
Create or delete theKubernetes clusters withinthe specified KaaS cluster ID.
operator m:kaas@operator Add or delete the bare metalhosts within the KaaS scope.
1(1, 2, 3, 4, 5) Grant is available by default. Other grants can be added during a KaaSmanagement and child cluster deployment.
Kubernetes
Scopeidentifier Role name Grant example Role description
m:k8s:<CLUSTER_ID>
cluster-admin m:k8s:<CLUSTER_ID>@cluster-admin
Allow the super-user accessto perform any action on anyresource on the cluster level.When used inClusterRoleBinding, providefull control over everyresource in a cluster and allKubernetes namespaces.
StackLight
Scopeidentifier Role name Grant example Role description
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 34
m:sl:$<CLUSTER_ID> or m:sl:$<CLUSTER_ID>:<SERVICE_NAME>
admin• m:sl:$<CLUSTER_ID>@admin• m:sl:$<CLUSTER_ID>:alerta@admin• m:sl:$<CLUSTER_ID>:alertmngmnt@admin• m:sl:$<CLUSTER_ID>:kibana@admin• m:sl:$<CLUSTER_ID>:graphana@admin• m:sl:$<CLUSTER_ID>:prometheus@admin
Assign roles to other userswithin the scope.
viewer• m:sl:$<CLUSTER_ID>@viewer• m:sl:$<CLUSTER_ID>:alerta@viewer• m:sl:$<CLUSTER_ID>:alertmngmnt@viewer• m:sl:$<CLUSTER_ID>:kibana@viewer• m:sl:$<CLUSTER_ID>:graphana@viewer• m:sl:$<CLUSTER_ID>:prometheus@viewer
Access the specified webUI(s) within the scope.The m:sl:$<CLUSTER_ID>@viewer grant provides accessto all StackLight web UIs:Prometheus, Alerta,Alertmanager, Kibana,Grafana.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 35
Manage StackLightUsing StackLight, you can monitor the components deployed in Mirantis KaaS and be quicklynotified of critical conditions that may occur in the system to prevent service downtimes.
Access StackLight web UIsStackLight provides five web UIs including Prometheus, Alertmanager, Alerta, Kibana, andGrafana. This section describes how to access any of these web UIs.To access a StackLight web UI:
1. Log in to the KaaS web UI.2. Select the required namespace.3. In the Clusters tab, click the required cluster.4. In the dialog box with the cluster information, copy the required endpoint IP from the
StackLight endpoints parameter.5. Paste the copied IP to a web browser and use the default credentials to log in to the web UI.
Once done, you are automatically authenticated to all StackLight web UIs.
Seealso
• KaaS Reference Architecture: Deployment architecture• KaaS Reference Architecture: Authentication flow
View Grafana dashboardsUsing the Grafana web UI, you can view the visual representation of the metric graphs based onthe time series databases.To view the Grafana dashboards:
1. Log in to the Grafana web UI as described in Access StackLight web UIs.2. From the drop-down list, select the required dashboard to inspect the status and statistics
of the corresponding service in your KaaS management or child cluster:
Component
Dashboard Description
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 36
Cephcluster
Ceph cluster Provides the overall health status of the Ceph cluster,capacity, latency, and recovery metrics.
Ceph NodesAvailable since KaaS1.5.0
Provides an overview of the host-related metrics, such as thenumber of monitors, OSD hosts, average usage of resourcesacross the cluster, network and hosts load.
NoteSince KaaS 1.5.0, Ceph hosts overview is renamed toCeph Nodes.
Ceph OSD Availablesince KaaS 1.5.0
Provides metrics for Ceph OSDs, including the OSD read andwrite latencies, distribution of PGs per OSD, Ceph OSDs andphysical device performance.
NotePrior to KaaS 1.5.0, Ceph OSDs metrics are included inthe Ceph OSDs overview and Ceph OSDs details Grafanadashboards.
Ceph pools Availablesince KaaS 1.5.0
Provides metrics for Ceph pools, including the client IOPS andthroughput by pool and pools capacity usage.
NoteSince KaaS 1.5.0, Ceph pool overview is renamed toCeph Pools.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 37
KaaSclusters
Clusters overviewAvailable since KaaS1.8.0
Represents the main cluster capacity statistics for all clustersof a KaaS deployment where StackLight is installed.
NoteThis dashboard is not available yet for the bare metalprovider.
Kubernetesservices
Kubernetes Calico Provides metrics of the entire Calico cluster usage, includingthe cluster status, host status, and Felix resources.
Kubernetes cluster Provides metrics for the entire Kubernetes cluster, includingthe cluster status, host status, and resources consumption.
Kubernetesdeployments
Provides information on the desired and current state of allKaaS cluster service replicas deployed.
Kubernetesnamespace
Provides the pods state summary and the CPU, MEM, network,and IOPS resources consumption per name space.
Kubernetes node Provides charts showing resources consumption per KaaScluster node.
Kubernetes pod Provides charts showing resources consumption per deployedpod.
MongoDB
MongoDB Provides the summary for the query operations, informationabout the database health and resource consumption.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 38
NGINX
NGINX Provides the overall status of the NGINX cluster andinformation about NGINX requests and connections.
StackLight
Alertmanager Provides performance metrics on the overall health status ofthe Prometheus Alertmanager service, the number of firingand resolved alerts received for various periods, the rate ofsuccessful and failed notifications, and the resourcesconsumption.
Elasticsearch Provides information about the overall health status of theElasticsearch cluster, including the resources consumption andthe state of the shards.
Grafana Provides performance metrics for the Grafana service,including the total number of Grafana entities, CPU andmemory consumption.
PrometheusAvailable since KaaS1.5.0
Provides the availability and performance behavior of thePrometheus servers, the sample ingestion rate, and systemusage statistics per server. Also, provides statistics about theoverall status and uptime of the Prometheus service, thechunks number of the local storage memory, target scrapes,and queries duration.
NotePrior to KaaS 1.5.0, Prometheus metrics are included inthe Prometheus performances and Prometheus statsGrafana dashboards.
Pushgateway Provides performance metrics and the overall health status ofthe service, the rate of samples received for various periods,and the resources consumption.
Prometheus Relay Provides service status and resources consumption metrics.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 39
Telemeter serverAvailable since KaaS1.8.0
Provides statistics and the overall health status of theTelemeter service.
NoteThis dashboard is not available yet for the bare metalprovider.
System
System Provides a detailed resource consumption and operatingsystem information per KaaS cluster node.
View Kibana dashboardsUsing the Kibana web UI, you can view the visual representation of logs and Kubernetes eventsof your deployment.To view the Kibana dashboards:
1. Log in to the Kibana web UI as described in Access StackLight web UIs.2. Click the required dashboard to inspect the visualizations or perform a search:
Dashboard DescriptionLogs Provides visualization and search of logs.Kubernetesevents Availablesince KaaS 1.3.0
Provides visualization and search of Kubernetes events.
Available StackLight alertsThis section provides an overview of the available predefined StackLight alerts. To view thealerts, use the Prometheus, Alertmanager, or Alerta web UI.
AlertmanagerThis section describes the alerts for the Alertmanager service.
• AlertmanagerFailedReload• AlertmanagerMembersInconsistent• AlertmanagerNotificationFailureWarning• AlertmanagerAlertsInvalidWarning
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 40
AlertmanagerFailedReload
Severity
Warning
Summary
Failure to reload the Alertmanager configuration.
Description
Reloading the Alertmanager configuration failed for{{ $labels.namespace }}/{{ $labels.pod }}.
AlertmanagerMembersInconsistent
Severity
Critical
Summary
Alertmanager did not detect all cluster members.
Description
Alertmanager did not detect all other members of the cluster.
AlertmanagerNotificationFailureWarning
Severity
Warning
Summary
Alertmanager has failed notifications.
Description
An average of {{ $value }} Alertmanager {{ $labels.integration }} notifications onthe {{ $labels.instance }} instance fail for 2 minutes.
AlertmanagerAlertsInvalidWarning
Severity
Warning
Summary
Alertmanager has invalid alerts.
Description
An average of {{ $value }} Alertmanager {{ $labels.integration }} alerts on the{{ $labels.instance }} instance are invalid for 2 minutes.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 41
CalicoThis section describes the alerts for Calico.
• CalicoDataplaneFailuresHigh• CalicoDataplaneAddressMsgBatchSizeHigh• CalicoDatapaneIfaceMsgBatchSizeHigh• CalicoIPsetErrorsHigh• CalicoIptablesSaveErrorsHigh• CalicoIptablesRestoreErrorsHigh
CalicoDataplaneFailuresHigh
Severity
Warning
Summary
High number of data plane failures within Felix.
Description
The {{ $labels.instance }} Felix instance has {{ $value }} data plane failures withinthe last hour.
CalicoDataplaneAddressMsgBatchSizeHigh
Severity
Warning
Summary
Felix address message batch size is higher than 5.
Description
The size of the data plane address message batch on the {{ $labels.instance }} Felixinstance is {{ $value }}.
CalicoDatapaneIfaceMsgBatchSizeHigh
Severity
Warning
Summary
Felix interface message batch size is higher than 5.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 42
Description
The size of the data plane interface message batch on the {{ $labels.instance }}Felix instance is {{ $value }}.
CalicoIPsetErrorsHigh
Severity
Warning
Summary
More than 5 IPset errors occur in Felix per hour.
Description
The {{ $labels.instance }} Felix instance has {{ $value }} IPset errors within the lasthour.
CalicoIptablesSaveErrorsHigh
Severity
Warning
Summary
More than 5 iptable save errors occur in Felix per hour.
Description
The {{ $labels.instance }} Felix instance has {{ $value }} iptable save errors withinthe last hour.
CalicoIptablesRestoreErrorsHigh
Severity
Warning
Summary
More than 5 iptable restore errors occur in Felix per hour.
Description
The {{ $labels.instance }} Felix instance has {{ $value }} iptable restore errorswithin the last hour.
CephThis section describes the alerts for the Ceph cluster.
• CephClusterHealthMinor• CephClusterHealthCritical• CephMonQuorumAtRisk
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 43
• CephOsdDownMinor• CephOSDDiskNotResponding• CephOSDDiskUnavailable• CephClusterNearFull• CephClusterCriticallyFull• CephOsdPgNumTooHighWarning• CephOsdPgNumTooHighCritical• CephMonHighNumberOfLeaderChanges• CephNodeDown• CephDataRecoveryTakingTooLong• CephPGRepairTakingTooLong• CephOSDVersionMismatch• CephMonVersionMismatch
CephClusterHealthMinor
Severity
Minor
Summary
Ceph cluster health is WARNING
Description
The Ceph cluster is in the WARNING state. For details, run ceph -s.
CephClusterHealthCritical
Severity
Critical
Summary
Ceph cluster health is CRITICAL.
Description
The Ceph cluster is in the CRITICAL state. For details, run ceph -s.
CephMonQuorumAtRisk
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 44
Severity
Critical
Summary
Storage quorum is at risk.
Description
The storage cluster quorum is low.
CephOsdDownMinor
Severity
Minor
Summary
Ceph OSDs are down.
Description
{{ $value }} of Ceph OSD nodes in the Ceph cluster are down. For details, runceph osd tree.
CephOSDDiskNotResponding
Severity
Critical
Summary
Disk is not responding.
Description
The {{ $labels.device }} disk device is not responding on the {{ $labels.host }}host.
CephOSDDiskUnavailable
Severity
Critical
Summary
Disk is not accessible.
Description
The {{ $labels.device }} disk device is not accessible on the {{ $labels.host }} host.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 45
CephClusterNearFull
Severity
Warning
Summary
Storage cluster is nearly full. Expansion is required.
Description
The storage cluster capacity is less than 85%.
CephClusterCriticallyFull
Severity
Critical
Summary
Storage cluster is critically full and needs immediate expansion.
Description
The storage cluster capacity is less than 95%.
CephOsdPgNumTooHighWarning
Severity
Warning
Summary
Some Ceph OSDs have more than 200 PGs.
Description
Some Ceph OSDs contain more than 200 PGs. This may have a negative impact onthe cluster performance. For details, run ceph pg dump.
CephOsdPgNumTooHighCritical
Severity
Critical
Summary
Some Ceph OSDs have more than 300 PGs.
Description
Some Ceph OSDs contain more than 300 PGs. This may have a negative impact onthe cluster performance. For details, run ceph pg dump.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 46
CephMonHighNumberOfLeaderChanges
Severity
Warning
Summary
Many leader changes occur in the storage cluster.
Description
{{ $value }} leader changes per minute occur for the {{ $labels.instance }}instance of the {{ $labels.job }} Ceph Monitor.
CephNodeDown
Severity
Critical
Summary
Storage node {{ $labels.node }} went down.
Description
The {{ $labels.node }} storage node is down and requires immediate verification.
CephDataRecoveryTakingTooLong
Severity
Warning
Summary
Data recovery is slow.
Description
Data recovery has been active for more than two hours.
CephPGRepairTakingTooLong
Severity
Warning
Summary
Self-heal issues detected.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 47
Description
The self-heal operations take an excessive amount of time.
CephOSDVersionMismatch
Severity
Warning
Summary
Multiple versions of storage services are running.
Description
{{ $value }} different versions of Ceph OSD components are running.
CephMonVersionMismatch
Severity
Warning
Summary
Multiple versions of storage services are running.
Description
{{ $value }} different versions of Ceph Monitor components are running.
ElasticsearchThis section describes the alerts for the Elasticsearch service.
• ElasticHeapUsageTooHigh• ElasticHeapUsageWarning• ElasticClusterRed• ElasticClusterYellow• NumberOfRelocationShards• NumberOfInitializingShards• NumberOfUnassignedShards• NumberOfPendingTasks• ElasticNoNewDocuments
ElasticHeapUsageTooHigh
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 48
Severity
Critical
Summary
Elasticsearch heap usage is too high (>90%).
Description
Elasticsearch heap usage is over 90% for 5 minutes.
ElasticHeapUsageWarning
Severity
Warning
Summary
Elasticsearch heap usage is high (>80%).
Description
Elasticsearch heap usage is over 80% for 5 minutes.
ElasticClusterRed
Severity
Critical
Summary
Elasticsearch cluster is RED.
Description
The Elasticsearch cluster status is RED.
ElasticClusterYellow
Severity
Warning
Summary
Elasticsearch cluster is YELLOW.
Description
The Elasticsearch cluster status is YELLOW.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 49
NumberOfRelocationShards
Severity
Critical
Summary
Shards relocation takes more than 20 minutes.
Description
Elasticsearch has {{ $value }} relocating shards for 20 minutes.
NumberOfInitializingShards
Severity
Critical
Summary
Shards initialization takes more than 10 minutes.
Description
Elasticsearch has {{ $value }} shards being initialized for 10 minutes.
NumberOfUnassignedShards
Severity
Critical
Summary
Shards have unassigned status for 5 minutes.
Description
Elasticsearch has {{ $value }} unassigned shards for 5 minutes.
NumberOfPendingTasks
Severity
Warning
Summary
Tasks have pending state for 10 minutes.
Description
Elasticsearch has {{ $value }} pending tasks for 10 minutes. The cluster worksslowly.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 50
ElasticNoNewDocuments
Severity
Warning
Summary
Elasticsearch has no new documents for 10 minutes.
Description
Elasticsearch obtains no new documents for 10 minutes.
etcdThis section describes the alerts for the etcd service.
• etcdInsufficientMembers• etcdNoLeader• etcdHighNumberOfLeaderChanges• etcdGRPCRequestsSlow• etcdMemberCommunicationSlow• etcdHighNumberOfFailedProposals• etcdHighFsyncDurations• etcdHighCommitDurations
etcdInsufficientMembers
Severity
Critical
Summary
The etcd cluster has insufficient members.
Description
The {{ $labels.job }} etcd cluster has {{ $value }} insufficient members.
etcdNoLeader
Severity
Critical
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 51
Summary
The etcd cluster has no leader.
Description
The {{ $labels.instance }} member of the {{ $labels.job }} etcd cluster has noleader.
etcdHighNumberOfLeaderChanges
Severity
Warning
Summary
More than 3 leader changes occurred in the the etcd cluster within the last hour.
Description
The {{ $labels.instance }} instance of the {{ $labels.job }} etcd cluster has{{ $value }} leader changes within the last hour.
etcdGRPCRequestsSlow
Severity
Critical
Summary
The etcd cluster has slow gRPC requests.
Description
The gRPC requests to {{ $labels.grpc_method }} take {{ $value }}s on{{ $labels.instance }} instance of the {{ $labels.job }} etcd cluster.
etcdMemberCommunicationSlow
Severity
Warning
Summary
The etcd cluster has slow member communication.
Description
The member communication with {{ $labels.To }} on the {{ $labels.instance }}instance of the {{ $labels.job }} etcd cluster takes {{ $value }}s.
etcdHighNumberOfFailedProposals
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 52
Severity
Warning
Summary
The etcd cluster has more than 5 proposal failures.
Description
The {{ $labels.job }} etcd cluster has {{ $value }} proposal failures on the{{ $labels.instance }} etcd instance within the last hour.
etcdHighFsyncDurations
Severity
Warning
Summary
The etcd cluster has high fync duration.
Description
The duration of 99% of all fync operations on the {{ $labels.instance }} of the{{ $labels.job }} etcd cluster is {{ $value }}s.
etcdHighCommitDurations
Severity
Warning
Summary
The etcd cluster has high commit duration.
Description
The duration of 99% of all commit operations on the {{ $labels.instance }} of the{{ $labels.job }} etcd cluster is {{ $value }}s.
General alertsThis section lists the general available alerts.
• TargetDown• NodeDown• Watchdog
TargetDown
Severity
Critical
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 53
Summary
The {{ $labels.job }} target is down.
Description
The {{ $labels.job }}/{{ $labels.instance }} target is down.
NodeDown
Severity
Critical
Summary
The {{ $labels.node }} node is down.
Description
The {{ $labels.node }} node is down. Kubernetes treats {{ $labels.node }} as notReady and kubelet is not accessible from Prometheus.
Watchdog
Severity
None
Summary
Watchdog alert that is always firing.
Description
This alert ensures that the entire alerting pipeline is functional. This alert shouldalways be firing in Alertmanager against a receiver. Some integrations with variousnotification mechanisms can send a notification when this alert is not firing. Forexample, the DeadMansSnitch integration in PagerDuty.
General node alertsThis section lists the general alerts for Kubernetes nodes.
• SystemCpuFullWarning• SystemLoadTooHighWarning• SystemLoadTooHighCritical• SystemDiskFullWarning• SystemDiskFullMajor• SystemMemoryFullWarning• SystemMemoryFullMajor• SystemDiskInodesFullWarning• SystemDiskInodesFullMajor
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 54
• SystemDiskErrorsTooHigh
SystemCpuFullWarning
Severity
Warning
Summary
High CPU consumption.
Description
The average CPU consumption on the {{ $labels.node }} node is {{ $value }}% for2 minutes.
SystemLoadTooHighWarning
Severity
Warning
Summary
System load is more than 1 per CPU.
Description
The system load per CPU on the {{ $labels.node }} node is {{ $value }} for 5minutes.
SystemLoadTooHighCritical
Severity
Critical
Summary
System load is more than 2 per CPU.
Description
The system load per CPU on the {{ $labels.node }} node is {{ $value }} for 5minutes.
SystemDiskFullWarning
Severity
Warning
Summary
Disk partition {{ $labels.mountpoint }} is 85% full.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 55
Description
The {{ $labels.mountpoint }} partition of the {{ $labels.device }} disk on the{{ $labels.node }} node is {{ $value }}% full for 2 minutes.
SystemDiskFullMajor
Severity
Major
Summary
Disk partition {{ $labels.mountpoint }} is 95% full.
Description
The {{ $labels.mountpoint }} partition of the {{ $labels.device }} disk on the{{ $labels.node }} node is {{ $value }}% full for 2 minutes.
SystemMemoryFullWarning
Severity
Warning
Summary
More than 90% of memory is used or less than 8 GB is available.
Description
The {{ $labels.node }} node consumes {{ $value }}% of memory for 2 minutes.
SystemMemoryFullMajor
Severity
Major
Summary
More than 95% of memory is used or less than 4 GB of memory is available.
Description
The {{ $labels.node }} node consumes {{ $value }}% of memory for 2 minutes.
SystemDiskInodesFullWarning
Severity
Warning
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 56
Summary
The {{ $labels.mountpoint }} volume uses 85% of inodes.
Description
The {{ $labels.device }} disk on the {{ $labels.node }} node consumes{{ $value }}% of disk inodes in the {{ $labels.mountpoint }} volume for 2 minutes.
SystemDiskInodesFullMajor
Severity
Warning
Summary
The {{ $labels.mountpoint }} volume uses 95% of inodes.
Description
The {{ $labels.device }} disk on the {{ $labels.node }} node consumes{{ $value }}% of disk inodes in the {{ $labels.mountpoint }} volume for 2 minutes.
SystemDiskErrorsTooHigh
Severity
Warning
Summary
The {{ $labels.device }} disk is failing.
Description
The {{ $labels.device }} disk on the {{ $labels.node }} node is reporting errors for 5minutes.
IronicThis section describes the alerts for Ironic. The alerted events include Ironic API availability andIronic processes availability.
• IronicMetricsMissing• IronicApiOutage
IronicMetricsMissing
Severity
Critical
Summary
Ironic metrics missing.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 57
Description
Metrics retrieved from the Ironic API are not available for 2 minutes.
IronicApiOutage
Severity
Critical
Summary
Ironic API outage.
Description
The Ironic API is not accessible.
Kubernetes applicationsThis section lists the alerts for Kubernetes applications.
• KubePodCrashLooping• KubePodNotReady• KubeDeploymentGenerationMismatch• KubeDeploymentReplicasMismatch• KubeStatefulSetReplicasMismatch• KubeStatefulSetGenerationMismatch• KubeStatefulSetUpdateNotRolledOut• KubeDaemonSetRolloutStuck• KubeDaemonSetNotScheduled• KubeDaemonSetMisScheduled• KubeCronJobRunning• KubeJobCompletion• KubeJobFailed
KubePodCrashLooping
Severity
Critical
Summary
The {{ $labels.pod }} Pod is restarting.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 58
Description
The {{ $labels.namespace }}/{{ $labels.pod }} Pod ({{ $labels.container }}) isrestarting {{ printf "%.2f" $value }} times per 5 minutes.
KubePodNotReady
Severity
Critical
Summary
The {{ $labels.pod }} Pod is in the non-ready state.
Description
The {{ $labels.namespace }}/{{ $labels.pod }} is in the non-ready state for longerthan an hour.
KubeDeploymentGenerationMismatch
Severity
Critical
Summary
The {{ $labels.deployment }} deployment generation does not match the metadata.
Description
The deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }}does not match the metadata, indicating that the deployment failed but has not beenrolled back.
KubeDeploymentReplicasMismatch
Severity
Critical
Summary
The {{ $labels.deployment }} deployment has a wrong number of replicas.
Description
The {{ $labels.namespace }}/{{ $labels.deployment }} deployment does not matchthe expected number of replicas for longer than one hour.
KubeStatefulSetReplicasMismatch
Severity
Critical
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 59
Summary
The {{ $labels.statefulset }} StatefulSet has a wrong number of replicas.
Description
The {{ $labels.namespace }}/{{ $labels.statefulset }} StatefulSet does not matchthe expected number of replicas for longer than 15 minutes.
KubeStatefulSetGenerationMismatch
Severity
Critical
Summary
The {{ $labels.statefulset }} StatefulSet generation does not match the metadata.
Description
The StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }}does not match the metadata, indicating that the StatefulSet failed but has not beenrolled back.
KubeStatefulSetUpdateNotRolledOut
Severity
Critical
Summary
The {{ $labels.statefulset }} StatefulSet update has not been rolled out.
Description
The {{ $labels.namespace }}/{{ $labels.statefulset }} StatefulSet update has notbeen rolled out.
KubeDaemonSetRolloutStuck
Severity
Critical
Summary
The {{ $labels.daemonset }} DaemonSet is not ready.
Description
Only {{ $value }}% of the desired Pods of the{{ $labels.namespace }}/{{ $labels.daemonset }} DaemonSet are scheduled andready.
KubeDaemonSetNotScheduled
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 60
Severity
Warning
Summary
The {{ $labels.daemonset }} DaemonSet has not scheduled Pods.
Description
The {{ $labels.namespace }}/{{ $labels.daemonset }} DaemonSet has{{ $value }} not scheduled Pods.
KubeDaemonSetMisScheduled
Severity
Warning
Summary
The {{ $labels.daemonset }} DaemonSet has incorrectly scheduled Pods.
Description
The {{ $labels.namespace }}/{{ $labels.daemonset }} has {{ $value }} Podsrunning where they are not supposed to run.
KubeCronJobRunning
Severity
Warning
Summary
The {{ $labels.cronjob }} CronJob is not ready for more than one hour.
Description
The {{ $labels.namespace }}/{{ $labels.cronjob }} CronJob takes more than onehour to complete.
KubeJobCompletion
Severity
Warning
Summary
The {{ $labels.job_name }} job is not ready for more than one hour.
Description
The {{ $labels.namespace }}/{{ $labels.job_name }} job takes more than one hourto complete.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 61
KubeJobFailed
Severity
Warning
Summary
The {{ $labels.job_name }} job failed.
Description
The {{ $labels.namespace }}/{{ $labels.job_name }} job failed to complete.
Kubernetes resourcesThis section lists the alerts for Kubernetes resources.
• KubeCPUOvercommitPods• KubeMemOvercommitPods• KubeCPUOvercommitNamespaces• KubeMemOvercommitNamespaces• KubeQuotaExceeded• CPUThrottlingHigh
KubeCPUOvercommitPods
Severity
Warning
Summary
Cluster has overcommitted CPU requests.
Description
The cluster has overcommitted CPU resource requests for Pods and cannot toleratenode failure.
KubeMemOvercommitPods
Severity
Warning
Summary
Cluster has overcommitted memory requests.
Description
The cluster has overcommitted memory resource requests for Pods and cannottolerate node failure.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 62
KubeCPUOvercommitNamespaces
Severity
Warning
Summary
Cluster has overcommitted CPU requests for namespaces.
Description
The cluster has overcommitted CPU resource requests for namespaces.
KubeMemOvercommitNamespaces
Severity
Warning
Summary
Cluster has overcommitted memory requests for namespaces.
Description
The cluster has overcommitted memory resource requests for namespaces.
KubeQuotaExceeded
Severity
Warning
Summary
The {{ $labels.namespace }} namespace consumes more than 90% of its{{ $labels.resource }} quota.
Description
The {{ $labels.namespace }} namespace consumes {{ printf "%0.0f" $value }}% ofits {{ $labels.resource }} quota.
CPUThrottlingHigh
Severity
Warning
Summary
The {{ $labels.pod_name }} Pod has CPU throttling.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 63
Description
The CPU in the {{ $labels.namespace }} namespace for the{{ $labels.container_name }} container in the {{ $labels.pod_name }} Pod has{{ printf "%0.0f" $value }}% throttling.
Kubernetes storageThis section lists the alerts for Kubernetes storage.
• KubePersistentVolumeUsageCritical• KubePersistentVolumeFullInFourDays• KubePersistentVolumeErrors
KubePersistentVolumeUsageCritical
Severity
Critical
Summary
The {{ $labels.persistentvolumeclaim }} PersistentVolume has less than 3% of freespace.
Description
The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in the{{ $labels.namespace }} namespace is only {{ printf "%0.2f" $value }}% free.
KubePersistentVolumeFullInFourDays
Severity
Critical
Summary
The {{ $labels.persistentvolumeclaim }} PersistentVolume is expected to fill up in 4days.
Description
Based on the recent sampling, the PersistentVolume claimed by{{ $labels.persistentvolumeclaim }} in the {{ $labels.namespace }} namespace isexpected to fill up within four days. Currently, {{ printf "%0.2f" $value }}% of freespace is available.
KubePersistentVolumeErrors
Severity
Critical
Summary
The status of the {{ $labels.persistentvolume }} PersistentVolume is{{ $labels.phase }}.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 64
Description
The status of the {{ $labels.persistentvolume }} PersistentVolume is{{ $labels.phase }}.
Kubernetes systemThis section lists the alerts for the Kubernetes system.
• KubeNodeNotReady• KubeVersionMismatch• KubeClientErrors• KubeletTooManyPods• KubeAPILatencyHighWarning• KubeAPILatencyHighCritical• KubeAPIErrorsHighCritical• KubeAPIErrorsHighWarning• KubeAPIResourceErrorsHighCritical• KubeAPIResourceErrorsHighWarning• KubeClientCertificateExpirationInSevenDays• KubeClientCertificateExpirationInOneDay• ContainerScrapeError
KubeNodeNotReady
Severity
Warning
Summary
The {{ $labels.node }} node is not ready for more than one hour.
Description
The Kubernetes {{ $labels.node }} node is not ready for more than one hour.
KubeVersionMismatch
Severity
Warning
Summary
Kubernetes components have mismatching versions.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 65
Description
Kubernetes has components with {{ $value }} different semantic versions running.
KubeClientErrors
Severity
Warning
Summary
Kubernetes API client has more than 1% of error requests.
Description
The {{ $labels.job }}/{{ $labels.instance }} Kubernetes API server client has{{ printf "%0.0f" $value }}% errors.
KubeletTooManyPods
Severity
Warning
Summary
kubelet reached 90% of Pods limit.
Description
The {{ $labels.instance }}/{{ $labels.node }} kubelet runs {{ $value }} Pods, closeto the limit of 110.
KubeAPILatencyHighWarning
Severity
Warning
Summary
The API server has a 99th percentile latency of more than 1 second.
Description
The API server has a 99th percentile latency of {{ $value }} seconds for{{ $labels.verb }} {{ $labels.resource }}.
KubeAPILatencyHighCritical
Severity
Critical
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 66
Summary
The API server has a 99th percentile latency of more than 4 seconds.
Description
The API server has a 99th percentile latency of {{ $value }} seconds for{{ $labels.verb }} {{ $labels.resource }}.
KubeAPIErrorsHighCritical
Severity
Critical
Summary
API server returns errors for more than 3% of requests.
Description
The API server returns errors for {{ $value }}% of requests.
KubeAPIErrorsHighWarning
Severity
Warning
Summary
API server returns errors for more than 1% of requests.
Description
The API server returns errors for {{ $value }}% of requests.
KubeAPIResourceErrorsHighCritical
Severity
Critical
Summary
API server returns errors for 10% of requests.
Description
The API server returns errors for {{ $value }}% of requests for{{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.
KubeAPIResourceErrorsHighWarning
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 67
Severity
Warning
Summary
API server returns errors for 5% of requests.
Description
The API server returns errors for {{ $value }}% of requests for{{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.
KubeClientCertificateExpirationInSevenDays
Severity
Warning
Summary
An authentication client certificate for the API server expires in less than 7.0 days.
Description
A client certificate used to authenticate to the API server expires in less than 7.0days.
KubeClientCertificateExpirationInOneDay
Severity
Critical
Summary
An authentication client certificate for the API server expires in less than 24.0 hours.
Description
A client certificate used to authenticate to the API server expires in less than 24.0.
ContainerScrapeError
Severity
Warning
Summary
Failure to get Kubernetes container metrics.
Description
Prometheus was not able to scrape metrics from the container on the{{ $labels.node }} Kubernetes node.
MongoDBThis section lists the alerts for the MongoDB service.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 68
• MongodbCursorsOpenTooMany• MongodbCursorTimeouts• MongodbConnectionsTooMany• MongodbMemoryUsageWarning
MongodbCursorsOpenTooMany
Severity
Warning
Summary
MongoDB has a high number of open cursors.
Description
{{ $value }} MongoDB cursors are open for the {{ $labels.instance }} instanceclients.
MongodbCursorTimeouts
Severity
Warning
Summary
MongoDB cursor timeouts.
Description
{{ $value }} MongoDB cursors timed out for the {{ $labels.instance }} instance.
MongodbConnectionsTooMany
Severity
Warning
Summary
Too many connections in MongoDB.
Description
The MongoDB {{ $labels.instance }} instance has {{ $value }} active connections.
MongodbMemoryUsageWarning
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 69
Severity
Warning
Summary
MongoDB high memory consumption.
Description
The MongoDB {{ $labels.instance }} instance virtual memory reached 80% ofmemory available to the container.
Netchecker
WarningThis feature is available starting from the KaaS release version 1.3.0.
This section lists the alerts for the Netchecker service.
• NetCheckerAgentErrors• NetCheckerReportsMissing• NetCheckerTCPServerDelay• NetCheckerDNSSlow
NetCheckerAgentErrors
Severity
Warning
Summary
Netchecker has a high number of errors.
Description
The {{ $labels.agent }} Netchecker agent had {{ $value }} errors within the lasthour.
NetCheckerReportsMissing
Severity
Warning
Summary
The number of agent reports is lower than expected.
Description
The {{ $labels.agent }} Netchecker agent has not reported anything for the last 5minutes.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 70
NetCheckerTCPServerDelay
Severity
Warning
Summary
The TCP connection to Netchecker server takes too much time.
Description
The {{ $labels.agent }} Netchecker agent TCP connection time to the Netcheckerserver has increased by {{ $value }} within the last 5 minutes.
NetCheckerDNSSlow
Severity
Warning
Summary
The DNS lookup time is too high.
Description
The DNS lookup time on the {{ $labels.agent }} Netchecker agent has increased by{{ $value }} within the last 5 minutes.
NGINXThis section lists the alerts for the NGINX service.
• NginxServiceDown• NginxDroppedIncomingConnections
NginxServiceDown
Severity
Minor
Summary
The NGINX service is down.
Description
The NGINX service on the {{ $labels.node }} node is down.
NginxDroppedIncomingConnections
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 71
Severity
Minor
Summary
NGINX drops incoming connections.
Description
NGINX on the {{ $labels.node }} node drops {{ $value }} accepted connections persecond for 5 minutes.
Node networkThis section lists the alerts for a Kubernetes node network.
• SystemRxPacketsErrorTooHigh• SystemTxPacketsErrorTooHigh• SystemRxPacketsDroppedTooHigh• SystemTxPacketsDroppedTooHigh• NodeNetworkInterfaceFlapping
SystemRxPacketsErrorTooHigh
Severity
Warning
Summary
The {{ $labels.node }} has package receive errors.
Description
The {{ $labels.device }} network interface has receive errors on the{{ $labels.namespace }}/{{ $labels.pod }} node exporter.
SystemTxPacketsErrorTooHigh
Severity
Warning
Summary
The {{ $labels.node }} node has package transmit errors.
Description
The {{ $labels.device }} network interface has transmit errors on the{{ $labels.namespace }}/{{ $labels.pod }} node exporter.
SystemRxPacketsDroppedTooHigh
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 72
Severity
Warning
Summary
60 or more received packets were dropped.
Description
{{ $value }} packets received by the {{ $labels.device }} interface on the{{ $labels.node }} node were dropped during the last minute.
SystemTxPacketsDroppedTooHigh
Severity
Warning
Summary
100 transmitted packets were dropped.
Description
{{ $value }} packets transmitted by the {{ $labels.device }} interface on the{{ $labels.node }} node were dropped during the last minute.
NodeNetworkInterfaceFlapping
Severity
Warning
Summary
The {{ $labels.node }} node has flapping interface.
Description
The {{ $labels.device }} network interface often changes its UP status on the{{ $labels.namespace }}/{{ $labels.pod }} node exporter.
Node timeThis section lists the alerts for a Kubernetes node time.
ClockSkewDetected
Severity
Warning
Summary
The NTP offset reached the limit of 0.03 seconds.
Description
Clock skew was detected on the {{ $labels.namespace }}/{{ $labels.pod }} nodeexporter. Verify that NTP is configured correctly on this host.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 73
PrometheusThis section describes the alerts for the Prometheus service.
• PrometheusConfigReloadFailed• PrometheusNotificationQueueRunningFull• PrometheusErrorSendingAlertsWarning• PrometheusErrorSendingAlertsCritical• PrometheusNotConnectedToAlertmanagers• PrometheusTSDBReloadsFailing• PrometheusTSDBCompactionsFailing• PrometheusTSDBWALCorruptions• PrometheusNotIngestingSamples• PrometheusTargetScrapesDuplicate• PrometheusRuleEvaluationsFailed
PrometheusConfigReloadFailed
Severity
Warning
Summary
Failure to reload the Prometheus configuration.
Description
Reloading of the Prometheus configuration failed for{{$labels.namespace}}/{{$labels.pod}}.
PrometheusNotificationQueueRunningFull
Severity
Warning
Summary
Prometheus alert notification queue is running full.
Description
The Prometheus alert notification queue is running full for{{$labels.namespace}}/{{ $labels.pod}}.
PrometheusErrorSendingAlertsWarning
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 74
Severity
Warning
Summary
Errors occur while sending alerts from Prometheus.
Description
1% of errors occur while sending alerts from Prometheus{{$labels.namespace}}/{{ $labels.pod}} to Alertmanager{{$labels.Alertmanager}}.
PrometheusErrorSendingAlertsCritical
Severity
Critical
Summary
Errors occur while sending alerts from Prometheus.
Description
3% of errors occur while sending alerts from Prometheus{{$labels.namespace}}/{{ $labels.pod}} to Alertmanager{{$labels.Alertmanager}}.
PrometheusNotConnectedToAlertmanagers
Severity
Warning
Summary
Prometheus is not connected to Alertmanager.
Description
Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected to anyAlertmanager instance.
PrometheusTSDBReloadsFailing
Severity
Warning
Summary
Prometheus has issues reloading data blocks from disk.
Description
The Prometheus server on the {{$labels.instance}} instance has{{$value | humanize}} reload failures over the last four hours.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 75
PrometheusTSDBCompactionsFailing
Severity
Warning
Summary
Prometheus has issues compacting sample blocks.
Description
The Prometheus server on the {{$labels.instance}} instance has{{$value | humanize}} compaction failures over the last four hours.
PrometheusTSDBWALCorruptions
Severity
Warning
Summary
Prometheus write-ahead log is corrupted.
Description
The Prometheus server on the {{$labels.instance}} instance has a corruptedwrite-ahead log (WAL).
PrometheusNotIngestingSamples
Severity
Warning
Summary
Prometheus does not ingest samples.
Description
Prometheus {{ $labels.namespace }}/{{ $labels.pod}} does not ingest samples.
PrometheusTargetScrapesDuplicate
Severity
Warning
Summary
Prometheus has many rejected samples.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 76
Description
Prometheus {{$labels.namespace}}/{{$labels.pod}} has many rejected samplesbecause of duplicate timestamps but different values.
PrometheusRuleEvaluationsFailed
Severity
Warning
Summary
Prometheus failed to evaluate recording rules.
Description
Prometheus {{$labels.namespace}}/{{$labels.pod}} has failed evaluations forrecording rules. Verify the rules state in the Status/Rules section of the PrometheusWeb UI.
Salesforce notifierThis section lists the alerts for the Salesforce notifier service.
• SfNotifierDown• SfNotifierAuthFailure
SfNotifierDown
Severity
Critical
Summary
The sf-notifier service is down.
Description
The sf-notifier service is down for 2 minutes.
SfNotifierAuthFailure
Severity
Critical
Summary
Failure to authenticate to Salesforce.
Description
The sf-notifier service fails to authenticate to Salesforce for 2 minutes.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 77
SMART disksThis section describes the alerts for SMART disks.
• SystemSMARTDiskUDMACrcErrorsTooHigh• SystemSMARTDiskHealthStatus• SystemSMARTDiskReadErrorRate• SystemSMARTDiskSeekErrorRate• SystemSMARTDiskTemperatureHigh• SystemSMARTDiskReallocatedSectorsCount• SystemSMARTDiskCurrentPendingSectors• SystemSMARTDiskReportedUncorrectableErrors• SystemSMARTDiskOfflineUncorrectableSectors• SystemSMARTDiskEndToEndError
SystemSMARTDiskUDMACrcErrorsTooHigh
Severity
Warning
Summary
The {{ $labels.device }} disk has UDMA CRC errors.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node is reporting SMARTUDMA CRC errors for 5 minutes.
SystemSMARTDiskHealthStatus
Severity
Warning
Summary
The {{ $labels.device }} disk has bad health.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node is reporting a badhealth status for 1 minute.
SystemSMARTDiskReadErrorRate
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 78
Severity
Warning
Summary
The {{ $labels.device }} disk has read errors.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node is reporting anincreased read error rate for 5 minutes.
SystemSMARTDiskSeekErrorRate
Severity
Warning
Summary
The {{ $labels.device }} disk has seek errors.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node is reporting anincreased seek error rate for 5 minutes.
SystemSMARTDiskTemperatureHigh
Severity
Warning
Summary
The {{ $labels.device }} disk temperature is high.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node has a temperature of{{ $value }}C for 5 minutes.
SystemSMARTDiskReallocatedSectorsCount
Severity
Major
Summary
The {{ $labels.device }} disk has reallocated sectors.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node has reallocated{{ $value }} sectors.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 79
SystemSMARTDiskCurrentPendingSectors
Severity
Major
Summary
The {{ $labels.device }} disk has current pending sectors.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node has {{ $value }}current pending sectors.
SystemSMARTDiskReportedUncorrectableErrors
Severity
Major
Summary
The {{ $labels.device }} disk has reported uncorrectable errors.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node has {{ $value }}reported uncorrectable errors.
SystemSMARTDiskOfflineUncorrectableSectors
Severity
Major
Summary
The {{ $labels.device }} disk has offline uncorrectable sectors
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node has {{ $value }}offline uncorrectable sectors.
SystemSMARTDiskEndToEndError
Severity
Major
Summary
The {{ $labels.device }} disk has end-to-end errors.
Description
The {{ $labels.device }} disk on the {{ $labels.host }} node has {{ $value }}end-to-end errors.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 80
SSL certificatesThis section lists the alerts for SSL cetificates.
• SSLCertExpirationWarning• SSLCertExpirationCritical
SSLCertExpirationWarning
Severity
Warning
Summary
SSL certificate expires in 30 days.
Description
The SSL certificate for {{ $labels.instance }} expires in 30 days.
SSLCertExpirationCritical
Severity
Critical
Summary
SSL certificate expires in 10 days.
Description
The SSL certificate for {{ $labels.instance }} expires in 10 days.
Telemeter
WarningThis feature is available starting from the KaaS release 1.8.0.
Caution!
The Telemeter support for the bare metal provider is currently under developement andwill be announced shortly.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 81
This section describes the alerts for the Telemeter service.
• TelemeterClientAuthenticationFailed• TelemeterClientFederationFailed
TelemeterClientAuthenticationFailed
Severity
Warning
Summary
Telemeter client failed to authenticate to the server.
Description
Telemeter client has failed to authenticate to the Telemeter server twice for the last30 minutes. Verify the telemeter-client container logs. Typically, such error occurs incase of incorrect ClusterID or Token set in telemeter-client settings.
TelemeterClientFederationFailed
Severity
Warning
Summary
Telemeter client failed to send data to the server.
Description
Telemeter client has failed to send data to the Telemeter server twice for the last 30minutes. Verify the telemeter-client container logs.
Disable workload monitoring
Caution!
This feature is available starting from the KaaS release 1.6.0.
On the clusters that run large-scale workloads, the workload monitoring generates a big amountof metrics that are resource-consuming. You can disable workload monitoring in the StackLightmetrics and monitor infrastructure only to prevent generation of excessive metrics.The feature is implemented using the metricFilter parameter that enables the cAdvisor(Container Advisor) and kubeStateMetrics metric ingestion filters for Prometheus. The feature isdisabled by default. If enabled, you can select the required namespaces to which the filter willapply.
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 82
To disable workload monitoring on a KaaS child or management cluster:
1. Log in to the KaaS web UI with the writer permissions.2. Select the required namespace.3. In the upper right corner of the KaaS web UI, click the arrow next to your user name to open
the drop-down menu.4. In the drop-down menu, click Download kubeconfig to download kubeconfig of your KaaS
management cluster.5. Log in to any local machine with kubectl installed.6. Copy the downloaded kubeconfig to this machine.7. Run one of the following commands:
• For a KaaS management cluster:
kubectl --kubeconfig <KUBECONFIG_PATH> edit -n <NAMESPACE_NAME> cluster <MANAGEMENT_CLUSTER_NAME>
• For a KaaS child cluster:
kubectl --kubeconfig <KUBECONFIG_PATH> edit -n <NAMESPACE_NAME> cluster <CHILD_CLUSTER_NAME>
8. Edit the opened manifest. For example:
spec: providerSpec: value: helmReleases: - name: stacklight values: metricFilter: enabled: true action: keep namespaces: kube-system: true stacklight: true kaas: true
• enabled - enable or disable metricFilter using true or false• action - action to take by Prometheus:
• keep - keep only metrics from namespaces that are defined in the namespaces list• drop - ignore metrics from namespaces that are defined in the namespaces list
• namespaces - list of namespaces to keep or drop metrics from regardless of theboolean value for every namespace
Mirantis Kubernetes-as-a-Service User Guide version beta
©2020, Mirantis Inc. Page 83