Configure Kubernetes Autoscaling with Custom Metrics


Autoscaling is natively supported in Kubernetes. To learn more about autoscaling, see the following documents:

By default, you can automatically scale the number of Kubernetes pods based on the observed CPU utilization. However, in many situations, you want to scale your application based on other monitored metrics, such as the number of incoming requests or the memory consumption. In Kubernetes 1.7, you have the capability to do that by leveraging the Prometheus and Kubernetes aggregator layers.

Kubernetes aggregator layer

Kubernetes 1.7 introduces a new concept called the aggregator layer. The aggregator layer allows you to install additional Kubernetes-style APIs. This means you can register custom API server(s) and register new APIs to the Kubernetes cluster. A detailed explanation of the aggregator layer can be found in the Kubernetes official documentation.


Prometheus is widely used to monitor all the components of a Kubernetes cluster including the control plane, the worker nodes, and the applications running on the cluster.

This guide describes the step-by-step Kubernetes cluster configuration as well as how to set up a Prometheus system to monitor the application and automatic scaling based on a sample custom metric: the number of incoming requests. All sample manifests used in this guide can be found in the Kubeless GitHub repository.

NOTE: The configurations below are only applicable on Kubernetes version v1.7 or later.

Assumptions and prerequisites

This guide makes the following assumptions: * You have a Docker environment running. * You have a Kubernetes cluster running. * You have the kubectl command line (kubectl CLI) installed.

The following are the steps you will complete in this guide:

  • Step 1: Configure the Kubernetes cluster to enable the aggregator layer and autoscaling API group.
  • Step 2: Deploy a Prometheus monitoring system.
  • Step 3: Deploy a custom API server and register it to the aggregator layer.
  • Step 4: Deploy a sample application and test the autoscaling.

Step 1: Cluster configuration

Before getting started, ensure that the Kubernetes control plane is configured to run autoscaling with custom metrics. As of Kubernetes 1.7, this requires enabling the aggregator layer on the API server and configuring the controller manager to use metric APIs via their REST clients. This guide runs on the Docker-in-Docker Kubernetes cluster provisioned by Kubeadm.

Starting the cluster via kubeadm

  $ wget
  $ chmod +x
  $ ./ up

Checking the status of the cluster

Once your cluster have started, check the status of the containers and the cluster. To do so, run the commands below:

  $ docker ps
  $ kubectl cluster-info

Cluster configuration

So the cluster is up and running, we can now configure the Kubernetes control plane to enable the aggregator layer by modifying the below manifests in /etc/kubernetes/manifests folder on the master server. Then kubeadm will update the control plane immediately.

The following configurations must be set:

  • Enable and configure the aggregator layer in the kube-apiserver.yaml manifest file as follows:

    --requestheader-client-ca-file=<path to aggregator CA cert>
    --proxy-client-cert-file=<path to aggregator proxy cert>
    --proxy-client-key-file=<path to aggregator proxy key>
  • Enable the autoscaling/v2alpha1 API group to support additional metrics for autoscaling in the kube-apiserver.yaml manifest file:

  • Configure the HPA controller to consume metrics via REST clients and configure the following settings in the kube-controller-manager.yaml manifest file:

  • Once the kube-apiserver is configured and running, kubectl will auto-discover all enabled API groups. To check that the autoscaling/v2alpha1 API group is enabled, use the following command:

    $ kubectl api-versions

    This is the output you should see. Note that the autoscaling/v2alpha1 must appear mongst the other API groups:

The Kubernetes cluster is now ready to register additional API servers and autoscale with custom metrics. In the next step, we deploy a Prometheus system and register a custom Prometheus-based API server.

Step 2: Deploy a Prometheus monitoring system

The Prometheus setup contains a CoreOS Prometheus operator and a Prometheus instance. We will deploy both of them using the commands below:

$ kubectl create -f prometheus-operator.yaml
$ kubectl create -f sample-prometheus-instance.yaml

Check the cluster services in order to make sure that Prometheus has been successfully deployed:

$ kubectl get svc

NAME                  CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes         <none>        443/TCP          6d
prometheus-operated   None            <none>        9090/TCP         1h

Step 3: Deploy a custom API server

The custom API server that we deploy provides the API group/version and allows the HPA controller query custom metrics from that. The custom API server we are using here is a Prometheus adapter which can collect metrics from Prometheus and send them to the HPA controller via REST queries (that's why we previously configured the HPA controller to use REST client via the --horizontal-pod-autoscaler-use-rest-clients flag):

$ kubectl create -f custom-metrics.yaml

namespace "custom-metrics" created
serviceaccount "custom-metrics-apiserver" created
clusterrolebinding "custom-metrics:system:auth-delegator" created
rolebinding "custom-metrics-auth-reader" created
clusterrole "custom-metrics-read" created
clusterrolebinding "custom-metrics-read" created
deployment "custom-metrics-apiserver" created
service "api" created
apiservice "" created
clusterrole "custom-metrics-server-resources" created
clusterrolebinding "hpa-controller-custom-metrics" created

In this step, we deploy and register the custom API server to the aggregator layer, so we can see it listed in the enabled api-versions:

$ kubectl api-versions

Now, the custom API server is running:

$ kubectl get po -n custom-metrics

NAME                                        READY     STATUS    RESTARTS   AGE
custom-metrics-apiserver-2956926076-wcgmw   1/1       Running   0          1h

$ kubectl get --raw /apis/


Step 4: Deploy a sample application

Now we can deploy a sample application and a sample HPA rule to autoscale with http_requests metric collected and exposed via Prometheus. The HPA rule allows us to scale the application pods between 2 and 10 replicas, and all pods serve a total of 100 requests per second.

$ cat sample-metrics-app.yaml
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2alpha1
  name: sample-metrics-app-hpa
        kind: Deployment
        name: sample-metrics-app
    minReplicas: 2
    maxReplicas: 10
    - type: Object
          kind: Service
          name: sample-metrics-app
        metricName: http_request
        targetValue: 100

Apply the recently created HPA rule as shown below:

$ kubectl create -f sample-metrics-app.yaml

deployment "sample-metrics-app" created
service "sample-metrics-app" created
servicemonitor "sample-metrics-app" created
horizontalpodautoscaler "sample-metrics-app-hpa" created

$ kubectl get hpa

NAME                     REFERENCE                       TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
sample-metrics-app-hpa   Deployment/sample-metrics-app   866m / 100   2         10        2          1h

To see the HPA controller scale up the number of application pods, increase some loads by hitting the sample application service.


In this guide we covered the steps for setting up a basic monitoring system based on Prometheus and a custom API server to extend the Kubernetes autoscaling feature with custom metrics. We also gave you an example of how to scale your application with the incoming traffic metric. However, it depends on the application to choose the appropriate metric to be calculated for autoscaling. You must truly understand which part of the application causes the high-load situation and configure the proper scale policy to allow the application survive during peak times.

To learn more about the topics discussed in this guide, see the following links: