Collecting Metrics

Yatai supports the use of Prometheus to collect metrics for BentoDeployment

Note

This documentation is just for BentoDeployment metrics, not for Yatai itself.

Prerequisites

  • yatai-deployment

Because the metrics collected are related to BentotDeployment, it relies on yatai-deployment

  • Kubernetes

    Kubernetes cluster with version 1.20 or newer

    Note

    If you do not have a production Kubernetes cluster and want to install yatai for development and testing purposes. You can use minikube to set up a local Kubernetes cluster for testing.

  • Dynamic Volume Provisioning

    As Prometheus requires metrics storage, you need to enable dynamic volume provisioning in your Kubernetes cluster. For more detailed information, please refer to Dynamic Volume Provisioning.

  • Helm

    We use Helm to install Prometheus Stack.

Quick setup

Note

This quick setup script can only be used for development and testing purposes

This script will automatically install the following dependencies inside the yatai-monitoring namespace of the Kubernetes cluster:

  • Prometheus Operator

  • Prometheus

  • Grafana

  • Alertmanager

bash <(curl -s "https://raw.githubusercontent.com/bentoml/yatai/main/scripts/quick-setup-yatai-monitoring.sh")

Setup steps

1. Install Prometheus Stack

1. Create a namespace for Prometheus Stack

kubectl create ns yatai-monitoring

2. Install prometheus-operator

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update prometheus-community

cat <<EOF | helm install prometheus prometheus-community/kube-prometheus-stack -n yatai-monitoring -f -
grafana:
  enabled: false
  forceDeployDatasources: true
  forceDeployDashboards: true
EOF

3. Verify that Prometheus is running

kubectl -n yatai-monitoring get pod -l release=prometheus

The output of the command above should look something like this:

NAME                                                   READY   STATUS    RESTARTS   AGE
prometheus-kube-prometheus-operator-6f5c99cd68-6kshn   1/1     Running   0          21h
prometheus-kube-state-metrics-668449846c-tm2nb         1/1     Running   0          21h
prometheus-prometheus-node-exporter-ljlxk              1/1     Running   0          20h
prometheus-prometheus-node-exporter-fnxs2              1/1     Running   0          20h
prometheus-prometheus-node-exporter-gqq8c              1/1     Running   0          20h

4. Verify that the CRDs of prometheus-operator has been established

kubectl wait --for condition=established --timeout=120s crd/prometheuses.monitoring.coreos.com
kubectl wait --for condition=established --timeout=120s crd/servicemonitors.monitoring.coreos.com

The output of the command above should look something like this:

customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com condition met

5. Verify that the Prometheus service is running

kubectl -n yatai-monitoring get pod -l app.kubernetes.io/instance=prometheus-kube-prometheus-prometheus

The output of the command above should look something like this:

NAME                                                 READY   STATUS    RESTARTS   AGE
prometheus-prometheus-kube-prometheus-prometheus-0   2/2     Running   0          15m

6. Verify that the Alertmanager service is running

kubectl -n yatai-monitoring get pod -l app.kubernetes.io/instance=prometheus-kube-prometheus-alertmanager

The output of the command above should look something like this:

NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          18m

7. Install Grafana

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update grafana

cat <<EOF | helm install grafana grafana/grafana -n yatai-monitoring -f -
adminUser: admin
adminPassword: $(LC_ALL=C tr -dc 'A-Za-z0-9' < /dev/urandom | head -c 20)
persistence:
  enabled: true
sidecar:
  dashboards:
    enabled: true
  datasources:
    enabled: true
  notifiers:
    enabled: true
EOF

8. Verify that the Grafana service is running

kubectl -n yatai-monitoring get pod -l app.kubernetes.io/name=grafana

The output of the command above should look something like this:

NAME                       READY   STATUS    RESTARTS   AGE
grafana-796c6947b7-r7gr4   3/3     Running   0          3m40s

9. Visit the Prometheus web UI

You can create an ingress for prometheus-kube-prometheus-prometheus service or port-forward the service to :9090:

kubectl -n yatai-monitoring port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 --address 0.0.0.0

Then visit the Prometheus web UI via http://localhost:9090

Prometheus web UI

10. Visit the Grafana web UI

You can create an ingress for prometheus-grafana service or port-forward the service to :8888:

kubectl -n yatai-monitoring port-forward svc/grafana 8888:80 --address 0.0.0.0

Then visit the Grafana web UI via http://localhost:8888

Note

Use the following command to get the Grafana username:

kubectl -n yatai-monitoring get secret grafana -o jsonpath='{.data.admin-user}' | base64 -d

Use the following command to get the Grafana password:

kubectl -n yatai-monitoring get secret grafana -o jsonpath='{.data.admin-password}' | base64 -d
Grafana web UI

2. Collect BentoDeployment metrics

1. Create PodMonitor for BentoDeployment

kubectl apply -f https://raw.githubusercontent.com/bentoml/yatai/main/scripts/monitoring/bentodeployment-podmonitor.yaml

After some time you can see in the service discovery page in the Prometheus web UI that the bento deployment has been discovered:

Prometheus service discovery header menu Prometheus service discovery

Now you can auto-complete to BentoML’s metrics in the prometheus expression input box:

Prometheus metrics auto complete Prometheus BentoML metrics

3. Create Grafana Dashboard for BentoDeployment

1. Download the BentoDeployment Grafana dashboard json file

curl -L https://raw.githubusercontent.com/bentoml/yatai/main/scripts/monitoring/bentodeployment-dashboard.json -o /tmp/bentodeployment-dashboard.json
curl -L https://raw.githubusercontent.com/bentoml/yatai/main/scripts/monitoring/bentofunction-dashboard.json -o /tmp/bentofunction-dashboard.json

2. Create Grafana dashboard configmap

kubectl -n yatai-monitoring create configmap bentodeployment-dashboard --from-file=/tmp/bentodeployment-dashboard.json
kubectl -n yatai-monitoring label configmap bentodeployment-dashboard grafana_dashboard=1

kubectl -n yatai-monitoring create configmap bentofunction-dashboard --from-file=/tmp/bentofunction-dashboard.json
kubectl -n yatai-monitoring label configmap bentofunction-dashboard grafana_dashboard=1

3. Go to the Grafana web UI to check out the BentoDeployment dashboard

Note

Wait a few minutes for the Grafana process to automatically reload the configuration

Grafana BentoDeployment dashboard