LogFileMetricExporter カスタムリソースを見てみる

はじめに

今年も残り僅かとなりました。OpenShift Advent Calendar 2024 24日目の記事です。年の瀬のお忙しい中お立ち寄りありがとうございます。

LogFileMetricExporter

本記事では、OpenShift Cluster Logging 5.8 から collector から分離された Exporter について見ていきます。製品ドキュメントでは以下のように記載されています。

In logging version 5.8 and newer versions, the LogFileMetricExporter is no longer deployed with the collector by default. You must manually create a LogFileMetricExporter custom resource (CR) to generate metrics from the logs produced by running containers.

If you do not create the LogFileMetricExporter CR, you may see a No datapoints found message in the OpenShift Container Platform web console dashboard for Produced Logs.

デプロイするのであれば、LogFileMetricExporter カスタムリソースを定義する必要があります。このカスタムリソースがない場合は、Web Console のダッシュボードに ProducedLog が表示されないとのことです。

どんな情報がとれるのでしょうか。

Configuring the logging collector - Logging | Observability | OpenShift Container Platform 4.16

収集できるメトリクス

実際にデプロイしてどんなメトリクスを収集できるか見てみみます。特別要件がないため、ドキュメントにあるサンプルをそのまま活用します。

apiVersion: logging.openshift.io/v1alpha1
kind: LogFileMetricExporter
metadata:
  name: instance
  namespace: openshift-logging
spec:
  nodeSelector: {}
  resources:
    limits:
      cpu: 500m
      memory: 256Mi
    requests:
      cpu: 200m
      memory: 128Mi
  tolerations: []

定義すると、logfilesmetricexporter が現れ、各ノードに metricexporter が起動してきました。

NAME                                           READY   STATUS    RESTARTS   AGE
cluster-logging-operator-75bfb4b58c-xclvh      1/1     Running   0          8h
collector-5pcnz                                1/1     Running   0          14m
collector-7mwlt                                1/1     Running   0          14m
collector-hs62g                                1/1     Running   0          14m
collector-j66g8                                1/1     Running   0          14m
collector-mrw6g                                1/1     Running   0          14m
logfilesmetricexporter-8gkfx                   1/1     Running   0          14m
logfilesmetricexporter-hb5hb                   1/1     Running   0          14m
logfilesmetricexporter-hn9sm                   1/1     Running   0          14m
logfilesmetricexporter-qwvqx                   1/1     Running   0          14m
logfilesmetricexporter-v86x5                   1/1     Running   0          14m
logging-loki-compactor-0                       1/1     Running   0          3h37m
logging-loki-distributor-5b4756f5bb-pc8dc      1/1     Running   0          3h37m
logging-loki-gateway-5f67b59847-n55z9          2/2     Running   0          3h37m
logging-loki-gateway-5f67b59847-rxrhx          2/2     Running   0          3h37m
logging-loki-index-gateway-0                   1/1     Running   0          3h37m
logging-loki-ingester-0                        1/1     Running   0          3h37m
logging-loki-querier-d666d57c-x4qjt            1/1     Running   0          3h37m
logging-loki-query-frontend-7c78459847-b4xfw   1/1     Running   0          3h37m
logging-loki-ruler-0                           1/1     Running   0          3h37m
logging-view-plugin-7597965997-rmb9w           1/1     Running   0          5h57m

メトリクスのエンドポイントは、Observe > Targets で確認したところ、https://<pod ip>:2112/metrics でした。

メトリクス情報を収集してみます。Pod に乗り込んでエンドポイントにアクセスします。

$ oc rsh logfilesmetricexporter-8gkfx
sh-5.1#
sh-5.1# curl -sk https://localhost:2112/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 5.6023e-05
go_gc_duration_seconds{quantile="0.25"} 0.000183166
go_gc_duration_seconds{quantile="0.5"} 0.000266248
go_gc_duration_seconds{quantile="0.75"} 0.00034949
go_gc_duration_seconds{quantile="1"} 0.001925796
go_gc_duration_seconds_sum 0.004054525
go_gc_duration_seconds_count 10
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 14
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.22.7 (Red Hat 1.22.7-1.el9_5)"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 4.524896e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.8430024e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 7742
# HELP go_memstats_frees_total Total number of frees.
...

情報が多いので間引くと以下のような感じです。（go に関するメトリクスは省きました）

メトリクス	説明	メトリクスタイプ
log_logged_bytes_total	Total number of bytes written to a single log file path, accounting for rotations	counter
process_cpu_seconds_total	Total user and system CPU time spent in seconds.	counter
process_max_fds	Maximum number of open file descriptors.	gauge
process_open_fds	Number of open file descriptors.	gauge
process_resident_memory_bytes	Resident memory size in bytes.	gauge
process_start_time_seconds	Start time of the process since unix epoch in seconds.	gauge
process_virtual_memory_bytes	Virtual memory size in bytes.	gauge
process_virtual_memory_max_bytes	Maximum amount of virtual memory available in bytes.	gauge
promhttp_metric_handler_requests_in_flight	Current number of scrapes being served.	gauge
promhttp_metric_handler_requests_total	Total number of scrapes by HTTP status code.	counter

色々ありますが、メインのメトリクスは log_logged_bytes_total になりそうです。ローテーションを考慮した単一のログファイルパスに書き込まれた合計バイト数です。これを見ることで、コンテナごとにどのくらいのログを出力したのかを確認できそうです。

ダッシュボードで確認

このメトリクスは、前述したようにWeb Console のダッシュボードですでにグラフ表示が内蔵されています。ダッシュボードを開いてみます。Observe > Dashboard の順に開き、Logging/Collection ダッシュボードを開きます。

ダッシュボードの中程に Produced Logs という項目があります。

Top producing containers と Top producing containers in last 24 hours というグラフがあります。

Top producing containers

グラフのクエリ情報は以下となっており、ログの出力が多い順に 10 個のコンテナの時系列データを眺めることができます。

topk(10, round(rate(log_logged_bytes_total[5m])))

Top producing containers in last 24 hours

topk(10, sum(increase(log_logged_bytes_total[24h])) by (exported_namespace,  podname, containername))

一方で、こちらは24時間以内の増加情報をもとに、ログ出力が多い順に 10 個のコンテナ情報をリストにしています。

DaemonSet の定義をみる

どのような方法で確認しているかを見ていきます。DaemonSet の定義をみます。

$ oc describe ds logfilesmetricexporter
Name:           logfilesmetricexporter
Selector:       component=logfilesmetricexporter,logging-infra=logfilesmetricexporter,provider=openshift
Node-Selector:  kubernetes.io/os=linux
Labels:         app.kubernetes.io/component=logfilesmetricexporter
                app.kubernetes.io/instance=logfilesmetricexporter
                app.kubernetes.io/managed-by=cluster-logging-operator
                app.kubernetes.io/name=lfme-service
                app.kubernetes.io/part-of=cluster-logging
                app.kubernetes.io/version=5.8.0
                component=logfilesmetricexporter
                implementation=
                logging-infra=logfilesmetricexporter
                pod-security.kubernetes.io/enforce=privileged
                provider=openshift
                security.openshift.io/scc.podSecurityLabelSync=false
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 5
Current Number of Nodes Scheduled: 5
Number of Nodes Scheduled with Up-to-date Pods: 5
Number of Nodes Scheduled with Available Pods: 5
Number of Nodes Misscheduled: 0
Pods Status:  5 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/component=logfilesmetricexporter
                    app.kubernetes.io/instance=logfilesmetricexporter
                    app.kubernetes.io/managed-by=cluster-logging-operator
                    app.kubernetes.io/name=lfme-service
                    app.kubernetes.io/part-of=cluster-logging
                    app.kubernetes.io/version=5.8.0
                    component=logfilesmetricexporter
                    implementation=
                    logging-infra=logfilesmetricexporter
                    pod-security.kubernetes.io/enforce=privileged
                    provider=openshift
                    security.openshift.io/scc.podSecurityLabelSync=false
  Annotations:      target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
  Service Account:  logfilesmetricexporter
  Containers:
   logfilesmetricexporter:
    Image:      registry.redhat.io/openshift-logging/log-file-metric-exporter-rhel9@sha256:daa8fa30560a787835f7f3fa6d584555ddf8f10beda023a1d747b1f7ab8e2d36
    Port:       2112/TCP
    Host Port:  0/TCP
    Command:
      /bin/bash
    Args:
      -c
      /usr/local/bin/log-file-metric-exporter -verbosity=2 -dir=/var/log/pods -http=:2112 -keyFile=/etc/logfilemetricexporter/metrics/tls.key -crtFile=/etc/logfilemetricexporter/metrics/tls.crt -tlsMinVersion=VersionTLS12 -cipherSuites=TLS_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256,ECDHE-ECDSA-AES128-GCM-SHA256,ECDHE-RSA-AES128-GCM-SHA256,ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384,ECDHE-ECDSA-CHACHA20-POLY1305,ECDHE-RSA-CHACHA20-POLY1305,DHE-RSA-AES128-GCM-SHA256,DHE-RSA-AES256-GCM-SHA384
    Limits:
      cpu:     500m
      memory:  256Mi
    Requests:
      cpu:        200m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /etc/logfilemetricexporter/metrics from lfme-metrics (ro)
      /var/log/containers from varlogcontainers (ro)
      /var/log/pods from varlogpods (ro)
  Volumes:
   varlogcontainers:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/containers
    HostPathType:
   varlogpods:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/pods
    HostPathType:
   lfme-metrics:
    Type:               Secret (a volume populated by a Secret)
    SecretName:         lfme-secret
    Optional:           false
  Priority Class Name:  system-node-critical
Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  CreateObject      38m   logfilemetricexporter  CreateObject DaemonSet openshift-logging/logfilesmetricexporter
  Normal  SuccessfulCreate  38m   daemonset-controller   Created pod: logfilesmetricexporter-hn9sm
  Normal  SuccessfulCreate  38m   daemonset-controller   Created pod: logfilesmetricexporter-hb5hb
  Normal  SuccessfulCreate  38m   daemonset-controller   Created pod: logfilesmetricexporter-v86x5
  Normal  SuccessfulCreate  38m   daemonset-controller   Created pod: logfilesmetricexporter-qwvqx
  Normal  SuccessfulCreate  38m   daemonset-controller   Created pod: logfilesmetricexporter-8gkfx

HostPath を利用して /var/log/containers や /var/log/pods のディレクトリ配下を確認しているようです。

ソースをみる

ソースコードを探してみます。Red Hat Ecosystem Catalog で log file metric exporter を検索すると以下のコンテナイメージにたどり着きます。

https://catalog.redhat.com/software/containers/openshift-logging/log-file-metric-exporter-rhel9/6447991f3270b32a256c4e2a?gs=&q=log%20file%20metric&container-tabs=technical-information

Technical Information のタブを開くと、ソースコードは https://github.com/ViaQ/log-file-metric-exporter にあることがわかります。

ソースコードで実際に処理をしていそうなところは、watcher.go でした。

https://github.com/ViaQ/log-file-metric-exporter/blob/main/pkg/logwatch/watcher.go#L123

 stat, err := os.Stat(path)
    if err != nil {
        return err
    }
    if stat.IsDir() {
        log.V(3).Info("Ignoring path given it is a directory", "path", path)
        return nil // Ignore directories
    }
    defer w.mutex.Unlock()
    w.mutex.Lock()
    lastSize, size := w.sizes[l], float64(stat.Size())
    log.V(3).Info("Stats", "path", path, "lastSize", lastSize, "size", size)
    w.sizes[l] = size
    var add float64
    if size > lastSize {
        // File has grown, add the difference to the counter.
        add = size - lastSize
    } else if size < lastSize {
        // File truncated, starting over. Add the size.
        add = size
    }

stat 情報を取得し、ファイルサイズを確認。前回より増えていれば増分を追加。減っていたら新たなファイルとして、サイズを追加ということを繰り返しています。この部分がローテーションを考慮したということなのでしょうか。

まとめ

LogFileMetricExporter により分割された Exporter について確認をしました。分割されたのはコンテナが出力するログ量を計測していたメトリクスです。ダッシュボードが定義されていたため、LogFileMetricExpoter リソースを定義しない場合はグラフは表示されませんが、どのような値が出力されるかを確認することができました。

LogFileMetricExporter について不明点がなくなりすっきりしました。これで年を越せるでしょうか。