問題
EKSにてdatadog agentをDamonSetで動かしkube-state-metrics
を導入すると以下のようなエラーが発生します。
2020-07-21 04:58:16 UTC | CORE | ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically.", "traceback": "Traceback (most recent call last):\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 820, in run\n self.check(instance)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 291, in check\n raise CheckException(\"Unable to detect the kubelet URL automatically.\")\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically.\n"}] 2020-07-21 04:58:19 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:118 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:77) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc' 2020-07-21 04:58:20 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet 2020-07-21 04:58:30 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet 2020-07-21 04:58:31 UTC | CORE | ERROR | (pkg/collector/python/kubeutil.go:38 in getConnections) | connection to kubelet failed: temporary failure in kubeutil, will retry later: try delay not elapsed yet 2020-07-21 04:58:31 UTC | CORE | ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically.", "traceback": "Traceback (most recent call last):\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 820, in run\n self.check(instance)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 291, in check\n raise CheckException(\"Unable to detect the kubelet URL automatically.\")\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically.\n"}] 2020-07-21 04:58:34 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:118 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:77) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc' 2020-07-21 04:58:40 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet 2020-07-21 04:58:40 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:356 in initListenerCandidates) | kubelet listener cannot start, will retry: temporary failure in kubeutil, will retry later: try delay not elapsed yet 2020-07-21 04:58:46 UTC | CORE | ERROR | (pkg/collector/python/kubeutil.go:38 in getConnections) | connection to kubelet failed: temporary failure in kubeutil, will retry later: try delay not elapsed yet
これはEKS上のkubeletがHTTPS(10250)のみしか待ち受けていないにも関わらず、datadog agentがHTTP通信(10255)を行うために発生するエラーです。
回避策
dataagentのpodのenvの設定にDD_KUBELET_TLS_VERIFY = true
を入れてください。
env: ~ - name: DD_KUBELET_TLS_VERIFY value: "true" ~
これによりHTTPS通信を行うようになります。