フラミナル

考え方や調べたことを書き殴ります。IT技術系記事多め

AWS EKSで動かした時にkubeletとの接続失敗を解消する方法

f:id:lirlia:20200721142058p:plain

問題

EKSにてdatadog agentをDamonSetで動かしkube-state-metricsを導入すると以下のようなエラーが発生します。

2020-07-21 04:58:16 UTC | CORE | ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically.", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 820, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 291, in check\n    raise CheckException(\"Unable to detect the kubelet URL automatically.\")\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically.\n"}]
2020-07-21 04:58:19 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:118 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:77) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc'
2020-07-21 04:58:20 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-07-21 04:58:30 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-07-21 04:58:31 UTC | CORE | ERROR | (pkg/collector/python/kubeutil.go:38 in getConnections) | connection to kubelet failed: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-07-21 04:58:31 UTC | CORE | ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically.", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 820, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 291, in check\n    raise CheckException(\"Unable to detect the kubelet URL automatically.\")\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically.\n"}]
2020-07-21 04:58:34 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:118 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:77) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc'
2020-07-21 04:58:40 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-07-21 04:58:40 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:356 in initListenerCandidates) | kubelet listener cannot start, will retry: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-07-21 04:58:46 UTC | CORE | ERROR | (pkg/collector/python/kubeutil.go:38 in getConnections) | connection to kubelet failed: temporary failure in kubeutil, will retry later: try delay not elapsed yet

これはEKS上のkubeletがHTTPS(10250)のみしか待ち受けていないにも関わらず、datadog agentがHTTP通信(10255)を行うために発生するエラーです。

回避策

dataagentのpodのenvの設定にDD_KUBELET_TLS_VERIFY = trueを入れてください。

        env:
  ~ 
        - name: DD_KUBELET_TLS_VERIFY
          value: "true"
  ~

これによりHTTPS通信を行うようになります。