一、环境

1.系统:centos7
2.kubernetes version:1.6.2
3.网络使用 calico
4. 安装prometheus-operator 0.11.3版本

二、问题

简单描述:安装完prometheus-operator 后,所有容器已经启动,但是ping不通grafana。不能正常访问grafana
断定问题: calicoctl get workloadendpoint没有grafana的集群IP(172.16.63.172)信息

1.k8s状态

[root@docker225 ~]# kubectl get pods -o wide -n monitoring
NAME                                  READY     STATUS    RESTARTS   AGE       IP               NODE
alertmanager-main-0                   2/2       Running   0          1y        172.16.63.164    192.168.14.225
alertmanager-main-1                   2/2       Running   0          16h       172.16.63.173    192.168.14.225
alertmanager-main-2                   2/2       Running   0          16h       172.16.63.174    192.168.14.225
grafana-1482444427-c75ts              2/2       Running   0          1y        172.16.63.172    192.168.14.225
kube-state-metrics-1587421894-hs01m   1/1       Running   0          1y        172.16.63.165    192.168.14.225
node-exporter-zbf01                   1/1       Running   0          1y        192.168.14.225   192.168.14.225
prometheus-k8s-0                      2/2       Running   0          1y        172.16.63.163    192.168.14.225
prometheus-k8s-1                      2/2       Running   0          16h       172.16.63.175    192.168.14.225
prometheus-operator-277826456-q7t8h   1/1       Running   2          1y        172.16.63.166    192.168.14.225

查看services

[root@docker225 ~]# kubectl get svc -n monitoring
NAME                    CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       10.254.197.166   <nodes>       9093:30903/TCP      1y
alertmanager-operated   None             <none>        9093/TCP,6783/TCP   1y
grafana                 10.254.213.82    <nodes>       3000:30902/TCP      1y
kube-state-metrics      10.254.59.234    <none>        8080/TCP            1y
node-exporter           None             <none>        9100/TCP            1y
prometheus-k8s          10.254.219.93    <nodes>       9090:30900/TCP      1y
prometheus-operated     None             <none>        9090/TCP            1y
prometheus-operator     10.254.191.143   <none>        8080/TCP            1y

查看ip

grafana-1482444427-c75ts 2/2 Running 0 1y 172.16.63.172 192.168.14.225
grafana 10.254.213.82 3000:30902/TCP 1y

问题:
访问http://192.168.14.225:30902失败,因为172.16.63.172 访问不通

  • 在master节点ping 172.16.63.172
      [root@docker225 ~]# ping 172.16.63.172 
      connect: Network is unreachable
      kubectl get pods -o wide --all-namespaces
    
  • 主要断定错误 calicoctl get workloadendpoint
    下面没有grafana-1482444427-c75ts相关信息

      [root@docker225 ~]# calicoctl get workloadendpoint
      NODE        ORCHESTRATOR   WORKLOAD                                            NAME   
      docker225   k8s            app-test1.app-test4-service1-4-4-3832218445-jm9vz   eth0   
      docker225   k8s            default.prometheus-operator-277826456-txw91         eth0   
      docker225   k8s            kube-system.heapster-v1.3.0-2713982671-0n9bm        eth0   
      docker225   k8s            kube-system.kube-dns-3412393464-2hrk2               eth0   
      docker225   k8s            kube-system.monitoring-influxdb-grafana-v4-dz9wj    eth0   
      docker225   k8s            monitoring.alertmanager-main-0                      eth0   
      docker225   k8s            monitoring.alertmanager-main-1                      eth0   
      docker225   k8s            monitoring.alertmanager-main-2                      eth0   
      docker225   k8s            monitoring.kube-state-metrics-1587421894-hs01m      eth0   
      docker225   k8s            monitoring.prometheus-k8s-0                         eth0   
      docker225   k8s            monitoring.prometheus-k8s-1                         eth0   
      docker225   k8s            monitoring.prometheus-operator-277826456-q7t8h      eth0   
    
  • 在容器内没有ifconfig命令
    kubectl exec -it grafana-1482444427-c75ts -n monitoring sh登录

       [root@docker225 kube-prometheus]# kubectl exec -it grafana-1482444427-c75ts -n monitoring sh
      	    Defaulting container name to grafana.
      	    Use 'kubectl describe pod/grafana-1482444427-c75ts' to see all of the containers in this pod.
      	    # ifconfig
      	    sh: 1: ifconfig: not found
    
  • 参考查看ip route,没有172.16.63.172相关信息
     [root@docker225 ~]# ip route
     default via 192.168.14.1 dev eno16777736  proto static  metric 100 
     blackhole 172.16.63.128/26  proto bird 
     172.16.63.163 dev calid75abf4f5e0  scope link 
     172.16.63.164 dev cali11239f98883  scope link 
     172.16.63.165 dev cali6023fc95ae3  scope link 
     172.16.63.166 dev cali7071edc8f29  scope link 
     172.16.63.167 dev cali4b6e0fbac0b  scope link 
     172.16.63.168 dev caliaa714b4436c  scope link 
     172.16.63.169 dev cali9e4b965a8fa  scope link 
     172.16.63.170 dev calicdafc5613c9  scope link 
     172.16.63.171 dev cali95761c45089  scope link 
     172.16.63.173 dev cali8bac6c0ff3f  scope link 
     172.16.63.174 dev cali15c7619fccc  scope link 
     172.16.63.175 dev cali1abad8afd57  scope link 
     172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 
     192.168.14.0/24 dev eno16777736  proto kernel  scope link  src 192.168.14.225  metric 100 
     [root@docker225 ~]# 
    

参考

Kubernetes主机间curl cluster ip时不通

三、解决方法

删除原来的grafana之后重装即可。推断是k8s安装时出错

cd prometheus-operator-0.11.3/contrib/kube-prometheus
#删除相关k8s内容
kubectl -n monitoring delete -f manifests/grafana/
#重新应用
kubectl -n monitoring apply -f manifests/grafana/

查看pod的网络,monitoring.grafana-1482444427-wp5nz为对应的k8s名称

[root@docker225 kube-prometheus]# calicoctl get workloadendpoint --workload=monitoring.grafana-1482444427-wp5nz
NODE        ORCHESTRATOR   WORKLOAD                              NAME   
docker225   k8s            monitoring.grafana-1482444427-wp5nz   eth0  


[root@docker225 kube-prometheus]# calicoctl get workloadendpoint
NODE        ORCHESTRATOR   WORKLOAD                                            NAME   
docker225   k8s            app-test1.app-test4-service1-4-4-3832218445-jm9vz   eth0   
docker225   k8s            default.prometheus-operator-277826456-txw91         eth0   
docker225   k8s            kube-system.heapster-v1.3.0-2713982671-0n9bm        eth0   
docker225   k8s            kube-system.kube-dns-3412393464-2hrk2               eth0   
docker225   k8s            kube-system.monitoring-influxdb-grafana-v4-dz9wj    eth0   
docker225   k8s            monitoring.alertmanager-main-0                      eth0   
docker225   k8s            monitoring.alertmanager-main-1                      eth0   
docker225   k8s            monitoring.alertmanager-main-2                      eth0   
docker225   k8s            monitoring.grafana-1482444427-wp5nz                 eth0   
docker225   k8s            monitoring.kube-state-metrics-1587421894-hs01m      eth0   
docker225   k8s            monitoring.prometheus-k8s-0                         eth0   
docker225   k8s            monitoring.prometheus-k8s-1                         eth0   
docker225   k8s            monitoring.prometheus-operator-277826456-q7t8h      eth0  

访问到grafana

四、用到的命令

查看指定空间的svc
kubectl get svc -n monitoring

如果没有加空间则查看默认空间default下的信息
kubectl get svc

查看所有空间
kubectl get svc –all-namespaces

查看pod信息
kubectl get pod[s] -n monitoring

查看pod错误
kubectl [-n monitoring] describe pod [pod_name没有指定pod_name则查看空间下所有]
kubectl describe pod [-n monitoring]

转自:http://www.manongjc.com/article/54045.html