QQ登录

只需一步,快速开始

扫描二维码登录本站

切换到宽版
微信扫一扫 分享朋友圈

已有 345 人浏览分享

50 K8S群集监控:性能分析和告警

[复制链接]
发表于 2022-11-24 16:23:40 | 显示全部楼层 | 阅读模式
性能分析

安装prometheus-stack

首先到 https://github.com/prometheus-operator/kube-prometheus 确定兼容当前kubenetes版本的分支
本例中,适配1.23的分支是0.10
克隆匹配的分支
git clone -b release-0.10 https://github.com/prometheus-operator/kube-prometheus.git

cd kube-prometheus安装Prometheus Operator
kubectl apply --server-side -f manifests/setup

kubectl apply -f manifests/查看注册的CRD
kubectl get crd | grep monitoring
root@node1:~/kube-prometheus# kubectl get crd | grep monitoring
alertmanagerconfigs.monitoring.coreos.com             2022-11-15T01:26:29Z
alertmanagers.monitoring.coreos.com                   2022-11-15T01:26:29Z
podmonitors.monitoring.coreos.com                     2022-11-15T01:26:30Z
probes.monitoring.coreos.com                          2022-11-15T01:26:30Z
prometheuses.monitoring.coreos.com                    2022-11-15T01:26:30Z
prometheusrules.monitoring.coreos.com                 2022-11-15T01:26:30Z
servicemonitors.monitoring.coreos.com                 2022-11-15T01:26:30Z
thanosrulers.monitoring.coreos.com                    2022-11-15T01:26:30Z查看相关的api资源
kubectl api-resources | grep monitoring
root@node1:~/kube-prometheus# kubectl api-resources | grep monitoring
alertmanagerconfigs                            monitoring.coreos.com/v1alpha1         true         AlertmanagerConfig
alertmanagers                                  monitoring.coreos.com/v1               true         Alertmanager
podmonitors                                    monitoring.coreos.com/v1               true         PodMonitor
probes                                         monitoring.coreos.com/v1               true         Probe
prometheuses                                   monitoring.coreos.com/v1               true         Prometheus
prometheusrules                                monitoring.coreos.com/v1               true         PrometheusRule
servicemonitors                                monitoring.coreos.com/v1               true         ServiceMonitor
thanosrulers                                   monitoring.coreos.com/v1               true         ThanosRuler查看容器状态
kubectl get pod -n monitoring
root@node1:~/kube-prometheus# kubectl get pod -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   0          2m22s
alertmanager-main-1                    2/2     Running   0          2m22s
alertmanager-main-2                    2/2     Running   0          2m22s
blackbox-exporter-6b79c4588b-p5czf     3/3     Running   0          3m28s
grafana-7fd69887fb-94dgw               1/1     Running   0          3m27s
kube-state-metrics-55f67795cd-rgwms    3/3     Running   0          3m27s
node-exporter-8qf7s                    2/2     Running   0          3m27s
node-exporter-rv54c                    2/2     Running   0          3m27s
node-exporter-x7tjn                    2/2     Running   0          3m27s
prometheus-adapter-5565cc8d76-bmrpg    1/1     Running   0          3m25s
prometheus-adapter-5565cc8d76-kwlmn    1/1     Running   0          3m25s
prometheus-k8s-0                       2/2     Running   0          2m21s
prometheus-k8s-1                       2/2     Running   0          2m21s
prometheus-operator-6dc9f66cb7-cn9cc   2/2     Running   0          3m25s查看svc
kubectl get svc -n monitoring
root@node1:~/kube-prometheus# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       ClusterIP   10.97.66.233     <none>        9093/TCP,8080/TCP            4m10s
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   3m4s
blackbox-exporter       ClusterIP   10.106.215.152   <none>        9115/TCP,19115/TCP           4m10s
grafana                 ClusterIP   10.101.130.249   <none>        3000/TCP                     4m9s
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            4m9s
node-exporter           ClusterIP   None             <none>        9100/TCP                     4m9s
prometheus-adapter      ClusterIP   10.106.237.229   <none>        443/TCP                      4m7s
prometheus-k8s          ClusterIP   10.104.4.173     <none>        9090/TCP,8080/TCP            4m8s
prometheus-operated     ClusterIP   None             <none>        9090/TCP                     3m3s
prometheus-operator     ClusterIP   None             <none>        8443/TCP                     4m7s将prometheus-k8s的端口映射到nodeport 30110
kubectl patch svc -n monitoring prometheus-k8s  -p '{"spec":{"type": "NodePort"}}'
kubectl patch service prometheus-k8s --namespace=monitoring --type='json' --patch='[{"op": "replace", "path": "/spec/ports/0/nodePort", "value":30110}]'将grafana的端口映射到nodeport 30120
kubectl patch svc -n monitoring grafana  -p '{"spec":{"type": "NodePort"}}'
kubectl patch service grafana --namespace=monitoring --type='json' --patch='[{"op": "replace", "path": "/spec/ports/0/nodePort", "value":30120}]'将alertmanager的端口映射到nodeport 30130
kubectl patch svc -n monitoring alertmanager-main  -p '{"spec":{"type": "NodePort"}}'
kubectl patch service alertmanager-main --namespace=monitoring --type='json' --patch='[{"op": "replace", "path": "/spec/ports/0/nodePort", "value":30130}]'    备注 修改alertmanager副本数(如果不需要高可用性,可以将replicas修改为1)
nano manifests/alertmanager-alertmanager.yaml
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.23.0
  name: main
  namespace: monitoring
spec:
  image: quay.io/prometheus/alertmanager:v0.23.0
  nodeSelector:
    kubernetes.io/os: linux
  podMetadata:
    labels:
      app.kubernetes.io/component: alert-router
      app.kubernetes.io/instance: main
      app.kubernetes.io/name: alertmanager
      app.kubernetes.io/part-of: kube-prometheus
      app.kubernetes.io/version: 0.23.0
  replicas: 3 #如果不需要高可用性此处修改为1
  resources:
    limits:
      cpu: 100m
      memory: 100Mi
    requests:
      cpu: 4m
      memory: 100Mi
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: alertmanager-main
  version: 0.23.0
kubectl apply -f manifests/alertmanager-alertmanager.yaml修改prometheus副本数(如果不需要高可用性,可以将replicas修改为1)
nano manifests/prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.32.1
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: alertmanager-main
      namespace: monitoring
      port: web
  enableFeatures: []
  externalLabels: {}
  image: quay.io/prometheus/prometheus:v2.32.1
  nodeSelector:
    kubernetes.io/os: linux
  podMetadata:
    labels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/instance: k8s
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: kube-prometheus
      app.kubernetes.io/version: 2.32.1
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 2 #如果不需要高可用性此处可以修改为1
  resources:
    requests:
      memory: 400Mi
  ruleNamespaceSelector: {}
  ruleSelector: {}
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: 2.32.1
kubectl apply -f manifests/prometheus-prometheus.yaml修改kubeStateMetrics-deployment.yaml中的映像地址(国内版需要)
nano kubeStateMetrics-deployment.yaml
kubectl apply -f prometheus-prometheus.yaml
- args:
        - --host=127.0.0.1
        - --port=8081
        - --telemetry-host=127.0.0.1
        - --telemetry-port=8082
        image: bitnami/kube-state-metrics:2.3.0       将http://k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0替换为bitnami/kube-state-metrics:2.3.0
   
如果使用lens,在metric配置页面中选择 Prometheus Operator monitoring/prometheus-k8s:9090
展现 dashboard

50 K8S群集监控:性能分析和告警-1.jpg

推荐报表

  • k8s-views-global: 15757
  • k8s-views-namespaces: 15758
  • k8s-views-nodes: 15759
  • k8s-views-pods: 15760


监控模式分析

Metric监控模式

查看servicemonitor对象
kubectl get servicemonitor -n monitoring
root@node1:~/kube-prometheus# kubectl get servicemonitor -n monitoring
NAME                      AGE
alertmanager-main         30m
blackbox-exporter         30m
coredns                   30m
grafana                   30m
kube-apiserver            30m
kube-controller-manager   30m
kube-scheduler            30m
kube-state-metrics        30m
kubelet                   30m
node-exporter             30m
prometheus-adapter        30m
prometheus-k8s            30m
prometheus-operator       30m以kubelet为例,查看通讯
netstat -lntp | grep kubelet
root@node1:~# netstat -lntp | grep kubelet
tcp        0      0 127.0.0.1:40125         0.0.0.0:*               LISTEN      866/kubelet
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      866/kubelet
tcp6       0      0 :::10250                :::*                    LISTEN      866/kubelet尝试访问kubelet metric
curl -s --cacert /var/lib/kubelet/pki/kubelet.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key  https://node1:10250/metrics | tail -l
root@node1:~# curl -s --cacert /var/lib/kubelet/pki/kubelet.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key  https://node1:10250/metrics | tail -l
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="9.999999999999999e-06"} 0
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="9.999999999999999e-05"} 1
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="0.001"} 2
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="0.01"} 2
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="0.1"} 2
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="1"} 2
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="10"} 2
workqueue_work_duration_seconds_bucket{name="DynamicCABundle-client-ca-bundle",le="+Inf"} 2
workqueue_work_duration_seconds_sum{name="DynamicCABundle-client-ca-bundle"} 0.000138202
workqueue_work_duration_seconds_count{name="DynamicCABundle-client-ca-bundle"} 2查看grafana dashboard

50 K8S群集监控:性能分析和告警-2.jpg

Exporter模式

查看node-exporter配置
kubectl get pod -n monitoring | grep node-exporter
root@node1:~# kubectl get pod -n monitoring | grep node-exporter
node-exporter-8qf7s                    2/2     Running   0          2d4h
node-exporter-rv54c                    2/2     Running   0          2d4h
node-exporter-x7tjn                    2/2     Running   0          2d4h查看service monitor
kubectl get servicemonitor -n monitoring node-exporter -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: span class="p">|
  creationTimestamp: "2022-11-15T01:26:36Z"
  generation: 1
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: node-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 1.3.1
  name: node-exporter
  namespace: monitoring
  resourceVersion: "3805"
  uid: 766a747a-2680-4ebc-ae25-dfd27270fb11
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 15s
    port: https
    relabelings:
    - action: replace
      regex: (.*)
      replacement: $1
      sourceLabels:
      - __meta_kubernetes_pod_node_name
      targetLabel: instance
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: app.kubernetes.io/name
  selector:
    matchLabels:
      app.kubernetes.io/component: exporter
      app.kubernetes.io/name: node-exporter
      app.kubernetes.io/part-of: kube-prometheus查看node-exporter servicemonitor对应的服务
kubectl get svc -n monitoring -l app.kubernetes.io/name=node-exporter
root@node1:~# kubectl get svc -n monitoring -l app.kubernetes.io/name=node-exporter
NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
node-exporter   ClusterIP   None         <none>        9100/TCP   4h13m查看node-exporter对应的端点
kubectl get ep node-exporter -n monitoring
NAME            ENDPOINTS                                                  AGE
node-exporter   192.168.1.231:9100,192.168.1.232:9100,192.168.1.233:9100   4h14m查看node  exporter的通信
netstat -lntp | grep 9100
root@node1:~/kube-prometheus# netstat -lntp | grep 9100
tcp        0      0 192.168.1.231:9100      0.0.0.0:*               LISTEN      13110/kube-rbac-pro
tcp        0      0 127.0.0.1:9100          0.0.0.0:*               LISTEN      12800/node_exporter收集当前节点(node1)的metric信息,之所以使用本地回环地址,主要是可以使用http通讯,简化演示效果
curl -s "http://127.0.0.1:9100/metrics" | tail -l
root@node1:~# curl -s "http://127.0.0.1:9100/metrics" | tail -l
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 1386
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0加载 dashboard 14513:Linux Exporter Node查看分析效果

50 K8S群集监控:性能分析和告警-3.jpg

云原生应用监控示例: 监控ETCD

查看etcd的端口
netstat -lntp | grep etcd
root@node1:~# netstat -lntp | grep etcd
tcp        0      0 192.168.1.231:2379      0.0.0.0:*               LISTEN      1937/etcd
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      1937/etcd
tcp        0      0 192.168.1.231:2380      0.0.0.0:*               LISTEN      1937/etcd
tcp        0      0 127.0.0.1:2381          0.0.0.0:*               LISTEN      1937/etcd尝试访问etcd的metric接口
curl 192.168.1.231:2379/metrics
curl https://192.168.1.231:2379/metrics
root@node1:~# curl 192.168.1.231:2379/metrics
curl: (52) Empty reply from server
root@node1:~# curl https://192.168.1.231:2379/metrics
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.       分别看到52和60报错
查看etcd证书信息
grep -E "key-file|cert-file" /etc/kubernetes/manifests/etcd.yaml
root@node1:~# grep -E "key-file|cert-file" /etc/kubernetes/manifests/etcd.yaml
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key再次访问etcd的metric接口
curl -s --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key https://192.168.1.231:2379/metrics -k | tail -1
root@node1:~# curl -s --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key https://192.168.1.231:2379/metrics -k | tail -1
promhttp_metric_handler_requests_total{code<span class="o">="503"} 0创建etcd 服务
nano  etcd-svc.yaml
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    app: etcd-prom
  name: etcd-prom
  namespace: kube-system
subsets:
- addresses:
  - ip: 192.168.1.231 #etcd服务器ip地址
  ports:
  - name: https-metrics
    port: 2379   # etcd端口
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: etcd-prom
  name: etcd-prom
  namespace: kube-system
spec:
  ports:
  - name: https-metrics
    port: 2379
    protocol: TCP
    targetPort: 2379
  type: ClusterIP创建服务
kubectl apply -f etcd-svc.yaml查看etcd服务信息
kubectl get svc -A | grep etcd
root@node1:~# kubectl get svc -A | grep etcd
kube-system   etcd-prom               ClusterIP   10.99.101.218    <none>        2379/TCP                        8s
观察服务的ip地址使用上述etcd服务地址访问metric信息
curl -s --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key https://10.99.101.218:2379/metrics -k | tail -1
root@node1:~# curl -s --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key https://10.99.101.218:2379/metrics -k | tail -1
promhttp_metric_handler_requests_total{code="503"} 0创建secret
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt查看secret
kubectl get secret -n monitoring | grep etcd
root@node1:~# kubectl get secret -n monitoring | grep etcd
etcd-certs                        Opaque                                3      9s修改promethus定义,增加certs信息
KUBE_EDITOR="nano"  kubectl edit prometheus k8s -n monitoring
replicas: 1
  secrets: #加在此处
  - etcd-certs
  resources:查看Prometheus Pod更新过程
kubectl get pod -n monitoring | grep k8s
root@node1:~# kubectl get pod -n monitoring | grep k8s
prometheus-k8s-0                       2/2     Running   0          21s查看etcd证书注入信息
kubectl exec -n  monitoring prometheus-k8s-0  -c prometheus -- ls /etc/prometheus/secrets/etcd-certs
root@node1:~# kubectl exec -n  monitoring prometheus-k8s-0  -c prometheus -- ls /etc/prometheus/secrets/etcd-certs
ca.crt
healthcheck-client.crt
healthcheck-client.key创建service monitor配置
nano servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd
  namespace: monitoring
  labels:
    app: etcd
spec:
  jobLabel: k8s-app
  endpoints:
    - interval: 30s
      port: https-metrics  # 这个port对应 Service.spec.ports.name
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
        certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
        keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
        insecureSkipVerify: true  # 关闭证书校验
  selector:
    matchLabels:
      app: etcd-prom  # 跟Service的lables保持一致
  namespaceSelector:
    matchNames:
    - kube-system
kubectl apply -f servicemonitor.yaml查看servicemonitor是否被正常加载
kubectl get servicemonitor -n monitoring | grep etcd
root@node1:~# kubectl get servicemonitor -n monitoring | grep etcd
etcd                      21s从Prometheus status-->configuration http://node1:30110/config 页面上检查etcd配置项

50 K8S群集监控:性能分析和告警-4.jpg

从Prometheus status-->targets 页面上检查etcd配置项

50 K8S群集监控:性能分析和告警-5.jpg

从Prometheus status-->service discovery 页面上检查etcd配置项

50 K8S群集监控:性能分析和告警-6.jpg

从Prometheus 首页尝试使用etcd相关指标进行查看

50 K8S群集监控:性能分析和告警-7.jpg

加载3070 dashboard在grafana中查看etcd相关数据

50 K8S群集监控:性能分析和告警-8.jpg

非云原生应用监控示例: MySQL-Exporter

创建MySLQ样例
kubectl create deploy mysql --image=registry.cn-beijing.aliyuncs.com/dotbalo/mysql:5.7.23查看pod状态
kubectl get pod
root@node1:~# kubectl get pod
NAME                     READY   STATUS              RESTARTS   AGE
mysql-686695c696-rxhlt   0/1     ContainerCreating   0          11s获取pod log
kubectl logs mysql-686695c696-rxhlt
root@node1:~# kubectl logs mysql-686695c696-rxhlt
error: database is uninitialized and password option is not specified
  You need to specify one of MYSQL_ROOT_PASSWORD, MYSQL_ALLOW_EMPTY_PASSWORD and MYSQL_RANDOM_ROOT_PASSWORD设置MySQL根密码
kubectl set env deploy/mysql MYSQL_ROOT_PASSWORD=mysql再次查看pod状态
kubectl get pod
root@node1:~# kubectl get pod
NAME                    READY   STATUS    RESTARTS   AGE
mysql-d869bcc87-s4gqb   1/1     Running   0          5s创建服务
kubectl expose deploy mysql --port 3306查看服务
kubectl get svc
kubectl get svc -l app=mysql
root@node1:~# kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    206d
mysql        ClusterIP   10.105.85.107   <none>        3306/TCP   9s
root@node1:~# kubectl get svc -l app=mysql
NAME    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
mysql   ClusterIP   10.105.85.107   <none>        3306/TCP   16s设置exporter所需凭据
kubectl get pod
kubectl exec -ti mysql-d869bcc87-s4gqb -- bash
mysql -uroot -pmysql
CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporter' WITH MAX_USER_CONNECTIONS 3;

GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
exit

exit
root@node1:~# kubectl get pod
NAME                    READY   STATUS    RESTARTS   AGE
mysql-d869bcc87-s4gqb   1/1     Running   0          104s
root@node1:~# kubectl exec -ti mysql-d869bcc87-s4gqb -- bash
root@mysql-d869bcc87-s4gqb:/# mysql -uroot -pmysql
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.23 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporter' WITH MAX_USER_CONNECTIONS 3;
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye
root@mysql-d869bcc87-s4gqb:/# exit
exit
command terminated with exit code 127
root@node1:~#创建exporter
nano mysql-exporter.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: mysql-exporter
  template:
    metadata:
      labels:
        k8s-app: mysql-exporter
    spec:
      containers:
      - name: mysql-exporter
        image: registry.cn-beijing.aliyuncs.com/dotbalo/mysqld-exporter
        env:
         - name: DATA_SOURCE_NAME
           value: "exporter:exporter@(mysql.default:3306)/"
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9104
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-exporter
  namespace: monitoring
  labels:
    k8s-app: mysql-exporter
spec:
  type: ClusterIP
  selector:
    k8s-app: mysql-exporter
  ports:
  - name: api
    port: 9104
    protocol: TCP
kubectl apply -f mysql-exporter.yaml验证exporter的运行情况
kubectl get pod -n monitoring | grep mysql

kubectl get svc -n monitoring | grep mysql
root@node1:~# kubectl get pod -n monitoring | grep mysql
mysql-exporter-84b6d8889b-r72vl        1/1     Running   0          56s
root@node1:~#
root@node1:~# kubectl get svc -n monitoring | grep mysql
mysql-exporter          ClusterIP   10.102.106.129   <none>        9104/TCP                        58s查看mysql的metrics数据
curl -s "http://10.102.106.129:9104/metrics" | tail -l
root@node1:~# curl -s "http://10.102.106.129:9104/metrics" | tail -l
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 70
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0创建ServiceMonitor
nano mysql-sm.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mysql-exporter
  namespace: monitoring
  labels:
    k8s-app: mysql-exporter
    namespace: monitoring
spec:
  jobLabel: k8s-app
  endpoints:
  - port: api
    interval: 30s
    scheme: http
  selector:
    matchLabels:
      k8s-app: mysql-exporter
  namespaceSelector:
    matchNames:
    - monitoring
kubectl apply -f mysql-sm.yaml验证ServiceMonitor
kubectl get servicemonitor -n monitoring | grep mysql
root@node1:~# kubectl get servicemonitor -n monitoring | grep mysql
mysql-exporter            16s检查配置 从Prometheus status-->configuration 页面上检查MySQL配置项
从Prometheus status-->targets 页面上检查MySQL配置项
从Prometheus status-->service discovery 页面上检查MySQL配置项
从Prometheus 首页尝试使用MySQL相关指标进行查看
加载 17320 dashboard在grafana dashboard中查看MySQL相关数据  (其他推荐:7362 6239 14057 )

50 K8S群集监控:性能分析和告警-9.jpg

黑盒监控

检查黑blackbox-exporter配置
kubectl get pod -n monitoring -l app.kubernetes.io/name=blackbox-exporter

kubectl get svc -n monitoring -l app.kubernetes.io/name=blackbox-exporter
root@node1:~# kubectl get pod -n monitoring -l app.kubernetes.io/name=blackbox-exporter
NAME                                 READY   STATUS    RESTARTS   AGE
blackbox-exporter-6b79c4588b-p5czf   3/3     Running   0          5h18m
root@node1:~#
root@node1:~# kubectl get svc -n monitoring -l app.kubernetes.io/name=blackbox-exporter
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)              AGE
blackbox-exporter   ClusterIP   10.106.215.152   <none>        9115/TCP,19115/TCP   5h18m使用blackbox监控网站
curl -s "http://10.106.215.152:19115/probe?target=cloudzun.com&module=http_2xx" | tail -l
root@node1:~# curl -s "http://10.106.215.152:19115/probe?target=cloudzun.com&module=http_2xx" | tail -l
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.469229557e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1检查blackbox配置
kubectl get cm blackbox-exporter-configuration   -n  monitoring -o yaml
apiVersion: v1
data:
  config.yml: |-
    "modules":
      "http_2xx":
        "http":
          "preferred_ip_protocol": "ip4"
        "prober": "http"
      "http_post_2xx":
        "http":
          "method": "POST"
          "preferred_ip_protocol": "ip4"
        "prober": "http"
      "irc_banner":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
          "query_response":
          - "send": "NICK prober"
          - "send": "USER prober prober prober :prober"
          - "expect": "PING :([^ ]+)"
            "send": "PONG ${1}"
          - "expect": "^:[^ ]+ 001"
      "pop3s_banner":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
          "query_response":
          - "expect": "^+OK"
          "tls": true
          "tls_config":
            "insecure_skip_verify": false
      "ssh_banner":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
          "query_response":
          - "expect": "^SSH-2.0-"
      "tcp_connect":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
  creationTimestamp: "2022-11-15T01:26:34Z"
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: blackbox-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.19.0
  name: blackbox-exporter-configuration
  namespace: monitoring
  resourceVersion: "3658"
  uid: f89f1661-0e9d-4b7e-b13e-56bf9509780e静态配置

监控web url

创建静态配置
touch prometheus-additional.yaml创建对应的secret并查看
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
kubectl get secret additional-configs   -n monitoring

kubectl describe secret additional-configs -n monitoring
root@node1:~# kubectl get secret additional-configs   -n monitoring
NAME                 TYPE     DATA   AGE
additional-configs   Opaque   1      16s
root@node1:~#
root@node1:~# kubectl describe secret additional-configs -n monitoring
Name:         additional-configs
Namespace:    monitoring
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
prometheus-additional.yaml:  0 bytes修改promethus定义,增加additionalScrapeConfigs信息
KUBE_EDITOR="nano"  kubectl edit prometheus k8s -n monitoring
image: quay.io/prometheus/prometheus:v2.32.1
  additionalScrapeConfigs: #增加配置
    key: prometheus-additional.yaml
    name: additional-configs
    optional: true更新静态配置文件
nano prometheus-additional.yaml
- job_name: 'blackbox'
  metrics_path: /probe
  params:
    module: [http_2xx] # Look for a HTTP 200 response.
  static_configs:
    - targets:
      - http://cloudzun.com # Target to probe with http.
      - https://www.google.com # Target to probe with https.
      - https://chengzhweb1030.azurewebsites.net/
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: blackbox-exporter.monitoring:19115 # The blackbox exporter's real hostname:port.热更新secret文件
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml --dry-run=client -o yaml | kubectl replace -f - -n monitoring检查配置 从Prometheus status-->configuration 页面上检查blackbox配置项
从Prometheus status-->targets 页面上检查blackbox配置项
从Prometheus status-->service discovery 页面上检查blackbox配置项
从Prometheus 首页尝试使用blackbox相关指标进行查看
加载13659 dashboard在grafana中查看blackbox相关数据 其他推荐报表:7587

50 K8S群集监控:性能分析和告警-10.jpg

监控Windows Exporter

下载并安装Windows Exporter
https://github.com/prometheus-community/windows_exporter/releases
查看本地的metrics输出信息
curl -s "http://192.168.1.6:9182/metrics" | tail -l
root@node1:~# curl -s "http://192.168.1.6:9182/metrics" | tail -l
windows_system_system_calls_total 2.17871068e+08
# HELP windows_system_system_up_time System boot time (WMI source: PerfOS_System.SystemUpTime)
# TYPE windows_system_system_up_time gauge
windows_system_system_up_time 1.6683326515004222e+09
# HELP windows_system_threads Current number of threads (WMI source: PerfOS_System.Threads)
# TYPE windows_system_threads gauge
windows_system_threads 6023
# HELP windows_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
# TYPE windows_textfile_scrape_error gauge
windows_textfile_scrape_error 0更新静态配置文件
nano prometheus-additional.yaml
- job_name: 'WindowsServerMonitor'
  static_configs:
    - targets:
      - "192.168.1.6:9182" # 被监控的windows机器的IP
      labels:
        server_type: 'windows'
  relabel_configs:
    - source_labels: [__address__]
      target_label: instance热更新secret文件
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml --dry-run=client -o yaml | kubectl replace -f - -n monitoring检查配置 从Prometheus status-->configuration 页面上检查Windows配置项
从Prometheus status-->targets 页面上检查Windows配置项
从Prometheus status-->service discovery 页面上检查Windows配置项
从Prometheus 首页尝试使用Windows相关指标进行查看
加载10467  dashboard在grafana中查看Windows相关数据15453

50 K8S群集监控:性能分析和告警-11.jpg

告警设置

启用AlertManger

编辑 alertmanager-secret.yaml并更新配置
cd kube-prometheus/manifests/
nano alertmanager-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.23.0
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    "global":
      "resolve_timeout": "5m"
      smtp_from: "simaaofu@163.com"
      smtp_smarthost: "smtp.163.com:465"
      smtp_auth_username: "simaaofu@163.com"
      smtp_auth_password: "SMTPAUTHPASSWORD"
      smtp_require_tls: false
      smtp_hello: "163.com"
    "inhibit_rules":
    - "equal":
      - "namespace"
      - "alertname"
      "source_matchers":
      - "severity = critical"
      "target_matchers":
      - "severity =~ warning|info"
    - "equal":
      - "namespace"
      - "alertname"
      "source_matchers":
      - "severity = warning"
      "target_matchers":
      - "severity = info"
    "receivers":
    - "name": "Default"
      email_configs:
      - to : "simaaofu@163.com"
        send_resolved: true
    - "name": "Watchdog"
      email_configs:
      - to : "chengzunhua@msn.com"
        send_resolved: true
    - "name": "Critical"
      email_configs:
      - to : "chengzunhua@msn.com"
        send_resolved: true
    "route":
      "group_by":
      - "namespace"
      - "job"
      - "alertname"
      "group_interval": "5m"
      "group_wait": "30s"
      "receiver": "Default"
      "repeat_interval": "12h"
      "routes":
      - "matchers":
        - "alertname = Watchdog"
        "receiver": "Watchdog"
      - "matchers":
        - "severity = critical"
        "receiver": "Critical"
type: Opaque
kubectl apply -f  alertmanager-secret.yaml在alertmanager界面查看更新的报警组

50 K8S群集监控:性能分析和告警-12.jpg

在alertmanager status界面查看更新的config

50 K8S群集监控:性能分析和告警-13.jpg

查看prometheusrule
kubectl get prometheusrule -n monitoring查看node-exporter对应的prometheusrule
kubectl get prometheusrule node-exporter-rules -n monitoring -o yaml
- alert: NodeClockNotSynchronising
      annotations:
        description: Clock on {{ $labels.instance }} is not synchronising. Ensure
          NTP is configured on this host.
        runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclocknotsynchronising
        summary: Clock not synchronising.
      expr: |
        min_over_time(node_timex_sync_status[5m]) == 0
        and
        node_timex_maxerror_seconds >= 16
      for: 10m
      labels:
        severity: warning到Prometheus-->rule页面查看 NodeClockNotSynchronising 警告定义信息

50 K8S群集监控:性能分析和告警-14.jpg

到Prometheus-->Alert页面查看报警信息

50 K8S群集监控:性能分析和告警-15.jpg

查看报警邮件:

50 K8S群集监控:性能分析和告警-16.jpg

到Prometheus-->Alert页面查看watchdog报警信息

50 K8S群集监控:性能分析和告警-17.jpg
按照设置的由规则,watchdog的邮件被发送到另一个信箱

50 K8S群集监控:性能分析和告警-18.jpg
设置报警范例: 网站访问延时报警

创建web网站报警配置
nano  web-rule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: blackbox-exporter
    prometheus: k8s
    role: alert-rules
  name: blackbox
  namespace: monitoring
spec:
  groups:
  - name: blackbox-exporter
    rules:
    - alert: DomainAccessDelayExceeds1s
      annotations:
        description:  域名 {{ $labels.instance }} 检测延迟大于1秒, 当前值为 {{ $value }}
        summary: 域名探测,访问延迟超过1秒
      expr: sum(probe_http_duration_seconds{job=~"blackbox"}) by (instance) > 1
      for: 1m
      labels:
        severity: warning
        type: blackbox
kubectl apply -f web-rule.yaml查看新创建的规则
kubectl get PrometheusRule -n monitoring
root@node1:~# kubectl get PrometheusRule -n monitoring
NAME                              AGE
alertmanager-main-rules           6h11m
blackbox                          30s
kube-prometheus-rules             6h11m
kube-state-metrics-rules          6h11m
kubernetes-monitoring-rules       6h11m
node-exporter-rules               6h11m
prometheus-k8s-prometheus-rules   6h11m
prometheus-operator-rules         6h11m到Prometheus-->Alert页面查看报警信息

50 K8S群集监控:性能分析和告警-19.jpg
到邮箱查看告警邮件

50 K8S群集监控:性能分析和告警-20.jpg

实验至此全部成功结束,如需帮助或者更多细节,请随时联系我,谢谢!
您需要登录后才可以回帖 登录 | 注册

本版积分规则

精彩推荐
精彩推荐
热门资讯
热门资讯
网友晒图
网友晒图
图文推荐
图文推荐

Powered by Discuz! X3.4

© 2001-2021 武汉掌媒科技有限公司