Prometheus 安装

    1. 替换国内docker源加速

    2. 关闭主机防火墙

      1. systemctl stop firewalld.service
    1. 拉取标准容器镜像,如果timeout多试几次

      1. docker pull prom/prometheus
      2. docker pull prom/alertmanager
      3. docker pull consul
    2. 准备部署测试机

      1. OS:CentOS-7.5.1804
      2. host1:127.0.0.1
      3. host2:127.0.0.2

    结构图如下:

    1. run consul
      实例运行:

      1. docker volume create consul-data
      2. docker run --name consul01 --volume consul-data:/consul/data -d -p 8300:8300 -p 8400:8400 -p 8500:8500 -p 8600:8600 consul
    2. run alertmanager

      配置文件中的url是monitor中的告警回调的接口,先把告警发给monitor,再由monitor来进一步处理展示在web上和关联配置好的接收人进行发送

      实例运行:

      1. docker volume create alertmanager-data
      2. docker run --name alertmanager01 --volume alertmanager-data:/alertmanager --volume /app/docker/alertmanager:/etc/alertmanager -d -p 9093:9093 -p 9094:9094 prom/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --web.listen-address=":9093" --cluster.listen-address=":9094"
    3. run prometheus

      配置文件/app/docker/prometheus/prometheus.yml
      example:

      1. # my global config
      2. global:
      3. scrape_interval: 10s
      4. evaluation_interval: 10s
      5. # Alertmanager configuration
      6. alerting:
      7. alertmanagers:
      8. - static_configs:
      9. - 127.0.0.1:9093
      10. # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
      11. rule_files:
      12. - /etc/prometheus/rules/*.yml
      13. # - "first_rules.yml"
      14. # - "second_rules.yml"
      15. # A scrape configuration containing exactly one endpoint to scrape:
      16. # Here it's Prometheus itself.
      17. scrape_configs:
      18. # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      19. - job_name: 'prometheus'
      20. # metrics_path defaults to '/metrics'
      21. # scheme defaults to 'http'.
      22. static_configs:
      23. - targets: ['127.0.0.1:9090']
      24. - job_name: 'consul'
      25. - server: 127.0.0.1:8500
      26. scheme: http
      27. services: []

      配置文件说明:
      global -> scrape_interval 默认采集间隔
      alerting -> targets altermanager的地址
      rule_files 告警配置规则文件路径
      scrape_configs -> job:consul 从consul中获取采集的对象信息

      实例运行:

      1. docker volume create prometheus-tsdb
      2. docker run --name prometheus01 --volume prometheus-tsdb:/prometheus --volume /app/docker/prometheus:/etc/prometheus -d -p 9090:9090 prom/prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle

      热加载配置接口:

      1. curl -X POST http://127.0.0.1:9090/-/reload
    4. 注册exporter

      1. curl -X PUT -d '{"id": "node31","name": "node31","address": "127.0.0.1","port": 9100,"tags": ["host"],"checks": [{"http": "http://127.0.0.1:9100/","interval": "10s"}]}' http://127.0.0.1:8500/v1/agent/service/register

    deploy_as_1

    1. alive_check可以部署在host2上去检测prometheus01的状态
    2. if prometheus01 down
    3. 检测host1的状态
    4. if host1 up
    5. 尝试一定次数去把prometheus01拉起,如果恢复了->return
    6. 修改prometheus02配置并reload配置去启用备节点
    1. host1 & host2 run consul

      1. docker run --name consul01 -d -p 8300:8300 -p 8400:8400 -p 8500:8500 -p 8600:8600 consul
    2. host1 run alertmanager

      1. docker volume create alertmanager-data
      2. docker run --name alertmanager01 --volume alertmanager-data:/alertmanager --volume /app/docker/alertmanager:/etc/alertmanager -d -p 9093:9093 -p 9094:9094 prom/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --web.listen-address=":9093" --cluster.listen-address=":9094"
    3. host2 run alertmanager

      在alertmanager.yml里配置不同的group_wait,为了防止极端情况下备节点告警时还没收到主节点的相关告警信息,让备节点等待一点时间

      1. docker volume create alertmanager-data
      2. docker run --name alertmanager02 --volume alertmanager-data:/alertmanager --volume /app/docker/alertmanager:/etc/alertmanager -d -p 9093:9093 -p 9094:9094 prom/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --web.listen-address=":9093" --cluster.listen-address=":9094" --cluster.peer="127.0.0.1:9094"
    4. host1 run prometheus

      1. docker volume create prometheus-tsdb
      2. docker run --name prometheus01 --volume prometheus-tsdb:/prometheus --volume /app/docker/prometheus:/etc/prometheus -d -p 9090:9090 prom/prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle
    5. host2 run prometheus 修改prometheus.yml,把01里的consul scrape改成去拉01节点prometheus的数据targets:’127.0.0.1:9090’,同步01的数据