这个例子主要说明, 如何执行一个批处理任务,并且在没有执行成功时报警

    执行批量作业的代码:

    1. import io.prometheus.client.CollectorRegistry;
    2. import io.prometheus.client.Gauge;
    3. import io.prometheus.client.exporter.PushGateway;
    4. CollectorRegistry registry = new CollectorRegistry();
    5. Gauge duration = Gauge.build()
    6. .name("my_batch_job_duration_seconds")
    7. .register(registry);
    8. Gauge.Timer durationTimer = duration.startTimer();
    9. try {
    10. // Your code here.
    11. // This is only added to the registry after success,
    12. Gauge lastSuccess = Gauge.build()
    13. .help("Last time my batch job succeeded, in unixtime.")
    14. .register(registry);
    15. lastSuccess.setToCurrentTime();
    16. } finally {
    17. durationTimer.setDuration();
    18. PushGateway pg = new PushGateway("127.0.0.1:9091");
    19. pg.pushAdd(registry, "my_batch_job");
    20. }

    如果任务最近没有运行,请创建一个警报到Alertmanager。将以下内容添加到Pushgateway的Prometheus服务的记录规则中:record rules ALERT MyBatchJobNotCompleted IF min(time() - my_batch_job_last_success_unixtime{job="my_batch_job"}) > 60 * 60 FOR 5m WITH { severity="page" } SUMMARY "MyBatchJob has not completed successfully in over an hour"