Getting started

Download the latest release of Prometheus foryour platform, then extract and run it:

Before starting Prometheus, let's configure it.

Configuring Prometheus to monitor itself

Prometheus collects metrics from monitored targets by scraping metrics HTTPendpoints on these targets. Since Prometheus also exposes data in the samemanner about itself, it can also scrape and monitor its own health.

While a Prometheus server that collects only data about itself is not veryuseful in practice, it is a good starting example. Save the following basicPrometheus configuration as a file named :

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    static_configs:

For a complete specification of configuration options, see theconfiguration documentation.

To start Prometheus with your newly created configuration file, change to thedirectory containing the Prometheus binary and run:

# Start Prometheus.
# By default, Prometheus stores its database in ./data (flag --storage.tsdb.path).
./prometheus --config.file=prometheus.yml

Prometheus should start up. You should also be able to browse to a status pageabout itself at . Give it a couple ofseconds to collect data about itself from its own HTTP metrics endpoint.

You can also verify that Prometheus is serving metrics about itself bynavigating to its metrics endpoint:localhost:9090/metrics

Using the expression browser

Let us try looking at some data that Prometheus has collected about itself. Touse Prometheus's built-in expression browser, navigate tohttp://localhost:9090/graph and choose the "Console" view within the "Graph"tab.

prometheus_target_interval_length_seconds

This should return a number of different time series (along with the latest valuerecorded for each), all with the metric nameprometheus_target_interval_length_seconds, but with different labels. Theselabels designate different latency percentiles and target group intervals.

If we were only interested in the 99th percentile latencies, we could use thisquery to retrieve that information:

To count the number of returned time series, you could write:

count(prometheus_target_interval_length_seconds)

For more about the expression language, see the.

To graph expressions, navigate to http://localhost:9090/graph and use the "Graph"tab.

For example, enter the following expression to graph the per-second rate of chunks being created in the self-scraped Prometheus:

rate(prometheus_tsdb_head_chunks_created_total[1m])

Experiment with the graph range parameters and other settings.

Starting up some sample targets

Let us make this more interesting and start some example targets for Prometheusto scrape.

The Go client library includes an example which exports fictional RPC latenciesfor three services with different latency distributions.

Download the Go client library for Prometheus and run three of these exampleprocesses:

# Fetch the client library code and compile example.
cd client_golang/examples/random
go get -d
go build
# Start 3 example targets in separate terminals:
./random -listen-address=:8080
./random -listen-address=:8081
./random -listen-address=:8082

You should now have example targets listening on http://localhost:8080/metrics,, and http://localhost:8082/metrics.

Now we will configure Prometheus to scrape these new targets. Let's group allthree endpoints into one job called example-random. However, imagine that thefirst two endpoints are production targets, while the third one represents acanary instance. To model this in Prometheus, we can add several groups ofendpoints to a single job, adding extra labels to each group of targets. Inthis example, we will add the group="production" label to the first group oftargets, while adding group="canary" to the second.

To achieve this, add the following job definition to the scrape_configssection in your prometheus.yml and restart your Prometheus instance:

Go to the expression browser and verify that Prometheus now has informationabout time series that these example endpoints expose, such as therpc_durations_seconds metric.

Configure rules for aggregating scraped data into new time series

Though not a problem in our example, queries that aggregate over thousands oftime series can get slow when computed ad-hoc. To make this more efficient,Prometheus allows you to prerecord expressions into completely new persistedtime series via configured recording rules. Let's say we are interested inrecording the per-second rate of example RPCs(rpc_durations_seconds_count) averaged over all instances (butpreserving the job and service dimensions) as measured over a window of 5minutes. We could write this as:

avg(rate(rpc_durations_seconds_count[5m])) by (job, service)

Try graphing this expression.

To record the time series resulting from this expression into a new metriccalled job_service:rpc_durations_seconds_count:avg_rate5m, create a filewith the following recording rule and save it as prometheus.rules.yml:

groups:
- name: example
  rules:
  - record: job_service:rpc_durations_seconds_count:avg_rate5m
    expr: avg(rate(rpc_durations_seconds_count[5m])) by (job, service)

To make Prometheus pick up this new rule, add a rule_files statement in your prometheus.yml. The config should nowlook like this:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  external_labels:
    monitor: 'codelab-monitor'
rule_files:
  - 'prometheus.rules.yml'
scrape_configs:
  - job_name: 'prometheus'
    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name:       'example-random'
    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:8080', 'localhost:8081']
        labels:
          group: 'production'
      - targets: ['localhost:8082']