Getting started with Data Prepper

    If you are migrating from Open Distro Data Prepper, see Migrating from Open Distro.

    There are two ways to install Data Prepper: you can run the Docker image or build from source.

    The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have Docker available. Run the following command:

    copy

    If you have special requirements that require you to build from source, or if you want to contribute, see the .

    2. Configuring Data Prepper

    Two configuration files are required to run a Data Prepper instance. Optionally, you can configure a Log4j 2 configuration file. See for more information. The following list describes the purpose of each configuration file:

    • : This file describes which data pipelines to run, including sources, processors, and sinks.
    • log4j2-rolling.properties (optional): This file contains Log4j 2 configuration options and can be a JSON, YAML, XML, or .properties file type.

    For Data Prepper versions earlier than 2.0, the .jar file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example:

    1. java -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml

    Optionally, you can add "-Dlog4j.configurationFile=config/log4j2.properties" to the command to pass a custom Log4j 2 configuration file. If you don’t provide a properties file, Data Prepper defaults to the log4j2.properties file in the shared-config directory.

    Starting with Data Prepper 2.0, you can launch Data Prepper by using the following data-prepper script that does not require any additional command line arguments:

    1. bin/data-prepper

    Configuration files are read from specific subdirectories in the application’s home directory:

    1. pipelines/: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files.
    2. : Used for the Data Prepper server configuration.

    You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example:

    To configure Data Prepper, see the following information for each use case:

    • Trace analytics: Learn how to collect trace data and customize a pipeline that ingests and transforms that data.
    • : Learn how to set up Data Prepper for log observability.

    Create a Data Prepper pipeline file named pipelines.yaml using the following configuration:

    1. simple-sample-pipeline:
    2. delay: "5000"
    3. source:
    4. random:
    5. sink:
    6. - stdout:

    copy

    4. Running Data Prepper

    Run the following command with your pipeline configuration YAML.

    1. docker run --name data-prepper \
    2. opensearchproject/data-prepper:latest

    copy

    The example pipeline configuration above demonstrates a simple pipeline with a source (random) sending data to a sink (stdout). For examples of more advanced pipeline configurations, see Pipelines.

    After starting Data Prepper, you should see log output and some UUIDs after a few seconds:

    The remainder of this page provides examples for running Data Prepper from the Docker image. If you built it from source, refer to the for more information.

    However you configure your pipeline, you’ll run Data Prepper the same way. You run the Docker image and modify both the pipelines.yaml and data-prepper-config.yaml files.

    For Data Prepper 2.0 or later, use this command:

      copy

      1. docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x

      copy

      Once Data Prepper is running, it processes data until it is shut down. Once you are done, shut it down with the following command:

      copy

      For Data Prepper 2.0 or later, the Log4j 2 configuration file is read from config/log4j2.properties in the application’s home directory. By default, it uses log4j2-rolling.properties in the shared-config directory.

      For Data Prepper 1.5 or earlier, optionally add "-Dlog4j.configurationFile=config/log4j2.properties" to the command if you want to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper defaults to the log4j2.properties file in the shared-config directory.

      Trace analytics is an important Data Prepper use case. If you haven’t yet configured it, see Trace analytics.

      Log ingestion is also an important Data Prepper use case. To learn more, see .

      To learn how to run Data Prepper with a Logstash configuration, see Migrating from Logstash.

      For information on how to monitor Data Prepper, see .

      More examples

      For more examples of Data Prepper, see in the Data Prepper repo.