Data Prepper configuration reference

    General pipeline options

    OptionRequiredDescription
    workersNoInteger, default 1. Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine.
    delayNoInteger (milliseconds), default 3,000. How long workers wait between buffer read attempts.

    Sources define where your data comes from.

    Source for the OpenTelemetry Collector.

    OptionRequiredDescription
    portNoInteger, the port OTel trace source is running on. Default is .
    request_timeoutNoInteger, the request timeout in millis. Default is 10_000.
    health_check_serviceNoBoolean, enables a gRPC health check service under grpc.health.v1/Health/Check. Default is false.
    proto_reflection_serviceNoBoolean, enables a reflection service for Protobuf services (see and gRPC Server Reflection Tutorial docs). Default is false.
    unframed_requestsNoBoolean, enable requests not framed using the gRPC wire protocol.
    thread_countNoInteger, the number of threads to keep in the ScheduledThreadPool. Default is 200.
    max_connection_countNoInteger, the maximum allowed number of open connections. Default is 500.
    sslNoBoolean, enables connections to the OTel source port over TLS/SSL. Defaults to true.
    sslKeyCertChainFileConditionallyString, file-system path or AWS S3 path to the security certificate (e.g. “config/demo-data-prepper.crt” or “s3://my-secrets-bucket/demo-data-prepper.crt”). Required if ssl is set to true.
    sslKeyFileConditionallyString, file-system path or AWS S3 path to the security key (e.g. “config/demo-data-prepper.key” or “s3://my-secrets-bucket/demo-data-prepper.key”). Required if ssl is set to true.
    useAcmCertForSSLNoBoolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is false.
    acmCertificateArnConditionallyString, represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if useAcmCertForSSL is set to true.
    awsRegionConditionallyString, represents the AWS region to use ACM or S3. Required if useAcmCertForSSL is set to true or sslKeyCertChainFile and are AWS S3 paths.

    file

    Source for flat file input.

    OptionRequiredDescription
    pathYesString, path to the input file (e.g. logs/my-log.log).

    pipeline

    Source for reading from another pipeline.

    stdin

    Source for console input. Can be useful for testing. No options.

    Buffers

    The default buffer. Memory-based.

    OptionRequiredDescription
    buffer_sizeNoInteger, default 512. The maximum number of records the buffer accepts.
    batch_sizeNoInteger, default 8. The maximum number of records the buffer drains after each read.

    Preppers perform some action on your data: filter, transform, enrich, etc.

    otel_trace_raw_prepper

    Converts OpenTelemetry data to OpenSearch-compatible JSON documents.

    OptionRequiredDescription
    root_span_flush_delayNoInteger, representing the time interval in seconds to flush all the root spans in the prepper together with their descendants. Defaults to 30.
    trace_flush_intervalNoInteger, representing the time interval in seconds to flush all the descendant spans without any root span. Defaults to 180.

    service_map_stateful

    Uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards.

    OptionRequiredDescription
    window_durationNoInteger, representing the fixed time window in seconds to evaluate service-map relationships. Defaults to 180.

    peer_forwarder

    Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment.

    OptionRequiredDescription
    upper_caseNoBoolean, whether to convert to uppercase (true) or lowercase (false).

    Sinks

    Sinks define where Data Prepper writes your data to.

    opensearch

    Sink for an OpenSearch cluster.

    OptionRequiredDescription
    hostsYesList of OpenSearch hosts to write to (e.g. [“https://localhost:9200“, ““]).
    certNoString, path to the security certificate (e.g. “config/root-ca.pem”) if the cluster uses the OpenSearch security plugin.
    usernameNoString, username for HTTP basic authentication.
    passwordNoString, password for HTTP basic authentication.
    aws_sigv4NoBoolean, whether to use IAM signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, ~/.aws/credential, etc.).
    aws_regionNoString, AWS region (e.g. “us-east-1”) for the domain if you are connecting to Amazon OpenSearch Service.
    aws_sts_roleNoString, IAM role which the sink plugin will assume to sign request to Amazon OpenSearch Service. If not provided the plugin will use the default credentials.
    trace_analytics_rawNoBoolean, default false. Whether to export as trace data to the otel-v1-apm-span-* index pattern (alias otel-v1-apm-span) for use with the Trace Analytics OpenSearch Dashboards plugin.
    trace_analytics_service_mapNoBoolean, default false. Whether to export as trace data to the otel-v1-apm-service-map index for use with the service map component of the Trace Analytics OpenSearch Dashboards plugin.
    indexNoString, name of the index to export to. Only required if you don’t use the trace_analytics_raw or trace_analytics_service_map presets.
    template_fileNoString, the path to a JSON index template file (e.g. /your/local/template-file.json if you do not use the trace_analytics_raw or trace_analytics_service_map. See for an example.
    document_id_fieldNoString, the field from the source data to use for the OpenSearch document ID (e.g. “my-field”) if you don’t use the trace_analytics_raw or trace_analytics_service_map presets.
    dlq_fileNoString, the path to your preferred dead letter queue file (e.g. /your/local/dlq-file). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster.
    bulk_sizeNoInteger (long), default 5. The maximum size (in MiB) of bulk requests to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually.

    file

    Sink for flat file output.

    OptionRequiredDescription
    pathYesString, path for the output file (e.g. logs/my-transformed-log.log).

    pipeline

    Sink for writing to another pipeline.

    Sink for console output. Can be useful for testing. No options.