Data streams

    • You’re ingesting documents that grow rapidly.
    • You don’t need to update older documents.
    • Your searches generally target the newer documents.

    A typical workflow to manage time-series data is as follows:

    • To split your data into an index for each day, use the rollover operation.
    • To perform searches on a virtual index name that gets expanded to the underlying indices, create an index alias.
    • To perform a write operation on an index alias, configure the latest index as the write index.
    • To configure new indices, extract common mappings and settings into an .

    Even after you perform all these operations, you’re still not enforcing the best practices when dealing with time-series data. For example, you can modify the indices directly. You’re able to ingest documents without a timestamp field, which might result in slower queries.

    Data streams abstract the complexity and enforce the best practices for managing time-series data.

    With data streams, you can store append-only time-series data across multiple indices with a single endpoint for ingesting and searching data. It replaces index aliases for time-series data.

    A data stream consists of one or more hidden auto-generated backing indices. These backing indices are named using the following convention:

    For example, , where generation-id is a six-digit, zero-padded integer that acts as a cumulative count of the data stream’s rollovers, starting at 000001.

    The most recently created backing index is the data stream’s write index. You can’t add documents directly to any of the backing indices. You can only add them via the data stream handle:

    The data stream routes search requests to all of its backing indices. It uses the timestamp field to intelligently route search requests to the right set of indices and shards:

    data stream indexing diagram

    The following operations are not supported on the write index because they might hinder the indexing operation:

    • close
    • clone
    • delete
    • shrink
    • split

    Get started with data streams

    To create a data stream, you first need to create an index template that configures a set of indices as a data stream. The data_stream object indicates that it’s a data stream and not a regular index template. The index pattern matches with the name of the data stream:

    1. PUT _index_template/logs-template
    2. {
    3. "index_patterns": [
    4. "my-data-stream",
    5. "logs-*"
    6. ],
    7. "data_stream": {},
    8. "priority": 100
    9. }
    1. PUT _index_template/logs-template-nginx
    2. {
    3. "index_patterns": "logs-nginx",
    4. "data_stream": {
    5. "timestamp_field": {
    6. "name": "request_time"
    7. }
    8. },
    9. "priority": 200,
    10. "template": {
    11. "settings": {
    12. "number_of_shards": 1,
    13. }
    14. }
    15. }

    In this case, logs-nginx index matches both the logs-template and logs-template-nginx templates. When you have a tie, OpenSearch selects the matching index template with the higher priority value.

    Step 2: Create a data stream

    After you create an index template, you can create a data stream. You can use the data stream API to explicitly create a data stream. The data stream API initializes the first backing index:

    1. PUT _data_stream/logs-redis
    2. PUT _data_stream/logs-nginx

    You can also directly start ingesting data without creating a data stream.

    Because we have a matching index template with a data_stream object, OpenSearch automatically creates the data stream:

    1. POST logs-staging/_doc
    2. {
    3. "message": "login attempt failed",
    4. "@timestamp": "2013-03-01T00:00:00"
    5. }

    To see information about a specific data stream:

    Sample response

    1. {
    2. "data_streams" : [
    3. "name" : "logs-nginx",
    4. "timestamp_field" : {
    5. "name" : "request_time"
    6. },
    7. "indices" : [
    8. {
    9. "index_name" : ".ds-logs-nginx-000001",
    10. "index_uuid" : "-VhmuhrQQ6ipYCmBhn6vLw"
    11. }
    12. ],
    13. "generation" : 1,
    14. "status" : "GREEN",
    15. "template" : "logs-template-nginx"
    16. }
    17. ]
    18. }

    You can see the name of the timestamp field, the list of the backing indices, and the template that’s used to create the data stream. You can also see the health of the data stream, which represents the lowest status of all its backing indices.

    To see more insights about the data stream, use the _stats endpoint:

    1. GET _data_stream/logs-nginx/_stats

    Sample response

    1. {
    2. "_shards" : {
    3. "total" : 1,
    4. "successful" : 1,
    5. "failed" : 0
    6. },
    7. "data_stream_count" : 1,
    8. "backing_indices" : 1,
    9. "total_store_size_bytes" : 208,
    10. "data_streams" : [
    11. {
    12. "data_stream" : "logs-nginx",
    13. "backing_indices" : 1,
    14. "store_size_bytes" : 208,
    15. "maximum_timestamp" : 0
    16. }
    17. }

    To ingest data into a data stream, you can use the regular indexing APIs. Make sure every document that you index has a timestamp field. If you try to ingest a document that doesn’t have a timestamp field, you get an error.

    1. POST logs-redis/_doc
    2. {
    3. "message": "login attempt",
    4. "@timestamp": "2013-03-01T00:00:00"
    5. }

    Step 4: Searching a data stream

    You can search a data stream just like you search a regular index or an index alias. The search operation applies to all of the backing indices (all data present in the stream).

    Sample response

    1. "took" : 514,
    2. "timed_out" : false,
    3. "_shards" : {
    4. "total" : 5,
    5. "successful" : 5,
    6. "skipped" : 0,
    7. "failed" : 0
    8. },
    9. "hits" : {
    10. "total" : {
    11. "value" : 1,
    12. "relation" : "eq"
    13. },
    14. "max_score" : 0.2876821,
    15. "hits" : [
    16. {
    17. "_index" : ".ds-logs-redis-000001",
    18. "_type" : "_doc",
    19. "_id" : "-rhVmXoBL6BAVWH3mMpC",
    20. "_score" : 0.2876821,
    21. "_source" : {
    22. "message" : "login attempt",
    23. "@timestamp" : "2013-03-01T00:00:00"
    24. }
    25. }
    26. ]
    27. }
    28. }

    A rollover operation creates a new backing index that becomes the data stream’s new write index.

    To perform manual rollover operation on the data stream:

    1. POST logs-redis/_rollover

    Sample response

    1. {
    2. "acknowledged" : true,
    3. "shards_acknowledged" : true,
    4. "old_index" : ".ds-logs-redis-000001",
    5. "new_index" : ".ds-logs-redis-000002",
    6. "rolled_over" : true,
    7. "dry_run" : false,
    8. "conditions" : { }
    9. }

    If you now perform a GET operation on the logs-redis data stream, you see that the generation ID is incremented from 1 to 2.

    You also don’t need to provide the rollover_alias setting, because the ISM policy infers this information from the backing index.

    Step 6: Manage data streams in OpenSearch Dashboards

    To manage data streams from OpenSearch Dashboards, open OpenSearch Dashboards, choose Index Management, select Indices or Policy managed indices.

    You see a toggle switch for data streams that you can use to show or hide indices belonging to a data stream.

    When you enable this switch, you see a data stream multi-select dropdown menu that you can use for filtering data streams. You also see a data stream column that shows you the name of the parent data stream the index is contained in.

    You can select one or more data streams and apply an ISM policy on them. You can also apply a policy on any individual backing index.

    You can performing visualizations on a data stream just like you would on a regular index or index alias.

    The delete operation first deletes the backing indices of a data stream and then deletes the data stream itself.

    To delete a data stream and all of its hidden backing indices:

    You can use wildcards to delete more than one data stream.

    We recommend deleting data from a data stream using an ISM policy.

    You can also use asynchronous search and and PPL to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.