Search backpressure

    To decide whether to apply search backpressure, OpenSearch periodically measures the following resource consumption statistics for each search request:

    • CPU usage
    • Elapsed time

    An observer thread periodically measures the resource usage of the node. If OpenSearch determines that the node is under duress, OpenSearch examines the resource usage of each search shard task and compares it against configurable thresholds. OpenSearch considers CPU usage, heap usage, and elapsed time and assigns each task a cancellation score that is then used to cancel the most resource-intensive tasks.

    OpenSearch limits the number of cancellations to a fraction of successful task completions. Additionally, it limits the number of cancellations per unit time. OpenSearch continues to monitor and cancel tasks until the node is no longer under duress.

    If a query is canceled, OpenSearch may return partial results if some shards failed. If all shards failed, OpenSearch returns an error from the server similar to the following error:

    Search backpressure adds several settings to the standard OpenSearch cluster settings. These settings are dynamic, so you can change the default behavior of this feature without restarting your cluster.

    Introduced 2.4

    You can use the to monitor server-side request cancellations.

    Sample request

    To retrieve the statistics, use the following request:

    Sample response

    The response contains the following fields.

    Field NameData typeDescription
    search_backpressureObjectStatistics about search backpressure.
    search_backpressure.
        search_shard_task
    ObjectStatistics specific to the search shard task.
    search_backpressure.
        search_shard_task.
        
    ObjectStatistics about the current tasks.
    search_backpressure.
        search_shard_task.
        calcellation_stats
    ObjectStatistics about the tasks canceled since the node last restarted.
    search_backpressure.modeStringThe for search backpressure.

    The resource_tracker_stats object contains the statistics for each resource tracker: , heap_usage_tracker, and .

    elapsed_time_tracker

    The elapsed_time_tracker object contains the following statistics related to the elapsed time.

    The heap_usage_tracker object contains the following statistics related to the heap usage.

    Field NameData typeDescription
    cancellation_countIntegerThe number of tasks canceled because of excessive heap usage since the node last restarted.
    current_max_bytesIntegerThe maximum heap usage for all tasks currently running on the node, in bytes.
    current_avg_bytesIntegerThe average heap usage for all tasks currently running on the node, in bytes.
    rolling_avg_bytesIntegerThe rolling average heap usage for n most recent tasks, in bytes. n is configurable and defined by the search_backpressure.search_shard_task.heap_moving_average_window_size setting. The default value for this setting is 100.

    cpu_usage_tracker

    The cancellation_stats object contains the following statistics for canceled tasks.

    Field NameData typeDescription
    cancellation_countIntegerThe total number of tasks canceled since the node last restarted.
    cancellation_limit_reached_countIntegerThe number of times when the number of tasks eligible for cancellation exceeded the set cancellation threshold.