Metric aggregations

Metric aggregations are of two types: single-value metric aggregations and multi-value metric aggregations.

Single-value metric aggregations return a single metric. For example, , min, max, avg, cardinality, and value_count.

Multi-value metric aggregations

Multi-value metric aggregations return more than one metric. For example, stats, extended_stats, matrix_stats, percentile, percentile_ranks, geo_bound, top_hits, and scripted_metric.

sum, min, max, avg

The sum, min, max, and avg metrics are single-value metric aggregations that return the sum, minimum, maximum, and average values of a field, respectively.

The following example calculates the total sum of the taxful_total_price field:

Sample Response

  1. ...
  2. "aggregations" : {
  3. "sum_taxful_total_price" : {
  4. "value" : 350884.12890625
  5. }
  6. }
  7. }

In a similar fashion, you can find the minimum, maximum, and average values of a field.

cardinality

The cardinality metric is a single-value metric aggregation that counts the number of unique or distinct values of a field.

The following example finds the number of unique products in an eCommerce store:

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "unique_products": {
  6. "cardinality": {
  7. "field": "products.product_id"
  8. }
  9. }
  10. }
  11. }

Example response

  1. ...
  2. "aggregations" : {
  3. "unique_products" : {
  4. "value" : 7033
  5. }
  6. }
  7. }

Cardinality count is approximate. If you have tens of thousands of products in your hypothetical store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn’t scale well; it requires huge amounts of memory and can cause high latencies.

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "unique_products": {
  6. "cardinality": {
  7. "field": "products.product_id",
  8. "precision_threshold": 10000
  9. }
  10. }
  11. }
  12. }

The value_count metric is a single-value metric aggregation that calculates the number of values that an aggregation is based on.

For example, you can use the value_count metric with the avg metric to find how many numbers the aggregation uses to calculate an average value.

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "number_of_values": {
  6. "value_count": {
  7. "field": "taxful_total_price"
  8. }
  9. }
  10. }
  11. }

Example response

  1. ...
  2. "aggregations" : {
  3. "number_of_values" : {
  4. "value" : 4675
  5. }
  6. }
  7. }

stats, extended_stats, matrix_stats

The stats metric is a multi-value metric aggregation that returns all basic metrics such as min, max, sum, avg, and value_count in one aggregation query.

The following example returns the basic stats for the taxful_total_price field:

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "stats_taxful_total_price": {
  6. "stats": {
  7. "field": "taxful_total_price"
  8. }
  9. }
  10. }
  11. }

Example response

The extended_stats aggregation is an extended version of the stats aggregation. Apart from including basic stats, extended_stats also returns stats such as sum_of_squares, variance, and std_deviation.

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "extended_stats_taxful_total_price": {
  6. "extended_stats": {
  7. "field": "taxful_total_price"
  8. }
  9. }
  10. }
  11. }

Sample Response

  1. ...
  2. "aggregations" : {
  3. "extended_stats_taxful_total_price" : {
  4. "count" : 4675,
  5. "min" : 6.98828125,
  6. "max" : 2250.0,
  7. "avg" : 75.05542864304813,
  8. "sum" : 350884.12890625,
  9. "sum_of_squares" : 3.9367749294174194E7,
  10. "variance" : 2787.59157113862,
  11. "variance_population" : 2787.59157113862,
  12. "variance_sampling" : 2788.187974983536,
  13. "std_deviation" : 52.79764740155209,
  14. "std_deviation_population" : 52.79764740155209,
  15. "std_deviation_sampling" : 52.80329511482722,
  16. "std_deviation_bounds" : {
  17. "upper" : 180.6507234461523,
  18. "lower" : -30.53986616005605,
  19. "upper_population" : 180.6507234461523,
  20. "lower_population" : -30.53986616005605,
  21. "upper_sampling" : 180.66201887270256,
  22. "lower_sampling" : -30.551161586606312
  23. }
  24. }
  25. }
  26. }

The std_deviation_bounds object provides a visual variance of the data with an interval of plus/minus two standard deviations from the mean. To set the standard deviation to a different value, say 3, set sigma to 3:

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "extended_stats_taxful_total_price": {
  6. "extended_stats": {
  7. "field": "taxful_total_price",
  8. "sigma": 3
  9. }
  10. }
  11. }

The matrix_stats aggregation generates advanced stats for multiple fields in a matrix form. The following example returns advanced stats in a matrix form for the taxful_total_price and products.base_price fields:

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "aggs": {
  4. "matrix_stats_taxful_total_price": {
  5. "matrix_stats": {
  6. "fields": ["taxful_total_price", "products.base_price"]
  7. }
  8. }
  9. }
  10. }

Example response

  1. ...
  2. "aggregations" : {
  3. "matrix_stats_taxful_total_price" : {
  4. "doc_count" : 4675,
  5. "fields" : [
  6. {
  7. "name" : "products.base_price",
  8. "count" : 4675,
  9. "mean" : 34.994239430147196,
  10. "variance" : 360.5035285833703,
  11. "skewness" : 5.530161335032702,
  12. "kurtosis" : 131.16306324042148,
  13. "covariance" : {
  14. "products.base_price" : 360.5035285833703,
  15. "taxful_total_price" : 846.6489362233166
  16. },
  17. "correlation" : {
  18. "products.base_price" : 1.0,
  19. "taxful_total_price" : 0.8444765264325268
  20. }
  21. },
  22. {
  23. "name" : "taxful_total_price",
  24. "count" : 4675,
  25. "mean" : 75.05542864304839,
  26. "variance" : 2788.1879749835402,
  27. "skewness" : 15.812149139924037,
  28. "kurtosis" : 619.1235507385902,
  29. "covariance" : {
  30. "products.base_price" : 846.6489362233166,
  31. "taxful_total_price" : 2788.1879749835402
  32. },
  33. "correlation" : {
  34. "products.base_price" : 0.8444765264325268,
  35. "taxful_total_price" : 1.0
  36. }
  37. }
  38. ]
  39. }
  40. }
  41. }

percentile, percentile_ranks

Percentile is the percentage of the data that’s at or below a certain threshold value.

The percentile metric is a multi-value metric aggregation that lets you find outliers in your data or figure out the distribution of your data.

The following example calculates the percentile in relation to the taxful_total_price field:

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "percentile_taxful_total_price": {
  6. "percentiles": {
  7. "field": "taxful_total_price"
  8. }
  9. }
  10. }
  11. }

Example response

  1. ...
  2. "aggregations" : {
  3. "percentile_taxful_total_price" : {
  4. "values" : {
  5. "1.0" : 21.984375,
  6. "5.0" : 27.984375,
  7. "25.0" : 44.96875,
  8. "50.0" : 64.22061688311689,
  9. "75.0" : 93.0,
  10. "95.0" : 156.0,
  11. "99.0" : 222.0
  12. }
  13. }
  14. }
  15. }

Percentile rank is the percentile of values at or below a threshold grouped by a specified value. For example, if a value is greater than or equal to 80% of the values, it has a percentile rank of 80.

Example response

  1. ...
  2. "aggregations" : {
  3. "percentile_rank_taxful_total_price" : {
  4. "values" : {
  5. "10.0" : 0.055096056411283456,
  6. "15.0" : 0.0830092961834656
  7. }
  8. }
  9. }
  10. }

The geo_bound metric is a multi-value metric aggregation that calculates the bounding box in terms of latitude and longitude around a geo_point field.

The following example returns the geo_bound metrics for the geoip.location field:

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "geo": {
  6. "geo_bounds": {
  7. "field": "geoip.location"
  8. }
  9. }
  10. }
  11. }

Example response

  1. "aggregations" : {
  2. "geo" : {
  3. "bounds" : {
  4. "top_left" : {
  5. "lat" : 52.49999997206032,
  6. "lon" : -118.20000001229346
  7. },
  8. "bottom_right" : {
  9. "lat" : 4.599999985657632,
  10. "lon" : 55.299999956041574
  11. }
  12. }
  13. }
  14. }
  15. }

top_hits

The top_hits metric is a multi-value metric aggregation that ranks the matching documents based on a relevance score for the field that’s being aggregated.

You can specify the following options:

  • from: The starting position of the hit.
  • size: The maximum size of hits to return. The default value is 3.
  • sort: How the matching hits are sorted. By default, the hits are sorted by the relevance score of the aggregation query.

The following example returns the top 5 products in your eCommerce data:

  1. GET opensearch_dashboards_sample_data_ecommerce/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "top_hits_products": {
  6. "top_hits": {
  7. "size": 5
  8. }
  9. }
  10. }
  11. }

Example response

  1. ...
  2. "aggregations" : {
  3. "top_hits_products" : {
  4. "hits" : {
  5. "value" : 4675,
  6. "relation" : "eq"
  7. },
  8. "max_score" : 1.0,
  9. "hits" : [
  10. {
  11. "_index" : "opensearch_dashboards_sample_data_ecommerce",
  12. "_type" : "_doc",
  13. "_id" : "glMlwXcBQVLeQPrkHPtI",
  14. "_source" : {
  15. "category" : [
  16. "Women's Accessories",
  17. "Women's Clothing"
  18. ],
  19. "currency" : "EUR",
  20. "customer_first_name" : "rania",
  21. "customer_full_name" : "rania Evans",
  22. "customer_gender" : "FEMALE",
  23. "customer_id" : 24,
  24. "customer_last_name" : "Evans",
  25. "customer_phone" : "",
  26. "day_of_week" : "Sunday",
  27. "day_of_week_i" : 6,
  28. "email" : "rania@evans-family.zzz",
  29. "manufacturer" : [
  30. "Tigress Enterprises"
  31. ],
  32. "order_date" : "2021-02-28T14:16:48+00:00",
  33. "order_id" : 583581,
  34. "products" : [
  35. {
  36. "base_price" : 10.99,
  37. "discount_percentage" : 0,
  38. "quantity" : 1,
  39. "manufacturer" : "Tigress Enterprises",
  40. "tax_amount" : 0,
  41. "product_id" : 19024,
  42. "category" : "Women's Accessories",
  43. "sku" : "ZO0082400824",
  44. "taxless_price" : 10.99,
  45. "unit_discount_amount" : 0,
  46. "min_price" : 5.17,
  47. "_id" : "sold_product_583581_19024",
  48. "discount_amount" : 0,
  49. "created_on" : "2016-12-25T14:16:48+00:00",
  50. "product_name" : "Snood - white/grey/peach",
  51. "price" : 10.99,
  52. "taxful_price" : 10.99,
  53. "base_unit_price" : 10.99
  54. },
  55. {
  56. "base_price" : 32.99,
  57. "discount_percentage" : 0,
  58. "quantity" : 1,
  59. "manufacturer" : "Tigress Enterprises",
  60. "tax_amount" : 0,
  61. "product_id" : 19260,
  62. "category" : "Women's Clothing",
  63. "sku" : "ZO0071900719",
  64. "taxless_price" : 32.99,
  65. "unit_discount_amount" : 0,
  66. "min_price" : 17.15,
  67. "_id" : "sold_product_583581_19260",
  68. "discount_amount" : 0,
  69. "created_on" : "2016-12-25T14:16:48+00:00",
  70. "product_name" : "Cardigan - grey",
  71. "price" : 32.99,
  72. "taxful_price" : 32.99,
  73. "base_unit_price" : 32.99
  74. }
  75. ],
  76. "sku" : [
  77. "ZO0082400824",
  78. "ZO0071900719"
  79. ],
  80. "taxful_total_price" : 43.98,
  81. "taxless_total_price" : 43.98,
  82. "total_quantity" : 2,
  83. "total_unique_products" : 2,
  84. "type" : "order",
  85. "user" : "rani",
  86. "geoip" : {
  87. "country_iso_code" : "EG",
  88. "location" : {
  89. "lon" : 31.3,
  90. "lat" : 30.1
  91. },
  92. "region_name" : "Cairo Governorate",
  93. "continent_name" : "Africa",
  94. "city_name" : "Cairo"
  95. },
  96. "event" : {
  97. "dataset" : "sample_ecommerce"
  98. }
  99. }
  100. ...
  101. }
  102. ]
  103. }
  104. }
  105. }
  106. }

scripted_metric

The scripted_metric metric is a multi-value metric aggregation that returns metrics calculated from a specified script.

A script has four stages: the initial stage, the map stage, the combine stage, and the reduce stage.

  • init_script: (OPTIONAL) Sets the initial state and executes before any collection of documents.
  • map_script: Checks the value of the type field and executes the aggregation on the collected documents.
  • combine_script: Aggregates the state returned from every shard. The aggregated value is returned to the coordinating node.
  • reduce_script: Provides access to the variable states; this variable combines the results from the combine_script on each shard into an array.
  1. GET opensearch_dashboards_sample_data_logs/_search
  2. {
  3. "size": 0,
  4. "aggregations": {
  5. "responses.counts": {
  6. "scripted_metric": {
  7. "init_script": "state.responses = ['error':0L,'success':0L,'other':0L]",
  8. "map_script": """
  9. def code = doc['response.keyword'].value;
  10. if (code.startsWith('5') || code.startsWith('4')) {
  11. state.responses.error += 1 ;
  12. } else if(code.startsWith('2')) {
  13. state.responses.success += 1;
  14. } else {
  15. state.responses.other += 1;
  16. }
  17. """,
  18. "combine_script": "state.responses",
  19. "reduce_script": """
  20. def counts = ['error': 0L, 'success': 0L, 'other': 0L];
  21. for (responses in states) {
  22. counts.error += responses['error'];
  23. counts.success += responses['success'];
  24. counts.other += responses['other'];
  25. }
  26. return counts;
  27. """
  28. }
  29. }
  30. }
  31. }

Sample Response

  1. ...
  2. "aggregations" : {
  3. "responses.counts" : {
  4. "value" : {
  5. "other" : 0,
  6. "success" : 12832,
  7. "error" : 1242
  8. }
  9. }
  10. }
  11. }