Ruby client

    To install the Ruby gem for the Ruby client, run the following command:

    To use the client, import it as a module:

    Connecting to OpenSearch

    To connect to the default OpenSearch host, create a client object, passing the default host address in the constructor:

    1. client = OpenSearch::Client.new(host: 'http://localhost:9200')

    The following example creates a client object with a custom URL and the log option set to true. It sets the retry_on_failure parameter to retry a failed request five times rather than the default three times. Finally, it increases the timeout by setting the request_timeout parameter to 120 seconds. It then returns the basic cluster health information:

    1. client = OpenSearch::Client.new(
    2. url: "http://localhost:9200",
    3. retry_on_failure: 5,
    4. request_timeout: 120,
    5. log: true
    6. )
    7. client.cluster.health

    The output is as follows:

    1. 2022-08-25 14:24:52 -0400: GET http://localhost:9200/ [status:200, request:0.048s, query:n/a]
    2. 2022-08-25 14:24:52 -0400: < {
    3. "name" : "opensearch",
    4. "cluster_name" : "docker-cluster",
    5. "cluster_uuid" : "Aw0F5Pt9QF6XO9vXQHIs_w",
    6. "version" : {
    7. "distribution" : "opensearch",
    8. "number" : "2.2.0",
    9. "build_type" : "tar",
    10. "build_hash" : "b1017fa3b9a1c781d4f34ecee411e0cdf930a515",
    11. "build_date" : "2022-08-09T02:27:25.256769336Z",
    12. "build_snapshot" : false,
    13. "lucene_version" : "9.3.0",
    14. "minimum_wire_compatibility_version" : "7.10.0",
    15. "minimum_index_compatibility_version" : "7.0.0"
    16. },
    17. "tagline" : "The OpenSearch Project: https://opensearch.org/"
    18. }
    19. 2022-08-25 14:24:52 -0400: GET http://localhost:9200/_cluster/health [status:200, request:0.018s, query:n/a]
    20. 2022-08-25 14:24:52 -0400: < {"cluster_name":"docker-cluster","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"discovered_master":true,"discovered_cluster_manager":true,"active_primary_shards":10,"active_shards":10,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":8,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":55.55555555555556}

    Creating an index

    You don’t need to create an index explicitly in OpenSearch. Once you upload a document into an index that does not exist, OpenSearch creates the index automatically. Alternatively, you can create an index explicitly to specify settings like the number of primary and replica shards. To create an index with non-default settings, create an index body hash with those settings:

    1. index_body = {
    2. 'settings': {
    3. 'index': {
    4. 'number_of_shards': 1,
    5. 'number_of_replicas': 2
    6. }
    7. }
    8. }
    9. client.indices.create(
    10. index: 'students',
    11. body: index_body
    12. )

    Mappings

    OpenSearch uses dynamic mapping to infer field types of the documents that are indexed. However, to have more control over the schema of your document, you can pass an explicit mapping to OpenSearch. You can define data types for some or all fields of your document in this mapping. To create a mapping for an index, use the put_mapping method:

    1. client.indices.put_mapping(
    2. index: 'students',
    3. body: {
    4. properties: {
    5. first_name: { type: 'keyword' },
    6. last_name: { type: 'keyword' }
    7. }
    8. }
    9. )

    By default, string fields are mapped as text, but in the mapping above, the first_name and last_name fields are mapped as keyword. This mapping signals to OpenSearch that these fields should not be analyzed and should support only full case-sensitive matches.

    You can verify the index’s mappings using the get_mapping method:

    1. response = client.indices.get_mapping(index: 'students')

    With strict mapping, you can index a document with a missing field, but you cannot index a document with a new field. For example, indexing the following document with a misspelled grad_yea field fails:

    1. document = {
    2. first_name: 'Connor',
    3. last_name: 'James',
    4. gpa: 3.93,
    5. grad_yea: 2021
    6. }
    7. client.index(
    8. index: 'students',
    9. body: document,
    10. id: 100,
    11. refresh: true
    12. )

    OpenSearch returns a mapping error:

    1. {"error":{"root_cause":[{"type":"strict_dynamic_mapping_exception","reason":"mapping set to strict, dynamic introduction of [grad_yea] within [_doc] is not allowed"}],"type":"strict_dynamic_mapping_exception","reason":"mapping set to strict, dynamic introduction of [grad_yea] within [_doc] is not allowed"},"status":400}

    To index one document, use the index method:

    1. document = {
    2. first_name: 'Connor',
    3. last_name: 'James',
    4. gpa: 3.93,
    5. grad_year: 2021
    6. }
    7. client.index(
    8. index: 'students',
    9. body: document,
    10. id: 100,
    11. refresh: true
    12. )

    Updating a document

    To update a document, use the update method:

    1. client.update(index: 'students',
    2. id: 100,
    3. body: { doc: { gpa: 3.25 } },
    4. refresh: true)

    Deleting a document

    To delete a document, use the delete method:

    1. client.delete(
    2. index: 'students',
    3. id: 100,
    4. refresh: true
    5. )

    Bulk operations

    You can perform several operations at the same time by using the bulk method. The operations may be of the same type or of different types.

    You can index multiple documents using the bulk method:

    1. actions = [
    2. { index: { _index: 'students', _id: '200' } },
    3. { first_name: 'James', last_name: 'Rodriguez', gpa: 3.91, grad_year: 2019 },
    4. { index: { _index: 'students', _id: '300' } },
    5. { first_name: 'Nikki', last_name: 'Wolf', gpa: 3.87, grad_year: 2020 }
    6. ]
    7. client.bulk(body: actions, refresh: true)

    You can delete multiple documents as follows:

    1. # Deleting multiple documents.
    2. actions = [
    3. { delete: { _index: 'students', _id: 200 } },
    4. { delete: { _index: 'students', _id: 300 } }
    5. client.bulk(body: actions, refresh: true)

    You can perform different operations when using bulk as follows:

    To search for a document, use the search method. The following example searches for a student whose first or last name is “James.” It uses a multi_match query to search for two fields (first_name and last_name), and it is boosting the last_name field in relevance with a caret notation ().

    1. q = 'James'
    2. query = {
    3. 'size': 5,
    4. 'query': {
    5. 'multi_match': {
    6. 'query': q,
    7. 'fields': ['first_name', 'last_name^2']
    8. }
    9. }
    10. }
    11. response = client.search(
    12. body: query,
    13. index: 'students'
    14. )

    If you omit the request body in the search method, your query becomes a match_all query and returns all documents in the index:

    1. client.search(index: 'students')

    Boolean query

    The Ruby client exposes full OpenSearch query capability. In addition to simple searches that use the match query, you can create a more complex Boolean query to search for students who graduated in 2022 and sort them by last name. In the example below, search is limited to 10 documents.

    1. query = {
    2. 'query': {
    3. 'bool': {
    4. 'filter': {
    5. 'term': {
    6. 'grad_year': 2022
    7. }
    8. }
    9. }
    10. },
    11. 'sort': {
    12. 'last_name': {
    13. 'order': 'asc'
    14. }
    15. }
    16. }
    17. response = client.search(index: 'students', from: 0, size: 10, body: query)

    You can bulk several queries together and perform a multi-search using the msearch method. The following code searches for students whose GPAs are outside the 3.1–3.9 range:

    1. actions = [
    2. {},
    3. {query: {range: {gpa: {gt: 3.9}}}},
    4. {},
    5. {query: {range: {gpa: {lt: 3.1}}}}
    6. ]
    7. response = client.msearch(index: 'students', body: actions)

    Scroll

    You can paginate your search results using the Scroll API:

    1. response = client.search(index: index_name, scroll: '2m', size: 2)
    2. while response['hits']['hits'].size.positive?
    3. scroll_id = response['_scroll_id']
    4. puts(response['hits']['hits'].map { |doc| [doc['_source']['first_name'] + ' ' + doc['_source']['last_name']] })
    5. response = client.scroll(scroll: '1m', body: { scroll_id: scroll_id })
    6. end

    First, you issue a search query, specifying the scroll and size parameters. The scroll parameter tells OpenSearch how long to keep the search context. In this case, it is set to two minutes. The size parameter specifies how many documents you want to return in each request.

    The response to the initial search query contains a _scroll_id that you can use to get the next set of documents. To do this, you use the scroll method, again specifying the scroll parameter and passing the _scroll_id in the body. You don’t need to specify the query or index to the scroll method. The scroll method returns the next set of documents and the _scroll_id. It’s important to use the latest _scroll_id when requesting the next batch of documents because _scroll_id can change between requests.

    You can delete the index using the delete method:

    1. response = client.indices.delete(index: index_name)

    Sample program

    The following is a complete sample program that illustrates all of the concepts described in the preceding sections. The Ruby client’s methods return responses as Ruby hashes, which are hard to read. To display JSON responses in a pretty format, the sample program uses the MultiJson.dump method.

    1. require 'opensearch'
    2. client = OpenSearch::Client.new(host: 'http://localhost:9200')
    3. # Create an index with non-default settings
    4. index_name = 'students'
    5. index_body = {
    6. 'settings': {
    7. 'index': {
    8. 'number_of_shards': 1,
    9. 'number_of_replicas': 2
    10. }
    11. }
    12. }
    13. client.indices.create(
    14. index: index_name,
    15. body: index_body
    16. )
    17. # Create a mapping
    18. client.indices.put_mapping(
    19. index: index_name,
    20. body: {
    21. properties: {
    22. first_name: { type: 'keyword' },
    23. last_name: { type: 'keyword' }
    24. }
    25. }
    26. )
    27. # Get mappings
    28. response = client.indices.get_mapping(index: index_name)
    29. puts 'Mappings for the students index:'
    30. puts MultiJson.dump(response, pretty: "true")
    31. # Add one document to the index
    32. puts 'Adding one document:'
    33. document = {
    34. first_name: 'Connor',
    35. last_name: 'James',
    36. gpa: 3.93,
    37. grad_year: 2021
    38. }
    39. id = 100
    40. client.index(
    41. index: index_name,
    42. body: document,
    43. id: id,
    44. refresh: true
    45. )
    46. response = client.search(index: index_name)
    47. puts MultiJson.dump(response, pretty: "true")
    48. # Update a document
    49. puts 'Updating a document:'
    50. client.update(index: index_name, id: id, body: { doc: { gpa: 3.25 } }, refresh: true)
    51. puts MultiJson.dump(response, pretty: "true")
    52. print 'The updated gpa is '
    53. puts response['hits']['hits'].map { |doc| doc['_source']['gpa'] }
    54. # Add many documents in bulk
    55. { index: { _index: index_name, _id: '200' } },
    56. { first_name: 'James', last_name: 'Rodriguez', gpa: 3.91, grad_year: 2019},
    57. { index: { _index: index_name, _id: '300' } },
    58. { first_name: 'Nikki', last_name: 'Wolf', gpa: 3.87, grad_year: 2020}
    59. ]
    60. client.bulk(body: documents, refresh: true)
    61. # Get all documents in the index
    62. response = client.search(index: index_name)
    63. puts 'All documents in the index after bulk upload:'
    64. puts MultiJson.dump(response, pretty: "true")
    65. # Search for a document using a multi_match query
    66. puts 'Searching for documents that match "James":'
    67. q = 'James'
    68. query = {
    69. 'size': 5,
    70. 'query': {
    71. 'multi_match': {
    72. 'query': q,
    73. 'fields': ['first_name', 'last_name^2']
    74. }
    75. }
    76. }
    77. response = client.search(
    78. body: query,
    79. index: index_name
    80. )
    81. puts MultiJson.dump(response, pretty: "true")
    82. # Delete the document
    83. response = client.delete(
    84. index: index_name,
    85. id: id,
    86. refresh: true
    87. )
    88. response = client.search(index: index_name)
    89. puts 'Documents in the index after one document was deleted:'
    90. puts MultiJson.dump(response, pretty: "true")
    91. # Delete multiple documents
    92. actions = [
    93. { delete: { _index: index_name, _id: 200 } },
    94. { delete: { _index: index_name, _id: 300 } }
    95. ]
    96. client.bulk(body: actions, refresh: true)
    97. response = client.search(index: index_name)
    98. puts 'Documents in the index after all documents were deleted:'
    99. puts MultiJson.dump(response, pretty: "true")
    100. # Bulk several operations together
    101. actions = [
    102. { index: { _index: index_name, _id: 100, data: { first_name: 'Paulo', last_name: 'Santos', gpa: 3.29, grad_year: 2022 } } },
    103. { index: { _index: index_name, _id: 200, data: { first_name: 'Shirley', last_name: 'Rodriguez', gpa: 3.92, grad_year: 2020 } } },
    104. { index: { _index: index_name, _id: 300, data: { first_name: 'Akua', last_name: 'Mansa', gpa: 3.95, grad_year: 2022 } } },
    105. { index: { _index: index_name, _id: 400, data: { first_name: 'John', last_name: 'Stiles', gpa: 3.72, grad_year: 2019 } } },
    106. { index: { _index: index_name, _id: 500, data: { first_name: 'Li', last_name: 'Juan', gpa: 3.94, grad_year: 2022 } } },
    107. { index: { _index: index_name, _id: 600, data: { first_name: 'Richard', last_name: 'Roe', gpa: 3.04, grad_year: 2020 } } },
    108. { update: { _index: index_name, _id: 100, data: { doc: { gpa: 3.73 } } } },
    109. { delete: { _index: index_name, _id: 200 } }
    110. ]
    111. client.bulk(body: actions, refresh: true)
    112. puts 'All documents in the index after bulk operations with scrolling:'
    113. response = client.search(index: index_name, scroll: '2m', size: 2)
    114. while response['hits']['hits'].size.positive?
    115. scroll_id = response['_scroll_id']
    116. puts(response['hits']['hits'].map { |doc| [doc['_source']['first_name'] + ' ' + doc['_source']['last_name']] })
    117. response = client.scroll(scroll: '1m', body: { scroll_id: scroll_id })
    118. end
    119. # Multi search
    120. actions = [
    121. {},
    122. {query: {range: {gpa: {gt: 3.9}}}},
    123. {},
    124. {query: {range: {gpa: {lt: 3.1}}}}
    125. ]
    126. response = client.msearch(index: index_name, body: actions)
    127. puts 'Multi search results:'
    128. puts MultiJson.dump(response, pretty: "true")
    129. # Boolean query
    130. query = {
    131. 'query': {
    132. 'bool': {
    133. 'filter': {
    134. 'term': {
    135. 'grad_year': 2022
    136. }
    137. }
    138. }
    139. },
    140. 'sort': {
    141. 'last_name': {
    142. 'order': 'asc'
    143. }
    144. }
    145. }
    146. response = client.search(index: index_name, from: 0, size: 10, body: query)
    147. puts 'Boolean query search results:'
    148. puts MultiJson.dump(response, pretty: "true")
    149. # Delete the index
    150. puts 'Deleting the index:'
    151. response = client.indices.delete(index: index_name)

    Ruby AWS Sigv4 Client