Introduction to OpenSearch

    Unsurprisingly, people often use search engines like OpenSearch as the backend for a search application—think Wikipedia or an online store. It offers excellent performance and can scale up and down as the needs of the application grow or shrink.

    An equally popular, but less obvious use case is log analytics, in which you take the logs from an application, feed them into OpenSearch, and use the rich search and visualization functionality to identify issues. For example, a malfunctioning web server might throw a 500 error 0.5% of the time, which can be hard to notice unless you have a real-time graph of all HTTP status codes that the server has thrown in the past four hours. You can use to build these sorts of visualizations from data in OpenSearch.

    Its distributed design means that you interact with OpenSearch clusters. Each cluster is a collection of one or more nodes, servers that store your data and process search requests.

    You can run OpenSearch locally on a laptop—its system requirements are minimal—but you can also scale a single cluster to hundreds of powerful machines in a data center.

    OpenSearch organizes data into indices. Each index is a collection of JSON documents. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to JSON. A simple JSON document for a movie might look like this:

    When you add the document to an index, OpenSearch adds some metadata, such as the unique document ID:

    Indices also contain mappings and settings:

    • A mapping is the collection of fields that documents in the index have. In this case, those fields are and release_date.

    OpenSearch splits indices into shards for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually.

    Despite being a piece of an OpenSearch index, each shard is actually a full Lucene index—confusing, we know. This detail is important, though, because each instance of Lucene is a running process that consumes CPU and memory. More shards is not necessarily better. Splitting a 400 GB index into 1,000 shards, for example, would place needless strain on your cluster. A good rule of thumb is to keep shard size between 10–50 GB.

    You interact with OpenSearch clusters using the REST API, which offers a lot of flexibility. You can use clients like or any programming language that can send HTTP requests. To add a JSON document to an OpenSearch index (i.e. index a document), you send an HTTP request:

    To run a search for the document:

    To delete the document: