假设我们想查找关于”full-text search”的文档,但是我们又想给涉及到“Elasticsearch”或者“Lucene”的文档更高的权重。我们的用意是想涉及到”Elasticsearch” 或者 “Lucene”的文档的相关性得分会比那些没有涉及到的文档的得分要高,也就是说这些文档会出现在结果集更靠前的位置。

一个简单的bool查询允许我们写出像下面一样的非常复杂的逻辑:

  1. content字段必须包含full,text,search这三个单词。
  2. 如果content字段也包含了“Elasticsearch”或者“Lucene”,则文档会有一个更高的得分。

匹配的should子句越多,文档的相关性就越强。到目前为止一切都很好。但是如果我们想给包含“Lucene”一词的文档比较高的得分,甚至给包含“Elasticsearch”一词更高的得分要怎么做呢?

我们可以在任何查询子句中指定一个boost值来控制相对权重,默认值为1。一个大于1的boost值可以提高查询子句的相对权重。因此我们可以像下面一样重写之前的查询:

  1. GET /_search
  2. {
  3. "query": {
  4. "bool": {
  5. "must": {
  6. "content": {
  7. "query": "full text search",
  8. "operator": "and"
  9. }
  10. }
  11. },
  12. "should": [
  13. { "match": {
  14. "content": {
  15. "query": "Elasticsearch",
  16. "boost": 3 (2)
  17. }},
  18. "content": {
  19. "query": "Lucene",
  20. "boost": 2 (3)
  21. }
  22. }}
  23. ]
  24. }
  25. }
  26. }
  1. 这些查询子句的boost值为默认值1
  2. 这个子句是最重要的,因为他有最高的boost值。
  3. 这个子句比第一个查询子句的要重要,但是没有“Elasticsearch”子句重要。

我们会在下一章介绍更多的组合查询,。但是首先让我们一起来看一下查询的另外一个重要的特征:文本分析。
<!—
=== Boosting Query Clauses

Imagine that we want to search for documents(((“bool query”, “boosting weight of query clauses”)))(((“weight”, “controlling for query clauses”))) about “full-text search,” but we
want to give more weight to documents that also mention “Elasticsearch” or
“Lucene.” By more weight, we mean that documents mentioning
“Elasticsearch” or “Lucene” will receive a higher relevance _score than
those that don’t, which means that they will appear higher in the list of
results.

A simple query allows us to write this fairly complex logic as follows:

GET /_search
{
“query”: {
“bool”: {
“must”: {
“match”: {
“content”: { <1>
“query”: “full text search”,
“operator”: “and”
}
}
},
“should”: [ <2>
{ “match”: { “content”: “Elasticsearch” }},
{ “match”: { “content”: “Lucene” }}
]
}
}

// SENSE: 100_Full_Text_Search/25_Boost.json

<1> The content field must contain all of the words full, text, and search.

<2> If the content field also contains Elasticsearch or Lucene,
the document will receive a higher _score.

The more should clauses that match, the more relevant the document. So far,
so good.

But what if we want to give more weight to the docs that contain Lucene and
even more weight to the docs containing Elasticsearch?

We can control (((“boost parameter”)))the relative weight of any query clause by specifying a boost
value, which defaults to 1. A boost value greater than 1 increases the
relative weight of that clause. So we could rewrite the preceding query as
follows:

// SENSE: 100_Full_Text_Search/25_Boost.json

<1> These clauses use the default boost of 1.

<2> This clause is the most important, as it has the highest boost.

<3> This clause is more important than the default, but not as important
as the Elasticsearch clause.

[NOTE]

[[boost-normalization]]

The boost parameter is used to increase(((“boost parameter”, “score normalied after boost applied”))) the relative weight of a clause
(with a boost greater than 1) or decrease the relative weight (with a
boost between 0 and 1), but the increase or decrease is not linear. In
other words, a boost of 2 does not result in double the _score.

Instead, the new _score is normalized after(((“normalization”, “score normalied after boost applied”))) the boost is applied. Each
type of query has its own normalization algorithm, and the details are beyond
the scope of this book. Suffice to say that a higher value results in
a higher _score.

If you are implementing your own scoring model not based on TF/IDF and you
need more control over the boosting process, you can use the
<> to(((“function_score query”))) manipulate a document’s

boost without the normalization step.

We present other ways of combining queries in the next chapter,
<>. But first, let’s take a look at the other important
feature of queries: text analysis.
—>