Scoring In RediSearch

If you prefer a custom scoring function, it is possible to add more functions using the .

These are the pre-bunldled scoring functions availabe in RediSearch and how they work. Each function is mentioned by registered name, that can be passed as a SCORER argument in FT.SEARCH.

Basic TF-IDF scoring with a few extra features thrown inside:

For each term in each result we calculate the TF-IDF score of that term to that document. Frequencies are weighted based on field weights that are pre-determined, and each term’s frequency is normalized by the highest term frequency in each document .
We multiply the total TF-IDF for the query term by the a priory document score given on FT.ADD .

So for N terms in a document D, T1...Tn , the resulting score could be described with this python function:

TFIDF.DOCNORM

Identical to the default TFIDF scorer, with one important distinction:

Term frequencies are normalized by the length of the document (in number of terms). The length is weighted, so that if a document contains two terms, one in a feild that has a weight 1 and one in a field with a weight of 5, the total frequency is 6, not 2.

A vraiation on the basic TF-IDF scorer, see .

We also multiply the relevance score for each document by the a priory docment score, and apply a penalty based on slop as in TFIDF.

DISMAX

It is not a 1 to 1 implementation of Solr’s DISMAX algorithm , but follows it in broad terms.

FT.SEARCH myIndex "foo" SCORER DISMAX

A scoring function that just returns the a priory score of the document without applying any calculations to it. Since document scores can be updates, this can be useful if you’d like to use an external score and nothing further.

HAMMING

Scoring by the (inverse) Hamming Distance between the documents’ payload and the query payload. Since we are interested in the nearest neighbors, we inverse the hamming distance ( 1/(1+d) ) so that a distance of 0 gives a perfect score of 1 and is the highest rank.

This works only if:

The document has a payload.
The query has a payload.
Both are exactly the same length .

Payloads are binary safe, and having payloads with a length that’s a multiple of 64 bits yields slightly faster results.

127.0.0.1:6379> FT.CREATE idx SCHEMA foo TEXT
127.0.0.1:6379> FT.ADD idx 1 1 PAYLOAD "aaaabbbb" FIELDS foo hello
OK
127.0.0.1:6379> FT.ADD idx 2 1 PAYLOAD "aaaacccc" FIELDS foo bar
127.0.0.1:6379> FT.SEARCH idx "*" PAYLOAD "aaaabbbc" SCORER HAMMING WITHSCORES
1) (integer) 2
3) "0.5" // hamming distance of 1 --> 1/(1+1) == 0.5
4) 1) "foo"
   2) "hello"
5) "2"
6) "0.25" // hamming distance of 3 --> 1/(1+3) == 0.25
   2) "bar"

Scoring Documents

Scoring In RediSearch

TFIDF.DOCNORM

DISMAX

HAMMING