Scoring in RediSearch

    If you prefer a custom scoring function, it is possible to add more functions using the Extension API.

    These are the pre-bundled scoring functions available in RediSearch and how they work. Each function is mentioned by registered name, that can be passed as a SCORER argument in FT.SEARCH.

    Basic with a few extra features thrown inside:

    1. For each term in each result, we calculate the TF-IDF score of that term to that document. Frequencies are weighted based on field weights that are pre-determined, and each term’s frequency is normalized by the highest term frequency in each document.

    2. We multiply the total TF-IDF for the query term by the a priory document score given on FT.ADD.

    So for N terms in document D, T1...Tn, the resulting score could be described with this python function:

    TFIDF.DOCNORM

    Identical to the default TFIDF scorer, with one important distinction:

    Term frequencies are normalized by the length of the document (expressed as the total number of terms). The length is weighted, so that if a document contains two terms, one in a field that has a weight 1 and one in a field with a weight of 5, the total frequency is 6, not 2.

      A variation on the basic TF-IDF scorer, see .

      We also multiply the relevance score for each document by the a priory document score and apply a penalty based on slop as in TFIDF.

      DISMAX

      It is not a 1 to 1 implementation of but follows it in broad terms.

      1. FT.SEARCH myIndex "foo" SCORER DISMAX

      A scoring function that just returns the a priory score of the document without applying any calculations to it. Since document scores can be updated, this can be useful if you’d like to use an external score and nothing further.

      HAMMING

      Scoring by the (inverse) Hamming Distance between the documents’ payload and the query payload. Since we are interested in the nearest neighbors, we inverse the hamming distance () so that a distance of 0 gives a perfect score of 1 and is the highest rank.

      This works only if:

      1. The document has a payload.
      2. The query has a payload.
      3. Both are exactly the same length.

      Payloads are binary-safe, and having payloads with a length that’s a multiple of 64 bits yields slightly faster results.

      1. 127.0.0.1:6379> FT.CREATE idx SCHEMA foo TEXT
      2. OK
      3. 127.0.0.1:6379> FT.ADD idx 1 1 PAYLOAD "aaaabbbb" FIELDS foo hello
      4. OK
      5. 127.0.0.1:6379> FT.ADD idx 2 1 PAYLOAD "aaaacccc" FIELDS foo bar
      6. 1) (integer) 2
      7. 2) "1"
      8. 3) "0.5" // hamming distance of 1 --> 1/(1+1) == 0.5
      9. 4) 1) "foo"
      10. 2) "hello"
      11. 5) "2"
      12. 6) "0.25" // hamming distance of 3 --> 1/(1+3) == 0.25
      13. 2) "bar"