Chinese support in Redis Search

Chinese support allows Chinese documents to be added and tokenized using segmentation rather than simple tokenization using whitespace and/or punctuation.

Indexing a Chinese document is different than indexing a document in most other languages because of how tokens are extracted. While most languages can have their tokens distinguished by separation characters and whitespace, this is not common in Chinese.

Redis Search makes use of the chinese tokenization library for this purpose. This is largely transparent to the user and often no additional configuration is required.

In pseudo-code:

Prints:

Using custom dictionaries

If you wish to use a custom dictionary, you can do so at the module level when loading the module. The setting can point to the location of a file which contains the relevant settings and paths to the dictionary files.

Chinese Support

Chinese support in Redis Search

Using custom dictionaries