Chinese support in Redis Search

    Chinese support allows Chinese documents to be added and tokenized using segmentation rather than simple tokenization using whitespace and/or punctuation.

    Indexing a Chinese document is different than indexing a document in most other languages because of how tokens are extracted. While most languages can have their tokens distinguished by separation characters and whitespace, this is not common in Chinese.

    Redis Search makes use of the Friso chinese tokenization library for this purpose. This is largely transparent to the user and often no additional configuration is required.

    In pseudo-code:

    Prints:

    Using custom dictionaries

    If you wish to use a custom dictionary, you can do so at the module level when loading the module. The setting can point to the location of a file which contains the relevant settings and paths to the dictionary files.