Typo tolerance

Example

On a movie dataset, let’s search for .

We use a prefix Levenshtein algorithm (opens new window) to check if the words match. The only difference with a Levenshtein algorithm is that it accepts every word that starts with the query words too. Therefore, words are accepted if they start with or have equal length.

substitution of a character of M by a character other than P. (e.g. kitten → sitten)
deletion of a character from M. (e.g. saturday → satuday)

There are some rules about what can be considered “similar”. These rules are by word and not for the whole query string.

If the query word is between 1 and 4 characters long, therefore, no typo is allowed. Only documents that contain words that start with or are of equal length with this query word are considered valid for this request.
If the query word is between 5 and 8 characters long, one typo is allowed. Documents that contain words that match with one typo are retained for the next steps.

“saturday” is accepted because it is the same word.
“sat” is not accepted because the query word is not a prefix of it (it is the opposite).
“satuday” is accepted because it contains one typo.