Tag Fields
The main differences between tag fields and full-text fields are:
We do not perform stemming on tag indexes.
The tokenization is simpler: The user can determine a separator (defaults to a comma) for multiple tags, and we only do whitespace trimming at the end of tags. Thus, tags can contain spaces, punctuation marks, accents, etc. The only two transformations we perform are lower-casing (for latin languages only as of now), and whitespace trimming.
Tags cannot be found from a general full-text search. If a document has a field called “tags” with the values “foo” and “bar”, searching for foo or bar without a special tag modifier (see below) will not return this document.
The index is much simpler and more compressed: We do not store frequencies, offset vectors of field flags. The index contains only document IDs encoded as deltas. This means that an entry in a tag index is usually one or two bytes long. This makes them very memory efficient and fast.
Tag fields can be added to the schema in FT.ADD with the following syntax:
SEPARATOR defaults to a comma ( ), and can be any printable ascii character. For example:
FT.CREATE idx SCHEMA tags TAG SEPARATOR ";"
As mentioned above, just searching for a tag without any modifiers will not retrieve documents containing it.
The syntax for matching tags in a query is as follows (the curly braces are part of the syntax in this case):
@<field_name>:{ <tag> | <tag> | ...}
e.g.
Notice that multiple tags in the same clause create a union of documents containing either tags. To create an intersection of documents containing all tags, you should repeat the tag filter several times.
For example, imagine an index of travellers, with a tag field for the cities each traveller has visited:
For this index, the following query will return all the people who visited at least one of the following cities:
FT.SEARCH myIndex "@cities:{ New York | Los Angeles | Barcelona }"
But the next query will return all people who have visited all three cities :
Tags can be composed multiple words, or include other punctuation marks other than the field’s separator ( ,
by default). Punctuation marks in tags should be escaped with a backslash ( ).
It is also recommended (but not mandatory) to escape spaces; The reason is that if a multi-word tag includes stopwords, it will create a syntax error. So tags like “to be or not to be” should be escaped as “to\ be\ or\ not\ to\ be”. For good measure, you can escape all spaces within tags.
@tags:{foo\ bar\ baz | hello\ world}