Text search parsers are responsible for splitting raw document text into tokens and identifying each token’s type, where the set of possible types is defined by the parser itself. Note that a parser does not modify the text at all — it simply identifies plausible word boundaries. Because of this limited scope, there is less need for application-specific custom parsers than there is for custom dictionaries. At present Greenplum Database provides just one built-in parser, which has been found to be useful for a wide range of applications.

    The built-in parser is named . It recognizes 23 token types, shown in the following table.

    does not support all valid email characters as defined by RFC 5322. Specifically, the only non-alphanumeric characters supported for email user names are period, dash, and underscore.

    This behavior is desirable since it allows searches to work for both the whole compound word and for components. Here is another instructive example:

    1. alias | description | token
    2. host | Host | example.com

    Parent topic: