12.8. 測試與除錯

    The functionallows easy testing of a text search configuration.

    ts_debugdisplays information about every token ofdocument_as produced by the parser and processed by the configured dictionaries. It uses the configuration specified byconfig_, ordefault_text_search_configif that argument is omitted.

    ts_debugreturns one row for each token identified in the text by the parser. The columns returned are

    • aliastext— short name of the token type
    • descriptiontext— description of the token type
    • tokentext— text of the token
    • dictionariesregdictionary[]— the dictionaries selected by the configuration for this token type
    • dictionaryregdictionary— the dictionary that recognized the token, orNULLif none did
    • lexemestext[]— the lexeme(s) produced by the dictionary that recognized the token, orNULLif none did; an empty array ({}) means it was recognized as a stop word

    Here is a simple example:

    1. SELECT * FROM ts_debug('english','a fat cat sat on a mat - it ate a fat rats');
    2. alias | description | token | dictionaries | dictionary | lexemes
    3. -----------+-----------------+-------+----------------+--------------+---------
    4. asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
    5. blank | Space symbols | | {} | |
    6. asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
    7. blank | Space symbols | | {} | |
    8. asciiword | Word, all ASCII | cat | {english_stem} | english_stem | {cat}
    9. blank | Space symbols | | {} | |
    10. asciiword | Word, all ASCII | sat | {english_stem} | english_stem | {sat}
    11. blank | Space symbols | | {} | |
    12. asciiword | Word, all ASCII | on | {english_stem} | english_stem | {}
    13. blank | Space symbols | | {} | |
    14. asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
    15. blank | Space symbols | | {} | |
    16. asciiword | Word, all ASCII | mat | {english_stem} | english_stem | {mat}
    17. blank | Space symbols | | {} | |
    18. blank | Space symbols | - | {} | |
    19. asciiword | Word, all ASCII | it | {english_stem} | english_stem | {}
    20. blank | Space symbols | | {} | |
    21. asciiword | Word, all ASCII | ate | {english_stem} | english_stem | {ate}
    22. blank | Space symbols | | {} | |
    23. asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
    24. blank | Space symbols | | {} | |
    25. asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
    26. blank | Space symbols | | {} | |
    27. asciiword | Word, all ASCII | rats | {english_stem} | english_stem | {rat}
    1. CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
    2. CREATE TEXT SEARCH DICTIONARY english_ispell (
    3. TEMPLATE = ispell,
    4. DictFile = english,
    5. AffFile = english,
    6. StopWords = english
    7. );
    8. ALTER TEXT SEARCH CONFIGURATION public.english
    9. ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
    1. alias | description | token | dictionaries | dictionary | lexemes
    2. -----------+-----------------+-------------+-------------------------------+----------------+-------------
    3. asciiword | Word, all ASCII | The | {english_ispell,english_stem} | english_ispell | {}
    4. blank | Space symbols | | {} | |
    5. asciiword | Word, all ASCII | Brightest | {english_ispell,english_stem} | english_ispell | {bright}
    6. blank | Space symbols | | {} | |

    In this example, the wordBrightestwas recognized by the parser as anASCII word(aliasasciiword). For this token type the dictionary list isenglish_ispellandenglish_stem. The word was recognized byenglish_ispell, which reduced it to the nounbright. The wordsupernovaesis unknown to theenglish_ispelldictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact,english_stemis a Snowball dictionary which recognizes everything; that is why it was placed at the end of the dictionary list).

    The wordThewas recognized by theenglish_ispelldictionary as a stop word (Section 12.6.1) and will not be indexed. The spaces are discarded too, since the configuration provides no dictionaries at all for them.

    You can reduce the width of the output by explicitly specifying which columns you want to see:

    The following functions allow direct testing of a text search parser.

    1. ts_parse(
    2. parser_name
    3. text
    4. ,
    5. document
    6. text
    7. ,
    8. OUT
    9. tokid
    10. integer
    11. , OUT
    12. token
    13. text
    14. ) returns
    15. setof record
    16. ts_parse(
    17. parser_oid
    18. oid
    19. ,
    20. document
    21. text
    22. ,
    23. OUT
    24. tokid
    25. integer
    26. , OUT
    27. token
    28. text
    29. ) returns
    30. setof record
    1. SELECT * FROM ts_parse('default', '123 - a number');
    2. tokid | token
    3. -------+--------
    4. 22 | 123
    5. 12 |
    6. 12 | -
    7. 1 | a
    8. 12 |
    9. 1 | number
    1. ts_token_type(
    2. text
    3. , OUT
    4. integer
    5. ,
    6. OUT
    7. alias
    8. text
    9. , OUT
    10. description
    11. text
    12. ) returns
    13. setof record
    14. ts_token_type(
    15. parser_oid
    16. oid
    17. , OUT
    18. tokid
    19. integer
    20. ,
    21. OUT
    22. alias
    23. text
    24. , OUT
    25. description
    26. text
    27. ) returns
    28. setof record

    ts_token_typereturns a table which describes each type of token the specified parser can recognize. For each token type, the table gives the integertokidthat the parser uses to label a token of that type, thealiasthat names the token type in configuration commands, and a shortdescription. For example:

    Thets_lexizefunction facilitates dictionary testing.

    1. ts_lexize(
    2. dict
    3. regdictionary
    4. ,
    5. token
    6. text
    7. ) returns
    8. text[]

    ts_lexizereturns an array of lexemes if the input_token_is known to the dictionary, or an empty array if the token is known to the dictionary but it is a stop word, orNULLif it is an unknown word.

    Examples:

    1. SELECT ts_lexize('english_stem', 'stars');
    2. ts_lexize
    3. -----------
    4. {star}
    5. SELECT ts_lexize('english_stem', 'a');
    6. ts_lexize
    7. -----------
    8. {}
    1. SELECT ts_lexize('thesaurus_astro','supernovae stars') is null;
    2. ?column?
    3. t

    The thesaurus dictionarythesaurus_astrodoes know the phrasesupernovae stars, butts_lexizefails since it does not parse the input text but treats it as a single token. Useplainto_tsqueryorto test thesaurus dictionaries, for example: