12.8. 測試與除錯
The functionallows easy testing of a text search configuration.
ts_debug
displays information about every token ofdocument
_as produced by the parser and processed by the configured dictionaries. It uses the configuration specified byconfig
_, ordefault_text_search_config
if that argument is omitted.
ts_debug
returns one row for each token identified in the text by the parser. The columns returned are
aliastext
— short name of the token typedescriptiontext
— description of the token typetokentext
— text of the tokendictionariesregdictionary[]
— the dictionaries selected by the configuration for this token typedictionaryregdictionary
— the dictionary that recognized the token, orNULL
if none didlexemestext[]
— the lexeme(s) produced by the dictionary that recognized the token, orNULL
if none did; an empty array ({}
) means it was recognized as a stop word
Here is a simple example:
SELECT * FROM ts_debug('english','a fat cat sat on a mat - it ate a fat rats');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | cat | {english_stem} | english_stem | {cat}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | sat | {english_stem} | english_stem | {sat}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | on | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | mat | {english_stem} | english_stem | {mat}
blank | Space symbols | | {} | |
blank | Space symbols | - | {} | |
asciiword | Word, all ASCII | it | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | ate | {english_stem} | english_stem | {ate}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | rats | {english_stem} | english_stem | {rat}
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);
ALTER TEXT SEARCH CONFIGURATION public.english
ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------------+-------------------------------+----------------+-------------
asciiword | Word, all ASCII | The | {english_ispell,english_stem} | english_ispell | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | Brightest | {english_ispell,english_stem} | english_ispell | {bright}
blank | Space symbols | | {} | |
In this example, the wordBrightest
was recognized by the parser as anASCII word
(aliasasciiword
). For this token type the dictionary list isenglish_ispell
andenglish_stem
. The word was recognized byenglish_ispell
, which reduced it to the nounbright
. The wordsupernovaes
is unknown to theenglish_ispell
dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact,english_stem
is a Snowball dictionary which recognizes everything; that is why it was placed at the end of the dictionary list).
The wordThe
was recognized by theenglish_ispell
dictionary as a stop word (Section 12.6.1) and will not be indexed. The spaces are discarded too, since the configuration provides no dictionaries at all for them.
You can reduce the width of the output by explicitly specifying which columns you want to see:
The following functions allow direct testing of a text search parser.
ts_parse(
parser_name
text
,
document
text
,
OUT
tokid
integer
, OUT
token
text
) returns
setof record
ts_parse(
parser_oid
oid
,
document
text
,
OUT
tokid
integer
, OUT
token
text
) returns
setof record
SELECT * FROM ts_parse('default', '123 - a number');
tokid | token
-------+--------
22 | 123
12 |
12 | -
1 | a
12 |
1 | number
ts_token_type(
text
, OUT
integer
,
OUT
alias
text
, OUT
description
text
) returns
setof record
ts_token_type(
parser_oid
oid
, OUT
tokid
integer
,
OUT
alias
text
, OUT
description
text
) returns
setof record
ts_token_type
returns a table which describes each type of token the specified parser can recognize. For each token type, the table gives the integertokid
that the parser uses to label a token of that type, thealias
that names the token type in configuration commands, and a shortdescription
. For example:
Thets_lexize
function facilitates dictionary testing.
ts_lexize(
dict
regdictionary
,
token
text
) returns
text[]
ts_lexize
returns an array of lexemes if the input_token
_is known to the dictionary, or an empty array if the token is known to the dictionary but it is a stop word, orNULL
if it is an unknown word.
Examples:
SELECT ts_lexize('english_stem', 'stars');
ts_lexize
-----------
{star}
SELECT ts_lexize('english_stem', 'a');
ts_lexize
-----------
{}
SELECT ts_lexize('thesaurus_astro','supernovae stars') is null;
?column?
t
The thesaurus dictionarythesaurus_astro
does know the phrasesupernovae stars
, butts_lexize
fails since it does not parse the input text but treats it as a single token. Useplainto_tsquery
orto test thesaurus dictionaries, for example: