精确值与全文
Exact values are exactly what they sound like. Examples would be a date or a
user ID, but can also include exact strings like a username or an email
address. The exact value is not the same as the exact value "foo"
.
The exact value 2014
is not the same as the exact value 2014-09-15
.
Full text, on the other hand, refers to textual data — usually written in
some human language — like the text of a tweet or the body of an email.
Full text is often referred to as ``unstructured data’’, which is a misnomer
— natural language is highly structured. The problem is that the rules of
natural languages are complex which makes them difficult for computers to
parse correctly. For instance, consider this sentence:
Exact values are easy to query. The decision is binary — a value either
matches the query, or it doesn’t. This kind of query is easy to express with
SQL:
WHERE name = "John Smith"
AND date > "2014-09-15"
Querying full text data is much more subtle. We are not just asking How well does this document match the
query?’’ In other words, how relevant is this document to the given query?
We seldom want to match the whole full text field exactly. Instead, we want
to search within text fields. Not only that, but we expect search to
understand our intent:
a search for
"jump"
should also match"jumped"
,"jumps"
,"jumping"
and perhaps even"fox news hunting"
should return stories about hunting on Fox News,
while"fox hunting news"
should return news stories about fox hunting.